Introduction

Discerning the interplay between positive Darwinian (diversifying) selection and negative (purifying) selection is fundamental to our understanding of adaptive evolution. Selection can either be pervasive, with selection affecting all lineages, or episodic, with selection affecting only a subset of lineages. Ecological systems experiencing incessant reciprocal adaptation, such as the conflicts between host-pathogen and predator–prey, represent practical models for studying the pervasive and episodic nature of selection (Jansa and Voss 2011; Daugherty and Malik 2012; Khan et al. 2019). Venoms, in particular, are protein-dominated secretions with clear genotype-phenotype maps, which are generally believed to experience high rates of gene duplication and subsequent neofunctionalization (Casewell et al. 2013). These high duplication rates are usually coupled with strong diversifying selection resulting from perpetual coevolution between predator and prey (Casewell et al. 2013). However, Sunagar and Moran (2015) recently challenged this theory, proposing that venoms mostly tend towards a “two-speed” evolutionary model where the substantial influence of diversifying selection seems to coincide with earlier periods of ecological specialization in a species’ evolutionary history. First, as most evidence detailing the rapid diversification of venom components was the result of research on snakes (Župunski et al. 2003; Lynch 2007; Gibbs and Rossiter 2008; Sunagar et al. 2014) and cone snails (Dutertre et al. 2014), Sunagar and Moran (2015) argued that stronger diversifying selection should be prevalent in younger lineages currently undergoing shifts in ecological niches (e.g., diet or range expansion). Conversely, research on venoms from the older venomous lineages (e.g., centipedes, scorpions, and spiders) has revealed that these venoms are governed predominantly by purifying selection (Sunagar et al. 2013; Sunagar and Moran 2015). This led Sunagar and Moran (2015) to suggest that, after an initial ecological expansion period, evolutionarily optimized toxins from the older lineages should be tending towards longer periods of purifying selection to maintain their fine-tuned functions. Although Sunagar and Moran (2015) detailed the lack of pervasive diversifying selection in older lineages, they emphasized that the strong influence of episodic diversifying selection in these same lineages indicates the older lineages may re-experience rapid diversification of venom components with new ecological shifts (i.e., diet or range expansion).

Scorpions (order Scorpiones), for example, represent one of the oldest venomous lineages, having originated approximately 430 million years ago (Soleglad and Fet 2003; Waddington et al. 2015). At the time Sunagar and Moran (2015) had proposed their two-speed venom evolution model, scorpion venom research was primarily concentrated on the medically significant Buthidae family of scorpions. Therefore, most empirical studies of scorpion venom selective pressures prior to Sunagar and Moran (2015) had unsurprisingly been focused on toxins typically found in Buthidae venoms (Zhu et al. 2004; Weinberger et al. 2010), although homologous toxins from two non-Buthidae scorpions were used in one study (Sunagar et al. 2013). In contrast to the enzyme-dominated snake venoms (Fatima and Fatah 2014; Oliveira et al. 2022), scorpions in the family Buthidae contain venoms rich in short peptide toxins, including the predominantly neurotoxic CS\(\alpha \)/\(\beta \) (cysteine-stabilized \(\alpha \)/\(\beta \)) toxins, and the functionally diverse non-CS\(\alpha \)/\(\beta \) toxins (Cid-Uribe et al. 2020). The scorpion CS\(\alpha \)/\(\beta \) toxins (e.g., K+ and Na+-channel modulating toxins) are thought to be under the influence of diversifying selection (Zhu et al. 2004; Weinberger et al. 2010; Gao et al. 2011) with much of this selection being episodic, while the non-CS\(\alpha \)/\(\beta \) toxins (e.g., bradykinin potentiating and non-disulfide bridge peptides) showed little diversifying selection, but strong purifying selection (Sunagar et al. 2013).

With advancements in high-throughput sequencing technologies, more opportunities for characterizing the venoms of non-Buthidae scorpions, and thus testing for selection across other scorpion venoms, have emerged. Characterizations of non-Buthidae scorpion venoms have revealed venoms with a diversity of ion-channel toxins, metalloproteases, non-disulfide bridge peptides (NDBPs), phospholipase A2 toxins (PLA2s), serine proteases (SPs), and other peptidases (Luna-Ramírez et al. 2015; Quintero-Hernández et al. 2015; Santibáñez-López et al. 2017; Rokyta and Ward 2017; Cid-Uribe et al. 2018; Nystrom et al. 2023). One group of non-Buthidae scorpions that have received significant attention are the Giant Desert Hairy Scorpions (genus Hadrurus) from the family Hadruridae (formerly Caraboctonidae; Santibáñez-López et al. 2020). Hadrurus are a non-medically significant genus of large (up to 15 cm), fossorial scorpions that range from the southwestern United States to Mexico and include at least seven currently recognized species (Soleglad and Fet 2010; Soleglad et al. 2011; Santibáñez-López et al. 2020). High-throughput proteomic and/or transcriptomic venom characterizations of Hadrurus spadix (Rokyta and Ward 2017) and Hadrurus concolorous (Santibáñez-López et al. 2019) have demonstrated Hadrurus venoms as rich sources of alpha K+-channel toxins (\(\alpha \)KTxs), La1-like peptides (La1s), NDBPs, PLA2s, peptidases, scorpine-like antimicrobial peptides (abbreviated either AMPs or SLPs), SPs, and uncharacterized venom proteins. The venom repertoire of Hadrurus scorpions is likely the result of the diverse selective pressures experienced by these species, including prey capture, defense, sexual conflict via sexual stinging (Tallarovic et al. 2000), and potentially venom-gland microbiome regulation (Gao et al. 2007). Sexual stinging, more specifically, describes a courtship ritual where male Hadrurus scorpions sting their female counterparts prior to copulation (Tallarovic et al. 2000). Although this ritual may serve as the basis for a sex-based variation in venom, a phenomenon detected in scorpion venoms before (De Sousa et al. 2010; Miller et al. 2016; Ward et al. 2018a; Olguín-Pérez et al. 2021), no tests for intersexual venom variation in Hadrurus have been performed.

In addition, only two studies have tested for evidence of diversifying selection across Hadrurus scorpions (Rokyta and Ward 2017; Santibáñez-López et al. 2019). Using pairwise tests of nonsynonymous and synonynmous (dN/dS) substitution rates of orthologous toxins between H. spadix and the close relative Hoffmannihadrurus gertschi, Rokyta and Ward (2017) identified higher rates of synonymous substitutions across both CS\(\alpha \)/\(\beta \) and non-CS\(\alpha \)/\(\beta \) toxins, providing support for the two-speed model proposed by Sunagar and Moran (2015). Santibáñez-López et al. (2019) were the first to test for episodic and pervasive diversifying and/or purifying selection in Hadrurus toxins, specifically codon site episodic and pervasive selection affecting CS\(\alpha \)/\(\beta \) scorpine-like 1 (SLP1) and scorpine-like 2 (SLP2) peptides. Santibáñez-López et al. (2019) detected a disproportionately large number of sites in SLPs across two Hadrurus (H. spadix and H. concolorous) and two Hoffmannihadrurus (H. aztecus and H. gertschi) species under strong pervasive purifying selection. They also observed comparable signals of strong purifying selection and weak diversifying selection in SLP1 and SLP2 paralogs from diverse scorpion families. Although Sunagar et al. (2013) provided evidence that not all CS\(\alpha \)/\(\beta \) toxins were under strong influences of episodic diversifying selection, the work by Santibáñez-López et al. (2019) suggests that more research is needed to fully understand the selection footprints underlying the evolution of scorpion CS\(\alpha \)/\(\beta \) toxins. Additionally, some scorpion toxins, particularly the non-CS\(\alpha \)/\(\beta \) NDBPs, exhibit extreme post-translation modifications resulting in mature peptide lengths of only a fraction of the precursor sequence (Zeng et al. 2005; Delgado-Prudencio et al. 2019). As sites outside the mature peptide (e.g., signal peptide and propeptide) should experience more profound purifying selection, Rokyta and Ward (2017) suggested that the lower proportion of sites under diversifying selection in older lineages may partly be a function of these extreme post-translational modifications.

Furthermore, the two studies on scorpion venoms that tested where diversifying selection at codon sites is episodic primarily tested for selection at diverse phylogenetic scales (i.e., family level; Sunagar et al. 2013; Santibáñez-López et al. 2019). Although studies across more diverse phylogenetic scales are important for uncovering broader macroevolutionary trends in venom evolution, they are limited in their propensity to assess evolutionary trends at lower taxonomic levels, arguably the primary drivers of venom variation observed at the more diverse phylogenetic scales. To investigate the selection footprints underlying venom evolution within a genus of scorpions, we first performed a joint high-throughput venom proteomic and venom-gland transcriptomic analysis of male and female Hadrurus arizonensis from three populations across their range. In parallel, we also reanalyzed raw data from the previously published venom proteome and venom-gland transcriptome of the black back scorpion, H. spadix (Rokyta and Ward 2017). We not only tested for a sex-based variation in Hadrurus venoms, but also evaluated episodic and pervasive signatures of gene-wide, branch-site, and codon site diversifying and/or purifying selection across five major scorpion toxin families (i.e., \(\alpha \)KTxs, NDBPs, SLP2s, PLA2s, and SPs), two of which have not previously been considered (i.e., PLA2s and SPs).

Results

Hadrurus Cytochrome c Oxidase I Phylogenetics

To confirm the species identities of our six Hadrurus arizonensis specimens, we generated a maximum likelihood phylogeny using 1,029 bp cytochrome c oxidase subunit I (COI) fragments from 256 published sequences of H. arizonensis (Graham et al. 2013) and the eight Hadrurus used in this study (Fig. 1). In their published phylogeny, Graham et al. (2013) defined a southern clade and a northern clade of H. arizonensis, the latter of which contained 251 of the 256 specimens further divided into six groups (i.e., Groups I–VI). Our phylogeny revealed the H. arizonensis from the La Paz County, Arizona (LPC) and San Diego County, California (SDC) populations were members of the northern clade Group I, while the H. arizonensis from the Pima County, Arizona (PC) population were members of the northern clade Group VI (Fig. 1). The Group I clade from Graham et al. (2013) displays the largest geographic range with H. arizonensis distributed across the center of their reported range, extending down towards southeastern tip of California and southwestern Arizona and up through the Lower Colorado River Valley into southern Nevada and Utah. Hadrurus arizonensis-LPC individuals showed close phylogenetic similarity to Group I members from the towns of Surprise and Quartzite, Arizona, the latter of which is in La Paz County. The H. arizonensis-SDC population displayed close phylogenetic similarity to Group I specimens from several southwestern California localities near the Mexican border, including Salton City, just east of San Diego County. The Group VI clade includes H. arizonensis distributed across southern Arizona, primarily in the Pima County town of Marana, Arizona. Therefore, H. arizonensis-LPC individuals in our study showed close phylogenetic similarity with scorpions collected from the same county. This result, along with the phylogenetic similarity our H. arizonensis-LPC and -SDC populations exhibited towards the Graham et al. (2013) Group I clade, provides convincing evidence that we not only correctly classified H. arizonensis individuals, but collected specimens representing multiple, diverse clades across the species range. However, we recognize that the clades described by Graham et al. (2013) did not all show strong statistical support. Therefore, we opted to categorize each of the H. arizonensis in our study as belonging to one of three populations (i.e., LPC, PC, and SDC) based on their collection localities.

Fig. 1
figure 1

Maximum likelihood IQ-TREE of cytochrome c oxidase I subunit genes from the eight Hadrurus used in this study and the 256 Hadrurus arizonensis from (Graham et al. 2013). Tree tips are colored by respective H. arizonensis clades defined by (Graham et al. 2013). Numbers in parentheses correspond to ultrafast/nonparametric bootstrap support values. Inset map details sampling localities for Hadrurus spadix from Rokyta and Ward (2017) and the three Hadrurus arizonensis populations used in this study from La Paz (LPC) and Pima Counties (PC) in Arizona and San Diego County (SDC), California. Hadrurus spadix and H. arizonensis ranges are based on reported distributions from Soleglad et al. (2011)

The Transcriptomic and Proteomic Foundations of Hadrurus Venoms

We sequenced between 12,202,727–25,722,039 Illumina quality-filtered read pairs, resulting in between 10,571,050–20,400,575 after read trimming and merging (Table 1). With our three transcriptome assembly (i.e., DNAStar NGen, Extender, and Trinity) and mass-spectrometry-based approach, we generated consensus transcriptomes for each of the four Hadrurus populations and identified between 1,549–2,002 protein-encoding sequences in each individual Hadrurus venom-gland transcriptome. Between the two H. spadix individuals, we identified a total of 1,550 unique protein-encoding sequences, 1,415 of which corresponded to nontoxins, 95 to proteomically confirmed toxins, and 40 to homology-only toxins. Of the 95 proteomically confirmed toxins identified from H. spadix, 61 were detected in the male, 90 were detected in the female, and 56 were detected in both sexes. In the H. arizonensis-LPC individuals, we identified 1,616 unique protein-encoding sequences, including 1,504 nontoxins, 88 proteomically confirmed toxins, and 24 homology-only toxins. For the 88 proteomically confirmed toxins from H. arizonensis-LPC, 78 were found in the male and female each, and 68 were shared between both. In the H. arizonensis-PC individuals, we identified 2,004 unique protein-encoding sequences, including 1,868 nontoxins, 97 proteomically confirmed toxins, and 39 homology-only toxins. Of the 97 proteomically confirmed toxins from the H. arizonensis-PC population, 66 were found in the male, 83 were found in the female, and 52 were detected in both individuals. Finally, in the H. arizonensis-SDC individuals, we identified 1,917 unique protein-encoding sequences, including 1,754 nontoxins, 121 proteomically confirmed toxins, and 42 homology-only toxins. Of the 121 proteomically confirmed toxins from H. arizonensis-SDC population, 97 were detected in the male and female each, and 73 were shared between the two.

Table 1 Summary of Hadrurus specimen information, sequencing read-pair counts, and numbers of identified sequences, including toxins

Using the population level consensus transcriptomes, we observed venom from each Hadrurus individual to contain an abundance (Fig. 2) and diversity (Table 2) of toxins frequently detected from scorpion venoms, including scorpine-like antimicrobial peptides (AMPs), ion-channel toxins (e.g., \(\alpha \)-potassium channel toxins), La-1 like peptides, non-disulfide bridge peptides (NDBPs), peptidases (e.g., Peptidase M2s), phospholipase A2 toxins (PLA2s), and serine proteases (SPs). We also identified a diversity and abundance of uncharacterized venom proteins (VPs; Fig. 2, Table 2), or those that we could not provide any clear functional classification on the basis of homology. Numbers of total toxin transcripts and proteomically confirmed toxin transcripts corresponding to each putative toxin family are reported for each individual in Table 2. Individual toxins identified from each individual, along with the respective amino acid sequence, precursor amino acid length, signal peptide length, top nr protein database BLAST matches, transcript abundance (TPM) and proteomic abundance (fmol) for H. spadix and the H. arizonensis-LPC, -PC, and -SDC populations are reported in Supplemental Table 1.

Fig. 2
figure 2

Hadrurus venom-gland transcriptomic (left) and venom proteomic (right) abundances of major toxin families from Hadrurus spadix (C0195 and C0196) and the three Hadrurus arizonensis populations from Pima County (PC), San Diego County (SDC), and La Paz County (LPC). Abbreviations: AMP—antimicrobial peptide, La1—La1-like peptide, NDBP—non-disulfide bridge peptide, PLA2—phospholipase A2 toxins, SP—serine protease, and VP—uncharacterized venom protein

Table 2 Number of different mRNA transcripts coding for putative venom proteins identified from Hadrurus populations with the number of proteomically confirmed transcripts shown in brackets

No Significant Sex-Based Variation in Hadrurus Venom Composition

We detected several major toxin families expressed at disproportionate transcriptomic (i.e., AMPs and NDBPs) and/or protoemic (i.e., \(\alpha \)KTxs, AMPs, and NDBPs) abundances between male and female Hadrurus (Fig. 2). NDBPs were expressed at much higher relative abundances in the females at 11.7\(-\)32.4% (23.2% on average) of total toxin transcript abundance compared to the males, where they were responsible for 2.9\(-\)17.6% (7.6% on average) of the total toxin transcript abundance (Fig. 2). NDBPs were also difficult to detect protoeomically and only contributed 0.4-\(-\)9.4% (4.7% on average) and 0.0-\(-\)2.3% (1.3% on average) of the total venom protein abundance in the females and males, respectively (Fig. 2). Ion-channel toxins, which did not show a discrepancy in transcriptomic expression, contributed 20.0-\(-\)26.5% (22.9% on average) and 15.8-\(-\)29.1% (20.1% on average) of the total toxin transcript abundance in the males and females, respectively (Fig. 2). However, ion-channel toxins were expressed at lower relative abundances in the venom proteomes of both the males (7.0-\(-\)18.8%; 12.8% on average) and, to a lesser extent, females (3.1-\(-\)12.0%; 6.2% on average. The \(\alpha \)-K+-channel toxins (\(\alpha \)KTxs) were by far the most abundant ion-channel toxin family, contributing 17.1-\(-\)22.9% (20.4% on average) and 13.4-\(-\)27.1% (17.9% on average) of the total toxin transcript abundance and 6.5-\(-\)17.6% (12.0% on average) and 2.1-\(-\)11.4% (5.2% on average) of the total venom protein abundance in the males and females, respectively (Supplemental Table 1). Finally, AMPs contributed approximately 0.7-\(-\)6.4% (2.4% on average) and 3.3-\(-\)7.4% (4.9% on average) of the total toxin transcript abundance in the males and females, respectively (Fig. 2). AMPs were observed at higher relative abundances in the venom proteomes of both the male and female Hadrurus at 1.4-\(-\)10.6% (6.0% on average) and 7.9-\(-\)22.1% (14.8% on average) of the total venom protein abundance, respectively (Fig. 2). Although we identified expression differences in \(\alpha \)KTxs, AMPs, and NDBPs, these differences did not translate to any significant sex-based variation in overall venom composition on the proteomic (p = 0.400; PERMANOVA) or transcriptomic (p = 0.171; PERMANOVA) level.

Although we did not detect a significant difference in venom composition between males and female Hadrurus, we did observe weak or no agreement in venom-gland transcript abundances of the proteomically-confirmed toxins (\(R^2 =\) 0.00-\(-\)0.35; Fig. 3, E–H) and homology-only toxins (\(R^2 =\) 0.03-\(-\)0.33; Fig. 3, I–L), between males and females. The disagreement in toxin transcript abundances between males and females of the same Hadrurus population provides a contrast to the much stronger agreements in nontoxin transcript abundances within populations (\(R^2 > 0.85\) in all cases; Fig. 3, A–D). Similar to the toxin transcript abundances, we also observed weak positive correlations in venom proeteomic abundances between male and female H. arizonensis (\(R^2 =\) 0.14-\(-\)0.29; Fig. 4), although the correlation between the H. spadix male and female was relatively stronger (\(R^2 = 0.48\); Fig. 4).

Fig. 3
figure 3

Within population mRNA abundance comparisons between male and female Hadrurus show strong agreements in nontoxins (A–D), and weak or no agreement in the proteome (E–H) and homology-only (I–L) toxins. Solid lines represent a correlation coefficient of one, dashed lines illustrate the lines of best fit, and shaded regions show the 95% confidence intervals for the true linear regression line. Abbreviations: clr—centered logratio transformation, n-number of transcripts, \(\rho \)—Spearman’s rank correlation coefficient, R—Pearson’s correlation coefficient and R2—coefficient of determination

Fig. 4
figure 4

Within population venom proteomic abundance comparisons between male and female Hadrurus revealed weak agreements for proteins detected in the venom of both individuals. Solid lines represent a correlation of 1 and dashed lines represent the line of best fit. Shaded regions correspond to 95% confidence intervals for the true linear regression line. Abbreviations: clr—centered logratio transformation, n—number of proteins, \(\rho \)—Spearman’s rank correlation coefficient, R—Pearson’s correlation coefficient, and R2—coefficient of determination

Neurotoxic \(\alpha \)-Potassium-Channel Toxins are Under Strong Diversifying Selection

Not only were the \(\alpha \)KTxs one of the more abundant toxin families in Hadrurus venoms, but they were also the most diverse, with 24–38 \(\alpha \)KTxs identified in the venom of each of our four Hadrurus populations (Table 2). After clustering the \(\alpha \)KTxs from the three H. arizonensis populations at 98% to remove nearly identical sequences, we identified a total of 78 \(\alpha \)KTxs across all four Hadrurus populations for use in our phylogenetic and selection analyses. However, one \(\alpha \)KTx (i.e., Hspad_aKTx-28; Supplemental Table 1) was significantly longer than all other \(\alpha \)KTxs with an 81 amino acid precursor. Although this longer sequence may be the result of a sequencing or assembly error, we did not identify any chimeric or other properties that warranted us to remove it from our list of assembled toxins. As a precaution, we excluded it from our phylogenetic and selection analyses. A maximum likelihood phylogeny of the 77 remaining \(\alpha \)KTxs revealed three groups of 5–50 toxins that all displayed at least 55% sequence identity to each member of the respective group (Groups-1–3; Fig. 5).

Fig. 5
figure 5

Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom \(\alpha \)KTxs showing three toxin groups that share >55% identity. Branch lengths represent the number of nucleotide substitutions per codon site and ultrafast/nonparametric bootstrap support values (out of 100) for branches corresponding to each of the three groups are displayed

Fig. 6
figure 6

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom Group-1 \(\alpha \)KTxs and the corresponding multiple protein sequence alignment. Green branches represent nodes experiencing significant episodic diversifying selection (aBSREL). (B) Hadrurus venom Group-1 \(\alpha \)KTx codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by protein region where the green region represents alignment overlap between signal peptides and mature peptides

Fig. 7
figure 7

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom Group-2 \(\alpha \)KTxs and the corresponding multiple protein sequence alignment. Green branches represent nodes experiencing significant episodic diversifying selection (aBSREL). (B) Hadrurus venom Group-2 \(\alpha \)KTx codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by protein region where the green region represents alignment overlap between signal peptides and mature peptides

The Group-1 \(\alpha \)KTxs, which included a total of 50 sequences, contained precursors of 61–69 amino acids with 18–26 amino acid long signal peptides and at least 56.9% pairwise sequence identities to all other Group-1 \(\alpha \)KTxs (Fig. 6A and Supplemental Table 2). We detected significant gene-wide episodic diversifying selection (p < 0.001; BUSTED) across Group-1 \(\alpha \)KTxs. We also detected significant episodic diversifying selection across 22 of the 97 branches in the Group-1 \(\alpha \)KTx phylogeny (Fig. 6A). Of the 71 total codon sites in the Group-1 \(\alpha \)KTx alignment, we detected 19 sites under episodic diversifying selection (p < 0.05; MEME) and 13 sites under pervasive diversifying selection (p < 0.05; FEL), 12 of which were identified as significant in both tests (Fig. 6B). We also detected eight sites under pervasive purifying selection (p < 0.05; FEL, Fig. 6B). As a proportion of total \(\alpha \)KTx sites under selection, we identified a much stronger signal of diversifying selection (28.2% of all sites) relative to purifying selection (11.2% of all sites). Of the 20 total sites under episodic and/or pervasive diversifying selection, four were in the signal peptide or signal peptide-mature peptide overlapping regions (15.4% of all signal peptide sites) and 16 were in the mature peptide region (35.6% of all mature peptide sites), providing evidence that the mature peptide region of \(\alpha \)KTxs are experiencing stronger diversifying selection relative to the signal peptide region. More specifically, of the 19 sites under episodic diversifying selection, we identified four in the signal peptide and signal peptide-mature peptide overlapping region (8.9% of all sites) and 15 in the mature peptide (33.3% of all sites). Of the 13 sites under pervasive diversifying selection, we identified four in the signal peptide and signal peptide-mature peptide overlapping region (8.9% of all sites) and 9 in the mature peptide (20.0%). Of the eight sites under purifying selection, four were in the signal peptide or signal peptide-mature peptide overlapping regions (15.4% of all signal peptide sites) and four were in the mature peptide region (8.9% of all mature peptide sites), indicating that the signal peptide region may be under a proportionally stronger purifying selection pressure relative to the mature peptide region.

The Group-2 \(\alpha \)KTxs included a total of 14 toxins, all of which had precursors of 60–62 amino acids with a 25–27 amino acid long signal peptide (Fig. 7A) and displayed at least 80.6% pairwise sequence identities to all group members. We detected significant gene-wide episodic diversifying selection in the Group-2 \(\alpha \)KTxs (p = 0.014; BUSTED). We also detected one out of the 25 branches in the Group-2 \(\alpha \)KTx phylogeny under episodic diversifying selection (Fig. 7A). Our site-specific tests of selection revealed one codon site under episodic diversifying selection in the mature peptide region and two sites under pervasive purifying selection in the signal peptide region (Fig. 7B). Conversely, the Group-3 \(\alpha \)KTxs, which only included five toxins with at least 95.7% pairwise sequence identities to all members, displayed no evidence for any diversifying or purifying selection.

Small Alignment Size Limits Detectability of Codon Site Diversifying Selection in \(\alpha \)-Potassium-Channel Toxins

To test for an effect of alignment size on our ability to detect diversfiying and purifying selection in Hadrurus venom \(\alpha \)KTxs, we tested for selection on 25 randomly re-sampled subsets of five \(\alpha \)KTxs each. We detected significant gene-wide episodic diversifying selection (BUSTED) in 21 of the 25 samples and at least one branch under episodic diversifying selection (aBSREL) in 22 of the 25 samples. However, we only identified an average of 0.8 sites under episodic diversifying selection (MEME) and 0.6 sites under pervasive diversifying selection (FEL). Conversely, we detected a larger signal of purifying selection with each sample displaying an average of 3.2 sites under pervasive purifying selection (FEL). Although we detected gene-wide and branch-site episodic diversifying selection in nearly all samples, our results demonstrate greater signals of codon-site purifying selection in \(\alpha \)KTxs when testing for selection with small alignment sizes, the opposite of what we observed in the Group-1 \(\alpha \)KTxs. Data on \(\alpha \)KTx re-sampled subsets, including the average pairwise sequence identity and selection analysis, is provided in Supplemental Table 3.

Evidence for Episodic Diversifying Selection of Non-disulfide Bridge Peptides

In addition to being one of the most abundant putative toxin families, NDBPs were also the second most diverse putative toxin family in all Hadrurus populations, with each population containing 8–12 sequences (Table 2). After clustering H. arizonensis NDBPs, we identified 22 total NDBPs from all Hadrurus populations for inclusion in our selection analyses. A maximum likelihood phylogeny of the 22 NDBPs revealed members of three of the five recognized scorpion venom NDBP subfamilies (i.e., NDBP2, NDBP3, and NDBP4; Fig. 8), as defined by Almaaytah and Albalas (2014).

The most diverse NDBP subfamily in Hadrurus venom was the NDBP4 subfamily of short-chain amino acids. We identified ten members of the NDBP4 subfamily, two of which contained mature peptides of 18 amino acids and eight that contained mature peptides of 12–13 amino acids. As the eight NDBPs with 12–13 amino acid mature peptides all showed at least 62.9% pairwise sequence identities, but were highly divergent from those with 18 amino acid mature peptides (Fig. 8), we only tested for selection on those with 12–13 amino acid mature peptides. In addition to their 12–13 amino acid mature peptides, these short-chain NDBPs all contained 68–71 amino acid precursors, a 23 amino acid N-terminal signal peptide, and a C-terminal 32–35 amino acid propeptide with a glycine-lysine-arginine (GKR) propeptide processing signal (Fig. 9A). Although we did not detect any evidence for gene-wide episodic diversifying selection in the NDBP4 subfamily (p = 0.353; BUSTED), we did detect significant episodic diversifying selection across two of the 13 branches in our NDBP4 phylogeny (aBSREL; Fig. 9A). We also detected evidence for site-specific episodic diversifying selection at one codon site in the mature peptide region of our NDBP4 subfamily alignment (MEME; Fig. 9B), representing 7.7% of all mature peptide sites. We did not, however, detect any sites under pervasive diversifying selection (FEL; Fig. 9B) in the NDBP4 subfamily, which is consistent with the two-speed model of Sunagar and Moran (2015). Furthermore, we detected three sites under pervasive purifying selection, one in the signal peptide region and two in the propeptide region (FEL; Fig. 9B). We also did not detect any pervasive purifying selection in the mature peptide region. Therefore, our results provide evidence that while the mature peptide region is under stronger episodic diversifying selection, the signal peptide and propeptide regions are experiencing stronger pervasive purifying selection.

Fig. 8
figure 8

Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom non-disulfide brdige peptides (NDBPs) highlighted by NDBP subfamily. Hadrurus toxins in the NDBP4 subfamily consisted of two groups, including short-chain NDBPs with 18 amino acid (aa) or 13 amino acid mature peptides. Branch lengths represent the number of nucleotide substitutions per codon site and ultrafast/nonparametric bootstrap support values (out of 100) for branches corresponding to each subfamily

Fig. 9
figure 9

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom NDBP4 subfamily of short-chain toxins and the corresponding multiple protein sequence alignment. Branch lengths represent the number of nucleotide substitutions per codon site. Green branches represent nodes experiencing significant episodic diversifying selection (aBSREL). (B) Hadrurus venom NDBP4 subfamily codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by signal peptide, mature peptide, and propeptide regions

Fig. 10
figure 10

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom NDBP3 subfamily of medium-chain toxins and the corresponding multiple protein sequence alignment. Branch lengths represent the number of nucleotide substitutions per codon site. Green branches represent nodes experiencing significant episodic diversifying selection (aBSREL). (B) Hadrurus venom NDBP3 subfamily codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by signal peptide, mature peptide, and the expected propeptide regions

We also tested for selection on members of the NDBP3 subfamily of medium-chain NDBPs. We identified five members of the NDBP3 subfamily, all of which had 74–78 amino acid long precursors, 23 amino acid long signal peptides, and at least 55.1% pairwise sequence identities to all subfamily members (Fig. 10A). Although we were able to hypothesize an expected arginine-lysine-arginine (RKR) propeptide cleavage site for one member of the NDBP3 subfamily (Hariz-LPC_NDBP-4) based on homology to previously described scorpion venom NDBPs, we did not identify any propeptide cleavage sites for the other members of this subfamily (Fig. 10A). We detected significant gene-wide episodic diversifying selection in the NDBP3 subfamily (p < 0.001; BUSTED). We also detected evidence for episodic diversifying selection across one of the seven branches in our NDBP3 subfamily phylogeny (aBSREL; Fig. 10A). Our site-specific analyses of selection showcased one site in the mature peptide region of the NDBP3 subfamily alignment under episodic diversifying selection (MEME; Fig. 10B). However, similar to the NDBP4 subfamily, we did not detect any sites under pervasive diversifying selection, consistent with the two-speed model. We also detected three sites under pervasive purifying selection, one in the signal peptide region, one in the mature peptide region, and one of which may be in the propeptide region (Fig. 10B). Finally, although we did detect seven members of the NDBP2 subfamily of long-chain NDBPs (Fig. 8), these toxins displayed high pairwise sequence divergence. Therefore, we did not test for selection acting on any members of the NDBP2 subfamily.

Hadrurus Toxins Under Stronger Purifying Selection

Scorpine-Like 2 Antimicrobial Peptides

We observed four scorpine-like antimicrobial peptides (AMPs) in the venom of each Hadrurus population (Table 2). After clustering AMPs between H. arizonensis populations at 98% sequence identity to remove nearly identical sequences, we identified a total of eight AMPs for use in our selection analyses. Of these AMPs, two showed homology to scorpine-like 1 AMPs (SLP1s) and six showed homology to scorpine-like 2 AMPs (SLP2s). In our maximum likelihood phylogeny for the eight identified AMPs, all six SLP2 toxins showed at least 59.9% pairwise sequence identities to all other SLP2s and contained 94–103 amino acid precursors with 19 amino acid signal peptides (Fig. 11A). In our selection analyses, however, we did not identify any gene-wide episodic diversifying selection (p = 0.478; BUSTED) or any branches from a SLP2-only phylogeny under episodic diversifying selection (aBSREL). We did identify one codon site in the SLP2s under episodic diversifying selection, but this site was in the signal peptide region (MEME; Fig. 11B). Conversely, we identified seven sites in the SLP2 mature peptide region experiencing pervasive purifying selection (FEL; Fig. 11B). Therefore, of the 84 codon sites in the mature peptide region of our SLP2 alignment, we detected purifying selection across approximately 8.3% of mature peptide sites, suggesting that these SLP2s are tending towards stronger purifying selection pressures.

Phospholipase A2s

We identified 3–5 phospholipase A2 (PLA2) toxins in the venom of all four Hadrurus populations (Table 2). Although these PLA2s were expressed at relatively low abundances in both the male (0.6-\(-\)2.6%; 1.5% on average) and female (1.1-\(-\)8.6%; 4.2% on average) transcriptomes, they were expressed at slightly higher abundances in the venom proteomes of both male (0.7-\(-\)6.9%; 4.3% on average) and female (2.9-\(-\)9.3%; 6.3% on average) Hadrurus (Fig. 2). After clustering the PLA2s within H. arizonensis, we identified a total of ten PLA2s in the venom of our four Hadrurus populations. However, these PLA2s showed high levels of sequence divergence with only one group of four sequences having high enough pairwise sequence identities to be included in our selection analyses (Fig. 12A). These four PLA2s all showed at least 75.9% pairwise sequence identities to each other, and contained 242–244 amino acid precursors with 19–21 amino acid signal peptides. Although we did not identify any gene-wide episodic diversifying selection (p = 0.500; BUSTED), we did observe two out of five branches from a phylogeny of these four PLA2s were under significant episodic diversifying selection (Fig. 12B; aBSREL). Our site-specific selection analyses did not reveal any codon sites under episodic diversifying selection (MEME; Fig. 12C). However, our analyses did detect six sites under pervasive purifying selection in the PLA2 mature peptide region (2.7% of all mature peptide sites), suggesting that Hadrurus venom PLA2s are experiencing stronger purifying selection relative to diversifying selection (FEL; Fig. 12C).

Fig. 11
figure 11

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom antimicrobial peptides (AMPs) with Scorpine-like 1 (SLP1) and Scorpine-like 2 (SLP2) peptides highlighted in gray and orange, respectively. Branch lengths represent the number of nucleotide substitutions per codon site and the ultrafast/nonparametric bootstrap support values (out of 100) for the branch corresponding to the SLP2s is provided. (B) Hadrurus venom SLP2 codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by signal peptide and mature peptide regions

Fig. 12
figure 12

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom phospholipase A2 toxins (PLA2s). PLA2s with >55% sequence similarity were used in the selection analyses and are highlighted in orange. Branch lengths represent the number of nucleotide substitutions per codon site and the ultrafast/nonparametric bootstrap support values (out of 100) for the branch corresponding to the PLA2s used in the selection analyses is provided. (B) Midpoint rooted maximum likelihood IQ-TREE of Hadrurus venom PLA2s used in the selection analyses. Green branches represent nodes experiencing significant episodic diversifying selection (aBSREL). (C) PLA2 codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by signal peptide and mature peptide regions

Serine Proteases

We also identified 4–5 serine proteases (SP) in the venom of each Hadrurus population (Table 2). Similar to the PLA2s, the SPs were expressed at relatively low abundances in venom-gland transcriptomes of both the males and females. These SPs contributed 2.4-\(-\)2.8% (2.5% on average) and 1.3-\(-\)6.1% (3.1% on average) of the total toxin transcript abundance in the males and females, respectively. However, SPs were expressed at higher abundances in the venom proteomes of both the males and the females at 6.8-\(-\)13.8% (8.8% on average) and 4.3-\(-\)15.3% (8.9% on average), respectively. After clustering H. arizonensis SPs, we identified 11 SPs, five of which contained at least 56.8% pairwise sequence identities with each other (Fig. 13A). These five SPs contained 298–303 amino acid precursors with 16–25 amino acid signal peptides. We did not identify any gene-wide episodic diversifying selection across these five SPs (p = 0.063; BUSTED). We also did not detect any episodic diversifying selection across any branches of a phylogeny generated from the five SPs (aBSREL). Our site-specific selection analyses did reveal one codon site in the SP mature peptide region under episodic diversifying selection (MEME; Fig. 13B). Conversely, we identified 39 sites in the mature peptide region under pervasive purifying selection (13.7% of all mature peptide sites), providing strong evidence that these SPs are primarily under purifying selection. The stronger purifying selection signals in Hadrurus venom SPs, along with similar signals in the SLP2s and PLA2s, provides a stark contrast to the strong diversifying selection signals detected in the \(\alpha \)KTxs and, to a lesser extent, the NDBPs.

Fig. 13
figure 13

(A) Midpoint-rooted maximum likelihood IQ-TREE of Hadrurus venom serine proteases (SPs). SPs with >55% sequence similarity were used in the selection analyses and are highlighted in orange. Branch lengths represent the number of nucleotide substitutions per codon site and the ultrafast/nonparametric bootstrap support values (out of 100) for the branch corresponding to the SPs used in the selection analyses is provided. (B) Hadrurus venom SP codon sites experiencing episodic diversifying selection via MEME (top) and pervasive diversifying or pervasive purifying selection via FEL (bottom). Dotted red lines indicate significant thresholds of p < 0.05. Codon sites are highlighted by protein region where the green region represents alignment overlap between signal peptides and mature peptide regions

Fig. 14
figure 14

We observed weak mRNA transcript and protein abundance comparisons within individuals across each Hadrurus population. Solid lines represent a correlation coefficient of one, longer dashed lines illustrate the line of best fit, and shaded regions show the 95% confidence interval for the true linear regression line. Abbreviations: clr—centered logratio transformation, n-number of transcripts, \(\rho \)—Spearman’s rank correlation coefficient, R—Pearson’s correlation coefficient and R2—coefficient of determination

Transcript and Protein Abundances Show Weak Agreement Within Individuals

We also observed a discrepancy between venom-gland transcriptome and venom proteome abundances in several major Hadrurus toxin families. For example, ion-channel toxins, La1s, and NDBPs were detected at higher average abundances in the transcriptomes, while AMPs, peptidases, PLA2s, and SPs were observed at higher average abundances in the proteomes (Fig. 2). Furthermore, we detected weak positive correlations between venom-gland transcript and venom protein abundances of individual toxins within all male Hadrurus (\(R^2 =\) 0.10-\(-\)0.25; Fig. 14). We observed similar weak positive correlations in transcript and protein abundances of individual toxins within all female Hadrurus (\(R^2 =\) 0.10-\(-\)0.34; Fig. 14).

Other Toxins Identified from Hadrurus Venoms

We also identified venom proteins from several other toxin families often found in scorpions (Ca2+-channel toxins, cysteine-rich secretory proteins, La1s, lipases, Na+-channel toxins, nucleotidases, peptidases, protease inhibitors, and VPs (Table 2), but because these toxins were (1) observed at lower diversities and/or abundances or (2) displayed high levels of sequence divergence, they were not included in our selection analyses. Peptidases (primarily peptidase family M2 angiotensin converting enzymes), La1s, and VPs were observed at particularly high abundances in both venom-gland proteomes and/or transcriptomes (Fig. 2). The VPs were an exceptionally diverse group, with each Hadrurus population containing 39–65 unique sequences (Table 2). Finally, we identified a diversity of other low abundance toxins, including both enzymatic and non-enzymatic proteins, all of which are reported in Table 2 and Supplemental Table 1.

Discussion

Contrasting Selection Patterns Across Hadrurus Venoms

As members of one of the more ancient venomous lineages, scorpions maintain venoms that are thought to be under strong purifying selection, but may experience periods of episodic diversification with shifting ecological conditions (e.g., diet and range expansion; Sunagar and Moran 2015). We identified three Hadrurus toxin families (PLA2s, SLP2s, and SPs) experiencing predominant purifying selection, and one toxin family (NDBPs) with signals of episodic diversifying selection, but no pervasive diversifying selection, all of which are consistent with the two-speed model of venom evolution proposed by Sunagar and Moran (2015). However, we also identified one Hadrurus toxin family (\(\alpha \)KTxs) experiencing stronger signals of both episodic and pervasive diversifying selection compared to purifying selection, the latter comparison of which has not been observed in scorpion venoms. Even in cases where a strong influence of episodic diversifying selection was observed in scorpion venoms, such as in CS\(\alpha \)/\(\beta \) ion-channel modulating toxins, a larger proportion of toxin sites were still reported to be under purifying selection (Sunagar et al. 2013). However, we observed more than twice as many Hadrurus venom Group-1 \(\alpha \)KTx sites under significant episodic and/or pervasive diversifying selection compared to purifying selection, which, in conjunction with significant gene-wide and branch-site episodic diversifying selection across \(\alpha \)KTxs, provides evidence for a group of scorpion toxins under more rapid diversification than previously expected. We also observed evidence for significant episodic diversifying selection across Hadrurus venom NDBPs from the NDBP4 and NDBP3 subfamilies, although the difference in the number of total sites under diversifying versus purifying selection was minimal. Previous tests of selection on scorpion venom NDBPs suggest they are governed primarily by purifying selection with some evidence of episodic diversifying selection (Sunagar et al. 2013), and our results support this finding. Furthermore, our results provide evidence that the mature peptide region in the Hadrurus venom \(\alpha \)KTxs and NDBPs is under stronger episodic diversifying selection relative to the signal peptide and/or propeptide regions. Likewise, signals of purifying selection in both the \(\alpha \)KTxs and NDBPs (particularly the NDBP4 subfamily) are more pronounced in the signal peptide and/or propeptide regions, suggesting that tests of selection across more than just the mature peptide sequence could result in over-estimations of purifying selection. Similar mature and propeptide selection patterns have also been observed in disulfide-rich peptides from Mygalomorphae and Araneomorphae spiders (Shaikh and Sunagar 2023). As the venom of several younger lineages (e.g., snakes and Toxicofera lizards) encompass toxins with significantly longer protein sequences compared to scorpions and spiders, selection analyses on these older lineages that include the signal peptide and/or propeptides will likely lead to relatively larger underestimations of diversifying selection.

Scorpion venom \(\alpha \)KTxs, which target voltage-gated potassium ion channels, are thought to be a major component of venom neurotoxicity (Zhu et al. 2011; Quintero-Hernández et al. 2013). Along with their apparent toxic functions, \(\alpha \)KTxs are one of the most abundant and diverse toxin families in Hadrurus venoms (Rokyta and Ward 2017; Santibáñez-López et al. 2019), indicating they likely play major roles in prey subjugation and/or predator defense, providing a coevolutionary basis for their rapid diversification. Scorpion venom NDBPs, which were also one of the most abundant and diverse toxin families in Hadrurus, have shown a more diverse set of mostly cytotoxic functions, including bradykinin-potentiating, hemolytic, and immune modulating activities (Almaaytah and Albalas 2014). NDBPs from Hadrurus venoms have also shown effective antimicrobial activity (Torres-Larios et al. 2000; Trentini et al. 2017), indicating they could be important for regulating the internal venom-gland microbiome (Gao et al. 2007). Some scorpion species are also thought to spray themselves with their own venom, which could function to regulate their external microbiome (Torres-Larios et al. 2000), tend to wounds (Gao et al. 2007), or as a form of a maternal care by female scorpions carrying offspring on their backs (Polis and Sissom 1990). Self-spraying of venom for antimicrobial purposes could suggest an opportunity for new coevolutionary selection pressures between scorpion hosts and their potential pathogens. During electrostimulation of Hadrurus venom glands we noticed that Hadrurus possess the mechanical capability to spray venom, but additional behavioral studies would be needed to confirm any hypothesized function for venom spraying. Although their exact function is unclear, the diversity of potential functions for Hadrurus venom NDBPs provides the opportunity for a complex set of selection pressures, which could help explain the episodic diversifying selection underlying the evolution of this toxin family.

In contrast to the strong signals of episodic and pervasive diversifying selection observed in \(\alpha \)KTxs and the episodic diversifying selection in the mature peptide of the NDBPs, the toxin families observed under predominant purifying selection (PLA2s, SLPs, and SPs) were detected at lower average abundances across Hadrurus venoms, indicating they play more ancillary roles in overall venom function. Although PLA2s and SLPs could manifest neurotoxicity (Harrison et al. 2014; Krayem and Gargouri 2020), SPs are likely involved in cytotoxicity (Almeida et al. 2002; Brazón et al. 2014). Furthermore, SPs could also be involved in prey digestion and/or post-translational processing where purifying selection could be important for maintaining current function. As power to detect diversifying selection with MEME decreases with small alignment sizes (Murrell et al. 2012), we recognize that the small number of sequences included in our selection analyses on PLA2s, SLPs, and SPs, along with the Group-3 aKTxs and toxins of the NDBP4 subfamily, could have impacted our ability to detect diversifying selection in these toxin groups. We attempted to test for an effect of alignment size in the Group-1 \(\alpha \)KTxs and found that although gene-wide and branch-site episodic diversifying selection were detected in nearly all re-sampled subsets of five \(\alpha \)KTxs, codon site diversifying selection was less prevalent than expected. However, our inability to detect gene-wide episodic diversifying selection in the PLA2s, SLP2s, and SPs, and branch-site episodic diversifying selection in the SLP2s and SPs, indicates that the small alignment sizes are not the only reason we were unable to detect diversifying selection in these toxin families. MEME also performs conservatively with low levels of pairwise sequence divergence (Murrell et al. 2012), which could explain why we were unable to detect any diversifying selection in the Group-3 \(\alpha \)KTxs. Furthermore, we used FEL to detect pervasive diversifying and purifying selection in Hadrurus venoms, but the work by Sunagar et al. (2013) used the Fast, Unconstrained Bayesian AppRoximation (FUBAR; Murrell et al. 2013). Although FUBAR is less conservative for detecting pervasive diversifying and purifying selection (Murrell et al. 2013), we opted to use FEL because it involves a comparable maximum likelihood approach with similar assumptions to MEME (Murrell et al. 2012). Methodological differences in the detection of purifying selection could have contributed to differences between our results and those of Sunagar et al. (2013), but they are unlikely to have drastically increased the number of sites we detected under purifying selection (Murrell et al. 2013).

In their two-speed model, Sunagar and Moran (2015) suggested that venom proteins rapidly diversify in organisms experiencing major ecological shifts (e.g., range and diet expansion), a phenomenon thought to be ongoing in the younger venomous lineages. Contrarily, organisms from older venomous lineages likely experienced rapid diversification of toxins early in their evolutionary history, but those fine-tuned toxins are now under predominant purifying selection to maintain toxic function. As previously discussed, Sunagar and Moran (2015) posited that venoms from the older lineages may re-enter episodic bouts of diversifying selection with new ecological shifts, which has been further supported by the strong influence of episodic diversifying selection observed in more recent studies on spider venoms (Brewer and Cole 2023; Shaikh and Sunagar 2023). Similar to studies on episodic diversifying selection in scorpions and other ancient lineages, these recent observations in spider venoms resulted from testing for selection at more diverse phylogenetic scales. Our results within a single genus of scorpions provide evidence for strong episodic diversifying selection in a more recently diverged group of arachnids. On first glance, this strong episodic diversifying selection seems to be in support of the two-speed model, suggesting that these recently diverged scorpions are undergoing ecological shifts in diet or geographic distribution. However, we also observed strong pervasive diversifying selection relative to pervasive purifying selection in one of the most prominent toxin families in Hadrurus venoms (Group-1 \(\alpha \)KTxs), indicating that our results for at least one major toxin family are inconsistent with the two-speed model.

In general, a large portion of the toxins observed in venoms represent paralogous sequences (Zancolli and Casewell 2020). Therefore, our detection of selection across Hadrurus toxins is, in large part, a result of selection across paralogs. By clustering toxins within species, we ensured that we only included unique paralogs within each of our two Hadrurus species. This would suggest that our analyses represent intraspecific and/or interspecific tests of selective pressures within and between Hadrurus species. As we tested for selection across toxin gene families and not individual orthologs between H. arizonensis and H. spadix, the bulk of any signal for selection in our gene-family phylogenies likely represents divergence among paralogs. For example, the Group-1 \(\alpha \)KTxs did not separate into distinct clades for each of our two Hadrurus species. Although we observed multiple clades containing only within-species paralogs, we also observed many clades containing both H. arizonensis and H. spadix sequences. In addition, many of the longer branches in our phylogeny represent divergence between paralogs, while the shallower branches often lead to orthologs among the two species. The origins of many of these paralogs clearly predate the split between H. spadix and H. arizonensis, highlighting that our analyses integrate the signal for selection over the history of each toxin gene family and represent more than just a test of selection within and between our two Hadrurus species. Much of the signal for selection across entire venom gene families can be captured with sampling from one or a few species, an approach that has often been used to test for selection across venom genes (Gibbs and Rossiter 2008; Dutertre et al. 2014; Sunagar et al. 2014). By sampling multiple individuals from multiple populations across the H. arizonensis range, which should all have approximately the same set of toxin loci, we not only generated a more complete set of paralogous toxins than would likely be possible with a single transcriptome, but also generated a more complete set of toxins orthologous to those from other scorpion families. Our observation of strong pervasive diversifying selection in the \(\alpha \)KTxs, which encompasses selection predating at least the split of H. arizonensis and H. spadix, provides a contrast to the two-speed model of Sunagar and Moran (2015). Furthermore, the proportion of \(\alpha \)KTx sites under episodic and pervasive diversifying selection in Hadrurus venoms is consistent with those found in the younger snake and cone snail lineages and more than double the rates identified in the “intermediate” lineage of Toxicofera lizards. (Sunagar et al. 2012; Dutertre et al. 2014; Sunagar and Moran 2015). Sunagar and Moran (2015) did suggest that the predatory and defensive role of venom also plays a part in their evolution, which could explain the weaker diversifying selection in the Toxicofera lizards relative to snakes and cone snails. Our results support this idea; the more abundant toxins with clear roles in prey capture and defense (e.g., neurotoxic \(\alpha \)KTxs) are under stronger diversifying selection, while the lower abundant toxins with less significant roles in prey capture and defense (e.g., SPs) are dominated by purifying selection.

Furthermore, testing for selection on venoms across diverse phylogenetic scales could limit the ability to effectively test for diversifying selection as repeated substitutions at the same sites become indistinguishable (Anisimova et al. 2001). In addition, trench warfare coevolution, where phenotypes between coevolving species flip back and forth between optima, is thought to be a driving evolutionary force in other venomous species, such as the coevolution between the Brazilian pit viper (Bothrops jararaca) and their Didelphini possum prey (Drabeck et al. 2022). If this phenotype matching occurs in scorpions and results in venom protein codon sites switching between the same few amino acids, then strong coevolutionary pressures and subsequent diversifying selection could be more difficult to detect. Consequently, strong signals of purifying selection at more diverse phylogenetic scales, such as those observed in work on ancient venomous lineages (Sunagar et al. 2013; Sunagar and Moran 2015; Brewer and Cole 2023; Shaikh and Sunagar 2023), could therefore be masking signals of diversifying selection.

Although our understanding of venoms from the younger venomous lineages, where diversifying selection is ubiquitous, has materialized from studies on more closely related species (Dutertre et al. 2014; Sunagar et al. 2014; Aird et al. 2015; Dashevsky and Fry 2018), comparable studies at similar phylogenetic scales are lacking for the ancient venomous lineages. Our observation of both strong episodic and pervasive diversifying selection on the more abundant toxins within a genus of scorpions, which is in accordance with expected selection patterns of the younger venomous lineages, indicates that the selection mechanisms underlying the evolution of some scorpion venoms is more ambiguous than previously suspected under the two-speed model. We recognize that unlike the single toxin family identified under strong diversifying selection in Hadrurus venoms from our study, previous work on younger lineages has described a majority of toxin families under strong diversifying selection (Sunagar and Moran 2015). Nonetheless, our results emphasize that additional, comparable sampling at lower taxonomic levels in the ancient venomous lineages is needed to accurately test the two-speed venom evolution model of Sunagar and Moran (2015) and unravel the molecular dynamics underlying the evolution of scorpion venoms.

Discrepancies in Toxin Expression

Although we observed AMPs and NDBPs expressed at disproportionate abundances and weak or no agreements in venom-gland transcript abundance and venom protein abundances between male and female Hadrurus, we did not identify any significant sex-based variation in overall venom composition. However, sex-based venom variation is not uncommon, having been observed across other venomous lineages, including centipedes (Nystrom et al. 2019), snakes (Furtado et al. 2006; Menezes et al. 2006), and spiders (Herzig et al. 2008; Binford et al. 2016). As previously discussed, sex-based variation in venom has also been observed in scorpions, with some of this variation leading to sex-based differences in prey toxicity (De Sousa et al. 2010; Miller et al. 2016; Ward et al. 2018a; Olguín-Pérez et al. 2021). Our inability to detect a sex-based difference in Hadrurus venoms could have stemmed from our smaller sample size, which was relatively smaller than those where a sex-based variation in scorpion venom was detected (Miller et al. 2016; Ward et al. 2018a; Olguín-Pérez et al. 2021). Therefore, we recommend additional sampling of male and female Hadrurus to determine if the expression differences in AMPs, NDBPs, and, to a lesser extent \(\alpha \)KTxs, are the result of a sex-based variation in venom.

We also observed discrepancies in toxin transcriptomic and proteomic expression, a well documented phenomenon in scorpion venoms (Zhang et al. 2015; Rokyta and Ward 2017; Ward et al. 2018b; Romero-Gutiérrez et al. 2018; Nystrom et al. 2023). These discrepancies generally stem from proteomic detectability challenges associated with small molecular weight toxins, such as ion-channel toxins and NDBPs. Several plausible explanations exist for these toxin expression disparities in scorpion venoms. First, many small molecular weight scorpion toxins (e.g., NDBPs) require substantial post-translational modifications (Zeng et al. 2005; Delgado-Prudencio et al. 2019), which led Rokyta and Ward (2017) to suggest that these post-translational modifications, in conjuction with inconsistent LC-MS/MS-detection capabilities for trypsin digested peptides, could be making mature toxins difficult to detect. Secondly, asynchronous regeneration of venom components has been observed in scorpion venoms (Pimenta et al. 2003; Nisani et al. 2012; Díaz-García et al. 2019; Carcamo-Noriega et al. 2019), but we recognize that this may not always be the case (Nystrom et al. 2022). If Hadrurus scorpions regenerate venom components asynchronously, then performing RNA-seq on venom-glands at a single time point (i.e., four days post-venom extraction) could hinder our ability to proteomically detect specific toxins. Finally, some scorpions can expel venom heterogenously, (Inceoglu et al. 2003; Abdel-Rahman et al. 2009), suggesting that inconsistent venom extraction techniques could result in drastic differences in transcript versus protein expression if some toxins (i.e., NDBPs) are expelled at different rates. Ultimately, these discrepancies underscore the importance of not only improving venom extraction and proteomic detection methods, but also performing venom characterizations with combined proteomic and transcriptomic approaches.

Comparison to Other Scorpion Venom Compositions

Our venom-gland transcriptomic and venom proteomic characterizations showcased a diverse set of toxins similar to those seen in other scorpion venoms from the family Hadruridae, including H. concolorous, H. aztecus, and H. gertschi (Schwartz et al. 2007; Rokyta and Ward 2017; Santibáñez-López et al. 2019). In addition, our venom characterization of H. spadix showed strong resemblance to the original H. spadix venom characterization of Rokyta and Ward (2017). Compared to Rokyta and Ward (2017), we detected 16 additional proteomically confirmed toxins in H. spadix, most of which were \(\alpha \)KTxs previously only detected by homology.

We observed an abundance of ion-channel modulating toxins in Hadrurus venoms, although these toxins were predominantly K+-channel modulating toxins (i.e., \(\alpha \)KTxs). K+-channel modulating toxins are widespread across all scorpion venoms (Cid-Uribe et al. 2020). Conversely, NaTxs are observed at much higher diversities and abundances in Buthidae scorpion venoms (Cid-Uribe et al. 2020), indicating that the scarcity of NaTxs in Hadrurus venoms is not surprising. Relative to other ion-channel modulating toxins, those that affect CaTxs are less abundant in scorpion venoms. However, most scorpions have at least 1–2 CaTxs (Cid-Uribe et al. 2020), similar to what we observed in Hadrurus.

NDBPs, which were one of the most abundant and diverse venom proteins in Hadrurus, are also present in most scorpion venoms (Cid-Uribe et al. 2020). Although we detected toxins from three of the five NDBP subfamilies (Almaaytah and Albalas 2014), those from the NDBP2 subfamily of long-chain peptides and NDBP4 subfamily of short-chain peptides showed exceptional sequence divergence. As previous NDBP classifications were assigned using relatively few known sequences (Zeng et al. 2005; Almaaytah and Albalas 2014), the high sequence divergence within NDBP subfamilies from Hadrurus, as well as other scorpions (Quintero-Hernández et al. 2015; Romero-Gutierrez et al. 2017; Nystrom et al. 2023), indicates that this putative toxin family may be in need of renewed subfamily classifications.

Although Hadrurus venom was not as enzyme-dominated as some scorpions species, such as Diplocentrus whitei (Nystrom et al. 2023), we did identify peptidases at high protoemic abundances in both H. spadix and H. arizonensis. However, this was expected as Rokyta and Ward (2017) already detected peptidases at similar abundances in H. spadix. More specifically, Peptidase M2 enzymes, such as those from Tityus serrulatus, are thought to induce hypertension in sting victims (Duzzi et al. 2021). Although their high proteomic abundance could suggest they are important for overall Hadrurus venom function, that function is still unclear. We also identified serine proteases at moderate abundances in venom proteomes, although they were not nearly as abundant as those seen in D. whitei (Nystrom et al. 2023).

Persistent research continues to demonstrate scorpion venoms as richer sources of enzymatic toxins than previously thought and it seems that all scorpion venoms may possess a “core” set of these enzymatic toxins (Delgado-Prudencio et al. 2022). Using high-throughput proteomic and/or transcriptomic data from 14 species across six of the 23 recognized scorpion families (i.e., Buthidae, Euscorpiidae, Hadruridae, Superstitionidae, Urodacidae, and Vaejovidae; Santibáñez-López, 2023 In Press), Delgado-Prudencio et al. (2022) identified a set of 24 core enzymes present in each venom. Of the 24 core enzymes, we detected apparent evidence for 12 in at least one Hadrurus population, including AChE, acid phosphatase (HistP), CarbAn, carboxypeptidase E, chitinase, HYAL, lysozyme (lysozyme C), neprilysin-2 (PeptidaseM13), peptidyl-dipeptidase A (Peptidase M2), PLA2, SupDismutase, and receptor protein-tyrosine kinase (TKTR). Along with the PLA2s, AChE likely plays a role in neurotoxicity (Delgado-Prudencio et al. 2022), although the low abundance in Hadrurus venom suggests that it may not contribute a major functional role. HYALs are commonly regarded as spreading agents in scorpion venoms (Oliveira-Mendes et al. 2019) and Delgado-Prudencio et al. (2022) suggested that acid phosphatases, such as HistP, may also act as spreading agents. As other venom acid phosphatases serve as venom allergens, HistP could also be responsible for a similar functional role (Barboni et al. 1987). Carboxypeptidase E, along with another non-core enzyme (PaHM), is critical to the C-amidation of NDBPs (Delgado-Prudencio et al. 2019). Delgado-Prudencio et al. (2022) suggested that receptor protein-tyrosine kinases and lysozymes, such as lysozymeC, could also be involved in activating other venom components. Next, CarbAn and SupDismutase could be involved in toxicity by producing harmful reactive oxygen species in prey, similar to L-amino acid oxidases from snake venom (Guo et al. 2012). However, Delgado-Prudencio et al. (2022) suggested that because venoms are susceptible to reactive oxygen species, these enzymes may assist with venom preservation. We also observed chitinases in Hadrurus venoms, which could play a role in digestion of chitin-containing invertebrate prey. Finally, Delgado-Prudencio et al. (2022) suggested that PeptidaseM13s play some multifunctional role, but that role is still not well understood.

Although recent venom characterizations have brought new species and their toxins to light, scorpion venoms continue to exhibit a vast set of proteins for which we have no clear functional classification. The abundance and diversity of VPs detected across all Hadrurus used in this study is no different. Therefore, continued research on scorpion venoms will not only increase the number of known scorpion venoms toxins, but will also provide new opportunities for testing the molecular evolutionary dynamics underlying this unique venomous lineage.

Conclusions

We performed high-throughput venom-gland transcriptomic and venom proteomic characterizations for one male and one female individual from each of three H. arizonensis populations and one H. spadix population. We identified a diversity of proteins and peptides commonly found in scorpion venoms, including AMPs, ion-channel toxins, La1s, NDBPs, peptidases, PLA2s, and SPs. We also observed discrepancies in toxin expression between male and female Hadrurus, although this did not translate to any significant sex-based variation in venom. However, our low sample size could have affected our ability to detect a statistical difference, and we suggest additional sampling to more accurately determine if sex-biased venom gene expression is present in Hadrurus. Furthermore, we detected strong signals of episodic and pervasive diversifying selection across one of the most abundant toxin families (\(\alpha \)KTxs), which likely play a major role in overall venom function. We also detected stronger signals of pervasive purifying selection in three other toxin families (i.e., PLA2s, SLP2s, and SPs), which likely play more secondary roles in overall venom function. Although mostly in support of the two-speed model of venom evolution, the stronger signals of episodic and pervasive diversifying selection in \(\alpha \)KTxs from Hadrurus venoms provide a contrast to previous work detailing the predominant purifying selection acting on scorpion venoms, indicating that the evolutionary trends underlying scorpion venoms are more complex than previously anticipated.

Materials and Methods

Specimen and Sample Collection

We collected a total of six Hadrurus arizonensis scorpions from the southwestern United States, including two scorpions from each of three different localities. Scorpions were sexed by counting pectinal teeth (Stahnke 1945). Scorpions C0337 () and C0340 () were collected from Pima County (PC), Arizona, USA, in October of 2015. C1032 () and C1034 () were collected from La Paz County (LPC), Arizona, USA, in August of 2019. C0343 () and C0344 () represent individuals collected from Borrego Springs in San Diego County (SDC), California, USA in October of 2015. Venom extractions, venom processing, and venom-gland dissections were performed as previously described (Rokyta and Ward 2017; Ward et al. 2018b; Nystrom et al. 2023). Venom samples were centrifuged at 12,000 G for 3 min, frozen at \(-80^{\circ }\hbox {C}\), lyophilized, and stored at \(-80^{\circ }\hbox {C}\) until further use. We dissected venom-glands from each scorpion four days after venom extraction, submerged glands in ice-cold RNAlater, and incubated overnight at \(4^{\circ }\hbox {C}\) before storing at \(-80^{\circ }\hbox {C}\). Specimens were preserved in 95% ethanol at \(-80^{\circ }\hbox {C}\).

Map Figure

To generate a map of sampling localities for H. arizonensis and H. spadix used in this study, we plotted the corresponding GPS points in QGIS version 3.30 (QGIS Development Team, 2023). Approximate species’ ranges for H. arizonensis and H. spadix were also plotted in QGIS using reported distributions for each species from (Soleglad et al. 2011).

Venom Proteomics

Lyophilized Hadrurus venom samples were resuspended in quantitative mass spectrometry (LC-MS/MS) quality water and total protein content was quantified using the Nanodrop 2000c spectrophotemeter (Thermo Fisher Scientific Inc., Waltham, Massachusetts, USA). Trypsin (Promega Cat. No. V5111) digestion was performed on approximately 11 \(\mu \)g of each venom sample by the Florida State University’s (FSU) Department of Biological Science Core Facilities as previously described (Nystrom et al. 2023). Following the trypsin digest, samples were submitted to the FSU College of Medicine’s Translational Science Laboratory and LC-MS/MS was executed as previously described (Rokyta and Ward 2017; Ward et al. 2018b; Nystrom et al. 2023).

Venom peptide and protein identities were distinguished in each sample using Proteome Discoverer (version 2.5; Thermo Fisher Scientific Inc.) and Scaffold (version 5.1.2; Proteome Software Inc., Portland, OR, USA). First, the raw LC-MS/MS data was analyzed using Proteome Discoverer with percolator for peptide and protein validation and a custom FASTA database containing protein sequences for the respective venom-gland consensus transcriptome (see section 5.5 for transcriptome generation methodology). SequestHT was used as the search engine and run under the following: Trypsin as the enzyme name, dynamic modifications, a maximum peptide length of 144, a minimum peptide length of six, a maximum missed cleavage of two, a fragment mass tolerance of 0.2 Da, oxidation +15.995 Da(M), carbamidomethyl +57.021 Da(C), a maximum delta Cn of 0.05, and a precursor mass tolerance of 10 ppm. Next, the identities of the validated peptide and proteins were confirmed using Scaffold with the protein and peptide false discovery rates set a 1.0% and a minimum number of one recognized peptides. Peptide and protein abundances were estimated as described by Rokyta and Ward (2017) and Ward et al. (2018b). First, conversion factors for each replicate from each MS sample were calculated in Scaffold using the normalized spectral counts of three spike-in control proteins of known concentrations. More specifically, individual conversion factors were estimated using the best fit line of the of the spike-in control concentrations and their normalized spectral counts, with an intercept at the origin. Normalized spectral counts for each replicate in each sample were converted to concentrations using these conversion factors. Concentrations were averaged across all three replicates to provide a final concentration for each individual.

Venom-gland RNA-seq

Venom-gland RNA-seq was performed as previously described (Rokyta et al. 2012; Ward et al. 2018b; Nystrom et al. 2023). We extracted total RNA using a TRIzol (Invitrogen) and chloroform extraction. Total RNA content was quantified with a QubitRNA Broad-range kit (ThermoFisher Scientific) and quality checked with a RNA 6000 Pico Bioanalyzer chip (Agilent Technologies). Next, we used a NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs) with a fragmentation time of 15.5 min to generate isolated mRNA samples with average fragements sizes of about 370 bp. To produce cDNA libraries for sequencing from our mRNA samples, we used a NEBNext Ultra RNA Library Prep Kit with a High Fidelity 2 \(\times \) Hot Start PCR Mix. cDNA libraries were purified and size-selected using Agencourt AMPure XP beads. Samples were indexed using Illumina Multiplex Oligos (New England Biolabs). Average fragment size and library quality was observed with a High Sensitivity DNA Bioanalyzer chip (Agilent Tech.). Sequencing-ready libraries were quantified with KAPA PCR by the Florida State University (FSU) Molecular Cloning Facility and sequenced at the FSU College of Medicine’s Translational Science Laboratory with 150 paired-end (PE) reads on an Illumina HiSeq2500 (C0337, C0340, C0343, and C0344) or NovaSeq6000 system (C1032 and C1034).

Trancriptome Assembly and Analysis

Venom-gland transcriptomes were assembled as previously described Nystrom et al. (2023), with slight modifications. First, 150 PE reads were filtered and quality checked to remove any cross contamination between samples using FastQC (version 0.11.5; Andrews et al. 2010) and custon python scripts. Filtered reads were trimmed with Trim Galore (version 0.4.4; Krueger 2015), merged with PEAR (version 0.9.6; Zhang et al. 2014), and de novo assembled using a multi-assembly approach consisting of DNAStar NGen (version 12.3.1), Extender (version 1.04; Rokyta et al. 2012), and Trinity (version 2.4.0; Grabherr et al. 2011). Using multiple assemblers with differing algorithmic strategies have been shown to generate more comprehensive venom-gland transcriptomes (Holding et al. 2018). More specifically, Holding et al. (2018) demonstrated that recovery of a more complete set of nontoxin and toxin transcripts was achieved with a combination of Trinity, SeqMan NGen, and Extender, which is the basis for the approach we applied. We then filtered out contigs from three assemblers using custom python scripts and annotated putative Hadrurus venom proteins based on their homology to previously identified scorpion venom proteins from the UniProt (UPT; downloaded April 13, 2018) toxin database. However, these homologous toxin searches were only executed on contigs with 90% matches to the total length of toxins from the UPT database. The top Basic Local Alignment Search Tool’s (BLAST) hit for each contig passing the above quality check was querried to identify valid stop codons and signal peptides with SignalP (version 4.1 under sensitive settings; Petersen et al. 2011). Any quality checked contigs with a valid stop codon and signal peptide were maintained and named by referencing their top UPT BLAST match.

We also performed a mass-spectrometry-based approach to annotating toxins to improve our ability to detect venom components. Using Proteome Discoverer and Scaffold, we searched the LC-MS/MS against custom databases generated from our transcriptome assemblies. After collecting all available open reading frames across each contig with the getorf function in Emboss (version 6.6.0.0; Rice et al. 2000), we annotated putative sequences that had open reading frames with proteomic evidence, a valid stop codon, and a signal peptide. As with the transcriptome assemblies, validated sequences were maintained and named by referencing the top UPT BLAST match. Putative toxin sequences from our mass spectrometry-based approach were combined within individuals and clustered at 100% sequence identity using cd-hit-est (version 4.6; Li and Godzik 2006). To check for chimeric sequences, we aligned the merged reads against our putative toxins using bwa (version 0.7.12; Li 2013) and used a 151-bp sliding window to screen read distributions and remove sequences without any coverage between windows. All maintained sequences with > 20 fold differences in read distributions were checked manually for chimeras and discarded as necessary. All maintained putative toxin sequences were clustered with cd-hit-est once at 99% within individuals and a second time at 98% between individuals from the same population (i.e., H. spadix, H. arizonensis-LPC, H. arizonensis-PC, and H. arizonensis-SDC).

All nontoxin transcripts were annotated using BUSCO (version 5.1.2; Seppey et al. 2019), genomic settings, the Arachnida Odb10 database (downloaded August 2020), and contigs from the Trinity assembly. For each individual, BUSCO matches with a single copy were named and checked for valid start and stop positions using custom python scripts. The final consensus transcriptome was constructed by combining the validated nontoxin sequences with our putative toxin sets from mass spectrometry and homology-based annotation approaches and clustering with cd-hit-est at 98% between individuals of the same population. In our analyses, venom components with both proteomic and transcriptomic evidence are referred to as “proteomically confirmed toxins”, while venom components with homology to putative scorpion toxin families and only transcriptomic evidence are referred to as “homology-only toxins.” Transcript abundances were quantified simultaneously with the parallel shell utility (Tange et al. 2011) using RSEM (version 1.3.2; Li and Dewey 2011) and bowtie2 (version 2.5.1; Langmead and Salzberg 2012) coding sequence alignments to the transcriptome. In modification from Nystrom et al. (2023) and to ensure we only maintained high quality transcripts, we removed sequences from our analysis that were absent in one individual and made up less than 250 transcripts per million (TPM) in the other. We then re-ran RSEM with bowtie2 to obtain final transcript abundances.

Statistical Analyses

To test for relationships between venom proteomic and venom-gland transcriptomic expression within and between individuals from the same population, we first centered log-ratio (clr; Aitchison 1986) transformed the proteomic and transcriptomic abundances. Correlations were then estimated using the inherent lm function in R version 4.2.1 (R Core Team 2021). To test for a sex-based variation in Hadrurus venoms, we grouped toxins into nine major toxin families (i.e., AMPs, ion-channel toxins, La1-like peptides, NDBPs, peptidases, PLA2s, SPs, VPs, and other toxins) and isometric log-ratio (ilr; Egozcue et al. 2003) transformed the relative transcriptomic and protein abundance data from each putative toxin family. We then tested for a significant difference in venom composition between males and females by running a PERMANOVA with the adonis function from the vegan package (Oksanen et al. 2013) in R. All figures were visualized using the ggplot2 package (Wickham 2016) with colorblind and grayscale friendly color schemes from the viridis package (Garnier et al. 2021) in R.

Mitochondrial Gene Phylogeny

As species-level phylogenies for scorpions are not as refined as those from the more well studied venomous species (e.g., snakes), we opted to use mitochondrial gene phylogenetics to confirm the species of H. arizonensis collected in this study. We generated a maximum likelihood phylogeny of 1,029 bp fragments of cytochrome c oxidase subunit I (COI) genes from the eight Hadrurus used in this study and 256 COI sequences from H. arizonensis collected across their range (KC347040–KC347295; Graham et al. 2013). The optimal maximum likelihood tree was generated using IQ-TREE 2 (version 2.0.3; Nguyen et al. 2015; Minh et al. 2020) using a codon model and 1000 ultrafast bootstrap replicates (Minh et al. 2013). Additional bootstrap support was generated using 500 nonparametric bootstraps in IQ-TREE 2. The optimal maximum likelihood tree was then visualized in the Interactive Tree Of Life (iTOL version 5; Letunic and Bork 2021) and rooted using the two H. spadix COI sequences as an outgroup.

Selection Analyses

Selection Analyses of Major Toxin Families

To prepare sequences for selection analyses, we first grouped venom gene coding sequences from H. spadix and the three H. arizonensis populations by putative toxin family. To remove nearly identical sequences and maintain consistency with clustering methods used in our transcriptome annotation process as described above (see Section 4.5) we cd-hit coding sequences from the three H. arizonensis populations within putative toxin family at 98%. Clustered sequences were renamed to include the toxin name of the initial sequence and the population abbreviation for each population where that toxin was found. For example, Hariz-PC_aKTx-1, Hariz-LPC_aKTx-2, and Hariz-SDC_aKTx-26, were clustered to Hariz-PC-LPC-SDC_aKTX-1. After clustering, we removed stop codons from each toxin sequence and imported sequences into Geneious version 2022.2.2 (Kearse et al. 2012). Hadrurus spadix and clustered H. arizonensis sequences in the same putative toxin family were aligned using the translation alignment under default settings in Geneious. Optimal maximum likelihood IQ-TREEs were generated for each major putative toxin family using a codon model and 1000 ultrafast bootstrap replicates and visualized with iTOL.

However, as some scorpion toxin families display high sequence divergence with up to 32 defined subfamilies (i.e., \(\alpha \)-K+-channel toxins and non-disulfide bridge peptides) and because highly divergent sequences can cause site-specific tests of positive selection to perform anti-conservatively (Murrell et al. 2012), we did not test entire toxin families for selection. Rather, we only tested groups of sequences from each putative toxin family that displayed at least 55% pairwise sequence identities to each toxin from that group. Removing overly divergent sequences to limit false positives is not uncommon when testing for diversifying and purifying selection (Rech et al. 2012; Spielman et al. 2019).

Putative toxin family groups of at least four sequences with shared 55% pairwise sequence identities were first aligned using a translational alignment under default settings in Geneious. Optimal maximum likelihood IQ-TREEs were then generated for each putative toxin family group using a codon model and 1000 ultrafast bootstrap replicates (with 500 nonparametric bootstraps for additional support) before being visualized with iTOL. We then used the HyPhy version 2.5.50 (Hypothesis Testing using Phylogenies) software package to test each putative toxin family group for signals of diversifying and/or purifying selection. First, we used HyPhy to test for site-specific episodic positive selection under the mixed effects model of evolution (MEME; Murrell et al. 2012) using default settings. We then used the fixed effects likelihood (FEL; Kosakovsky Pond and Frost 2005) test, with default settings, to infer site-specific pervasive diversifying and purifying selection. Evidence for site-specific selection from MEME and FEL was plotted using the ggplot2 package in R. To test all branches across each each tree for signals of episodic diversifying selection, we performed an adaptive Branch-Site Random Effects Likelihood (aBSREL; Smith et al. 2015) test. Finally, we tested for gene-wide diversifying selection using the Branch-Site Unrestricted Statistical Test for Episodic Diversification (BUSTED; Murrell et al. 2015) under default settings.

Subset Resampling and Analysis of \(\alpha \)KTxs

As power to detect diversifying selection increases with alignment size (i.e., number of sequences; Murrell et al. 2012), we decided to re-perform our selection analyses on a subset of the \(\alpha \)KTxs (the most diverse putative toxin family) to test for an effect of alignment size on our capability to detect selection across scorpion venoms. We randomly sampled 25 subsets of five toxins from the Group-1 \(\alpha \)KTxs. Each subset of five toxins was translation-aligned in Geneious before producing a maximum likelihood IQ-TREE. We then repeated each test for episodic and pervasive diversifying/purifying selection (i.e., BUSTED, aBSREL, MEME, and FEL) as described above (see Sect. 4.8.1).

Multiple Protein Sequence Alignment

All multiple protein sequence alignments of Hadrurus toxins were produced in Geneious version 2022.2.2 (Kearse et al. 2012) using Clustal Omega (version 1.2.3; Sievers et al. 2011; Sievers and Higgins 2018) under default settings. The resulting multiple protein sequence alignments were imported into R and plotted using the ggmsa package (Zhou et al. 2022). Signal peptides were identified using SignalP version 4.1 under sensitive settings (Petersen et al. 2011). Arginine and lysine propeptide cleavage sites were detected using ProP (version 1.0; Duckert et al. 2004). Other expected propeptide cleavage sites were identified by homology to known scorpion toxins with propeptides, where possible.