Introduction

Hottentotta tamulus (Fabricius), the Indian red scorpion of family Buthidae, is one of the most poisonous scorpions of the Indian subcontinent (Murthy and Zare 1998; Strong et al. 2015; Ratnayake et al. 2016). If the sting is not treated in time then it is considered as lethal. This is especially true in children from rural areas where envenomation can lead to several complications including cardiovascular, haemodynamic and haematological alterations (Murthy and Zare 1998; Bawaskar and Bawaskar 1999; Strong et al. 2015). The species is widely distributed in Indian subcontinent and the toxicity of envenomation in different geographical locations has been suggested to differ substantially (Kankonkar et al. 1998). For instance, within the Indian state of Maharashtra, fatal human envenomation are recorded mainly from the Konkan region (Bawaskar and Bawaskar 1999).

Fig. 1
figure 1

Sampling localities of H. tamulus.

Earlier investigations on understanding the reasons for geographical variation in the toxicity of envenomation from the red scorpion have focussed on understanding the variation in venom peptides (Badhe et al. 2006; Newton et al. 2007; Strong et al. 2015). However, it is not clear whether this is also reflected in variation in the population genetic structure of the species. Suranse et al. (2017) showed that there is high genetic variation in the mitochondrial cytochrome oxidase subunit I (COI) partial gene sequence of H. tamulus and the genetic distance increased linearly with geographical separation. However, their result was based on small sample size per population.

In this study, we investigate the haplotype diversity of eight populations of H. tamulus from different geographical regions of Maharashtra state. Based on the model-based clustering, we have grouped the eight populations into three clusters, which correspond to the high, moderate and low rainfall areas. We have provided the first account of diversity and biogeographical distribution of H. tamulus haplotypes.

Materials and methods

Sample collection

A total of 66 specimens of H. tamulus were collected from eight populations in Maharashtra (figure 1). The specimens were collected within 2-km radius of point locality mentioned in table 1. Species identification was confirmed using Tikader and Bastawade (1983) and Kovařík (2007). The eight populations were geographically divided into three groups, namely high, moderate and low rainfall areas, based on the amount of precipitations they receive. Populations at Bhatye Plateau, Sangameshwar and Kalyan are on the western side of the Western Ghats, which receive very high precipitation. Populations at Jejuri, Shindavane, Pashan and Alandi are on the eastern side of the Western Ghats and are in the rain shadow area with moderate precipitation. Population at Jalna in the far east is from a dry area with very low precipitation. Collected specimens were preserved in 100% ethanol. Specimens used in this study are currently in the museum collection of the Bombay Natural History Society (BNHS), Mumbai; the Wildlife Information Liaison Development Society (WILD), Coimbatore; and the Institute of Natural History Education and Research (INHER), Pune, India. Museum accession numbers are provided in table 1.

Table 1 Specimens used for morphological and genetic analysis. GenBank accession numbers are provided for specimens used for genetic analysis.

Genetic analysis

DNA was extracted from 36 specimens of H. tamulus following the methods explained by Suranse et al. (2017). Cytochrome oxidase subunit 1 (COI) was amplified using forward primer LCO1490 (\(5^\prime \)-GGT CAA CAA ATC ATC ATA AAG ATA TTG G-\(3^\prime \); Folmer et al. 1994) and reverse primer HCO2198 (5\(^\prime \)-TAA ACT TCA GGG TGA CCA AAA AAT CA-\(3^\prime \); Folmer et al. 1994) or Nancy (\(5^\prime \)-CCH GGT AAA ATT AAA ATA TAA ACT TC-\(3^\prime \); Simon et al. 1994). PCR amplification, PCR product purification and sequencing follow the protocol detailed in Suranse et al. (2017). Sequences were edited manually and were deposited in GenBank (accession numbers are provided in table 1). Additional six sequences were obtained from Suranse et al. (2017). Sequences were aligned using MUSCLE (Edgar 2004) and were trimmed to the coding region. Final alignment had 606 bases. Uncorrected raw (p) distances between pairs of sequences were calculated in MEGA 7 (Kumar et al. 2016).

For phylogenetic analysis using maximum likelihood method, Orthochirus bicolor (COI gene GenBank accession number KT716038) was used as an outgroup. Sequences were partitioned into first, second and third codon positions and best partition scheme after merging was determined using the greedy strategy (Lanfear et al. 2012) implemented in IQ-Tree (Nguyen et al. 2015) based on minimum Bayesian information criterion (BIC) (Schwarz 1978). Maximum likelihood analysis was performed in IQ-Tree (Nguyen et al. 2015) with ultrafast bootstrap support (Minh et al. 2013) for 1000 iterations. The phylogenetic tree was edited using FigTree v1.4.2 (Rambaut 2009).

Genetic network was constructed using median joining method with \(\varepsilon = 0\) to get the minimum spanning network (Bandelt et al. 1999) by freeware POPART (Leigh and Bryant 2015). DNA polymorphism analysis and Tajima’s neutrality test was performed in DnaSP (Librado and Rozas 2009). Site-wise codon selection pattern was studied using single-likelihood ancestor counting method (Pond and Frost 2005) implemented in Datamonkey (Delport et al. 2010). Model-based clustering method implemented in Structure 2.3.1 was used to understand grouping of the eight populations (Pritchard et al. 2000). Structure was run using length of burnin period of 5000 and number of Markov Chain Monte Carlo (MCMC) simulations set to 50000 after burnin. Analysis of molecular variance (AMOVA) (Excoffier et al. 1992) was performed by Arlequin suite ver 3.5 (Excoffier and Lischer 2010) to understand the genetic variance between and among the identified groups from model-based clustering method.

Table 2 Haplotype diversity (H), nucleotide diversity (\(\pi \)) and Tajima’s neutrality test for three groups.

Morphometric analysis

We measured 13 morphometric characters, namely cephalothorax length, carapace length, anterior carapace width, posterior carapace width, mesosoma length, pedipalp length, femur length, patella length, manus length, fixed finger length, movable finger length, pectin length and pectinal teeth length along with two meristic characters namely left and right pectin teeth count. All morphometric characters had a positive linear relationship with the cephalothorax length. Therefore, to remove size bias, we converted all characters as percentage of cephalothorax length. Twelve size adjusted morphometric characters and two meristic characters were used to understand whether the specimens from different locations formed different morphological groups by plotting the Euclidian distances using nonmetric multidimensional scaling (NMDS). To check the whether low dimensional NMDS plot was representative of multidimensional data we calculated stress value using Shepard plot. NMDS was performed in PAST 3.15 (Hammer et al. 2001).

Results

Genetic analysis based on COI of 42 specimens from eight populations harboured 22 haplotypes with high haplotype diversity and nucleotide diversity (table 2). The divergence in different alleles was not significantly different from neutrality (Tajima’s D, \(P>\) 0.1; table 2). Analyses of nucleotide substitution model suggested F81 for the combined first and second codon positions (BIC = 4197.104) and K3Pu+G4 for third codon position (BIC = 2436.328) as the best partition scheme, suggesting that for rate of transitions and transversions type of nucleotide substations were same for the first two codon positions. However, codon substitution pattern suggested that \(d_{\mathrm{N}}/d_{\mathrm{S}}\) ratio was 0.0202 and 18 sites were negatively selected at 0.1 significance level.

Model-based clustering grouped the eight populations in three clusters (lnL = –466.0 ± 8.7) corresponding to the three geographical groups with high, medium and low rainfall (figure 2c). Biogeographical groups of H. tamulus are also evident from the maximum likelihood tree based on best partition scheme, which clustered populations from three groups into monophyletic clades (figure 2a). Median joining network also formed three distinct clusters corresponding to high, medium and low rainfall areas (figure 2b). These groups were statistically supported in AMOVA, which showed high per cent variation among groups as compared to among populations within group and individuals within populations (table 3), confirming that the high, moderate and low rainfall areas harbour distinct haplotypes.

Maximum haplotype diversity was in the low rainfall region whereas maximum nucleotide diversity was in high rainfall region (table 2). Allelic frequencies were not significantly different from neutrality (table 2). Although, within populations as well as within groups there were high number of mutations (figure 2b), most of these were null mutations and formed identical protein (figure 2d), except for two different nonsynonymous substitutions in low and moderate rainfall areas (figure 2d). Maximum raw genetic distance between pair of sequences was as high as 5% (figure 2e). Although, both the maximum likelihood tree and median joining network formed three distinct clusters, there was no distinct genetic gap, separating intra-group distances from inter-group distances (figure 2e), suggesting that the three groups cannot be delineated into different species.

Despite the genetic distance among populations in the three groups based on geographical separation and precipitation, we could not identify any morphological differences (figure 3). With a stress value less than 0.05 (see figure 3 inset), NMDS depicted a good fit to the actual ranked distance among the individuals indicating that the observed overlap in morphometric space (figure 3) depicts the true picture of similarity in the morphometric characters.

Discussion

Suranse et al. (2017) suggested that there is a high genetic variation in H. tamulus populations based on 11 specimens from nine locations. Since, each locality was represented by a few number of individuals; they had limited information on genetic variation among and within populations. Based on a larger sample size, we validate their claim that H. tamulus harbours a high genetic variability as evident from high haplotype and nucleotide diversity. Further we have shown that population from different geographical locations harbour distinct haplotypes.

Suranse et al. (2017) suggested that the high genetic variation in H. tamulus warrants the re-evaluation of subspecies, which are currently synonymized with the species. We have topotypes of H. tamulus from Kalyan and Buthus tamulus concanensis Pocock from Bhatye plateau and both the populations formed a single cluster for high rainfall area cluster. We therefore genetically validate the synonymsy of Buthus tamulus concanensis to H. tamulus by Kovařík (2007). Barcode gap, which is defined as a gap between intraspecific diversity and interspecific diversity is often used for delimitation of species (Puillandre et al. 2012). However, absence of such a gap among the haplotypes of three geographical clusters and strong overlap in morphometric clusters suggests that the observed variations are at the level of populations and not species.

The high haplotype and nucleotide diversity in our study is consistent with other buthid taxa from Africa and Europe (Sousa et al. 2010, 2011, 2012; Husemann et al. 2012). Similar to our study, Sousa et al. (2010) also noted that the change in allelic frequencies was not significant from neutrality. Nevertheless, similar to Miller et al. (2014) observations for the boreal North American scorpion, we also found that COI has experienced purifying selection with larger number of synonymous mutations than nonsynonymous mutations.

Fig. 2
figure 2

Molecular analysis based on mitochondrial cytochrome oxidase partial sequence and its translated protein sequence. (a) Maximum liklihood tree superimposed on distribution map. Orthochirus bicolor, used as an outgroup, is not shown. Values along the nodes are percentage bootstrap values for 1000 iterations, where values less than 50 are not shown. (b) Median joining network of nucleotide sequences showing the relationship between different haplotypes. Mutations are indicated with dash. (c) Triangular plot showing the clustering of populations using model-based clustering method. Note that all the points are on the respective vertex of the triangle. (d) Median joining tree of amino acid sequences shows the relationship between different haplotypes. Number of changes in amino acid sequences are depicted with dash marks. (e) Box plot of intra and inter group raw p genetic distances for groups depicted in (b) and (c). Solid line in the box is median, dashed line is mean and outliers are shown by solid circles. In (b) and (d), number of individuals in each haplotype are indicated with the divisions of the circle. In (a), numbers in parenthesis after the locality name are samples used for genetic analysis.

The biogeographical distribution of H. tamulus haplotypes, with distinct genetic structure across the high, moderate and low rainfall areas is an interesting finding for two reasons. First, although it has been suggested that there are distinct biogeographical patterns in the distribution of scorpion fauna of India (Tikader and Bastawade 1983), we provide the first evidence that the widespread species such as H. tamulus could have distinct genetic haplotypes in different biogeographical regions. The genetic separation along the western and eastern slopes of Western Ghats mountain ranges is of particular interest because although there is genetic evidence that westward and eastward flowing rivers of northern Western Ghats may harbour distinct species of freshwater fishes (see for example Kumkar et al. 2016), to our knowledge there are no such evidences for land fauna of this region. The rainfall itself has been considered as a major factor affecting the diversity and distribution of scorpions (see Koch 1977; Brites-Neto and Duarte 2015) as it could affect the environmental factors such as soil texture, vegetation type and the availability of the prey base.

Table 3 AMOVA of eight populations from three geographical groups inferred from model-based clusters.
Fig. 3
figure 3

Nonmetric multidimensional scaling (NMDS) of H. tamulus specimens collected from different localities using size adjusted morphometric characters. Shepard plot depicts the reliability of the NMDS plot is shown in the inset. Numbers in parenthesis after the locality name are samples used for morphometric analysis.

The second interesting aspect of the biogeographical separation of the haplotypes could be its possible connection to the toxicity of envenomation from geographically separated populations of H. tamulus. There are anecdotal reports that suggest that the potency of scorpion stings from the Konkan region on the western side of the Western Ghats is higher than the areas on the eastern side of the Western Ghats (Kankonkar et al. 1998; Murthy and Zare 1998; Bawaskar and Bawaskar 1999). To check these claims, Newton et al. (2007) studied the venom peptide composition for populations from western and eastern slopes of the Western Ghats and found marked variation in the venom peptides for scorpions collected from high and low rainfall areas. Our study suggests that the geographically separated populations have genetically distinct haplotypes. It is therefore possible that variation in the venom peptides could be attributed to genetic variations among the populations.

Current study is still preliminary as we could investigate only one gene because of several logistic reasons. Nevertheless, we provide first evidence with of high haplotype diversity in the widespread species H. tamulus and show that there is distinct biogeographical pattern in haplotype distribution.