Introduction

A widely recognized approach towards characterizing diversity is studying and comparing genetic variation within and between life forms (Noss 1990). Considering the necessity of genetic diversity for evolution to occur (Reed and Frankham 2003), it is often used as a measure of fitness and to determine long-term viability of organisms. The genetic variability of an organism allows us to assess its ability to respond to changing environmental conditions or disease epidemics (Rasmussen et al. 2014; Rivers et al. 2014; Hume et al. 2016). The maintenance of genetic diversity is largely shaped by population size and geneflow between populations (Frankel and Soule 1981; Frankham 1996) with larger connected populations showing higher genetic diversity (Futuyma 1986; Falconer 1996). In this regard, extrinsic factors such as habitat continuity and natural barriers play an important role in shaping genetic diversity. Depending on the species’ traits, relief features like mountains, forests, dry grasslands, river basins etc., may hinder movement (Roonwal 1984; Knowles 2000; Nag et al. 2011) or facilitate connections (Willis et al. 2010) or even act as refugia (Ripley and Beehler 1990; Modolo et al. 2005) during unfavourable conditions, hence modifying patterns of geneflow and leading to genetic structure. In recent times, anthropogenic factors such as loss and/ or fragmentation of habitats have also reduced connectivity and fostered genetic structure, as well as loss of genetic diversity. The combination of limited geneflow and small size could cause genetic differences to accumulate rapidly between geographically isolated populations (Worley et al. 2004). In the context of habitat fragmentation, knowledge about how diversity is partitioned within and among populations is important for species-centric conservation programs. Small patches separated by stringent barriers, natural or human-made, could lead to distinct populations or even sub-species on either side of the barrier. Additionally, it could also lead to loss of effective population size, or bottleneck effects (Lukoschek 2018; Frankham et al. 2002) which could be inferred from their underlying genetic condition. Further, using markers with a higher mutation rate can provide information on recent population genetic processes (Wang et al. 2019). This knowledge could be used to identify sources of introduced populations (Goodman et al. 2001), the most vulnerable populations requiring immediate action, and also, genetically diverse populations which can recover them.

The Indian sub-continent is a prime example of a mosaic landscape, with a complex ecological, climatic and geological history (Biswas and Pawar 2006) and harbouring diverse ecosystems criss-crossed with topological barriers as well patches of human-altered regions, that shelter an astounding biodiversity (Ghosh-Harihar et al. 2019). However, not many studies in India have looked at the impact of landscape alteration and habitat fragmentation on the genetic diversity of a species, especially across its entire range. Additionally, we do not have a complete understanding of regional biodiversity patterns in India (Reddy 2014; Tamma and Ramakrishnan 2015). This is even more so in cases of meso or mega herbivore fauna, particularly in grassland ecosystems. Few findings indicate the possibility of both hitherto unknown genetic connectivity as well as barriers to geneflow, depending on the species under study. Research on tiger populations from six protected areas in central India showed genetic connectivity between these populations, in spite of the presence of seemingly in hospitable regions in between (Joshi et al. 2013). On the other hand, leopard cats from the North and South of India have a high population structure, indicating a climatic barrier (Mukherjee et al. 2010). A leopard meta-population in central India shows genetic structuring due to increasing habitat fragmentation, both at the landscape and fine-scale levels (Dutta et al. 2013). Elephant populations in Southern India belong to two genetic clusters, corresponding to the Nilgiri and Periyar-Anamalai regions, and these are further distinct from those in central and northern India (Vidya et al. 2005a, b). While sloth bears in the Satpura-Maikal region of central India separate into two genetic clusters, there is also evidence of contemporary geneflow (Dutta et al. 2015). All of these examples come from investigations on forest dwelling species while the dry-habitat ones have been largely ignored. The same topographic and climatic barriers would affect the distribution of arid-zone species in a different manner, in addition to the challenges presented by habitat fragmentation. The Indian grey wolves also inhabit semi-arid grasslands and scrub forests (Jhala and Giles 1991) and their phylogeography has been well studied (Sharma et al. 2004), albeit with a distinct lack of samples from southern India. Another species that could potentially display strong genetic structure is the grassland dwelling blackbuck or Antilope cervicapra, an endemic antelope from the Indian sub-continent. They are phylogenetically nested within the gazelle clade (Jana and Karanth 2019), yet are morphologically distinct and sexually dimorphic. They are medium sized antelopes (23–45 kg; Mungall 1978; Ranjitsinh 1989) and form a part of the Bovidae family. They are selective grazers that live in groups which can range from two to several hundred individuals (Ranjitsingh 1989). While places like Velavadar in Gujarat and Tal Chhappar in Rajasthan boast of the largest natural blackbuck populations, we also find small groups of less than ten in other places like Kolar in Karnataka or Ananthapur district in Andhra Pradesh (personal observations). They are seen in isolated clusters and also in seemingly connected metapopulations, across different parts of the country, making it an ideal system to study the different patterns of genetic structure and variability and also, demographic history in the context of habitat fragmentation. Blackbucks have never been found, even as fossils, outside the sub-continent, although they had been introduced in North America and Argentina, for game hunting. Once prevalent across India, their numbers were greatly decimated in the past due to indiscriminate poaching and game hunting. Post that period, shrinkage and fragmentation of grasslands to relic pockets further stained these isolated populations (Schaller 2009). Currently, very few patches of grasslands and scrublands remain relatively undisturbed across the country and these areas continue to be relegated as wastelands and are often considered as prime targets for development, urbanisation and even land conversion for agriculture. They are currently seen in multiple areas across the semi-arid regions in India. Given that grasslands in India continue to be heavily fragmented, it is imperative to understand the population structure of blackbucks and make effective decisions regarding their conservation.

Most of our information on antelopes comes from African species. Asia has fewer antelopes, but they are found in a wider range of habitats (Mallon and Kingswood 2001). Non-invasive methods offer us a way for conducting genetic analysis of species that may be difficult to sample otherwise. Methods like using fecal samples allow us to obtain genetic information for animals without having to euthanize, maim or even capture the focal samples, earning approval from both conservation studies and ethical perspectives. However, the extracted DNA is often highly fragmented, of poor quality and quantity and includes microbial and dietary genetic material, making downstream experiments and analysis more difficult. A fraction of collected samples that do not pass the quality threshold have to be discarded, resulting in reduced sample sizes. Some strategies that can support more reliable results include repeated genotyping of samples to ensure accuracy (Taberlet et al. 1996) and rejection of samples containing a lower amount of DNA than a reliability threshold (Morin 2001; Horvath et al. 2005).

Here we study how genetic variation is distributed across the range of a native Asian grassland specific antelope, the blackbuck. While previous work has reported the forensic identification of blackbucks for wildlife investigations (Kumar et al. 2018), two very recent studies have begun advancing our understanding of the genetic diversity in this species. Bhaskar et al., utilised three mitochondrial markers covering cytochrome b, cytochrome c oxidase subunit-1 and the partial control region to study multiple populations in southern India and found evidence of higher than usual haplotype diversity and some genetic structure between three clusters. De et al., sampled a single region from northern India and were able to cross-amplify a panel of ungulate microsatellite markers which could be potentially used for population and landscape genetics studies of the blackbuck. However, we still lack a comprehensive knowledge of their genetic diversity, involving different markers, multilocus analysis and well as extensive sampling covering their pan-India geographical distribution. To elucidate the patterns of genetic structure within this species we used both nuclear (microsatellite) and mitochondrial (D-loop) markers. Furthermore, we also explore the drivers of genetic variation and make inferences on past demography in this species.

Materials and methods

Sampling

Fecal samples of blackbucks were collected non-invasively, without animal-handling from 12 different locations (map in Fig. 1) across their geographic range, in the states of Rajasthan, Gujarat, Maharashtra, Karnataka, Tamil Nadu, Andhra Pradesh, Uttar Pradesh and Orissa (details in Table 1). In case of protected areas, permissions to conduct research were obtained from the requisite forest departments. The animals were observed from a distance and tracked on foot or followed on vehicles. Being diurnal, blackbucks spend most of their time foraging during early morning and afternoon and hence, most of the sampling was conducted in this period. Population densities of blackbucks at each sampling location were obtained from either literature, previous surveys or forest department records, where available (Bikash Rath and Rao 2005; Asif and Modse 2016; Baskaran et al. 2016; Mamatha and Hosetti 2018; Meena and Saran 2018; BirdLife International, 2020). This was used as a proxy for population sizes for the purpose of our study, since we had access to more accurate records of densities rather than actual number of individuals in a particular location. Since blackbuck fecal pellets have a characteristic fecal shape (Bhaskar et al. 2021), they are easily distinguishable from those of other bovid species from that region, thus negating any potential mix-up. Fresh fecal samples from individual adult blackbucks were collected using two methods. First, the outer layer of the fecal pellets was swabbed and preserved in a lysis buffer solution (White and Densmore 1992) in 2.0 ml cryovials, to prevent the DNA from degradation. The buffer was prepared using 2.5 g of Sodium dodecyl sulphate (SDS), 100 ml of 0.5 M Ethylene diamine tetra-acetic acid (EDTA), 1ml Sodium chloride (NaCl) and 50 ml of 1 M Tris-Hydrochloric acid (HCl) and milliQ water to make up the volume to 500 ml. Secondly, the whole pellets were collected and stored in absolute alcohol. The samples were stored at room temperature in the field and DNA extraction was carried out after they were transferred to the laboratory.

Table 1. List of populations sampled, along with the population densities and genetic information
Fig. 1
figure 1

Map showing the different sampling locations. The populations were grouped into East (Green), North (Red) and South (Blue) clusters and are colour coded as per the labels of the haplotype network. Inset: Male adult blackbuck

Molecular work

The DNA was extracted from the swabbed samples using the Wizard Genomic DNA Purification kit (Promega, Singapore) and from the whole pellets using the QIAmp Fast Stool Extraction kits (Qiagen, Germany), following the standard protocols, with certain modifications. Following the first step, where the outer layer of the pellets was scraped into fine powder, the samples were kept in extraction buffer overnight, before proceeding to the next steps. The DNA extracts were quality checked using a nanodrop, aliquoted and diluted about five-fold, and stored at − 20 °C. For the mitochondrial marker, the D-loop region sequence was obtained from the complete mitochondrial genome of A. cervicapra (GenBank Accession number: AP003422.1). The region was divided into four overlapping segments which acted as templates to design primers using the web tool Primer-BLAST. The default primer stringency conditions were used and the primer melting temperature (Tm) range was set from 45 °C to 60 °C, with a maximum Tm difference of 3 degrees. Primer pairs were selected from the resultant options on the basis of maximum coverage and higher GC content. Following preliminary trials, four overlapping primers viz., HV1D, INT, DLF3 and DLF4 (Table 2), covering < 300 base pairs each, were finalized to amplify the D-loop region spanning ~ 800 base pairs. Double stranded polymerase chain reactions (PCRs) were performed using ~ 5–20 ng/ul of the extracted DNA, with 1U Taq Polymerase, 2 mM of Magnesium chloride (MgCl2), 0.15 mM of deoxynucleotide triphosphates (dNTPs), 2ng/ml of Bovine Serum Albumin (BSA), 10 μm forward and reverse primers and milliQ water to make up a final reaction volume of 10ul, using the following thermocycler settings: 94 °C initial temperature (5 min), 45–50 cycles of 94 °C denaturation (30 s), 45–63 °C annealing (30 s), 72 °C extension (1 min) and 72 °C final extension (10 min). The PCR products were run on 1% agarose gel using electrophoresis and observed under ultraviolet light to select for successful amplification and the Sanger sequencing was outsourced to Medauxin and Barcode Biosciences. Each sample was sequenced in both forward and reverse directions and the obtained sequence was aligned using the Genbank data to avoid potential Numts and ensure the validity of the blackbuck mitochondrial DNA region. The sequences for unique haplotypes were submitted to Genbank (Accession numbers: OP794109-OP794335).

Due to the unavailability of published primers for microsatellite loci in blackbucks, primers from related species were tested in randomly selected samples. Of the 25 bovid primer pairs selected, bm302, bm415, maf70, hdz496, inra040, tgla122 and sps115 (Table 2), showed successful amplification during the preliminary trials and were further used for genotyping. The PCRs were performed using ~ 5–20 ng/ul of the extracted DNA as template, with 1x Qiagen Multiplex master mix, 10 μm of fluorescence-labelled forward (FAM or 6HEX) and reverse primers, 2 ng/ml of BSA and milliQ water to make up a final volume of 10ul, using the following thermocycler settings: 94 °C initial temperature (5 min), 50–55 cycles of 94 °C denaturation (30 s), 50–60 °C annealing (30 s), 72 °C extension (1 min) and 72 °C final extension (10 min). In light of suggested strategies to minimize genotyping errors (mentioned earlier in Introduction), each sample was run at least 3 times to ascertain consistent results for minimising scoring errors and also to verify the final alleles called. The PCR products were outsourced to Barcode Biosciences for genotyping, using LIZ500 as a size standard. Care was taken to avoid exposure to direct light during setting up the reactions, to prevent degradation of the fluorescent-labelled primers.

Table 2 List of primers used in this study, along with details of primer sequence, labels used, fragments obtained and annealing temperature

Analyses

Mitochondrial markers

The sequences obtained were edited using Chromas v2.6.5 (technelysium.com.au/chromas.html), to correct erroneous base calling. The cleaned sequences were then aligned using the Muscle algorithm with default parameters in Mega v7 (Tamura et al. 2011) and the four overlapping sequences obtained for each individual were combined using Mega and Geneious v10.1 (Kearse et al. 2012). The nucleotide and haplotype diversity values for each of the sampling locations were calculated using DnaSP v6.11.01 (Rozas et al. 2017). The concatenated dataset (> 800 bps), containing only the complete sequences obtained (without missing information), was used to calculate the haplotype diversity and to generate a Roehl data file in DnaSP. This was then used to build a build a median joining haplotype network in NETWORK v5.1 (fluxus-engineering.com), without star contraction and using the default values for epsilon. The original median joining network was post-processed on the basis of maximum parsimony using the Steiner algorithm (Polzin and Daneshmand 2001), to reduce the number of loops crossing over and obtain a network that could be visualised better. Further, the entire dataset assuming a single population was also used to calculate Tajima’s D and Fu’s FS values in Arlequin v3.5.2.2 (Excoffier and Lischer 2010). The mitochondrial sequences were segregated into three clusters viz., North (N), East (E) and South (S) based on their geographical location (Table 1) and used for an analysis of molecular variation (AMOVA) in Arlequin v3.5.2.2, to compare the genetic differences between clusters and also quantify the percentage of genetic variation explained within and between clusters. Further, a paired Mantel test was performed for the pairwise genetic and geographic distances between samples to check for isolation by distance (IBD).

Microsatellite data

For microsatellite screening, the selected 25 loci were used for amplification in > 25 samples, and included members from all the sampling locations. Most of the loci did not show any PCR amplification, even under multiple cycling conditions. The final loci to use for analysis were selected by using samples that showed a clear amplification in > 50% of the samples belonging to different sampling locations.

Each locus was amplified and scored at least three times (in each sample), in order to minimize genotyping errors. The samples that did not show successful amplification or consistent peaks in three genotyping replicates were discarded and the estimation of null alleles and allelic dropout, along with checking for scoring errors was done using MICROCHECKER v2.2.3 (Van Oosterhout et al. 2004). The genotyping results were viewed using the microsatellite plug-in in Geneious v10.1, to identify the size standard and the allele calling was done manually, post visualising the microsatellite stutter peaks. The Pid/PI (probability of misidentifying two individuals, drawn from the same randomly mating population, as a single individual) and Pidsibs/PIsibs (misidentifying siblings as the same individual, taking into account genetic similarity) values for each of the population was calculated separately, both at each locus and cumulatively using GenAlEx v6.503 (Peakall and Smouse 2006, 2012) in Microsoft Excel 2016. Further, the genetic diversity was measured as the number of alleles per population (Na), population-wise unbiased expected (He) (Nei 1978) and observed (Ho) heterozygosity, and tested for Hardy Weinberg equilibrium (HWE). The complete microsatellite dataset (without considering location information) was also used to calculate overall observed and expected heterozygosities and deviations from Hardy Weinberg equilibrium and linkage disequlibrium in Arlequin v3.5.2.2. The allele size information from each of the loci was also used to build a pairwise distance matrix (unbiased Nei’s distance) between individuals, which was then used for a principal coordinates analysis (PCoA) and to build a neighbour joining tree in MEGA. Further, the pairwise FST values between populations was used for a paired Mantel test with pairwise geographic distances in GenAlEx to check for isolation by distance.

Genetic clusters were determined using a Bayesian clustering approach in STRUCTURE v2.3.4 software (Pritchard et al. 2000), to determine whether there was any hidden population structure, irrespective of the geographical location of samples. The software can be used to group individuals into K clusters based on their genotypes, without prior information on their geographical location, using a Markov Chain Monte Carlo (MCMC) approach. An admixture model was used, with a Dirichlet prior D, where the relative contributions of the K populations were modelled using K parameters. A uniform prior with a maximum of 10.0 was used and the initial alpha value was set at 1.0 and standard deviation (SD) to 0.025. Allele frequencies were considered to be correlated among populations and FST value were considered ‘different for different subpopulations’, with prior mean of 0.01 and SD of 0.05. The analysis was run for K values between one and twelve and 20 runs were performed for each K with 10,000 iterations following a burn-in of 1000 steps. The results were used as input for structure harvester to determine the optimal number of genetic clusters using the method by Evanno et al. (2005), based on estimation of the mean likelihood value per K.

A simulation-based approach was used on the whole population to test for departure from mutation-drift equilibrium and detect signatures of past population dynamics. Two mutation models, the single stepwise model (SSM) and the two-phase model (TPM) were used and the Wilcoxon sign-rank test was implemented in BOTTLENECK v1.2.02 (Cornuet and Luikart 1996). The TPM model was run using both, the default variance of the geometric distribution and proportion of SMM in TPM values and also with variance = 0.36 and proportion = 0.0 as suggested by the authors, as they correspond to sensible parameter values for most microsatellites. The analysis was run for the species, from all sampling locations across its range, using the entire dataset as a single population.

The nucleotide diversity obtained from the mitochondrial data and the Ho and Na values obtained using the microsatellite markers were used as proxies for genetic diversity and compared against the population densities at each of the sampling locations. A linear regression was performed to determine whether there was an association between population density and genetic diversity and an analysis of variance (ANOVA) to test whether the correlations were significant.

Historical demography

Finally, to understand past demographic changes in the populations of this species, an Approximate Bayesian Computation was implemented on the microsatellite dataset using DIYABC v2.0 (Cornuet et al. 2014). The three geographic clusters, North, South and East were coded as populations and three demographic scenarios (Fig. 5) were compared after computing 3 million simulated datasets. In the first scenario, the South cluster was derived from the North (at time t2), which harboured the ancestral population, and the East diverged from the South at a later time period (t1). It is known that dispersal of the Bovidae family into the Indian subcontinent occurred predominantly through the Northwestern faunal gateway (Kurup 1974). The ancestral range of antelopes being the Saharo-Arabian region (comprising of Northern Africa and the middle East), the antelope lineages in India are presumed to have followed a similar route (Jana and Karanth 2019). In the second scenario, the South cluster was more ancient, and the North and East clusters were derived from it at successive time periods (t2 and t1 respectively). This also accounts for the possibility that the Southern region served as a ‘refugia’ during the Pleistocene (Vidya et al. 2009), from which the other two clusters diverged more recently. In the third scenario, the ancestral population (NA) gave rise to the North and South clusters (at time t2) and the East cluster was derived from the South at a later period (t1). The scenario and parameter prior combinations were first pre-evaluated using a Principal Component Analysis (PCA). This was done to check if a significant proportion of the simulated dataset was different from the values in the observed dataset. The stepwise mutation model was used, and the posterior probabilities of the scenarios was computed to find the most likely demographic pattern. The Bayesian model choice was made more efficient by using linear discriminant analysis on summary statistics, as suggested by the authors. Each of the scenarios was individually tested using both direct estimate and logistic regression to compare the number of selected datasets closets to the observed dataset. The confidence in scenario choice was evaluated using posterior based errors (where test samples were drawn from the simulated datasets closest to the observed dataset) and prior based errors (test samples randomly drawn from scenario ID and parameter values in prior distributions) when computed globally over all scenarios. Further, the chosen scenario was separately checked for prior error rate when compared against all the other scenarios in consideration, to deduce any type I errors.

Fig. 2
figure 2

Haplotype network of all samples showing unique haplotypes for all locations

Results

Mitochondrial dataset

About 400 individuals were sampled for both, the mitochondrial and microsatellite analysis and ~ 50% of them showed successful amplification in each case. Since extraction from fecal samples yielded low quality, fragmented DNA, separate PCRs were performed to amplify four smaller, over-lapping regions of the D-loop, that were then concatenated. Unambiguous results from each sequencing reaction were selected and the final dataset consisted of 810 bps D-loop sequence from 227 individuals. The three samples from Jayamangali did not show amplification for the first ~ 300 bps region of the D-loop, which also contains the hypervariable region 1. Hence, these samples were removed from the dataset used for building the haplotype network in NETWORK. A total of 186 haplotypes were found in the dataset and the network showed distinct haplotypes from all sampling locations, however, it did not exhibit any geographical clustering of haplotypes (Fig. 2). An overall pattern of isolation by distance was detected across the study region by the Mantel test (r = 0.25, p < 0.01).

Fig. 3
figure 3

Neighbour joining tree built using pairwise genetic distance between all samples, from 7 microsatellite loci. The highlighted region shows the clusters where samples from Bhetanai (East) are present. The sample labels are colour coded according to the North (red), South (blue) and East (green) clusters

The analysis of molecular variance (AMOVA) showed that 89.52% of the variation was explained within the three geographic clusters, North, South and East and only 10. 48% was sourced/contributed from between clusters. These geographic clusters were delimited based on the STRUCTURE analyses of microsatellite data (see below). The Mantel For the entire species, the haplotype diversity was high (0.997) while the nucleotide diversity was low (0.0667). Tajima’s D was − 1.474 and Fu’s FS was − 23.877 and both values were significantly negative. A comparison using pairwise FST values between clusters showed a 9.57% difference between the North and South cluster, 10% difference between the South and East and 13.45% difference between North and East clusters and the FST p values were all significant (p < < 0.01). The nucleotide diversity from mitochondrial D-loop data showed a significant correlation with the population densities at the sampling locations (F = 19.05, p < 0.01), with the linear model having an adjusted R2 of 0.6673. We found that mitochondrial genetic diversity was positively associated with blackbuck density (t = 4.365, p = 0.0024).

Microsatellite dataset

In our experiments, only seven of the tested loci showed successful amplification. Successful amplification of microsatellites markers from fecal samples is effected by many factors, including climatic conditions, sample preservation, time between collection and DNA extraction, PCR inhibitors, improper annealing, low DNA template, among others. Other studies have also shown that a large proportion of markers screened for blackbuck could not be amplified, indicating mutated or non-conserved flanking regions of the primer binding sequences (De et al. 2021). Large allelic dropout and the presence of null alleles also hinder obtaining accurate data, making it challenging to obtain a large panel of microsatellite markers for diversity-based statistical analysis. Further, genotyping errors can frequently occur while working with microsatellite markers, especially when working with non-invasive genetic samples. These may lead to a positive bias, which accumulates proportionally with increasing number of markers used (Creel et al. 2003). We were able to obtain a final dataset of 213 individuals from the sampled 400, which showed consistent peaks in the genotyping results. MICROCHECKER did not show evidence of large allelic dropout in any of the seven loci. Null alleles were indicated in the sampling sites (ranging from one to six loci) although they were inferred as a result of homozygote excess in each case. The repeat regions could not be successfully amplified for most of the samples from Challakere and hence, this location was dropped from further microsatellite analyses (Table 1). All the seven loci were polymorphic, with 8–20 alleles each and the mean number of alleles ranged from 3.143 to 7.714 per site and did not depend on the population density at those sites (linear regression: r = 0.505, p > > 0.05). All loci showed an overall deviation from HW equilibrium although, among all the 77 possible locus-site combinations (7 loci × 11 populations), 34 did not show this pattern (p > 0.01). The expected heterozygosity values ranged from 0.509 to 0.730 and the observed heterozygosities were lower (0.256–0.579). The microsatellite heterozygosities did not show any significant correlation to the population densities from sampling locations (p > > 0.05), unlike nucleotide diversity from mtDNA. The cumulative Pid/PI was < 0.000001 in most cases with the exception of Kolar (0.0001), corresponding to < < 0.001% probability that two individuals may have been mistakenly considered as one, and a < 0.01% chance of two siblings misidentified and a single blackbuck. The cumulative Pid/PI and Pidsibs/PIsibs values indicated a min of 5 microsatellites loci were required to allow for robust discrimination between two blackbucks (PID < 0.001 and PID sib < 0.01).

The Neighbour joining tree built (Fig. 3) did not show any particular structure for most sampling locations, except for Bhetanai, the majority of which fell in a single cluster (with the exception of four samples, which were closer to other populations). This was further supported by the PCoA analysis for blackbucks across eleven sites. The first axis explained 10.95% of variation while the first three axes combined explained 26.7% of the variation (Supplementary fig. S1). The Mantel test did not show any correlation between the pairwise FST values between sampling locations and their geographical distances (r < < 0.001, p > > 0.05), indicating no isolation by distance.

Fig. 4
figure 4

a Graph showing the DeltaK values for each K assigned in STRUCTURE. b Bar plot of STRUCTURE analyses arranged according the three clusters (K = 3)

The results from STRUCTURE and Structure Harvester showed K = 3 as the optimum value with the highest likelihood (Delta K = 29.946), indicating 3 most likely partitions (Fig. 4). Cluster 1 (C1, Eastern Cluster) almost completely contained individuals from Bhetanai, whereas cluster 2 (C2, Northern Cluster) is mostly composed of individuals from Velavadar, Tal Chhappar, Uttar Pradesh and Jayamangali and cluster 3 (C3, Southern Cluster) majorly from Rollapadu, Point Calimere, Kolar and Ranebennur. Samples from Nannaj were almost evenly distributed between clusters 2 and 3 (9 in C2 and 6 in C3) and two thirds of the animals from Timbaktu belonged to C3 and the rest in C2 (details of sampling locations in Supplementary Fig. S2).

Fig. 5
figure 5

Historical demographic scenarios tested using DIYABC. The green, red and blue regions correspond to the East (Pop1), North (Pop2) and South (Pop3) respectively. Results indicated Scenario 3 as the most likely scenario (See supplementary figure S3)

The Wilcoxon rank sign test using the BOTTLENECK software showed significant heterozygote deficiency in both, when SMM (p = 0.012) and TPM (p = 0.027) models (with suggested parameters for microsatellites) were assumed.

Historical demography

The first two axes of the PCA plot generated using DIYABC explained more than 55% of the variation. The observed dataset was within the simulated dataset space and most of the summary statistics values for the two datasets were not significantly different. Both the direct estimate and logistic regression considered scenario 3 as the best historical model of demographic change, by a considerable margin (Fig. 5; Supplementary Fig. S3). In the third scenario, the ancestral population gave rise to the North and South clusters and the East cluster was derived from the South at a later period. Further evaluation of confidence in our scenario choice also showed a high probability of type I error, where scenario 3 was rejected even though it was the correct choice, giving more validity to the results.

Discussion

Habitat loss and fragmentation has been known to influence genetic diversity in many organisms, although the extent of such affects has not been studied in grassland ecosystems in India. Here, we focus on an endemic antelope, to elucidate the patterns of genetic structure within this species, and also make inferences on past demography, using both nuclear (microsatellite) and mitochondrial markers. Since we were interested in both phylogeography and population genetics of the species, we put greater focus on maximised coverage of the geographical distribution range of the blackbucks. Although we were able to collect ~ 400 samples from our fieldwork, there were logistical constraints in larger sample collection, immediate DNA extraction and complete mitochondrial D-loop amplification. Non-invasive genetic studies are known to suffer from poor DNA quality and quantity (ref) and one way of ensuring accuracy of the results is by discarding samples below a certain reliability threshold, Our final tally of ~ 230 reported in this study belonged to samples that showed robust amplification across the complete mitochondrial D-loop and well as successful and consistent genotyping in three replicates. Though the study uses a small set of microsatellite markers, the combination of 7 loci showed sufficient resolution for correctly distinguishing between separate individuals. Since, population genetic parameters and frequency-based diversity analysis are less effected by genotyping errors as compared to individual identification (Pompanon et al. 2005), the seven selected loci in our study should have sufficient resolution for further genetic analysis of the focal species.

The mitochondrial haplotype network did not show any geographical clustering although each sampling location had unique haplotypes. The lack of shared haplotypes even between blackbucks from closely located regions suggests restricted female dispersal (Kerth et al. 2000; Lappan 2007). This is also supported by the fact that we find signatures of isolation by distance in maternally inherited mitochondrial DNA but not in case of microsatellite data, suggesting that male blackbucks might be the dispersing sex in this species. Many mammals are known to exhibit similar behaviour on account of female philopatry or male-biased dispersal and a polygynous mating system (Greenwood 1980; Clutton-Brock 1989; Waterman 2008; Nutt 2008).

The Neighbour joining tree generated using microsatellite data also did not exhibit geographical clustering of samples, with one notable exception. The majority of samples from Bhetanai (which is also the East cluster) fall in a single cluster, and this was also highlighted by the PCoA which shows notable separation between samples from Bhetanai and the other sites. The Bayesian method implemented in Structure supported three optimal clusters, one of which primarily contained samples from the East (Bhetanai) and the other two were mostly consisted of samples from the North (Velavadar, Tal Chhappar and Haliya) and South (Rollapadu, Point Calimere, Ranebennur, Kolar, Timbaktu). When these geographic clusters were delineated in the mitochondrial data for comparison, we found that a small, albeit significant, percentage of the genetic variation was explained between clusters.

Although the microsatellite diversity indices did not show any significant statistical correlation with the population densities, the mitochondrial nucleotide diversities of the sampled regions were positively correlated, with larger populations also showing higher mitochondrial genetic diversity. This again could potentially be a result of greater dispersal of male blackbucks and hence greater geneflow, when considering nuclear markers. In contrast, restricted movement of blackbuck females coupled with smaller effective population size of mtDNA marker (Moore 1995) may have contributed to the observed correlation between mitochondrial nucleotide diversities and population density.

While the mitochondrial haplotype network does not support geographic clustering of haplotypes, on the other hand, we find some genetic clustering into three populations in the microsatellite data. The simulation based approximated Bayesian computation strongly supported a past demographic scenario where an ancestral population gave rise to two genetic clusters, that pertain to populations in the North and South. The third population diverged from the South cluster at a later time period that gave rise to the current population in the East. These clusters are congruent with potential geographical barriers to species distribution and movement (Mani 1974; Ripley and Beehler 1990). Peninsular India is separated from northern India by a host of hill ranges prominently among them, the Satpura range, flanked by the Vindhyan and Ajanta hills (Thangaraj et al. 2010). Further, the Narmada and Tapti river basins also feature among prominent barriers (Thangaraj et al. 2010; Ramachandran et al. 2017). Northern India also experiences longer periods of extreme temperature, both in summer and winter when compared to much of peninsular India (Roy 2019). The two bifurcated regions correspond to the locations encompassing the South and North clusters mentioned in our study. The East cluster, although falling within peninsular India, is also isolated by forests and hill ranges belonging to the northern part of Eastern Ghats, as well as the Mahanadi and the Godavari and Krishna river basins on either side (Ramachandran et al. 2017). The highly fragmented series of hills in the southern Eastern Ghats may not pose as strong a barrier to the Point Calimere population, which is also located on the East Coast and forms one of the southernmost populations of this species. While most of these relief features have been considered as refugia or stepping-stones for wet-zone elements that aided their expansion (Hora 1949), the same topology could have also disrupted the movement of dry-zone species, even if they did not form a stringent barrier. However, it must be noted that signatures of historical demography could be distorted by recent geneflow. In case of blackbucks, our molecular data does not support strong population genetic structure, suggesting the possibility of such recent geneflow. Hence, the demographic scenario proposed here needs further validation.

From BOTTLENECK analysis on the entire species across its range, the SMM and TPM models showed significant heterozygote deficiency, which is indicative of a recent population growth (Cornuet and Luikart 1996; Sonsthagen et al. 2017). From the mitochondrial data, the significance of both, negative Tajima’s D (Tajima 1996) and Fu’s FS (Fu 1997), as well presence of a low nucleotide but high haplotype diversity, also points to recent expansion in this species. Large numbers of blackbucks were killed in the pre-independence (before 1947) era in India (Hughes 2009; Gilmour 2018) and poaching continued to be prevalent thereafter, resulting in precipitous decline in populations warranting threatened status for this species (Mallon 2008). However, recent surveys indicate a rise in numbers (IUCN SSC Antelope Specialist Group 2017). The protection accorded to this species under the Wildlife Protection Act in 1972 might have helped stabilize blackbuck populations. Furthermore, blackbucks seem to be well adapted to woodlands, scrublands, agricultural fields and even plantations. Thus, a combination of better protection and adaptability of this species to human modified landscapes probably aided the demographic expansion of the species in recent times, in spite of continued loss and fragmentation of grasslands brought about by anthropogenic activities. The demographic expansion in turn might have resulted in the lack of mitochondrial and nuclear population genetic structure (Excoffier et al. 2009; Garcia-Cisneros et al. 2016; Sabuni et al. 2016; Wereszczuk et al. 2017). However, much of the geneflow appears to be male mediated and it remains to be seen how restricted female movement might shape the demography of this species in the future.

Conclusion

Our study throws light on the genetic diversity of an endemic antelope species across its range and uncovers certain interesting patterns. Males of this species seem to move about a lot more as compared to females, as indicated by isolation by distance in mitochondrial but not nuclear markers, along with unique mitochondrial haplotypes from each region. We find the existence of three genetic clusters that coincide with current geographic regions, potentially facilitated by a host of biogeographic barriers in the Indian subcontinent. Our results indicate that a single ancestral population gave rise to blackbuck populations in the North and South, while the East population arose more recently from the South and is more distinct than the rest. Recent population expansion in A cervicapra as observed from both nuclear and mitochondrial analyses, as well as their increasing adaptation to human-modified landscapes point to the stability of this species, renewing hopes for their long-term survival. This study adds to the increasing body of work on phylogeography in the Indian subcontinent and tells us how both historic (geographic) and current anthropogenic processes might be shaping the demography of species in tandem. Similar future research that considers genetic markers with different evolutionary histories would give us a clearer overall picture and help in greenlighting strategies for conservation, especially in the case of endangered or vulnerable species.