Introduction

Developing an effective conservation strategy for long-ranging mammalian species, often traversing through a human-modified landscape, requires fine-scale information on resource requirements (Sanderson et al. 2002). Multiple studies in recent times have concluded the existence of individualistic as well as sex-linked variability in patterns of resource utilization across various taxa to account for differential physiological requirements at various life-history stages (Sprogis et al. 2018; Ofstad et al. 2019; Carrasco et al. 2020; Johnson and Derocher 2020; Rehnus and Bollmann 2020). Individual-level studies on animals with distinct morphological marks or tags can monitor their life history while also providing information such as abundance and survivorship at the population level, often requiring ample resources (Galvis et al. 2014; Sadhu et al. 2017). For species lacking such markings, invasive physical tags, e.g. coloured bands or collars, ear tags, transmitters and skin brands are used that may be lost over time and are cost- and time-intensive (Woods et al. 1999).

With the recent advances, permanent genetic tags such as multi-locus microsatellite genotypes and molecular sexing from non-invasive faecal samples of individuals overcome these limitations and hence, are useful for long-term monitoring across large landscapes and over the focal species’ lifespan. Molecular tracking and faecal DNA-based abundance estimation have been used as cost-effective techniques alternative to physical tagging or camera trapping with comparable accuracy and precision (Ernest et al. 2000; Janečka et al. 2011; Hedges et al. 2013; Caniglia et al. 2014; Gray et al. 2014).

However, some key parameters (e.g. age of the individual, the decay rate of the faeces deposited) for obtaining reliable demographic estimates from non-invasive sampling have been calibrated for only a limited number of species (Eggert et al. 2003; Flagstad et al. 2012; Hedges et al. 2013; Poutanen et al. 2019). Additionally, molecular tagging poses a substantial challenge in obtaining reliable genotypes from the inferior quality faecal DNA due to intrinsic errors, i.e. allelic drop-outs and false alleles (Fernando et al. 2003a; Scandura et al. 2006; Sethi et al. 2014). Too few microsatellites lacking optimal resolution produce ‘shadow’ genotypes caused by merging individual identities, thereby underestimating the actual count (Mills et al. 2000; Sethi et al. 2014). Conversely, ‘ghost’ individuals are produced if samples from the same individual generate non-identical genotypes due to the accumulation of genotyping errors (Creel et al. 2003; Lampa et al. 2013; Sethi et al. 2014). The proportion of ‘ghost’ individuals is positively correlated with the number of microsatellite loci used for individual identification, incurring up to five-fold positive bias in enumerating unique individuals (Creel et al. 2003). Multiple studies (Wang et al. 2012; Rothstein et al. 2016; Wang 2016) have developed algorithms incorporating genotyping error rates to minimise ‘ghost’ errors.

The Asian elephant (Elephas maximus) has been obliterated from 95% of the historical stronghold, whereas, in India, its geographic distribution has shrunk by 70% since the 1960s (Sukumar 2006). Additionally, the loss of a quarter of the elephant habitat in India within the last century (Padalia et al. 2019) underscores the importance of regular monitoring of elephant populations to understand habitat use, movement patterns, estimation of abundance and dynamics of human-elephant interactions. Monitoring elephants at the individual level based on variables such as body shape, ear, tail, and tusk require enormous efforts to maintain such a database over the species’ lifespan and across the vast landscape it inhabits (Morley and van Aarde 2007; de Silva et al. 2011; Goswami et al. 2019).

For Asian elephants, non-invasive genetic sampling (gNIS) has been employed to address a broad range of research topics, i.e. population monitoring (Vidya et al. 2007; Flagstad et al. 2012; Hedges et al. 2013; Chakraborty et al. 2014; Gray et al. 2014; Zhang et al. 2015), social organization (Vidya and Sukumar 2005a; Ahlering et al. 2011a), population and landscape genetics (Fernando et al. 2003b; Vidya et al. 2005; Goossens et al. 2016; De et al. 2021; Parida et al. 2022), demographic history (Sharma et al. 2018), and phylogeography (Vidya et al. 2009). However, the sets of markers used by different research groups are often non-overlapping, rendering the data difficult to compare, as is the case with microsatellite data of most other species (Garner et al. 2005; Li and Kimmel 2013). It is critical to generate harmonized, comparable data, especially for threatened taxa having a wide distribution range for conservation planning at the landscape level (de Groot et al. 2016).

Molecular sexing in elephants has been carried out using either of the following strategies: restriction fragment length polymorphism (RFLP) (Fernando and Melnick 2001; Vidya et al. 2003; Ahlering et al. 2011a; Hedges et al. 2013), polymerase chain reaction (PCR) followed by gel-based evaluation (Gupta et al. 2006; Vidya et al. 2007; Munshi-South et al. 2008; Ahlering et al. 2011b; Chakraborty et al. 2014), PCR and subsequent fragment analysis of fluorescent dye labelled product or quantitative PCR (qPCR) based assay (Aznar-Cormano et al. 2021).

A gNIS based method for cost-effective identification of unique individuals and their sex is crucial for augmenting information on elephant ecology, demography, and understanding the patterns of human-elephant interactions at the landscape scale (Chiyo et al. 2011; Chakraborty et al. 2014). Therefore, we aimed to propose a microsatellite panel with sufficient resolution while using a minimum number of microsatellites to avoid compounding genotyping errors that inflate unique individual counts (Creel et al. 2003). We prioritized co-amplification of the microsatellite markers along with a sexing marker in a single multiplex, thereby minimizing reagent and plasticware consumption and chances of human errors during handling. The specific objectives of this study using degraded elephant faecal DNA were: (i) to standardize a co-amplifiable multiplex panel consisting of a sex-linked marker and microsatellite markers that would provide low misidentification probability and genotyping error rates, and (ii) empirical validation of the proposed microsatellite panel using a larger set of published markers.

Materials and methods

Study area

The sampling for this study was conducted in the human-dominated landscape around Rajaji Tiger Reserve (RTR) in the Shivalik Elephant Reserve (SER), with the majority of the samples being from areas under Haridwar and Dehradun Forest Divisions (FD), Uttarakhand, north-west India.

Field sampling and DNA extraction

We opportunistically collected spatially segregated elephant faecal samples (n = 149, Fig. 1a) from the landscape between August 2014 and May 2018. We placed the samples in sterile containers (50 ml) over silica gel following Wasser et al. (1997). The samples were oven-dried at 56 °C placed ~ 15cm apart from one another (~ 10 samples per rack) < 30 days post-collection. We set the airflow velocity of the oven at a minimum to reduce the chance of cross-contamination. The dried samples were capped and stored at room temperature away from sunlight for up to 32 months until DNA extraction.

Fig. 1
figure 1

Locations of the a elephant faecal samples collected (n = 149) and b samples used in analyses (n = 105) along with their identified sex from the vicinity of the Rajaji Tiger Reserve, Uttarakhand, India

Faecal samples of multiple mammalian species were routinely dried together in the same oven, and we assessed cross-contamination possibilities based on > 1000 DNA sequences from samples processed using this protocol. Downstream Sanger sequencing of amplicons (~ 400 bp) using universal mammalian primers consistently yielded distinct sequences of the target species (Goyal et al. unpublished data) based on nucleotide BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Upon scrutiny of the raw chromatograms (Q > 20), none of the samples displayed any ambiguous peaks across the variable sites. Similar oven drying protocols have been used in non-invasive genotyping studies for sample desiccation before long-term storage (Murphy et al. 2000; Borthakur et al. 2011, 2013; De et al. 2021, 2022). Therefore, we believe that the chance of the results being affected by cross-contamination during oven drying of elephant faecal samples is negligible.

We scraped the top layer of the dung boluses with a sterile blade into 2 ml polypropylene centrifuge tubes. We used QIAamp DNA Stool Mini Kit (Qiagen GmbH, Hilden, Germany) to isolate DNA from the faecal scrapings using the manufacturer-specified stool DNA extraction protocol after overnight digestion in stool lysis buffer at 56 °C. We carried out all the DNA extractions in a separate low DNA isolation facility with negative controls to track contaminations.

PCR amplification

Selection of microsatellite markers

We initially selected a published multiplex microsatellite panel (EMU13, EMU17, LafMS02 and LafMS05; Nyakaana and Arctander 1998; Kongrit et al. 2008) exhibiting high amplification success with faecal DNA (De et al. 2021). We modified the combination to exclude a marker (EMU17) producing inconsistent ‘stutter’ bands with degraded samples during initial testing (unpublished data) and replaced with two more loci (EMU07 and EMU12; Kongrit et al. 2008) of higher success and low error rates (De et al. 2021). Finally, the selected panel consisted of five dinucleotide microsatellite loci, i.e. EMU07, EMU12, EMU13, LafMS05 and LafMS02. This combination of microsatellite markers could assign the highest number of individuals to their respective identities among several panels tested using a blind test approach (De et al. 2022).

Molecular sexing

For molecular identification of sex from faecal DNA, we selected an elephant-specific Y-chromosome linked Amelogenin marker (Ahlering et al. 2011b) for co-amplification with the microsatellite panel. The sex marker, AMELY2, produces a 121 bp amplicon in Asian elephant males and does not amplify in females (Ahlering et al. 2011b). The criteria for assigning a sample of female origin were non-amplification of the AMELY2 fragment and a minimum of four microsatellites co-amplifying. For males, co-amplification of AMELY2 and at least four microsatellites were required.

Reaction composition and conditions

The forward primers of the microsatellite panel (n = 5) and the sexing marker (n = 1) were labelled with fluorescent G5 dyes (Applied Biosystems, CA, USA). Each reaction consisted of 5 µl Qiagen Multiplex PCR Master-mix, 10µg bovine serum albumin (BSA), 2 µl DNA template and 1.0 µl labelled primer cocktail containing equal proportions of 10 µM microsatellite and sexing markers and nuclease-free water to make the reaction volume up to 10 µl. The thermal cycling profile (Veriti thermocycler, Applied Biosystems) consisted of initial denaturation at 95 °C for 15 min, 45 cycles of denaturation at 95 °C for 30 s, touchdown annealing at 62 to 52 °C for 1 min—a drop of 1 °C every two cycles up to 20th cycle and 52 °C for rest of the 25 cycles, extension at 72 °C for 40 s; final extension at 60 °C for 30 min and finally, hold at 4 °C. To screen for PCR performance, we visualized the products in 2% w/v agarose gel stained with ethidium bromide.

We followed a multi-tube approach (Taberlet et al. 1996) to restrict genotyping errors with minor modifications. Instead of a two-step protocol to amplify up to seven replicates (Taberlet et al. 1996), we replicated the multiplex reaction a total of four times (Ruiz-González et al. 2013; Bhatt et al. 2020; De et al. 2021) with each DNA isolate (n = 149).

All sets of PCRs included positive and negative controls to track PCR failure and reagent contamination. We dissolved 1 µl of each PCR product in 8.93 µl HiDi formamide and 0.07 µl Gene Scan 500 Liz size standard (Applied Biosystems) and denatured at 95 °C for 5 min before loading in an ABI 3500xl automated genetic analyser (Applied Biosystems) for fragment analysis.

Data analysis

The resulting electropherograms were visualized using GENEMAPPER 5.0 (Applied Biosystems). Automated allele scoring was performed subsequently while we manually verified each call. We only considered alleles that produced sharp, clear peaks without any ambiguity caused due to stuttering or + A peaks (Matsumoto et al. 2004). Additionally, we carefully reviewed any peak under a relative fluorescence unit (RFU) value of 1000. We removed data with RFU < 500 from all further analyses as the probability of allelic drop-out (ADO), inversely proportional to peak heights, has been demonstrated to be ≤ 5% at RFU ≥ 487 (Gill et al. 2015). We used a lower threshold peak height ratio (PHR; signal intensity ratio of the larger allele in terms of fragment length to the smaller allele) of 40% (Mäck et al. 2021) for heterozygous genotypes to minimize false alleles in the dataset. Considering samples with high RFU ensured a greater quality of data with unambiguous profiles, falling in the top two categories of ‘SEQ-score’ 1 and 2 as suggested in the literature (Scandura et al. 2006).

We binned the resulting raw fragment length data using AUTOBIN v0.9 Excel Macro (Salin 2010). The criteria for a consensus homozygous genotype was the amplification of the same allele in a minimum of three replicates, while a consensus heterozygous genotype was recorded in the case at least two replicates produced the exact same sets of alleles (Sawaya et al. 2011; Morin et al. 2016). Any replicate genotypes that did not pass the criteria for consensus were recorded as missing data. We considered a sample of male origin when the AMELY2 fragment co-amplified with at least four microsatellites, while for a female individual, non-amplification of AMELY2 and amplification of four microsatellites were required. We used GIMLET v1.3.3 (Valière 2002) to compute ADO and false allele (FA) rates per heterozygote and homozygote genotypes, respectively, based on the four sets of multi-tube repeat genotypes.

We proceeded for individual identification with all samples with a minimum of 4-loci consensus microsatellite data using the program CERVUS 3.0.7 (Kalinowski et al. 2007). A threshold of at least four-loci match (i.e., exact matches across five-loci and four-loci) along with the molecular sexing data was set as the criteria to assign unique genotypes. We used the R package ‘diveRsity’ (Keenan et al. 2013; R Core Team 2019) to compute the diversity statistics for the microsatellite markers from the identified individual data. We tested for linkage disequilibrium (LD) using GENEPOP 4.7 (Rousset 2008) with 1000 dememorization, 100 batches and 1000 iterations per batch. Null allele frequencies were calculated using the EM algorithm (Dempster et al. 1977) as implemented by the software FreeNA (Chapuis and Estoup 2007). We used GenAlEx 6.5.0.1 (Peakall and Smouse 2006, 2012) to calculate the probability of misidentifying two different random individuals as a single individual (PID) and two full siblings as an individual (PIDSib) for the chosen panel of markers.

Empirical validation of panel resolution

We amplified and scored additional microsatellite markers (n = 9; EMU03, EMU04, EMU09, EMU10, EMU11, EMU14, EMU15, EMU17 and LafMS03) (Nyakaana and Arctander 1998; Kongrit et al. 2008) with the samples meeting evaluation criteria using reaction conditions described in De et al. (2021). We used the five-loci and 14-loci (including the original five microsatellite loci) panels separately to identify first-order relatives (parent-offspring and full siblings) using the software COLONY v2.0.6.6 (Jones and Wang 2010) under a full-likelihood framework with medium run lengths and weak sib-ship prior for three independent runs. We screened for the potential ‘shadow’ error by comparing the incongruity between the two datasets. Thereafter, we manually quantified the sample pairs assigned as first-order relatives with the 14-loci panel but merged as same individuals using 5-loci genotypes.

Results and discussion

Marker amplification and genotyping errors

We recorded a 74.1% average success rate (calculated as the proportion of successful consensus genotypes out of the total attempted genotypes) across the five microsatellite loci, ranging between 59.7 and 83.2% successful amplification (Table 1). The PCRs yielded microsatellite amplicon sizes between 101 and 157 bp, while AMELY2 fragments were 121 bp (Fig. 2). All positive controls amplified successfully, along with accurate identification of sex. Neither of the negative controls showed any amplification. Most alleles (95.6%) produced a peak height of > 1000 RFU (Supplementary Fig. S1). Allele frequencies across the markers ranged from 0.01 to 0.62 (Supplementary Fig. S2). The mean ADO rate across loci was 0.11 ± 0.02, and the FA rate was 0.05 ± 0.01. Null allele frequencies varied between 0.12 and 0.22 with a mean of 0.15 ± 0.02. None of the pairs of the loci showed any evidence for linkage disequilibrium (LD) at a 95% level of significance. The locus LafMS02 produced allelic peaks followed by a ‘+A’ peak, one bp apart (Fig. 2). We retained the allelic peak and ignored the trailing +A despite having comparable peak height. Accurate allele scoring from faecal samples has been a significant limitation in getting reliable data from non-invasive samples. However, characterizing each observed allele across loci using stutter-to-peak and first-to-second allele ratios (Table 1), as suggested by Matsumoto et al. (2004), may aid accurate allele scoring by providing a quantitative criterion.

Fig. 2
figure 2

Electropherograms of the microsatellites and the Y-linked sex marker co-amplified from faecal DNA extracts of a male and b female elephant individuals

Table 1 Characteristics of the microsatellite markers (n = 5) co-amplified from faecal DNA (n = 149) of the Asian elephant

Individual identification

We dropped samples with lower than four-loci microsatellite consensus genotypes (n = 44) from further analyses. There were 148 missing genotypes for the 44 samples discarded. The majority of the gaps (n = 127; 85.8%) within the dropped samples were caused by non-amplification, while consensus could not be reached for the rest 14.2% (Supplementary Fig. S3).

The remaining samples (n = 105) contained only 4.6% missing genotypes across the five-microsatellite panel. Complete five-locus data were available for 81 samples (77.1%), while the rest of the 24 samples (22.9%) had one missing genotype each. Based on the threshold set for microsatellite similarity, CERVUS algorithm identified 151 pairs of matching genotypes (putative recaptures) out of the 5460 pair-wise comparisons possible between the samples under analysis (n = 105). Out of these matching sample pairs (n = 151), 47% matched exactly across all five microsatellite loci, whereas 53% had genotypes that matched for four loci, with the remaining one locus data either missing or mismatching. We identified a minimum of 51 unique genotypes or individuals from the samples analyzed (n = 105). There were 39 individuals (76.5%) appearing in the dataset only once (single genetic captures), while 23.5% of individuals (n = 12) showed multiple captures each, thereby accounting for 62.9% of the total samples analyzed. We observed five individuals with two to three captures, one individual with four captures, four individuals with six to seven captures and one individual each with nine and fifteen captures, respectively.

Molecular sexing

Molecular sexing was successful for all 105 samples. We identified 73 samples of male and 32 samples of female origin (Fig. 1b). Out of 51 individuals identified using microsatellite data, 25 (49.1%) were identified as males using molecular sexing, while 26 (50.9%) were females.

Marker characteristics

The number of alleles per locus (Na) varied from four to seven (Mean Na = 4.80 ± 0.58) (Table 1). For the 51 individuals, estimates of expected heterozygosity (HE) across loci were in the range of 0.54 to 0.77 (Mean HE = 0.63 ± 0.04). The mean observed heterozygosity was HO = 0.40 ± 0.05 (0.22–0.56). PID and PIDSib varied between 0.09 and 0.27 and 0.38–0.55 across the loci (Table 1).

We designed this microsatellite panel to reliably identify Asian elephant individuals under the worst-case scenario of using degraded DNA for genotyping as we used dry-stored samples kept at room temperature up to ~ 3 years after field collection. We achieved a mean success rate (74.1%) and genotyping error rate (Table 1; allelic dropout: 0.11 ± 0.02, false alleles: 0.05 ± 0.01) comparable to other non-invasive studies on Asian elephants (Vidya and Sukumar 2005b; Flagstad et al. 2012; Hedges et al. 2013; Chakraborty et al. 2014; Gray et al. 2014; Goossens et al. 2016). Therefore, adopting a similar methodology of a single multiplex PCR with four replicates using elephant faecal DNA would be useful for a rapid sweeping survey to identify individuals and their sex under financial and logistic constraints. The efficacy of the prescribed panel with dried samples stored in room temperature indicates that post-facto analyses can also be taken up using this approach where faecal samples have already been collected for nutritional, parasitological or endocrinological studies.

The five-loci microsatellite panel suggested in this study has a probability of misidentification rate of 0.04% for random individuals (recommended range < 1%; Waits et al. 2001) and 3% between full siblings (Supplementary Fig. S4). It is possible to achieve even lower probabilities of ‘shadow’ genotypes (Mills et al. 2000) dataset by scrutiny of matching genotypes for differences in molecular sexing results, bolus morphometry, location and time of sampling.

Validation of the suggested panel

It has been established that ‘ghost’ individuals, directly proportional to the number of markers used for individual identification, may incur significant positive bias while estimating wildlife abundance (Creel et al. 2003; Lampa et al. 2013, 2015; Winiarski and McGarigal 2016; De et al. 2022). Due to the prevalence of genotyping errors in non-invasive microsatellite data, several studies suggest using the minimum number of markers to obtain sufficient resolution (i.e. reasonably low PID) for identifying unique individuals (Waits et al. 2001; Creel et al. 2003; Wang 2016). The typical social organization of elephants warrant validation of the suggested panel for its ability to distinguish between closely-related individuals, as the theoretical PIDsib obtained may be considered sub-optimal.

We recorded 86 dyads of first-order relatives using the 14-loci data set generated for the purpose of validation. In comparison, the suggested 5-loci panel could correctly differentiate 83 dyads (96.5%) as individuals. Discrepant merging of three dyads of first-order relatives while using the five-loci panel translated into a ‘shadow’ error of only 0.2% across 1275 possible dyads (51 individuals). Therefore, the empirical validation suggests that the multiplex five-loci panel, along with the sexing marker, is sufficient for identifying unique elephant individuals for monitoring purposes. Using the additional nine markers did not add any significant information towards individual identity. This is clearly indicated by the cumulative PID and PIDsib values plotted against the number of markers used for individual identification (Supplementary Fig. S4).

Cost-effectiveness of the described protocol

We propose collecting 10 to 15 g of faecal samples from the outer surface of an intact elephant dung bolus as a DNA source during routine field surveys or patrolling activities. Following the protocol outlined in this study, the cost of laboratory analyses to obtain sex and individual identity is US$ 18/sample, including reagents and laboratory staff (Supplementary Table S1). We excluded the cost of permanent laboratory equipment from the calculation. In comparison, Hedges et al. (2013) reported a cost of US$ 55/sample (reagents and human resources) for laboratory analyses to estimate demographic parameters of the Asian elephant in Laos, including microsatellite markers (n = 8), mtDNA sequence and RFLP-based molecular sexing data, including sequencing costs (US$ 14). While acknowledging intrinsic differences such as age of the samples being collected, impact of environmental conditions on DNA quality, skill of the human resources involved, we believe the protocol suggested in the current study provide a cheap alternative for faecal DNA based monitoring of the Asian elephant.

Conclusion

We suggest harmonized use of the multiplex panel described in this study for multi-locus genotyping during future status surveys undertaken for Asian elephants across its range. Identifying elephant individuals and their sex using the optimized single multiplex panel provides a high potential to reveal additional information on (i) sex-specific spatio-temporal distribution patterns, abundance, habitat use (Fig. 1b), (ii) population estimation from periodic surveys, and (iii) understanding human-elephant interaction including individual-based crop-raiding behaviour. Based on the requirement, additional markers (n = 9) can be supplemented as described in a previous pan-India study using non-invasive faecal sampling of the Asian elephant (De et al. 2021) for information such as population and landscape genetics and kinship patterns. Detailed information on the fine-scale spatio-temporal resource utilization patterns is lacking for most of the elephant populations in India (Vijayakrishnan et al. 2020). Hence, we believe that an effort to understand the demography and individual-level distribution and ranging patterns using the described protocol would provide additional insight, which is critical to formulating successful elephant conservation strategies.