Introduction

Measures of census (Nc) and effective (Ne) population size are very important for effective management and conservation of natural populations (Charlesworth 2009; Frankham 2005; Gurov et al. 2017; Hedgecock et al. 2007; Mowat and Strobeck 2000). Nc is a direct measure of the number of individuals present in a population and provides a demographic estimate of population viability. In contrast, Ne reflects the number of reproductive individuals that contribute to the next generation and is a measure of the rate at which genetic diversity is lost due to genetic drift (Frankham et al. 2002; Luikart et al. 2010). However, estimating these two population parameters in the wild remains a significant challenge, especially for rare and elusive species. Although both parameters can be estimated directly from field observation or demographic information (Bata et al. 2017; Caballero 1994; Frankham 1995; Gittleman 2001; Hedwig et al. 2018; Johnson et al. 2005; Kimura and Crow 1963; Leberg 2005; Nunney and Elam 1994; Ruiz-Olmo et al. 2001; Schmeller and Merilä 2007; Wright 1938), obtaining these data can be very logistically difficult for wild populations.

Genetic data are a common alternative (Do et al. 2014; Jones and Wang 2010; Miller et al. 2005; Otis et al. 1978; White and Burnham 1999) and have been applied to a wide range of wildlife species (Banks et al. 2003; Bergl and Vigilant 2007; Frankham 1995; Langergraber et al. 2007; Lucchini et al. 2002). Tissue samples can provide high quality genetic material, but their collection is not always feasible for at-risk species. As an alternative, genetic data can be collected from non-invasive samples, such as shed hairs or feces, without disturbing the target species. Such samples tend to be of lower quality and present numerous technical challenges (Clemento et al. 2009; Dawnay et al. 2011; Dou et al. 2016; Ernest et al. 2000; Granjon et al. 2020; Puechmaille and Petit 2007). However, non-invasive sampling enables the collection of a higher volume of samples than may be possible if using tissue. Furthermore, a plethora of different methods have been developed to estimate Nc or Ne from genetic information using a single or multiple sample periods (Do et al. 2014; Jorde and Ryman 2007; Miller et al. 2005; Otis et al. 1978; White and Burnham 1999). Several studies have used one or both methods to produce credible population size values (Arandjelovic et al. 2010; Bellemain et al. 2005; England et al. 2010; Tallmon et al. 2004). Unlike methods that require multiple sampling sessions, estimating from a single sampling period is often very useful for species where sampling is costly or difficult over multiple time periods.

Nevertheless, these approaches require sufficient available data to obtain precise and accurate estimates (Miller et al. 2005; Waples 2006; Waples and Do 2010). Fortunately, tools such as the NeOGen software (Blower et al. 2019) are now available that allow researchers to determine in advance the minimum number of samples and loci needed to provide a reliable estimate of Ne. Taken together, these approaches can provide crucial information and increase the essential knowledge base that informs conservation and management decisions of threatened or endangered species.

One such threatened species for which a strong knowledge base is lacking is the mandrill (Mandrillus sphinx). This primate species is endemic to Central Africa and is distributed across the tropical forests of Cameroon, Equatorial Guinea, Congo, and Gabon (Abernethy and Maisels 2019; Kingdon 1997). Mandrills are highly social and live in large groups or "hordes," which can make them particularly vulnerable to hunting pressure and habitat loss (Abernethy and Maisels 2019). Field observations have reported that hordes may have shrunk or disappeared in some areas of the Cameroon and Equatorial Guinea forests where pressure is more intense (Abernethy and Maisels 2019). Because of this, the mandrill is listed in Appendix I by the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and categorized as ״Vulnerable״ on the International Union for Conservation of Nature (IUCN) Red List (Abernethy and Maisels 2019; Oates and Butynski 2008).

Wild mandrills are generally difficult to observe directly due to the closed habitat that they occupy (Abernethy and Maisels 2019; Oates and Butynski 2008), making counts of horde size difficult. Nevertheless, the first estimates of mandrill Nc were obtained using camera traps and direct observations from a focal horde at the Station d'Etudes des Gorilles et Chimpanzés (SEGC) in the Lopé National Parc (LNP), Gabon (Abernethy et al. 2002; Rogers et al. 1996). This horde frequents the savanna-forest mosaic in the northern portion of the park during the breeding season (June to September, with a peak in reproductive effort in July–August), enabling direct counts. The size of the horde was first estimated to be over 600 individuals (Rogers et al. 1996), and a second count reported a range of 340–845 individuals, with an average of 620 (Abernethy et al. 2002). More recent unpublished observations have suggested as many as 1,250 mandrills in the horde (Lehmann D., 2019, personal communication). In contrast, observational estimates of Nc from another horde in Moukalaba-Doudou National Park in Gabon are comparatively smaller (169–442 individuals (Hongo 2014). Although these intensive field studies have provided valuable information on the likely range in the horde sizes, it is difficult to replicate these kinds of studies in other parts of the mandrill range without taking a non-invasive genetic approach.

Therefore, the objective of this study was to use a panel of 16 microsatellite loci to genotype fecal samples obtained from successive annual sampling (2016–2018) of the SEGC mandrill horde to: (1) estimate the census size (Nc) of the SEGC horde using several mark-recapture genetic estimators and compare these estimates with those previously obtained in the field, (2) validate the minimum number of samples and loci needed to obtain accurate estimates of Ne, and (3) derive estimates of Ne using a range of available genetic estimators. This research will also allow us to evaluate the feasibility of non-invasive genetics to monitor the population size of wild mandrills at other sites across their range.

Materials and methods

Study site and sample collection

Samples were collected in the northern part (00.12S, 11.36E) of the LNP, adjacent to the SEGC field station in Gabon. Although the LNP covers an area of approximately 5,000 km2 of lowland tropical rainforest, the northern part of the SEGC is dominated by a mosaic vegetation cover of grassy savannahs and fragments of natural forest (White 1994; White and Abernethy 1997). The Ogooué River borders the park at its northern-most extent and provides a natural barrier for many animals (Abernethy et al. 2002). The site is characterized by two dry seasons: the little dry season from December to February and the long dry season that extends from mid-June to mid-September. Temperatures at the site vary little with a mean monthly minimum of 20 ± 23.8°C in the dry season and 26 ± 33.8°C in the wet season (1984 ± 98) (Abernethy et al. 2002).

We sampled mandrill feces (n = 927) from the SEGC horde over three successive years (2016–2018) during the long summer dry season (July and August). This period corresponds to the mandrill breeding season, when mature males and females are present in the horde (Abernethy et al. 2002). We collected only fresh (< 3 h) fecal samples to maximize DNA quality for downstream molecular analyses (Regazzi 2007). Mandrill fecal samples are similar to that of other large primates, in that they are generally solid and physically preserve well. We placed fecal samples in a 50 mL Falcon tube half-filled with silica gel beads, as previous work has shown that this storage medium is the best for preserving nuclear DNA in central African forest antelope (Soto-Calderón et al. 2009). We stored the samples in a freezer at − 20°C prior to DNA extraction. In an unrelated study, a small number of females and males of the horde were fitted with radio collars, allowing us to locate the focal horde and collect fecal samples more easily. Aliquots of blood (n = 14) and hair (n = 9) samples were also collected from a subset of these individuals.

DNA extraction and amplification of microsatellites

We extracted DNA from fecal samples collected 1 to 2 months after collection using the QIAamp Fast DNA Stool Mini Kit (Qiagen, CA). DNA from blood and hair was extracted using the DNeasy Blood and Tissue kit (Qiagen, CA). All extractions included blanks to control for DNA contamination. We performed all extractions in a dedicated fecal DNA extraction room, which was kept separate from all other sources of DNA to minimize the risk of contamination.

We selected a panel of 16 microsatellite loci previously isolated from mandrills (Benoit et al. 2014) and amplified them in four multiplex reaction mixes (M1-4), each containing four loci (Supplementary Table 1). Forward primers were labelled with fluorophore dyes (labeled 6-FAM, HEX, or NED) to discriminate individual loci within each multiplex. We performed polymerase chain reaction (PCR) amplification of each multiplex in a total volume of 10 µl. PCRs contained 0.1 µl of each primer (reverse and forward) at 0.2 µM final concentration, 0.5 µl of 20 mg/ml BSA (Bovine Serum Albumin), 5 µl of 2X multiplex PCR kit (Qiagen, CA), 1.7 µl of RNase-free water, and 2 µl of DNA extract. We performed PCR amplification using a touch-down protocol, with duration of cycling steps following the PCR kit’s manufacturer instructions. The cycle began with an initial denaturation step for 15 min at 95 °C to allow activation of the hot start Taq polymerase. For M1 and M3, we then followed this step with 10 cycles of 30 s at 95°C for denaturation, 90 s of annealing at 60°C (with a 1°C decrease after each cycle), and a 60-s extension step at 72°C. We then performed 30 additional cycles using the following conditions: 94°C for 30 s, 50°C for 90 s, and 72°C for 60 s, followed by a final extension step of 60°C for 30 min. PCR conditions for M2 and M4 were the same as for M1 and M3, except that the initial annealing temperature during the first 10 cycles of the protocol started at 63°C, and decreased to 53°C over the course of the reaction. PCR products were then analyzed on an ABI3130xl sequencer at either the Department of Biological Sciences, University of New Orleans, USA, or the Georgia Genomics and Bioinformatics Core (Georgia, USA).

Microsatellite genotyping

We determined raw allele sizes for each microsatellite locus using the GENEIOUS R 6.1.8 program (Kearse et al. 2012) and binned alleles using the TANDEM program (Matschiner and Salzburger 2009). Because of the generally low amounts of DNA in fecal samples and the high risk of genotyping errors, we quantified rates of allelic dropout in a pilot study to determine the number of replicates needed to reduce the probability of obtaining a false homozygote to less than 0.05 (Bellemain et al. 2005; Flagstad et al. 2004; Paetkau 2003). Calculation of error rates from this preliminary analysis revealed that three replicates were sufficient to minimize the risk of genotyping false homozygotes (Supplementary Table 2). In this pilot study, we also calculated the probability of identity (PID), or the probability of individuals having the same genotype by chance. In the absence of information on the kinship structure or level of genetic diversity in the focal horde, we used the PIDsibs estimator because it provides conservative estimates of PID based on the possibility that individuals in the population may be related (Evett and Weir 1998; Waits et al. 2001). We estimated the per-locus values of PIDsibs using the GIMLET version 1.3.3 program (Valière 2002). To determine the minimum number of loci needed to differentiate individuals, we ranked loci from highest to lowest PIDsibs and calculated cumulative scores across ordered loci until the PIDsibs value fell to < 0.01 (Supplementary Table 3). Our estimates of PID indicated that a minimum of six least informative loci are needed in order to reliably differentiate individuals. Therefore, assuming that data for some loci may be lost due to conflicts between PCR replicates, only samples that amplified for at least 9 loci in the first replicate were genotyped for the remaining two. From these three replicates, we constructed multi-locus consensus genotypes using GIMLET (Valière 2002). Based on error rates calculated in the pilot study, we called genotypes as heterozygous when two alleles appeared in at least two independent replicates, whereas homozygotes were only accepted if the same allele appeared alone in all three replicates (Bonin et al. 2004). Samples that did not have consensus genotypes for at least seven loci, which is one more than the minimum required as per our PID estimation, were discarded from downstream analyses.

We performed tests of deviation from Hardy–Weinberg equilibrium (HWE) and linkage equilibrium (LE) using the program ARLEQUIN version 3.5 (Excoffier and Lischer 2015) and corrected for multiple hypothesis testing using the Holm-Bonferroni method (Gaetano 2018; Holm 1979). We also evaluated consensus genotypes for the presence of three common genotyping errors: non-amplification of specific alleles (null alleles), small allele bias, and errors due to stutter using the program MICROCHECKER version 2.2.3 (Van Oosterhout et al. 2004).

We identified duplicate genotypes in the dataset using a custom Python script that counted matching loci in all possible pairwise combinations of multi-locus genotypes. We considered two samples to belong to the same individual when they shared six or more matching genotypes with no more than two mismatching alleles (Paetkau 2003). Because genotypes from non-invasive samples tend to have missing data, we also considered any multi-locus pairs with fewer than six matching loci as duplicates if their shared loci had a cumulative PIDsibs < 0.01 (Waits et al. 2001). For downstream analyses requiring unique genotypes, the least informative genotype of the duplicated pair was removed from the dataset. In cases where missing data resulted in pairs of genotypes with fewer than six loci that amplified in both genotypes, it was impossible to determine whether the two originated from the same individual. In these ambiguous cases, the least informative genotype of the pair was also removed from the dataset.

Genetic estimation of Nc using single and multiple sampling periods

We estimated the Nc of the SEGC mandrill horde using several genetic models based on single and multiple sampling periods. All these Nc estimators assume that each multi-locus genotype can be "captured" one or more times during the same or different sampling periods and that capture heterogeneity may exist. Here, duplicate samples represent recaptures. These estimators also assume a closed population (Miller et al. 2005; White and Burnham 1999).

We estimated Nc from each individual sampling period (2016, 2017, and 2018) by applying two estimators from the CAPWIRE package (Miller et al. 2005) implemented in the program R (R Development Core Team, 2017). The two estimators are: the equal capture model (ECM), which assumes no capture heterogeneity in the dataset, and the two-rate innate model (TIRM), which accounts for heterogeneity in capture probabilities. Both estimators calculate Nc on a maximum likelihood basis from a single sampling session, utilizing multiple captures of genotypes from that session (Miller et al. 2005). To examine the effect of sample size, we also pooled the samples from the three periods into a single dataset to estimate Nc, since the successive sampling periods were only one year apart and are likely to reflect the same cohort.

We also compared estimates of Nc using the multi-sample estimators implemented in the program MARK version 9.0 (White and Burnham 1999). The program estimates Nc using several closed population models that each incorporate different capture probabilities: the Mo model, where capture probabilities are assumed to be constant; Mt, where capture probabilities vary with time; Mb, where there is a behavioral response to capture; and Mh2, where capture probabilities vary by individual animal. MARK also allows combinations of these factors (Mth, Mtb, Mtbh). For analyses carried out in the program, we first aggregated individual multi-locus genotypes observed during the three sampling periods and compiled a "capture" and "recapture" history using the GenCapture version 1.4.9 program (McKelvey and Schwartz 2005; Schwartz et al. 2006). To choose the best model for our data, we compared each model’s AICc (Akaike information criterion corrected for small sample size) and respective weighting values (w).

Estimation of the minimum number of samples and loci needed to estimate Ne

We used the program NeOGen (genetic Ne for Overlapping Generations) Ver. 1.3.0.6.a1 (Blower et al. 2019) to estimate the minimum number of samples and loci needed to provide an accurate and precise estimate of Ne. NeOGen estimates the number of samples and loci required to provide a reliable Ne estimate using species-specific demographic and genetic parameters (Blower et al. 2019) and the degree of linkage disequilibrium based on the LDNe algorithm (Waples and Do 2010). The model is applicable to iteroparous species with overlapping generations, as is the case for mandrills. Demographic and genetic data on wild populations of mandrills are unfortunately scarce. We therefore gathered available data on reproductive age and male mortality rates from captive populations of mandrills at the Centre International de Recherches Médicales de Franceville (CIRMF), Gabon (Setchell et al. 2005) and from expert opinion (Abernethy K. and Lehmann D., personal communication) (Table 1). As data on female mortality for mandrills was lacking, we used demographic data available from baboon populations (Bronikowski et al. 2016). We evaluated the power of Ne estimation using 13 or 10 loci and a maximum sample size of 400 genotypes, with confidence intervals for Ne assessed at every 100 genotypes. This simulated sample size is greater than the actual sample size in the present study, allowing us to determine the minimum number of samples needed to obtain an accurate estimate of Ne. Ten loci represent the average number of loci that were amplified in all samples, and 13 is the maximum number of loci used. Since the exact size of the mandrill horde is not known, we ran NeOGen using Nc values of 620, 845, and 1250. These values are drawn from Abernethy et al. (2002) and from D. Lehmann (personal communication, 2019).

Table 1 Mandrill (Mandrillus sphinx) input parameters used in the NeOGen software

Genetic estimation of effective population size (Ne) using single and combined sample period

We used the unique genotypes to provide estimates of effective population size (Ne). We first estimated Ne using the samples from each individual sampling period using available one-sample estimators available in the program NeESTIMATOR Version 2. 01 (Do et al. 2014), namely: the linkage disequilibrium method between loci (LDNe), the excess heterozygosity method (HeNe), and the molecular coancestry method. We also applied the sibship structure approach using the maximum likelihood model implemented in the program COLONY Version 2.0.6.4 (Jones and Wang 2010). As a comparison, we also estimated Ne using genotypes pooled across all three-year sampling periods. In all methods, we used an exclusion criterion for rare alleles (Pcrit) equal to 0.02 (alleles with frequency < Pcrit are excluded) (Do et al. 2014). Finally, N estimates were combined across years using an unweighted harmonic mean, as suggested by other researchers (Waples and Do 2010). To incorporate all estimates into the analyses, infinite estimates were converted to a value of 99,999 (Do et al. 2014).

Results

Microsatellite genotyping

From 927 samples collected in the field, a total of 329 samples or 35.5% (with 91, 103 and 135 samples respectively for each individual year from 2016 to 2018) amplified successfully with a minimum of seven out of 16 microsatellite loci. From each individual year period from 2016 to 2018, a total of 83, 93 and 98 individual genotypes were identified respectively after removal of within-year duplicates. After removal of between-year recaptures, we identified a total of 232 unique genotypes across all three years combined. All loci were consistently amplified with a success rate of at least 45%, except for the MaCh312 locus, which only amplified in 10% of samples (Supplementary Table 1). We detected evidence of null alleles in only two loci: MaCh868 and MaCh834. Both loci also showed evidence of significant deviation from HWE proportions after Holm-Bonferroni correction and were removed from all subsequent analyses. We also removed the MaCh312 locus due to insufficient data. In the individual year data, all loci appeared to be independent of each other. All remaining loci (n = 13) were highly polymorphic (Table 2), with an average allele number of 8.38 ± 1.74 and an overall mean observed and expected heterozygosity of 0.76 ± 0.08 and 0.77 ± 0.10, respectively.

Table 2 Summary statistics for the 16 microsatellite loci

Estimates of Nc from genetic methods based on single and multiple sampling periods

Estimates of Nc obtained for each individual year (2016, 2017, and 2018) and for combined data from across all three time periods using CAPWIRE are shown in Table 3. Overall, the TIRM model provided larger estimates, while the ECM model provided smaller point estimates with narrower 95% confidence intervals. The Nc estimates for TIRM from the combined 3-year data were larger and the confidence intervals narrower than those obtained using the individual period data, except for 2018, which had even smaller confidence intervals. The ECM estimates were similar to each other, with the exception of the one produced with the 2018 data, which was smaller. In addition, use of the likelihood ratio test (LRT) indicated that TIRM was a better fit to the data compared to ECM in the analyses of the 2018 data and when the data were combined. However, ECM was a better fit only for data from the individual periods of 2016 and 2017.

Table 3 Estimates of Nc from individual sample period data (2016 to 2018) and data from all three periods combined into a single sample design, using two single sample based genetic estimators

Comparison of the different models implemented in MARK shows that both the Mo and Mh2 models fit the data well based on the Delta AICc values (Table 4), implying that there may be heterogeneity in the detection probabilities. Nevertheless, the null model (Mo) and the heterogeneity model (Mh2) in MARK produced similar estimates and associated confidence intervals (Table 4).

Table 4 Population size estimate [Nc (95% CI)], and AICc scores corrected for sample size using the program MARK

The minimum number of samples needed to estimate effective population size (Ne)

The results of our simulation of the power of Ne estimation indicate that, if the census population size is 620, a minimum of 200 samples is required when 10 or 13 loci are used for estimation (Figure S1). When a population size of 845 is used, for 10 or 13 loci, a minimum of 300 or 200 samples are sufficient, respectively, to obtain an accurate Ne estimate (Fig. 1). The results of the analysis using Nc = 1250 showed that for 10 loci, 400 samples are required, while for 13 loci, 300 samples are sufficient to provide an accurate estimate of Ne (Figure S1). These observations show that fluctuations in the population size parameter can affect NeOGen results. Furthermore, they suggest that the strength of the Ne estimates determined here may be improved with additional samples or loci when the population size is larger than 620.

Fig. 1
figure 1

Graph showing the results of NeoGen software simulations to estimate the number of samples and loci needed to obtain an accurate Ne, assuming a census size of 845. The power of the Ne estimate is evaluated at every 100 genotypes, with a maximum of 400, using 10 loci (a), and 13 loci (b). The x-axis shows the combination of the number of samples and loci. The y-axis shows the corresponding estimate of Ne (blue circles) with 95% CIs. All estimates of Ne are represented by two values in parentheses. The first value indicates the relevant estimate, and the second indicates the number of times the estimate was incalculable (i.e., negative, or close to infinity) in all replicates. Incalculable CIs are indicated by a red arrow and CIs with adequate power are in blue with a flat base. The precision of Ne for each combination is evaluated by the width of the CIs. The precision of the point estimates of Ne can be judged by their similarity to the shaded dashed "precision guideline," which is equal to the Ne estimated from all loci and all individuals in the same age cohorts as sampled for the sample/locus combinations

Estimates of Ne from genetic methods based on single and combined sampling periods

Estimates of effective population size (Ne) varied considerably between methods (Table 5). Overall, the estimates produced by the individual period samples were generally smaller than those provided by the three-year samples combined. Finite population Ne estimates based on individual period data ranged from 58.71 to 234.14 individuals for all methods. Results based on excess heterozygosity (HeNe) and the molecular coancestry model were inconsistent or yielded infinite estimates. In contrast, estimates from the linkage disequilibrium (LDNe) and sibship (COLONY) models appeared more consistent across sample periods, although the LDNe estimates using data from the 2017 individual period were comparatively large. Combining data from across all three sampling periods yielded larger estimates of Ne for both the LDNe and sibship models. In contrast, the HeNe method still yielded infinite estimates whereas the molecular coancestry model produced unrealistically low estimates. Given the most robust estimates of Ne from our models, Ne appears to range between 13.6 and 29.5% of Nc (Table 6).

Table 5 Estimates of Ne from individual sample period data (2016 to 2018) and data from all three periods combined into a single sample design, using four single sample based genetic estimators
Table 6 Ratio of population size estimates

Discussion

Census size estimates (Nc) of the mandrill population

We used a non-invasive genetic approach to provide measures of population size based on single and multiple sampling strategies. We found that both methods can be effective, given a sufficient sample size. Estimates from the TIRM model implemented in CAPWIRE were improved when genotypes from the three sampling sessions were pooled. Those estimates, along with those from the program MARK were most similar to previous estimates determined by direct field observations (Abernethy et al. 2002; Lehmann D., person. communication, 2019). In accordance with past studies, our estimates revealed a larger group size than many other highly social primates from other regions, such as northern yellow baboon (Papio cynocephalus) (Wallis 2020), the southern Chacma baboon (Papio ursinus) (Sithaldeen and Rylands 2020; Stone et al. 2012) and macaques (Macaca spp.) (Boonratana et al. 2020; Chetry et al. 2003). The only other primate with a larger estimated group size is from gelada monkeys (Theropithecus gelada; Nc ≥ 1500 individuals; Beehner et al. 2007; Kifle et al. 2013).

The low Nc values obtained in our study from the single sample period data using ECM and TIRM in CAPWIRE (Miller et al. 2005) appear to underestimate the population size. In addition, the wide confidence intervals of these values show low precision around the point estimate. These results can be explained by the small number of samples used. Indeed, consistent with the results of other studies, an insufficient number of samples can produce unreliable estimates when using these one-sample models (Miller et al. 2005). In contrast, using a greater number of samples improves the population estimates and reduces the width of the confidence intervals (Miller et al. 2005).

When we used a larger sample size by combining data from all three years, the ECM produced a point estimate that appeared very similar to previous estimates from the same model based on single-year samples. In contrast, the TIRM estimate was much larger and had reasonably small confidence intervals. Other researchers have obtained similar results (Miller et al. 2005). It has been shown that, despite using a sufficient number of samples, ECM tended to produce lower and less credible estimates when there was heterogeneity in the probability of sample capture (Miller et al. 2005), which may also be the case in our study. These results have also been observed in other simulation and empirical studies, for example in population estimates for gorillas (Arandjelovic et al. 2010; Dou et al. 2016) and bats (Puechmaille and Petit 2007). In these studies, the authors used a sufficient number of samples and found that CAPWIRE performed better using TIRM rather than ECM when capture heterogeneity was suspected in the data. Thus, our results suggest that the insufficient number of samples obtained from individual years in our study leads to less precise estimates with wider confidence intervals and low point values of Nc, as previously demonstrated (Miller et al. 2005). In contrast, the use of a larger data set and the TIRM model that accounts for heterogeneity in capture probabilities between samples appears to produce a more robust estimate.

Interestingly, using either the null model (Mo, suggesting a constant capture probability) or the heterogeneity model (Mh2) in MARK (White and Burnham 1999) gave results that were comparable to those given by TIRM when applied to a large number of samples from all three sampling years. The comparison of the MARK and TIRM results thus shows that using the combined samples from the three sampling periods produced relatively robust Nc estimates of mandrills. These results also support the suggestion by other researchers that accounting for heterogeneity in capture probability can produce good results of Nc (Bellemain et al. 2005; Dou et al. 2016; White and Burnham 1999).

Our genetic estimates of 989 (95% CI 947–1399) and 992 (95% CI 708–1453) mandrills obtained with the TIRM and MARK estimators respectively are substantially larger than the initial maximum field estimates of up to 700 (Rogers et al. 1996) or 845 individuals (Abernethy et al. 2002) using observational data of the same horde. Recent unpublished observations suggest as many as 1,250 mandrills in the SEGC horde (Lehmann D. 2019, personal communication), and although this number is included within our confidence intervals, our point estimates show a somewhat smaller value.

Previous studies have compared genetic and standard field methods to estimate Nc in other species such as mountain gorillas (Guschanski et al. 2009), otters (Arrendal et al. 2007; Hájková et al. 2009) and giant pandas (Zhan et al. 2006). In these studies, the authors found that genetic estimators most often provide reliable results, whereas standard field methods tend to overestimate or underestimate true population sizes. The usefulness of standard traditional methods for estimating population size, such as cameras or direct counts, may indeed be limited when individuals form a large horde and live in closed forest habitats (Bata et al. 2017; Buckland 1980; Christman 2004; Frankham 1995; Leberg 2005). Studies carried out on mandrill populations have shown that this species is difficult to observe in nature due to their dense forest habitat and reclusive behavior (Abernethy et al. 2002; Hoshino et al. 1984; Jouventin 1975; Rogers et al. 1996).

As mentioned above, there are some discrepancies between our genetic estimates and the historical estimates from Rogers et al. (1996) and Abernethy et al. (2002). It is possible that past researchers did not observe the entire horde, as we have noted that the horde often divides into smaller sub-hordes to better occupy different habitats in search of new resources (Lehmann D., 2019, personal communication). Predation by panthers may also lead to subdivision of the horde but is expected to be of short duration, while subdivision due to foraging may extend for about one to two months before the larger horde rebuilds (Lehmann D., 2019, personal communication). However, mandrill counts by Abernethy et al. (2002) occurred over a 39-month period, from June 1996-August 1999, and therefore should have captured the majority of individuals within the SEGC horde. The difference in our estimates is more likely to be explained by growth of the focal horde. LNP contains favorable habitat, with minimal hunting pressure and seasonally stable resources. Given that more than 20 years have elapsed between the past studies and the present one, horde growth would be unsurprising. The recent unpublished counts by D. Lehmann (2019) also point to an increase in horde size since the late 1990s.

The apparent growth of the mandrill horde reflects the conservation efforts of the park's wildlife brigade and ecoguard patrols, as well as the park's recognition in 2007 as a World Heritage Site (https://papaco.org/gabon/). However, similar protection may not be provided in other areas of the mandrill’s range and monitoring the population size of other hordes may prove essential in management. It remains to be seen whether direct counts are as accurate as genetic methods in other habitat types where mandrills are more difficult to observe. In this case, genetic methods appear to offer a reliable alternative.

Genetic estimates of Ne in mandrills

In this study, we provided for the first time Ne estimates for the SEGC focal horde of mandrills using a range of genetic methods (Do et al. 2014; Jones and Wang 2010). We compared the estimates based on individual sample period samples (2016–2018) and data combined from all three periods. Our results indicated that both strategies can provide good estimates but only if sufficient sample sizes are obtained. Comparison of Ne estimates allowed us to estimate Ne of the SEGC horde to be between 135 (95% CI 108–176) and 292 (95% CI 239–370) individuals, using the two best performing estimators in this study: sibship structure and linkage disequilibrium (LDNe) respectively. Nevertheless, these estimates should be interpreted with caution, since our NeOGen analyses showed that our dataset may lack sufficient power if the census size is large.

Estimates produced using the individual period samples and those based on the combined dataset varied between methods. Excess heterozygosity (HeNe) and molecular coancestry methods gave unreliable results. However, the linkage disequilibrium (LDNe) and sibship methods gave close finite estimates using both single-period data and combined sample periods, although the value obtained from the COLONY sibship was much lower when single-period samples were used.

These differences in Ne estimates obtained from different sampling designs (single-period versus combined samples) are consistent with previous studies that have reported variable estimates of Ne depending on the method used (Do et al. 2014; Wang 2009; Waples and Do 2010). These results reflect the limitations of the approaches used to estimate Ne, with genetic estimators generally losing performance with small numbers of samples and loci (Do et al. 2014; England et al. 2006, 2010; Luikart et al. 2010; Richards and Leberg 1996; Tallmon et al. 2004; Wang 2009; Waples 1989; Waples and Do 2010). The results provided by the HeNe and molecular coancestry methods are not surprising, as in most cases of simulation studies, these methods have often produced poor results due to biases caused by sample number (Do et al. 2014; Luikart et al. 2010). The downward bias observed using the sibship method could be due to the increase in related individuals or the sensitivity of the method to sample size, as previously demonstrated by other researchers (Wang 2009; Waples 1989; Waples and Do 2010). Indeed, the sibship method is based on the principle that the estimate of Ne increases when the number of non-related individuals increases (Jones and Wang 2010). Thus, the results produced by the sibship model suggest that the estimates obtained using the combined samples from the three sampling years appear to be the best.

Nevertheless, the results provided by NeOGen (Blower et al. 2019) revealed that more than 300 samples may be required to obtain an accurate Ne for a large population using 10 loci, which is the average number in our dataset. Somewhat fewer samples would have been required for 13 loci. From these results, it appears that our Ne estimates would be improved by the use of either additional samples or additional loci, if the census size is as large as is suggested from our analyses. In addition, other studies have shown that using a large number of loci with high allelic richness and high Pcrit values (i.e., Pcrit > 1/2 N with N the number of samples) can minimize bias, and thus improve estimates of these variables (Do et al. 2014; Waples 2006; Waples and Do 2010), which was likely the case for the LDNe and sibship methods.

Previous studies have indicated that levels of Ne that are less than 50 can be detrimental to a population, since small effective sizes can reduce adaptive capacity and cause severe inbreeding risk (Frankham et al. 2014; Madsen et al. 1999; Westemeier et al. 1998). Therefore, an Ne of 135 or 292 individuals in the SEGC mandrill horde is likely sustainable given the large size of this mandrill horde and its apparent growth over 22 years of study (1999–2018).

Here, we noted that Ne values appear to be between 13.6% and 29.4%, of Nc, which is higher than many other wildlife studies (Frankham 1995; Harpending and Cowan 1986; Kinnaird and O’Brien 1991; Palstra and Ruzzante 2008). Our analyses did not identify the exact factors that might influence Ne in this population, but factors such as the large numbers of individuals (Nc) and connectivity between hordes may be key. Although studies of gene flow between mandrill hordes have not yet been done, observational studies of the SEGC horde have reported that male mandrills leave the natal horde to be solitary before they reach 6 years of age. When these individuals reach adulthood (> 9 years), they return to the horde during the breeding season (Abernethy et al. 2002). It is not yet known whether these mature males return to their natal horde or emigrate to other populations. However, these field observations of Abernethy et al., (2002) may suggest that mandrills may disperse into neighboring hordes and thus avoid inbreeding. In addition, other observations reveal that mandrills appear to move between habitat fragments by crossing the intervening savanna (Abernethy and Maisels 2019) and thus may exchange genes with other hordes to maintain a viable population.

The lack of demographic data on wild mandrills limits the extent to which we can understand the dynamics of this population. This study is limited by the number of loci and genotypes available, which could affect the reliability of Nc and Ne estimates. However, mark-release-recapture estimates of Nc and single sample estimates of Ne from a larger pooled set of samples yielded meaningful results. Thus, our research shows that non-invasive sampling is a viable strategy to estimate horde size in mandrills and is the first study to provide a genetic estimate of this species in the wild.

Conclusion

This study shows that population assessment of wild mandrills using a non-invasive sampling approach is feasible and likely to be effective in providing important data that would otherwise be difficult to obtain. While standard field methods are often limited when it is difficult to observe mandrills in the wild, the non-invasive genetic approach may become one of the most efficient and cost-effective ways to study the species in areas where populations are suspected to be declining. This study also shows the importance of combining a range of genetic estimators, because not all estimators perform equally well. However, a sufficient number of samples is required to obtain an accurate estimate, so it may be necessary to sample in multiple sessions. We recommend the use of non-invasive genetics as an effective tool to study wild mandrills, provided sufficient samples and loci are available. Studies on the reproductive system, assessment of bottlenecks, gene flow between populations, and population viability are needed to better understand the genetic status, management, and long-term conservation of mandrills.