Introduction

Amphibians are among the most threatened vertebrates. Many species from all continents are experiencing a demographic decline (Houlahan et al. 2000), due to multiple causes (Allentoft and O’Brien 2010). Important anthropogenic activities include land use, leading to habitat loss and fragmentation, pollution and indirectly the increase of UV-B irradiation (Weyrauch and Grubb 2006). Also, climate change and global warming affect the distribution of amphibians, influence breeding phenology or lead to pathogen outbreaks (Corn 2005; Rohr et al. 2008). Amphibians appear to be particularly sensitive to all these processes, making them good biological indicators of environmental quality (Blaustein and Wake 1990).

The causes and consequences of amphibian decline, and the utility of this taxon as a biological indicator, cannot be generalized at a global scale. Many factors interact, and their impact likely differs according to geographic areas and focal species (Beebee and Griffiths 2005). Studies at regional scale, where the major factors of habitat disturbance can be identified, the demographic dynamic of a species and its genetic impact can be reconstructed, and the possible causal influences can be inferred, are therefore crucial for understanding, and mitigating, amphibian declines. In this context, the Alpine environment is of particular interest and concern.

The Alpine environment is heavily affected by environmental change, including climate change and land use (Cannone et al. 2008; Vanham et al. 2009; Keiler et al. 2010; Huggel et al. 2010). In particular, temperatures in the European Alps increased in the last century twice as much as the global average increase (Brunetti et al. 2009). Consequences of climate warming, such as the upwards shift of the tree-line (Leonelli et al. 2011), or the change in population genetic structure, have been already demonstrated or predicted in many Alpine plants species (Jay et al. 2012; Moradi et al. 2012). However, few case studies of recent demographic and genetic change are documented in animals (though see for example the decreasing litter size in the marmot, Tafani et al. 2013). Here we analyzed the genetic variation in an amphibian species sampled in the Italian Alps. Our main goal is to estimate the genetic impact, if any, of its recent demographic decline.

In studying influences on genetic patterns, it is always important to consider not only recent but also ancient events in a species’ history. For example, many species have undergone range fluctuations, colonization events, or demographic collapse due to climatic/habitat changes that occurred thousands of years ago. Examples include well-documented northward tree migration after the last glaciation (10,000–20,000 years ago), mid-Holocene hemlock decline, and elephant population contraction due to drying in tropical Africa 4000 years ago (Bhiry and Filion 1996; Hewitt 2000; Okello et al. 2008; Lee-Yaw et al. 2008). Such events have left genetic signatures, and may be the dominant drivers of modern genetic patterns in some species. In other species, it may be shown that the drivers are more recent, as in Iberian lynx (Casas-Marce et al. 2013). Determining whether and how ancient and recent environmental change has influenced population genetic patterns is a key unanswered evolutionary (Andrew et al. 2013) and ecological (Sutherland et al. 2013) question, and whether a species is newly rare or was always rare will also determine the relevant management interventions to be applied (Sgrò et al. 2011).

Recent statistical genetic methods offer the potential to give a more complete understanding of demographic and genetic histories (Andrew et al. 2013), especially when a direct comparison between genetic variation in modern and museum samples (e.g. Rubidge et al. 2012) is not possible. Specifically, improvements in likelihood and simulation-based methods allow comparison of alternative demographic models (e.g. stability, decline) and estimation of parameters regarding historical and contemporary population sizes and timing of major events. These methods are a major improvement on widely-used but simplistic tests of population equilibrium. Use of multiple complementary analyses, and comparing results among them, should help unravel recent and ancient historical fluctuations in effective population size.

The yellow-bellied toad, Bombina variegata, is mainly distributed across central western Europe, from Spain to the Carpathian Mountains (Sillero et al. 2014). Breeding sites are usually ephemeral, and include small puddles in meadows and river loops and occasionally farm ponds or water-filled wheel ruts (Gollmann et al. 1998; Di Cerbo and Ferri 2000; Sillero et al. 2014). Although the species is globally considered of Least Concern by the IUCN (IUCN 2014), extinctions or demographic reductions have been reported in the last decades across the distributional range. In particular, severe declines are documented in Romania, the Netherlands, and Italy (Goverse et al. 2007; Barbieri et al. 2004; Covaciu-Marcov et al. 2010). Only one population is now described in Luxembourg, and the species is probably extinct in Belgium and highly fragmented in France (Kuzmin et al. 2009). Urbanization and consequent loss of suitable habitat (e.g. abandonment of pastures, heavy use of unpaved forestry roads and drainage of natural breeding sites) are considered the major factors reducing the population sizes and increasing the fragmentation in this species. Additionally, chytridiomycosis has been also suggested to be an important cause of population decline at least in the sister species Bombina pachypus (Stagni et al. 2004). As in many amphibians in natural conditions, B. variegata has small effective population size (Beebee and Griffiths 2005) and low dispersal ability (Smith and Green 2005; Hartel 2008), making the genetic and non-genetic risks associated to small numbers of highly isolated individuals even higher.

In Italy, B. variegata was common in the last century (De Betta 1857; Giacomelli 1887; Vandoni 1914), but it is significantly declining in many areas (Stagni et al. 2004). Anthropization of natural habitats, pollution and use of pesticides led to a population decrease in the last decades (Barbieri et al. 2004), fragmentation and local extinctions (Di Cerbo and Ferri 2000). A recent study used simulations, under various models of climate change, environmental alteration and solar irradiation, to predict that the yellow-bellied toad in Italy might lose between 13 and 75 % of its suitable natural habitat in the next 50 years (D’Amen et al. 2011).

Here we study the pattern of genetic variation at the mitochondrial cytochrome b gene and at 11 microsatellite markers in a restricted area in the Italian Alps, where recent extinctions and population declines have been confirmed (Caldonazzi et al. 2002). We typed 200 individuals from 9 sites to address two main questions: (1) Does the genetic pattern show evidence of demographic decline and fragmentation, and, if so, (2) can we directly infer that recent human-related factors are responsible for the genetic pattern? We address this question using a set of complementary statistical methods suitable to estimate the effective population sizes and their temporal dynamic, the population structure, the individual genomic compositions and the pattern of isolation by distance, and to probabilistically compare alternative demographic models. Our results have specific implications for the conservation of B. variegata, and provide general guidelines for avoiding over-estimation of extinction risks when genetic data are analysed.

Materials and methods

Sample collection and DNA extraction

We collected 200 samples of B. variegata (toe clips from adults) from nine different localities, representing most of the known breeding populations in the Province of Trento (Northern Italy), from 2009 to 2011. Sampling sites and their abbreviation used throughout this paper are reported in Fig. 1. Different ecosystems were considered: samples from Spiz (SPI) and Monte Baldo (MBA) came from isolated mountain areas (about 1500 m above sea level, asl); samples from Zambana (ZAM) and Mezzolombardo (MEZ) were collected in the main valley of the Region (the Adige valley), close to areas devoted to agriculture (about 210 m asl); samples from Nago (NAG) and Loppio (LOP) came from sites close the touristic area of Garda Lake (160 and 250 m asl, respectively); samples from Verla (VER), Pozzolago (POZ) and Prà (PRA) were collected from scarcely urbanized areas along the Avisio river (from 450 to 620 m asl), and in particular from agricultural ponds (VER) and river loops (POZ and PRA). Individual GPS coordinates of each sample were recorded. Toe clips were obtained and stored in 95 % ethanol; about 20 mg of tissue were used to perform DNA extraction using the protocol of the DNeasy Tissue kit (QIAGEN Inc, Hilden, Germany). All sampling procedures were approved by the Italian Ministry of Environment and the Wildlife Committee of the Autonomous Province of Trento (DPN/2D/2003/2267 and 4940-57/B-09-U265-LS-fd).

Fig. 1
figure 1

Map of the nine sampling sites (indicated by red dots) in the Alpine region of Trentino Alto-Adige. Major lakes are shaded. The population codes used throughout the papers are reported in brackets. (Color figure online)

Marker genotyping

We initially sequenced a fragment (471 bp) of the mitochondrial DNA (mtDNA) cytochrome b gene to verify the haplotypic affiliation of the samples, with respect to the known maternal phylogeographic pattern in Europe. We used the primer pairs L14850 and H 15410 according to Tanaka et al. (1994). PCR amplifications were conducted in 20 μl (containing 1 μl of template DNA, 2 μl of 10× buffer, 0.1 μM of each pair of primers, 1 unit of Hot Master Taq polymerase and ultra pure water) under the following conditions: 10 min at 94 °C, 35 cycles of 30 s at 94 °C, 45 s at 52 °C, 60 s at 65 °C, and a final extension step for 10 min at 65 °C. Sequences were edited using Finch TV 1.4.0 (an open source application developed by Geospiza Research Team, http://www.geospiza.com/Products/finchtv.shtml), assembled with Sequencer v.4.7 and aligned using ClustalX (Thompson et al. 1997) using default parameters.

The genetic variation level and structure at the local scale were then investigated typing 11 autosomal microsatellites (Table S1, Online Resource) previously isolated in B. variegata or Bombina bombina (Nürnberger et al. 2003; Stuckas and Tiedemann 2006; Hauswaldt et al. 2007). PCR amplifications were conducted in four different multiplex reactions in a final volume of 20 ul containing: 1 μl of template DNA, 2 μl of 10× buffer, 0.05 μM of each pair of primers, 1 unit of Hot Master Taq polymerase (Applied) and ultra-pure water. The amplification protocol consisted of an initial denaturation step at 94 °C for 10 min, followed by 30 cycles of the series: 94 °C for 30 s, annealing temperature (see Table S1, Online Resource) for 30 s, 65 °C for 45 s; then, a final extension step at 65 °C for 10 min. PCR labeled products were run on a four capillary system ABI 3130 Genetic Analyzer (Applied Biosystem) and scored with an internal lane standard (LIZ 500) using GeneMapper software.

Statistical analysis

Mitochondrial DNA

A phylogenetic tree was built using the maximum-likelihood algorithm implemented in MEGA5 (Tamura et al. 2011), using the Kimura two-parameter model (selected as the best model by JModelTest, Posada 2008) and 1000 bootstrap replicates. This analysis included the haplotypes from our study, 361 sequences available in Genbank for B. variegata (EF212448-EF212809), and two sequences used as outgroups from B. bombina and B. orientalis (JF898352, EU531278).

Microsatellites

Microsatellites were tested for the presence of null alleles, allele drop-out and scoring errors using MicroChecker (Van Oosterhout et al. 2004) and FreeNa (Chapuis and Estoup 2007). An outlier analysis was performed with the software Lositan (Antao et al. 2008), running 50,000 simulations, with FDR sets to 0.1, confidence interval of 0.95, sample sizes from 10 to 80, and assuming either an infinite or a stepwise mutation model.

We used GENEPOP 3.4 (Raymond and Rousset 1995) to test for deviations from Hardy–Weinberg equilibrium for each locus and globally. We also tested genotypic Linkage Disequilibrium (LD) for each pair of loci. To evaluate overall genetic variation, the expected and observed heterozigosity (H e and H o ) and number of alleles (N a ) within each population were calculated using Arlequin v3.5 (Excoffier and Lischer 2010); FSTAT software (Goudet 1995) was used to calculate allelic richness (A r ). In addition, pairwise F st values between populations and their significance were computed with Arlequin v3.5 and the corresponding triangular matrix of distances was visualized using Principal Coordinates analysis (PCoA) implemented in GenAlex v6.5 (Peakall and Smouse 2012). Pairwise distances were also computed using two indices of genetic differentiation that, differently from F st , do not depend on the level of variation within populations: G’ st (Hedrick 2005), and Jost’s D (Jost 2008).

Bayesian clustering analyses

STRUCTURE v2.3.4 (Pritchard et al. 2000; Hubisz et al. 2009) was used to determine the most plausible number K of genetically homogeneous groups and to estimate the genetic composition of each individual. We applied the LOCPRIOR with admixture model, which assumes that sampling locations are informative and allows for mixed ancestry of individuals. This model is more powerful in detecting weak genetic structure and reduces misassignments (Hubisz et al. 2009). We also tested the model where geography is not considered (i.e., LOCPRIOR set to off) to verify that the major features of the genetic structure can be still identified. Each run of STRUCTURE consisted of 1000,000 iterations after a burn-in period of 250,000, and 10 runs were analysed for all K values between 1 and 9. The most probable K was selected comparing the likelihood at different K values and using the approach of Evanno et al. (2005) based on the rate of change of the likelihood.

Genetic vs. geographic distances

The correlation between genetic similarity and geographic distance was evaluated at both individual and population levels. At the individual level, we computed with the software SPAGeDi (Hardy and Vekemans 2002) the kinship coefficient estimator derived by Loiselle et al. (1995) for all pairs of individuals. These coefficients were then pooled in classes with similar number of comparisons, corresponding to different geographic distances. At the population level, we used a Mantel test to analyse the relationship between the linearized F st based distance (F st /(1 − F st )) and the logarithm of the linear geographic distance. Slope and the intercept of the regression line between these two variables were estimated following a reduced major axis regression (RMA) and a jackknife approach over all points (Jensen et al. 2005).

Recent effective population size

Two methods were used to estimate the recent effective population size (N e ) of each population: LDNe (Waples and Do 2008; Do et al. 2014) and ONeSAMP (Tallmon et al. 2008). LDNe is based on the linkage disequilibrium among unlinked loci created by random drift and the estimated N e reflects the population size in the last few generations (Hare et al. 2011). As suggested by the authors (Waples and Do 2008), we excluded the alleles with frequencies smaller than 0.02 to avoid bias related to rare alleles. ONeSAMP implements an approximate Bayesian computation analysis (Beaumont et al. 2002; Bertorelle et al. 2010). Eight summary statistics are used by ONeSAMP to compare observed and simulated data sets; the inclusion of linkage disequilibrium among these statistics makes this method particularly sensitive to recent population sizes (Skrbinsek et al. 2012). The lower and upper limits of the uniform prior distribution of N e were set to 2 and 5000, respectively.

Demographic dynamic

We analysed the demographic dynamic of each population using five approaches: (1) the M-ratio test (Garza and Williamson 2001); (2) the heterozygosity excess test implemented in the software BOTTLENECK 1.2.02 (Piry et al. 1999), (3) a Bayesian analysis based on the coalescent framework and able to estimate the posterior distributions of the parameters of a contraction/expansion demographic model, as implemented in the software MSVAR v1.3 (Beaumont 1999, 2004); (4) a likelihood analysis based on the coalescent framework and specifically designed to infer population size contractions and simultaneously the parameters of the mutation model for microsatellites (as implemented in the software MIGRAINE, Leblois et al. 2014); (5) a model comparison based on the approximate Bayesian computation approach (Beaumont et al. 2002; Bertorelle et al. 2010), as implemented in the software in DIYABC v 1.0.4.46b (Cornuet et al. 2008, 2010). The first two approaches are simple statistical tests of the null hypothesis of demographic stability, while the last three approaches are model-based, and produce parameter estimates and/or model probabilities using most or all the information provided by the data.

Each method has different statistical properties, which depend on the number of markers, the specific feature of the bottleneck (e.g. age, initial population size, intensity, recovery or not) and possible violations of the model they assume (e.g., migration events among populations). Therefore, none can be considered superior to the others in all conditions (e.g. Swatdipong et al. 2010; Chikhi et al. 2010; Peery et al. 2012; Hoban et al. 2013a). We briefly describe these methods, and we will return to their properties in the discussion.

The M-ratio test is based on the frequency distribution of allelic sizes, which is expected to have gaps after a bottleneck due to stochastic loss of rare alleles. Statistical significance was established comparing the observed values with the empirical null distribution obtained simulating 10,000 times the genealogy expected under demographic stability with M_P_VAL (Garza and Williamson 2001). Simulations assume the two-phase mutation model, and require three parameters: the population-mutation parameter, θ = 4Neμ, the mean size of multi-repeat mutations, δ g , and the proportion of multistep events, p s . Different values of θ were tested, i.e. 1, 2, and 5; δ g and p s were fixed to 3.1 and 0.22 as estimated in a recent review by Peery et al. (2012).

The heterozygosity excess test is based on the comparison between heterozygosity and number of alleles, which is predicted to deviate from the expectation after a bottleneck because the former decreases more slowly than the latter. Statistical significance (one tail) is computed using the Wilcoxon’s signed ranked test to compare observed and expected heterozygosities (Cornuet and Luikart 1996), where expected values are computed by simulations assuming again a two phase mutation model, a variance among multiple steps equal to 12 (corresponding to δ g  = 3.1, see Peery et al. 2012) and p s  = 0.22.

The method implemented in MSVAR assumes that an ancestral population with effective size N 1 , increased or decreased (linearly or exponentially) to its current size N 0 , starting T generations ago. The estimation algorithm is based on Markov Chain Monte Carlo simulations, and the simple Single-step Mutation Model (SMM) is assumed. Simulations were run for 4 × 108 iterations; convergence and posterior distributions of the parameters were evaluated with Tracer v1.5 (Rambaut et al. 2014), after discarding the first 10 % of the chains (burn-in). For each population, three independent runs were performed assuming an exponential demographic change. The possible effect of this choice was tested assuming a linear change in an additional run of the program. Priors means for the ancestral and current population sizes were set equal to a log-10 transformed value of 3 (1000 individuals), with a standard deviation equal to 1. The prior distributions are log-normal, and this setting allows the testing of population sizes from few tens to hundreds of thousands of individuals. Three different prior distributions of the time since the demographic change were tested, with means equal to 2, 3, and 4, respectively (corresponding to 100, 1000, and 10,000 years) and standard deviations equal to 1. The prior distribution of the average mutation rate across loci was set to 1.27 × 10−3 per generation. This value corresponds to the direct measure of the microsatellite mutation rate available for amphibians, as estimated from 7906 allele transfers from parents to offspring in the tiger salamander (Bulut et al. 2009). All the other prior settings in the hierarchical model implemented in MSVAR are reported in Table S2, Online Resource, and follow standard choices used in other studies (e.g. Storz et al. 2002; Goossens et al. 2006).

Considering that MSVAR provides little information on the mutation rate (Girod et al. 2011), but this rate is necessary to convert the scaled parameters θ 0 = 4N 0 μ, θ 1 = 4N 1 μ, and t = T/2N 0 into the natural parameters N 0 , N 1 , and T, we estimated the natural parameters in two different ways: (a) using the posterior distribution of the mutation rate as estimated by MSVAR; and (b) using the posterior distribution of the scaled parameters and subsequently generating the distributions of the natural parameters N 0 , N 1 , and T using either the “amphibian specific” rate = 1.27 × 10−3 or the commonly used rate of 5.0 × 10−4 per generation (Garza and Williamson 2001; Storz et al. 2002). Time estimates are transformed in years assuming a generation time of 3 years (Szymura 1998; Gollmann and Gollmann 2002). Estimates were made on each population separately, and also after pooling populations that are not significantly differentiated.

The same scaled parameters estimated by MSVAR were also estimated with MIGRAINE, a computer package that implement a coalescent method based on importance sampling of gene genealogies (Leblois et al. 2014). Under this method, microsatellites are allowed to mutate under the generalized stepwise mutation model (GSM), which is more realistic for this type of markers and reduces the risk of false positive in bottleneck testing (Peery et al. 2012). Scaled parameters were then converted to natural parameters assuming the “amphibian specific” mutation rate (see above).

Lastly, the demographic dynamic was analysed comparing three alternative scenarios with the ABC (approximate Bayesian computation) approach as implemented in DIYABC (Cornuet et al. 2010): constant effective population size, ancient bottleneck and recent bottleneck. The models assuming ancient or recent reductions were simulated to mimic the demographic effects possibly related to the post-glacial founding of the Alps populations and the human-mediated processes affecting amphibians in the last century, respectively. Hereafter, we call these models Con (constant population size), AnD (ancient post-glacial decline), and ReD (recent, human-related, decline). Ten different settings and prior distributions were tested for each population to check the robustness of the results, for a total of 90 analyses (Table S3, Online Resource). The prior distribution for the mutation rate was set to either uniform from 1 × 10−5 to 5 × 10−3, or gamma shaped with shape parameter equal to 3.2, mean equal to 1.27 × 10−3, and range from 0.01 to 0.0001, thus covering a wide range of plausible values estimated empirically for microsatellites in different species.

Results

Mitochondrial sequences

Three polymorphic sites, and an average pairwise divergence of 0.043 % among individuals, were found in the 420 bp alignment of the cytb gene. Four different haplotypes were detected, three of which had never been observed before in this species. The maximum likelihood (ML) phylogenetic tree (Fig. S1, Online Resource) indicates that the samples we analyzed belong to the previously described “Balkano-Western” clade of the nominal form, Bombina variegata variegata (Hofman et al. 2007).

Microsatellite markers

All 200 samples from 9 populations were successfully genotyped at all 11 amplified loci. MicroChecker and FreeNa results did not suggest any significant presence of null alleles, scoring errors or allelic drop-out. Evidence of selection operating at some marker was not found by the F st outlier method implemented in Lositan. Systematic deviation from Hardy–Weinberg and linkage equilibrium can be excluded: only 5 out of 99 (11 loci × 9 populations) Hardy–Weinberg tests were significant with P < 0.05, and only 2 out of 55 locus pairs showed significant genotypic linkage with P < 0.05, after controlling for false discovery rate (FDR) for multiple testing following Benjamini and Hochberg (1995).

All loci were polymorphic and the number of alleles per locus ranged from 2 for F22 to 11 for Bv32 (Table S1, Online Resource). Genetic variation was relatively low in all populations (Table 1). Heterozygosity values were around 0.50, with lower values in SPI (H e  = 0.41) and NAG (H e  = 0.34). The allelic richness per locus was between 3 and 4 for most populations, again with SPI and NAG showing the lowest values (2.5 and 2.4, respectively).

Table 1 Genetic diversity at 11 microsatellite markers and effective population size estimates of 9 populations of B. variegata

Population differentiation

Significant genetic differentiation (after following FDR correction) was found in 34 out 36 pairwise F st comparisons. The only exceptions are the comparisons between two pairs of geographically adjacent populations (ZAM vs. MEZ and PRA vs. POZ). F st values (see Table S4, Online Resource) ranged between 0.05 and 0.15 in most cases, with higher values (up to 0.32) when the NAG site was involved. The matrix of distances is graphically visualized in Fig. 2 using the PCoA. Pairwise distances computed with G’ st and Jost’s D were linearly and highly correlated to F st (R2 equal to 0.99 and 0.98, respectively), with regression coefficients very close to 1 (1.35 and 0.92, respectively), and intercepts very close to 0 (0.016 and 0.012, respectively). The distance matrices based on these two additional measures of differentiation are reported in Table S4, Online Resource. The corresponding PcoA plots (not shown) are virtually identical to that reported in Fig. 2.

Fig. 2
figure 2

Principal coordinate analysis of pairwise F st among populations and plots of proportion of ancestry of each sampled individual for five genetic clusters inferred using STRUCTURE under the LOCPRIOR model

Bayesian clustering analyses

The inspection of the likelihood plot for different K values (Fig. S2a, Online Resource), and the plot based on the rate of change of the likelihoods (Supp. Fig. 2b), suggests that the two most relevant partition of the data are those with 2 and with 5 inferred groups. For K = 2, in fact, we observe the highest rate of likelihood change, and for K = 5, the likelihood plot reaches a plateau and a peak in the rate of likelihood change is also observed. Locations are informative (r statistic = 0.04), and when the geographic information is excluded from the model (i.e., LOCPRIOR set to off), K = 5 is the most supported partition, and the same major groups are identified (Fig. S3 and Fig. S4, Online Resource). We discuss therefore here the results obtained for K = 2 and K = 5 using the LOCPRIOR model (see Fig. S5, Online Resource, for all the plots for K = 2 to K = 7).

For K = 2, the inferred groups are predominant in central/northern and southern locations (Fig. S5, Online Resource), respectively. All the individuals in 4 populations can be entirely or almost entirely assigned to the central/northern (PRA, POZ and SPI) or the southern (NAG) groups. Individuals in the other 5 populations show admixed composition with very similar fractions of the two inferred groups within the same locality, suggesting shared ancestry rather than recent admixture (e.g. Jarvis et al. 2012).

For K = 5, the groups inferred by STRUCTURE roughly correspond to the groups graphically identified by the PCoA plot (Fig. 2): from South to North, we can easily identify MBA + LOP, NAG, SPI, ZAM + MEZ, VER + POZ + PRA. In the southern area, NAG is genetically distinct from MBA and LOP, but with a clear portion of shared ancestry with these neighboring localities. Some individuals in NAG also appear as recent hybrids, with ancestors both in NAG and in MBA or LOP. SPI appears as a genetic isolate in the central portion of the sampled area. In the North, two major groups can be identified: one including the two western samples located along the major Adige valley (ZAM and MEZ), and the other grouping the eastern samples at higher altitude along the Avisio side valley (VER, POZ, and PRA). Interestingly, all the individuals in VER, the sampling locality along the side Avisio valley that is closer to the main Adige valley, show large affinity with the southern localities of MBA and LOP.

Genetic vs. geographic distances

The relationship between linearized F st and the logarithm of geographic distance is positive, weak (R2 = 0.07, Fig. S6, Online Resource), and statistically significant (Mantel test, P = 0.04). Estimated kinship coefficients are relatively high (1/16, as among first cousins) when individuals from localities separated by 5 km or less are compared, and very low otherwise (Fig. S7, Online Resource).

Recent effective population sizes

Point estimates of recent effective population sizes are low or very low (Table 1). The maximum value is around 170 individuals for the Loppio population using the LDNe method, but for the same population the estimated size is less than 30 when the ONeSAMP method is applied. All the other values range approximately between 10 and 50, with LDNe producing in most cases larger estimates than ONeSAMP. The confidence intervals have large upper limits in most LDNe estimates, but the posterior distributions of Ne produced by ONeSAMP have very small probabilities for N e  > 50.

Demographic dynamic

All the populations have M-ratio values (see Table 2) below the 0.68 threshold usually taken as evidence for a bottleneck (Garza and Williamson 2001). When M-ratios are tested controlling for false positives (Benjamini and Hochberg 1995), significant support of the bottleneck (P < 0.05) is found in all populations, the only exception being ZAM and MEZ when the largest values of θ = 5 is assumed. The heterozygosity excess test indicates that heterozygosities are higher than predicted from the number of alleles, as expected after a bottleneck, but this difference is significant only for SPI.

Table 2 Tests of demographic bottleneck

The posterior distributions of ancestral and current population sizes, as estimated directly by MSVAR in each population, have very limited overlap, and, although rather large confidence intervals were found, support a demographic decline in all populations (Fig. 3; Table S5a, Online Resource). Different populations show similar distributions, but considering the point estimates we note that the ratio between ancestral and current median sizes varies approximately between 7 and 45. NAG, MBA, LOP, and SPI show the most extreme reduction (>25 fold), and a less extreme decline is estimated for the other populations (<15 fold). Ancient sizes distributions have peaks at around 1000–2000 individuals, and current sizes estimates vary between 35 and 150 animals in different populations. Only small differences from this pattern are observed assuming either an exponential or a linear decline (Table S5a, Online Resource).

Fig. 3
figure 3

Posterior distribution of the effective population sizes (in log 10 units) for each population obtained with MSVAR assuming an exponential demographic change. Dashed lines represent current Ne, while dotted lines represent pre-bottleneck Ne. The solid line is the prior distribution for both current and ancient population sizes

The best supported value for the time when the decline started varies in different populations between 250 and 1500 years before present (BP) when the exponential decline was assumed (see Fig. 4a; Table S5a, Online Resource) and between 500 and 3000 years BP when the linear decline was assumed. Given the evident overlap between prior and posterior distributions, we checked the influence of the former on the latter by performing additional tests with different priors. The posterior distributions support a decline starting point between few hundred and few 1000 years even when the prior mean was decreased or increased by a factor of 10 (see Fig. 4b, c).

Fig. 4
figure 4

Posterior distributions in different populations (dashed lines) of the time since the change in effective population size estimated by MSVAR assuming the exponential change. Three different means of the prior distribution (solid lines) were tested: a 1000 years (log10 transformed value = 3); b 10,000 years (log10 transformed value = 4); c 100 years (log10 transformed value = 2)

All these general results of MSVAR are consistent across runs and when scaled, instead of natural, parameters are estimated, or when samples from pairs of populations not genetically differentiated were pooled to increase the sample size (Table S5a, b, Online Resource). On the contrary, population sizes and decline age estimates are approximately doubled if the “generic” mutation rate is used instead of the “amphibian specific” rate is used to convert scaled into natural parameters (Table S5c, Online Resource). Credible intervals and median values of the posterior distributions estimated in different MSVAR analyses are all reported in. Tables S5, Online Resource.

When population size and the time since the population started to decline are estimated with the method implemented in MIGRAINE, thus allowing for multiple steps in the mutation process, the general conclusions reflect those produced by MSVAR, with some differences in the parameter estimates (Table S6, Online Resource). Modern but especially ancestral sizes are larger, varying among different populations between 50 and 400, and 5000 and 15,000, respectively, and thus increasing the estimated intensity of the decline. The beginning of the decline, on the contrary, is almost the same as estimated by MSVAR, ranging between 600 and 3500 years.

The results of the ABC analysis are clearly affected by the priors setting, but, overall, the evidence against demographic stability (Con model) is strong and the model AnD (ancient post-glacial decline) appears the most plausible to explain the pattern of genetic variation (Fig. 5; Table S3, Online Resource). However, it is also important to note that in 9 out of 90 analyses performed under different priors setting (10 for each population), the posterior probability of ReD (recent decline, human related) was higher than the probability of AnD. This situation occurred only for three populations, MEZ (once), NAG (5 times) and SPI (3 times), and can be visualized by the overlap of probability ranges for AnD and ReD reported in Fig. 5. The results of the complete set of ABC analyses are reported in Table S3, Online Resource.

Fig. 5
figure 5

Graphical representation of the posterior probabilities of three different demographic scenarios tested with the ABC approach. For each population, the black bars are proportional to the range of posterior probabilities obtained under 10 different priors settings. Con constant population size, ReD recent decline, associate to human activities, AnD ancient decline, associated to the post-Glacial colonization of the Alps. Details of the prior distributions, and model probabilities in each analysis, are reported in Supp. Table 2

Discussion

This study was motivated by the recent demographic decline and habitat change and fragmentation observed in multiple Alpine population of the yellow-bellied toad (Caldonazzi et al. 2002). Considering the heightened susceptibility of this mountain environment to ongoing increase of temperatures, our main goal was to understand whether or not recent demographic and habitat change had already produced negative genetic effects in terms of variation levels, within and between populations, and inbreeding. After a preliminary phylogeographic analysis based on cytochrome b sequences, we addressed this main question typing 11 nuclear microsatellites in 200 individuals from 9 populations. In general, the data supported a genetic bottleneck occurring in the past, and we dedicated particular attention in the statistical analysis to estimate the timing of this event. We clearly show that ancient reductions of genetic variation likely occurred in this species, but a recent and recorded demographic decline, possibly associated to human activities, did not leave a significant genetic signature. Therefore the conservation situation may be optimistic, due to possibly ancient purging effects and adaptation occurring in many generations at low density, and also the lack (as yet) of human impacts.

Phylogeographic affiliation

The phylogenetic analysis of mitochondrial sequences showed that all samples included in this study fall in the Balkan-Western clade (Supp. Fig. 1). The level of variation in our sample was very low, with 89 % of the individuals sharing the same mtDNA sequence, and only 4 haplotypes in total. This result agrees with previous studies suggesting a severe reduction of variation in the Western areas of the distributions (Hofman et al. 2007; Fijarczyk et al. 2011). This data also and supports the hypothesis that the populations characterized by this clade originated in a Balkan refugium and expanded northwestward after the last glaciation, losing genetic variation during the colonization process, similar to results shown in the moor frog, Rana arvalis (Babik et al. 2004).. Further statistical analyses at this marker were not possible, as only thee polymorphic sites were observed.

Nuclear variation and estimates of contemporary Ne and kinship

Microsatellite markers confirmed that genetic variation levels are very low in at least two Alpine populations (NAG and SPI), and in general lower than observed in comparable populations of this species or the sister species B. bombina. When samples sizes are adjusted to be equal by resampling, and only the loci shared among studies are considered, the average number of alleles was about 27 and 40 % lower in the Alps than in a B. variegata and B. bombina population in Northern Germany, respectively (Hauswaldt et al. 2007). The number of alleles and the heterozygosities are similar to the values found in endangered frog or toad species (e.g. Morgan et al. 2008; Beauclerc et al. 2010; Wang 2012; Igawa et al. 2013). If compared to large collections of heterozygosity values observed in both endangered and non-endangered species (e.g. Frankham et al. 2010; Hughes 2010), the average level of genetic variation in B. variegata populations should be probably considered “medium–low”. For example, the average value of heterozygosity observed in the Alpine toad populations, 0.47, correspond to the 10th and 30th percentile in two lists of 221 non-endangered and 73 endangered or vulnerable bird species, respectively (Hughes 2010).

Low genetic variation in modern samples may be produced by different demographic scenarios, including low and constant census size, recent bottlenecks, and even in recently large and expanding populations when mutations have not the time to accumulate yet. As a first step, our analyses excluded the last scenario (expansion), specifically our estimation of effective contemporary population size using the linkage disequilibrium pattern among physically unlinked markers, and the ABC-based OneSAMP method. Most populations showed values smaller than 50 individuals, and some of them values smaller than 20. These values are lower than those estimated in other ranid species (Wilkinson et al. 2007; Phillipsen et al. 2011), and similar to the estimates obtained in endangered anuran species (Ficetola et al. 2010; Wang 2012). We conclude therefore that these yellow-bellied toad populations have today relatively low genetic variation and evolutionary potential, with high risk of local extinction due to demographic stochasticity, considering the very limited number of breeders. Inbreeding within populations or between individuals sampled at very short geographic distances is probably unavoidable in this condition, as suggested by the kinship coefficients estimated in this study, but the negative effects in terms of individual fitness is not easily predictable (see below) and should be directly evaluated.

Genetic structure

Gene flow, which could counteract the loss of variation and inbreeding in small populations, is unlikely to occur in a fragmented landscape and especially in species with reduced movement capabilities such as frogs (e.g., Dolgener et al. 2012; Igawa et al. 2013). In particular, most of the sample sites in our study are separated by highly urbanized areas, and a previous mark-recapture field study in B. variegata showed that travel distances covered each year by adults or subadults rarely exceed 500 m (Hartel 2008). As expected, a clear evidence of genetic substructure was found, and only two pairs of populations, separated by less than 7 km, were not genetically differentiated between them. Genetic distances were substantial, which was observed using both classical Wright’s F st and more recently developed metrics for multi-allelic markers, Jost’s D and Hedrick’s G’ st . This result is not unexpected and it suggests that when genetic variation is medium to low, Wright’s F st is suitable for any type of marker.

Five major genetic groups were identified, with two of them corresponding to two single and highly divergent populations (NAG and SPI), and the others associated to geographically homogenous areas. Genetic data also showed that kinship levels are high only at very short distances. Overall, these results indicate that gene flow among local small populations is limited, and rapidly decreases as the geographic distance increases.

The population of NAG showed the highest values of Fst (from 0.15 to 0.32). In this case, although one sampled area (LOP) is very close, gene flow is probably prevented because of habitat discontinuity due to urbanization in the touristic area of Garda Lake. Interestingly, NAG is also the only population where clear signals of recent admixture with the neighboring populations were found in some individuals. Future investigations should be performed to test the hypothesis of human-mediated translocation events.

Newly rare or always rare?

Small local population size can occur at demographic equilibrium, i.e. a natural and stable condition reached in patchy habitats by species with limited dispersal, or can result from recent demographic decline. Field studies in B. variegate suggest recent demographic decline, and genetic evidence suggest small effective sizes and fragmentation. But can we directly infer that recent decline is the cause of the low genetic variation? In other words, given that field studies indicate that the demographic equilibrium has been recently perturbed, can we also conclude that the genetic variation pattern has also been recently perturbed? Answering this question is clearly relevant in terms of conservation actions, since the dual threat, demographic and genetic, should be considered a much higher concern, as a likely step further in the extinction vortex (Lankau and Strauss 2011). If, on the contrary, the recent demographic decline is not the cause of the current genetic pattern, milder measures of protection could be sufficient to favor demographic re-expansion, and prevent the beginning of genetic erosion. Therefore we dedicated a large effort to estimate the genetic impact of the recent demographic decline.

Five approaches were used. Two of them, the M-ratio and the heterozygosity excess tests, are classical statistical tests that test whether simple properties of the observed data are compatible with what is expected under the null hypothesis of demographic stability. The genetic data appear mostly incompatible with demographic stability. In almost all the analyses and populations, the M-ratio strongly supports a demographic bottleneck. Considering that the power of this test is reasonably high when a bottleneck occurs in an isolated population between few (Peery et al. 2012, Hoban et al. 2013a) and few hundreds (Garza and Williamson 2001; Swatdipong et al. 2010) generations, it might be inferred that our data are compatible with a recent decline. However, it has been shown by simulation that even relatively low migration rates (m = 0.001) can extend the time frame of the bottleneck signal based on the M-ratio to several thousand of generations (Swatdipong et al. 2010). A significant excess of heterozygosity compared to value expected from the number of alleles was observed only in three populations (one after the multiple test correction). Considering that the heterozygosity excess is a transitory event rarely extending more than 50 or approximately 0.5–4 Ne generations (Cornuet and Luikart 1996; Henry et al. 2009; Peery et al. 2012), and in general a shorter time compared to the gap in the allelic size distribution contributing to the M-ratio (Spear et al. 2006; Hundertmark and Van Daele 2009; Marshall et al. 2009), these results can be considered as a statistically weak and unconvincing evidence of recent decline, with stronger support on ancient declines.

The above inference based on the temporal power window of the M-ratio (older bottlenecks) and the heterozygosity excess test (younger ones) is speculative, though not uncommon in the literature (e.g. Spear et al. 2006; Lumibao and McLachlan 2014). More robust and direct evidence on the age and the intensity of the bottleneck can be obtained, as shown by simulation and empirical studies (Girod et al. 2011; Peery et al. 2012) when the demography is modeled and the whole information contained in the data is used (rather than one summary statistic). Here we used the model-based methods implemented in MSVAR, MIGRAINE, and DIYABC; all these methods clearly support the hypothesis of a non-recent demographic decline in all the populations. In particular, MSVAR and MIGRAINE indicated that the demographic decline most likely started not later than approximately 250 year ago, and not earlier than approximately 3500 years ago. In other words, these estimates point to a decline predating the currently documented human-induced changes, and postdating the most recent complete deglaciation and climate stability reached about 10,000 years ago (Cusinato and Bassetti 2007), when several plant and animal species had probably re-colonized the Alpine area of our study. These two values, 250 and 3500 years before present, correspond to the range of median values estimated in different populations, but considering that the support intervals are rather large and also that the main genetic impact in all these close Alpine populations is probably related to a shared demographic dynamic, we can prudently take them as an estimate of the temporal boundaries of a population size change. We prefer here not to speculate on which historical or environmental factor that may have caused this decline, since the time interval is quite large and it may even reflect an average between the ages of two or more independent declines (Sharma et al. 2012). The clear inference is that the genetic impact of the population decline recently observed in field studies, if any, is limited and surpassed by the impact of a much older decline. A study in salamanders in North America also suggested a decline some thousands of years ago (Jordan et al. 2008).

Population sizes dropped by at least one order of magnitude. Some differences can be detected among populations, but the support intervals of the estimates are large: the only safe conclusion appears that the decline was more intense in the most southern populations of Nago, Monte Baldo, Loppio, and Spiz. Only in these populations, in fact, the estimated ratio between ancient and current size was at least as large, or larger than 25. Finally, when three explicit models were compared using the ABC approach, the highest posterior probabilities always favored, with the exception of a few analyses in Nago and Spiz, an ancient demographic decline, i.e. a decline occurring at least 200 years ago but probably in more ancient times. The fact that similar results were found in all the populations, especially regarding the time and strength of decline, suggests a range-wide rather than localized influence.

Caveats to our demographic dynamic inference

Statistical testing and parameter estimation imply of course assumptions that, when violated, may produce biased results. Direction and magnitude of the bias are difficult to predict in different conditions and for different approaches, but some general notes regarding the robustness of our results can be drawn. (1) Population structure may produce false bottleneck signals in MSVAR (Chikhi et al. 2010) and probably in all coalescent-based methods (e.g. Wakeley 1999; Heller et al. 2013). Following the empirical suggestions by Chikhi et al. 2010 and Heller et al. 2013, we repeated the MSVAR analysis in two data sets created sampling either 3 or 10 individuals per population. This approach is likely to reduce the power to detect bottleneck occurring only in some populations, but can be used to exclude that population structure is the only responsible of the bottleneck inference. The ratio between the estimated ancient and modern population sizes was very close to 5 in both analyses, suggesting that a real decline occurred in the Alpine populations we considered. (2) Rare alleles may go undetected in small samples, thus producing gaps in the allelic size distribution and false signals of bottlenecks in the M ratio. This effect is probably small in our case, since the M ratio remains small and significant when genetically similar samples (MEZ + ZAM and PRA + POZ) are pooled, or when the whole data set is jointly analyzed. (3) When only few individuals contribute to the next generation, and the vast majority does not, i.e. when the variance in reproductive success is high, false signals of bottleneck may emerge in stable populations (Hoban et al. 2013b). Direct measure of the variance in reproductive success are not available for B. variegata, but we know that the vast majority of females reproduce and the number of eggs per clutch is relatively small (Barandun et al. 1997), and also that the likely mating system in this species probably allows many males to fertilize eggs (Sanderson et al. 1992; Vines 2003). It seems therefore unlikely that large variance in reproductive success is the cause of our results. (4) Wrong mutation models and rates, and a wrong generation time, obviously introduce bias in inference and estimates. Most of the methods applied here used a microsatellite-specific multistep mutation model, a mutation rate based on the only direct estimate known for an amphibian (1.27 × 10−3), and a generation time estimated for B. variegata (3 years). Estimates of population sizes and times since the beginning of the inferred decline increase by applying the slightly slower “generic” rate commonly used for this type of markers in non-amphibian species (5.0 × 10−4), or by increasing the generation time to the value of 5 years estimated in some populations of the related species B. bombina (Vines et al. 2003; Fog et al. 2011). Nevertheless, the major conclusions of this study, i.e. that the recent population decline is not the main responsible for the observed genetic variation pattern, remains robust, and this is true even increasing the mutation rate to very large and uncommon values of 2 to 4 × 10−3 per generation (Peery et al. 2012). (5) Bottleneck detection and estimation is based on the assumption that only one demographic event occurred in the past. This is an oversimplification of the real history of a population, but it is unclear how multiple events (e.g. sequential bottlenecks) can affect these analyses (Goossens et al. 2006; Okello et al. 2008; Sharma et al. 2012). More simulation studies (Hoban et al. 2012) are required to better understand the behavior of these methods under complex demographic scenarios including “multi-events” models, necessarily requiring several parameters. Highly informative genomic data sets, such as large panels of SNPs or whole genomes, could be used in such situations.

Conservation issues and actions

Protecting B. variegata populations where demographic reductions have been documented, and possibly favoring a demographic increase, is important to prevent further genetic variation erosion. As shown also in a recent simulation study, much of the variation can be preserved if quick action is implemented (Hoban et al. 2014). However, since the current level of genetic variation in most of the populations we analyzed is not extremely low, and the genetic impact of the recent decline, if any, appears limited, some optimism regarding the possibility of a complete recovery without risks of negative genetic consequences is justified. Our results do not clearly indicate a specific environmental situation where, in general, conservation efforts should be focused, and even some recent studies based on non-genetic data suggest that this question has not a simple answer. Hartel and von Wehrden (2013), in fact, found that traditional farming practices produce a large number of suitable ponds and should be preserved, but Scheele et al. (2014) observed that the pasture ponds, compared to those in forest, tend to host individuals in worse body conditions. However, our study does show that highest priority might be given to the populations of Spiz and Nago, since they showed lowest values of diversity, clear evidence of extreme contraction of effective population size, and some (weak) evidence of the genetic impact of a recent decline. Spiz is one of the two highly isolated populations in our study located at high altitude, where the negative effects of global warming may additionally increase the risk of local extinction. In fact, if early breeding is commonly associated with increasing temperature (Blaustein et al. 2001), the increased probability of late frosts can have fatal consequences on early-bred spawn (Henle et al. 2008). Nago is located in a high tourism area, where anthropogenic disturbance may be impactful. Interestingly, some signature of the recent introduction of individuals from other areas has been found in Nago, and it would be useful to determine if this migration (likely due to human releases) could have inadvertently, but positively, reduced the risk of inbreeding in this highly homogenous and genetically isolated population.