Introduction

Open habitats (savannas and grasslands) of the Neotropics occupy, semi-continuously, several ecoregions, mainly Cerrado and Pampas, and dominate important extensions of the Humid Chaco (Van Els et al., 2021). These formations are also present, but as smaller enclaves, in Caatinga, Amazon forest, Atlantic forest (semideciduous), and lowlands of northern South America, like the Guianan Savanna (Salgado-Laboriau, 1997; Behling and Hooghiemstra, 2001).

Several vertebrate species are associated with open habitats in the Neotropics; however, the evolutionary history of only a few has been studied. In this context, birds are the most studied group (Van Els et al., 2021; Ritter et al., 2021; Lima-Rezende et al., 2022), followed by reptiles (Quijada-Mascareñas et al., 2007; Santos et al., 2014). In frogs, several species mainly associated with Cerrado, Chaco, or Caatinga, and some species from the Diagonal of Open Formations (combination of Caatinga-Cerrado-Chaco) have been studied (e.g., Prado et al., 2012; Thomé et al., 2016; Oliveira et al., 2018; Brusquetti et al., 2019; Vasconcellos et al., 2019). Of these, only Dermatonotus muelleri (Boettger, 1885) inhabit open habitats (Oliveira et al., 2018), and the other species are associated with gallery forests and/or with the borders of gallery forests of the Cerrado (Prado et al., 2012; Vasconcellos et al., 2019) or occur mainly in dry forests of the Caatinga and the Chaco (Thomé et al., 2016; Brusquetti et al., 2019).

The Cerrado ecoregion harbors the major and most continuous savanna-like environments in the Neotropics (Salgado-Labouriau, 1997). The compartmentalization of the Cerrado landscape between plateaus (above 500 m altitude) and depressions (below 500 m altitude) due to the uplift of the Central Brazilian Plateau (CBP, 2–7 Mya; Silva, 1997) is suggested as one of the most important vicariant events that could have influenced the diversification of Cerrado species (Nogueira et al., 2011; Werneck, 2011). This event has been suggested as responsible for the diversification of some frogs and lizards (Werneck et al., 2012a; Recoder et al., 2014; Guarnizo et al., 2016; Oliveira et al., 2018), but with different mechanisms, including vicariance and dispersal. For example, a vicariant event was evoked to explain the diversification of the frog D. muelleri (Oliveira et al., 2018), suggesting that the highlands of the CBP would have split a widely distributed ancestral population. In a scenario of dispersion, the so-called plateau-depression hypothesis (Santos et al., 2014) suggests that the ancient plateaus harbor older lineages than the recently formed depressions, which were formed by erosion during the Quaternary and hold younger lineages resulting from recent colonization events (Silva, 1997).

Another hypothesis that has been invoked to explain diversification in open habitats is based on the Pleistocene refugia hypothesis (Haffer, 1969). This hypothesis suggests that Pleistocene climatic fluctuations would have induced several changes in South America vegetation cover. For example, the Amazon forest, during glacial periods, would have been invaded by tracts of savanna, isolating the forest in patches that would serve as refuges for forest-dependent taxa, leading to allopatric speciation. In contrast, during interglacial periods, the forest would have expanded and reconnected, leading to the range expansion of newly diversified taxa (Haffer, 1969).

Pleistocene climatic changes probably also led to vegetation changes in the Cerrado and other ecoregions, like Humid Chaco and Pampas. The warmer and more humid climate during interglacial periods would have promoted the geographic expansion of forest species, causing the reduction and fragmentation of savanna and grassland formations (Mäder & Freitas, 2019; Giudicelli et al., 2021), thus isolating animals associated with these environments. This hypothesis, named “isolation by instability”, suggests diversification by reduced gene flow between populations isolated by unstable areas (Vasconcellos et al., 2019).

All these scenarios are known to leave genetic signatures related to split events and historical demography. In a vicariant scenario some stability in historical demography and main split events coincident with the mentioned event are expected (Raposo do Amaral et al., 2013), while under a scenario of dispersion, new populations are expected to show lower genetic diversity and demographic growth originating from a small size population (founder effect; Santos et al., 2014). In a scenario related to Pleistocene climatic changes, populations are expected to have recently split and to have indications of a bottleneck (Carnaval et al., 2009).

Through approximate Bayesian computation (ABC) it is possible to generate specific models corresponding to these scenarios and find the most likely to have occurred according to empirical data (Beaumont, 2019). ABC has become popular for analyzing complex problems in a variety of fields (Beaumont, 2019) because the method bypasses the exact likelihood calculations by using summary statistics and simulations, thus expanding the range of models for which statistical inference can be considered (Sunnåker et al., 2013). Briefly, ABC analysis consists of formulating a model, fitting the model to data (parameter estimation), and comparing it to other models (Csilléry et al., 2010; Gelman et al., 2003). Furthermore, the increase in the number of ABC studies has led to the development of methods that can improve the accuracy of model choice, like regression-adjustment and the choice of summary statistics (Beaumont, 2019).

Scinax fuscomarginatus (Lutz, 1925) is an interesting model to test hypotheses about diversification in Neotropical open habitats due to its wide geographic distribution, primarily associated with savannas and grasslands. The species is a small treefrog that inhabits South American open areas east of the Andes, being one of the most widely distributed Neotropical anuran species (Brusquetti et al., 2014). This species occupies open habitats of Humid Chaco, Cerrado, Pantanal, Chiquitano dry forest, Mato Grosso seasonal forest, and Guianan savanna (Brusquetti et al., 2014).

We used mitochondrial and nuclear DNA of a wide sampling of S. fuscomarginatus to estimate genetic clustering and, based on ABC, tested the diversification hypotheses that could be responsible for the current genetic pattern. We tested hypotheses related to the uplift of the CBP (vicariance and plateau-depression hypotheses) and to the climate changes through Pleistocene glacial cycles (isolation by instability hypothesis).

Materials and Methods

Sampling, Laboratory Procedures and Molecular Methods

We produced a multilocus dataset including one mitochondrial and three nuclear gene fragments: mtDNA consisted of 615 bp of the Cytochrome c oxidase subunit I gene (COI) and nuDNA of 395 bp of the Ribosomal protein L3 (intron 5) gene (RPL3), 380 bp of the B–fibrinogen (intron 7) gene (BFIB), and 346 bp of the Lactose Dehydrogenase Chain Beta (Intron 3) gene (MVZ 27–28).

We extracted total genomic DNA from samples preserved in 95–100% ethanol (muscle or liver), using the DNeasy extraction kit (Qiagen, Valencia, CA, USA), following the manufacturer’s protocol. We amplified gene fragments via polymerase chain reaction using published primers and a commercial kit (Master Mix, Fermentas). For the amplification of the mitochondrial gene we used a step-up reaction (UP) following Lyra et al. (2017). For the amplification of the nuclear gene fragments we used an initial denaturation step of 3 min at 94 °C, followed by 35 cycles (45 cycles for difficult samples) (30 s of denaturation at 95 °C, 30 s of annealing at 50–64.3 °C, and 45 s of extension at 72 °C), and a final extension step of 7 min at 72 °C (see Table S1 in Online Resource for details). We purified PCR products using ExoSAP (Fermentas) and sent them to Macrogen Inc. (Seoul, South Korea) for sequencing. We checked chromatograms and edited sequences in CODONCODE ALIGNER v. 3.5.4 (Codon Code Corporation).

We separated sequences of individuals with heterozygous indels with the algorithm Process Heterozygous Indels in CodonCode Aligner v. 3.5.4 (Codon Code Corporation) and used Phase 2.1 (Stephens et al., 2001) implemented in DnaSP 5.1 (Librado & Rozas, 2009) to resolve haplotypes of heterozygous individuals, discarding those with less than 0.60 of the posterior probability. We aligned sequences from each gene fragment separately with muscle (Edgar, 2004) in MEGA 6 (Tamura et al., 2013). Within nuclear loci, we tested for recombination with PhiTest implemented in Splitstree v. 4.2 (Huson & Bryant, 2006).

To diminish missing data, after Phase analyses, we kept samples with sequences for at least the mtDNA gene and two nuclear genes, except for two samples with one nuclear gene (from northeastern Santa Cruz, Bolivia and southern Mato Grosso do Sul, Brazil) which were kept in order to maintain geographic representation. The final matrix included 68 tissue samples of S. fuscomarginatus from 43 localities (Fig. 1), representing populations from Cerrado, Humid Chaco, Chiquitano dry forest, Pantanal, Guianan savanna, and Mato Grosso seasonal forest (see Appendix S1 in Online Resource).

Fig. 1
figure 1

Sampled localities and population assignment of Scinax fuscomarginatus. Horizontal bars (right) correspond to each specimen analyzed in GENELAND, genetic clusters (groups) are represented by colors: group A = gray; group B = orange; group C = blue; group D = yellow. Numbers in the map and horizontal bars correspond to the map code in Appendix S1 in Online Resource. Blue lines represent main rivers, and black lines represent country borders. Ecoregions were modified from Olson et al. (2001)

Population Assignments

To assign individuals to genetic clusters we followed Carstens et al. (2013) including methods for the discovery of clusters followed by a validation step. Firstly, we tested for a correlation between genetic distance and geographical distance (isolation by distance, IBD). For this, we calculated genetic distance matrices for each nuclear gene in MEGA and geographic distance matrices using the dist function in R version 4.2.1 (R Core Team, 2019). We performed the Mantel test between both matrices for each nuclear gene fragment using the function mantel.rtest in the ‘ade4’ v. 1.7 R package (Dray & Dufour, 2007), employing 10,000 Monte Carlo permutations. For all fragments, we found a significant association between genetic and geographic distances (see Results section), indicating non-random-mating, which can affect the efficiency of methods of population assignment that do not incorporate the geographic location of individuals (e.g., STRUCTURE). In order to overcome the possible influence of isolation by distance, we used methods that consider the geographic distribution of individuals (GENELAND and BAPS) (see Thomé et al., 2016).

We ran GENELAND package v. 4.9.2 in R (Guillot et al., 2005a, b) for mitochondrial and nuclear datasets together, using ten independent runs and varying the number of populations from 1 to 15. Each run consisted of 200 iterations of burn-in, followed by 1,000,000 sampled iterations, sampled every 1,000 iterations. We assumed a spatially explicit model with the exact coordinates of samples and an uncorrelated model for allele frequencies.

We ran BAPS (Corander et al., 2008) combining nuclear and mitochondrial genes in a single dataset using the coordinates of samples. As the analysis of combined haploid and diploid sequence markers is technically not feasible in baps, we conducted a diploid analysis considering both phased haplotypes of nuDNA for each individual combined with its correspondent mtDNA sequence, which was duplicated in order to not allow a missing allele. We first ran the mixture model using a k range of 1–10, three times for each k. Then we used this result to run the admixture model (Corander & Marttinen, 2006) using 100 interactions, 200 reference individuals, and 10 interactions per individual. Admixed individuals were considered significant at P < 0.05.

In order to validate the different population assignments that resulted from GENELAND and BAPS, we used the multispecies coalescent model (Rannala & Yang, 2003; Edwards, 2009; Yang & Rannala, 2014) implemented in BPP v. 4.3 (Yang & Rannala, 2010) to estimation of the number of populations and topology of trees. This method has been successfully applied to species delimitation from multilocus datasets with the benefit that it accounts for gene tree conflict by modeling coalescent stochasticity when estimating species tree as a framework for species delimitation (Liu et al., 2009; Yang & Rannala, 2010). Moreover, this method has proven suitable for diagnoses population structure associated with population isolation but not species limits (Freudenstein et al., 2017). First we inferred a species tree using the multilocus coalescent model implemented in *BEAST (Heled & Drummond, 2010) on BEAST v. 2.4.3 (Bouckaert et al., 2014) for each population assignment (GENELAND and BAPS) to use as a guide tree for BPP analyses, considering the four loci. The nucleotide substitution model, range of the rate of heterogeneity, and proportion of invariant positions were inferred during the MCMC analysis with ‘bModelTest’ v. 1.2.1 package (Bouckaert, 2015) implemented in BEAST, with transition-transversion split option and empirical frequencies. For all loci we used strict clock model. We ran 100 million generations sampling every 10,000 with the Yule model tree prior and constant population size. To assess the stationarity of the Markov chains we examined the effective sample size (ESS) in the software TRACER v. 1.6 (Rambaut et al., 2013), ESS > 200 was expected at stationarity. The species tree was annotated with TREEANNOTATOR v. 2.4.3 (Drummond & Rambaut, 2007) as a maximum clade credibility tree and median heights as node ages; the first 2,000 trees (20% of the sampling) were discarded as burn-in. In BPP, as prior distributions of ancestral population size (theta) and root age (tau) can affect the posterior probabilities of the models (Yang & Rannala, 2010), we tested the resulting population assignments hypothesis using a range of prior combinations that included relatively large and small ancestral population sizes in combination with relatively deep and shallow divergence times, following Leaché and Fujita (2010). We performed A11 analyses (Yang & Rannala, 2014) for joint estimation of the number of populations and topology of trees using distinct combinations of theta and tau with the algorithms 0 and 1. The theta parameter (ancestral population size) varied from small (IG = 3, 0.02) to large (IG = 3, 0.2) and the tau parameter (divergence time) from deep (IG = 3, 0.01) to shallow (IG = 3, 0.001). We ran each analysis with automatic finetune for 100,000 generations, with 8,000 generations discarded as burn-in. For each prior combination we ran two replicates under the algorithms 0 and 1, totaling 16 analyses for each population assignment.

Genetic Diversity

For the mitochondrial gene fragment we calculated the basic statistics for each population assigned in the previous step: number of haplotypes, haplotype diversity, nucleotide diversity, and index of genetic differentiation Fst, and performed neutrality tests (Tajima’s D, R2 and Fu’s FS) in DnaSP 5.

Construction of Models to be Tested with Approximate Bayesian Computation (ABC)

All scenarios follow GENELAND assignment and the species tree topology inferred by *BEAST (see Results section and Online Resource), including four populations with geographic distribution limits that mainly follow boundaries between lowlands and plateaus of central Cerrado. Based on those population assignments we discarded the hypotheses of ancient divergence related to the center of CBP (Oliveira et al., 2018). We constructed four alternative diversification models representing three main scenarios: a simple vicariant scenario, a colonization scenario, and a recent refugia scenario (Fig. 2).

The vicariant model (model 1) considers constant population sizes after splits, with the first split corresponding to CBP uplift (7 − 2 Mya). The plateau-depression model (model 2), represents the colonization scenario, with stable populations in plateaus and small ancient populations in depressions (founder effect), with the first split also corresponding to CBP uplift. Two isolation by instability models (models 3 and 4) represent the refugia scenario. The first one with all populations experiencing size reductions, denoting small or micro-refuges (model 3), and the other with stable populations in highlands of the CBP, denoting a stable open environment area in central Brazil proposed for Cerrado trees (Collevatti et al., 2012; de Oliveira Buzatti et al., 2017; de Buzatti et al., 2018), which roughly corresponds to that region (model 4). In both latter models the most ancient split corresponds to Last Interglacial (LIG; 150,000 years ago).

Fig. 2
figure 2

Simulated scenarios of Scinax fuscomarginatus diversification. Genetic clusters (groups) are represented by colors. Times of divergences represented by t1, t2 and t3 and other priors are detailed in Materials and Methods. Best-supported model is highlighted in the red box

Model Testing with Approximate Bayesian Computation

We used the program ms (Hudson, 2002) to simulate genetic data under the defined demographic models. The simulations were conducted using the same loci lengths and sample sizes as the empirical data and run independently for each locus. The summary statistics computed for each marker were considered jointly in the model choice.

We conducted a preliminary analysis with 100,000 replicates to determine appropriate prior intervals for effective population size (Ne), θmit = Neµ and θnuc = 4Neµ. Firstly, we performed a cross-validation test to assess the performance of different methods of rejection [regular rejection (REJ), multinomial logistic regression (MNLOG) and neural networks (NN)], values of tolerance (0.001, 0.01, 0.05 and 0.1), and different combinations of summary statistics as vectors of three to five initially estimated statistics (pi, ss, D, thetaH, H). We applied the function cv4postpr from the R package ‘abc’ v. 2.1 (Csilléry et al., 2012; http://cran.r-project.org/web/packages/abc/index.html) which performs 10 simulations from the prior distribution of each model, used as PODs (pseudo-observed data). We considered as indicative of performance the ability to maximize the probability of choosing the simulated model over other models following Tsai and Carstens (2013). Then, we estimated new parameters with the function abc using the best performing method of rejection, value of tolerance and vector of summary statistics.

For this preliminary analysis we estimated a prior interval for the effective population size (Ne). Considering Ne = θ/µ, we used a mean COI mutation rate (µ) estimated for amphibians of 0.7% substitutions per lineage per million years based on the estimations of Freilich et al. (2014) and Meng et al. (2014) and theta-W calculated for COI in DnaSP (theta of COI = 3.88–28.86) resulting in Ne values from 900,000 to 6,000,000 that were used in a uniform distribution. For θ of mitochondrial and nuclear loci, we used broad intervals (0.01–150) drawn independently for each locus from a uniform distribution. We converted divergence times (t) in 4 N generations (coalescent time) using the formula: number of generations/Ne, assuming a generation time of one year (following Toledo and Haddad, 2005) and Ne previously estimated from COI. The prior for the time of the most ancient split event for models 1 and 2 (T3) represents the CBP, from mid-tertiary to early-quaternary (final plateau uplift), 7 to 2 Mya (Silva, 1997). To represent this time interval, we used a lognormal distribution, with a mean of 14.9 and standard deviation (SD) of 0.6, which in years corresponds to 1.1 to 7.9 Mya (95% HPD) with a median of 2.9 Mya. For models 3 and 4 the prior for T3 represents main interglacial periods of the Upper Pleistocene, the LIG and Sangamon interglacial period together (from 150 to 60 Kya). The Sangamon interglacial period represents the last interglacial period; some definitions localize this period between 130 and 70 Kya, but more recent definitions restrict this period between 130 and 115 Kya (Otvos, 2015; Gowan et al., 2021). However, after the Sangamon period, glacigenic deposits of the last Patagonian glaciation were dated from 60 Kya, although deposits of the Last Glacial Maximum (LGM) correspond to 25–20 Kya (Rutter et al., 2012). To represent this time interval, we used a lognormal distribution, with a mean of 12 and SD of 0.6, which in years corresponds to 60 to 437 Kya (95% HPD) with a median of 163 Kya. In all models subsequent split events (T2 and T1) were inserted between T3 and the present.

To draw the priors related to founder effect and bottleneck we followed Perez et al. (2016). The intensity of the the founder effect in the plateau-depression model (model 2) was sampled from a uniform distribution from 0 to 0.1 of the original population size. The exponential population growth rates were defined to ensure a size ratio of the founder and current population. To do this, we used a beta distribution with a mean of 1.0 and SD 15.0, concentrating this prior in very low values (median of 0.0452 and 95% HPD from 0.00169 to 0.218) to assure that the current population size sampled is several times larger than the founder population. The intensity of the bottleneck in the isolation by instability models (models 3 and 4) was sampled from a uniform distribution from 0.01 to 1 of the original population size. Statistics (pi, ss, D, thetaH, H) from empirical data were estimated with sample_stats.exe from a ms file generated in DnaSP.

For the final analysis, we adjusted priors for Ne and theta based on preliminary analysis results and kept the other parameters invariant. We conducted 500,000 replicates for each model and performed a new cross-validation test to compare the performance of the ABC analysis using the simulated data with and without transformation by Principal Component Analysis (PCA). In the transformed dataset, we replaced the summary statistics by the first six Principal Components, which comprise 90% of the information contained in the original dataset. We performed the ABC model selection using the algorithm, level of tolerance, dataset and summary statistics selected in both cross-validation steps.

After model selection, we checked whether the best-supported model provided a good fit to the data using the function gfit of the R package ‘abc’. To validate model selection results we performed a posterior predictive check (ppc). For this, we estimated parameters related to the best-supported model from simulations and from these parameters generated new summary statistics to check whether the selected model is able to reproduce observed values of summary statistics.

Results

Population Assignments

We found a significant association between genetic and geographic distances for all nuclear genes (MVZ 27–28: r = 0.66, P < 0.001; BFIB: r = 0.12, P < 0.05; RPL3: r = 0.15; P < 0.01). Different population assignment methods produced different results. In GENELAND, all replicates yielded k = 4, while BAPS yielded k = 9. However, the assignment of BAPS and GENELAND are equivalent in the sense that BAPS subdivided populations found by GENELAND (see Fig. 1 and Figure S1 in Online Resource).

BPP analysis with GENELAND population assignment recovered 100% of posterior probability. With BAPS population assignment, BPP results were less consistent, with posterior probability values from 96 to 99%. Among the nine populations, lower posterior probabilities were found in the groups with only one individual each (B3, C1, and C3). Considering BPP results and that BAPS population assignment is generally a subdivision of GENELAND groups, we based subsequent analyses on GENELAND results. The species trees inferred by BPP analyses were not reported because of low posterior probabilities, specially with BAPS assignments (0.0465 of posterior probability for best species tree, considering all runs). With GENELAND assignments, posterior probabilities were not so low, however the highest posterior probability was just 0.597, including, in all runs, three models in the 95% confidence interval.

In the GENELAND assignments (Figs. 1 and 3), group A clusters together samples from Cerrado lowlands, Mato Grosso seasonal forest, and Humid Chaco, from northeastern Brazil to northern Argentina; group B includes samples mainly from the highlands of the CBP, from central to southeastern Brazil; group C clusters the single sample from the Guianan savanna, with samples from western Cerrado and Chiquitano dry forest; and group D includes two samples from a single locality, from a transitional area between southern Cerrado and western Atlantic forest.

In all populations resulting from GENELAND analysis we found high levels of haplotype diversity (Hd) and low levels of nucleotide diversity (pi), which suggest recent population expansions, in concordance with the R2 index. Tajima’s D and Fu’s Fs were not significant (Table 1).

Fig. 3
figure 3

Sampled localities and population assignment of Scinax fuscomarginatus. Genetic clusters (groups) are represented by colors following GENELAND results: group A = grey; group B = orange; group C = blue; group D = yellow. (a) Geographical distribution of highlands (b) results of GENELAND population assignment analysis, colors represent the posterior probability of individuals membership, darker colors correspond to higher posterior probabilities. The sample from Venezuela (Group C) was not included in order to maintain visibility of the samples distribution

Table 1 Basic statistics of Scinax fuscomarginatus mitochondrial locus fragment (COI). N, number of individuals; H, number of unique haplotypes; Hd, haplotype diversity ± standard deviation (sd), π nucleotide diversity per site ± sd; D, Tajima’s D; R2, Ramos-Onsins and Rozas’s R2; Fs, Fu’s Fs

Model Testing with Approximate Bayesian Computation

The first cross-validation tests showed that the algorithm with best performance (higher probabilities of choosing the simulated models) was the MNLOG with tolerance of 0.05 and the vector of summary statistics containing ss, theta-H and H. The second cross-validation test, to compare the performance of the ABC analysis using the simulated data with and without transformation by PCA in final simulations, showed no significant differences in performances. Following this result, we used both datasets to choose the best model.

With both the transformed and untransformed datasets, model 3 resulted in the highest posterior probability (PP) (transformed dataset PP = 0.74 and untransformed dataset PP = 0.82) and Bayes Factor “substantial” relative to the best alternative model (model 4, isolation by instability with refuge in CBP) (Table 2). In gfit test, for both transformed and untransformed datasets, the best-supported model provided a good fit to the observed data (see Figure S2 in Online Resource). Posterior predictive checks (PPC) also supported the adequacy of the results showing the values of the observed data within the values of the simulated data (see Figure S3 in Online Resource).

Table 2 Posterior probabilities (PP) and Bayes Factor (BF, the ratio of the marginal probabilities of data under two competing models) of each diversification model for Scinax fuscomarginatus tested in ABC analysis. BF was compared with the best alternative model. Refuge 1, bottleneck effect in all populations; Refuge 2, bottleneck effect in lowland populations; transf., transformed datasets by PCA (Principal Component Analysis); untransf., untransformed datasets. Values in bold indicate the best model. Representation of the models is shown in Fig. 2

ABC supports a scenario of multiple refuges, both in plateaus and lowlands (model 3: isolation by instability hypothesis), in which all populations experienced population size reduction (bottleneck effect) after divergence in a recent time (Pleistocene).

Estimated parameters show that all populations have decreased by about a third compared to ancestral population size (Table 3). The median of the most ancient split event (median for T3 = 173.05 Kya) and of the subsequent events (median for T2 = 75.6 Kya; median for T1 = 29.9 Kya) range from Middle to the Upper Pleistocene (Table 3).

Table 3 Demographic parameters estimated under multinomial logistic regression (MNLOG) for Scinax fuscomarginatus. CI, confidence interval; Ne, effective population size in units of individuals; θnuc, ancestral population size for nuclear loci; θmt, ancestral population size for mitochondrial locus; θr is the ratio of the ancestral population size in the current population; τ1, τ2 and τ3, divergence times in Mya

Discussion

We found different population assignments with different methods, according to GENELAND, S. fuscomarginatus is genetically structured in four geographically associated lineages, while BAPS estimations resulted in nine. In several studies where more than one method of population assignment was used, an overestimation of genetic clusters by BAPS was observed (Latch et al., 2006; Zachos et al., 2016; Pinho et al., 2019; Terrones et al., 2022). Tests with simulated and real datasets suggested that the overestimation of K observed in BAPS is more frequent in populations poorly differentiated (Latch et al., 2006) and in populations with low number of individuals (Terrones et al., 2022); with a tendency to identify clusters with low number of individuals (Zachos et al., 2016) or spurious designation of individuals to clusters of only one individual (singleton clusters) (Pinho et al., 2019). In our BAPS analysis we found clusters from over ten to just one individual, and in all cases corresponding to subdivisions of clusters assigned by GENELAND. With BPP we found that GENELAND individual assignments were fully supported. Based on consistency of BPP results and on the fact that BAPS subdivided the clusters estimated by GENELAND we considered the results of BAPS an overestimation.

According to GENELAND assignments, S. fuscomarginatus is genetically structured in four geographically associated lineages, which follow the current distribution of open habitats of highlands of the CBP (groups B and D) and of the Cerrado lowlands, Mato Grosso seasonal forest, Humid Chaco, Chiquitano dry forest, and Guianan savanna (groups A and C) (Figs. 1 and 3). Among the models tested with ABC, the most likely to have occurred is the “isolation by instability” model, a scenario of multiple refugia in which all populations experienced population size reduction (bottleneck) in a relatively recent time, with the most ancient split event surrounding the LIG (about 150 Kya). Bottleneck events are also supported by basic statistics of the mitochondrial fragment. High haplotype diversity combined with low nucleotide diversity might be associated with recent population expansion (Eizirik et al., 2001; Althoff and Pellmyr, 2002; Joseph et al., 2002; Stamatis et al., 2004). This pattern is probably a consequence of an accumulation of mutations which results in a high number of closely related haplotypes, that is, many recently evolved haplotypes. We hypothesized that during interglacial periods, warmer and wetter climate would have favored the geographic expansion of forests, resulting in the retraction and fragmentation of open habitats like savannas and grasslands. Populations of S. fuscomarginatus, as well as other species associated with open habitats, would have remained isolated in relatively small stable areas with reduced gene flow among them.

Connections Between Open Areas North and South of the Amazonia

Although our sampling efforts were focused on open areas south of the Amazonia and testing connections through the Amazon forest is beyond our goals, we consider that the clustering of samples of populations from both sides of it deserves some discussion. With the exception of the disconnected Guianan savanna, which is part of the northern Amazonian savannas (Daly & Mitchell, 2000), the area occupied by samples of genetic groups A and C is a large and continuous savanna-like environment. A similar distribution has been described for birds (Mittermeier et al., 2010; Lima-Rezende et al., 2019, 2022; Ritter et al., 2021), snakes (Wüster et al., 2005; Quijada-Mascareñas et al., 2007), and plants (de Buzatti et al., 2018; Resende-Moreira et al., 2019). Rather than long-distance dispersal, recent connections through corridors in Amazon forest are suggested to explain this pattern (Lima-Rezende et al., 2019). Four corridors have been proposed: by Atlantic coast (Silva, 1995; Werneck et al., 2012b), by central Amazon forest (Haffer, 1967; Ledo & Colli, 2017; Ribeiro et al., 2016), and two by western Amazon forest: one along the base of the Andes mountain range (Silva, 1995; Webb, 1978; Werneck et al., 2012b) and another through the Madeira River basin (Ribeiro et al., 2016). However, palynological studies (Colinvaux et al., 1996, 2000; Bush et al., 2004) and past model reconstructions (Cowling et al., 2001; Claussen et al., 2013; O’ishi & Abe-Ouchi, 2013; Hopcroft & Valdes, 2015) have contradicted the connections of open habitats through central Amazon forest, showing a history of forest stability in the region.

Considering the geographic distribution of samples clustered in Group C (Fig. 1) and the above-mentioned evidence against Central Amazonian corridor, a plausible route for S. fuscomarginatus connections is by western Amazonia. Lima-Rezende et al. (2022) found high migration levels between Cerrado and Guianan savanna populations by the West Amazonia corridor in four bird species. Currently, this region is a continuum of open formations from Cerrado to the Madeira River basin, which are connected by Chiquitano dry forest and Beni Savanna (Fig. 1). A study of soil carbon isotopes suggested expansion of savanna vegetation in this area from early to middle Holocene (de Freitas et al., 2001) and grass and herb pollen records indicate predominance of tropical grasses and drier climate during Last Glacial Maximum (LGM) (Van Der Hammen & Hooghiemstra, 2000). However, Arruda et al. (2018) found that savanna expansions were restricted only to ecotonal areas and the forest was not replaced by savanna, but by another type of forest. Furthermore, other studies support stability in Amazon forest during LGM (Costa et al., 2018; Kern et al., 2022) and even an increase in precipitation during the last glacial cycles (Wang et al., 2004, 2017; Cheng et al., 2013).

As mentioned, grass pollen records have been interpreted as direct evidence of dry conditions (Van Der Hammen & Hooghiemstra, 2000); however, grass can occur in a variety of environments, especially in floodplains (Kirschner & Hoorn, 2020). The fluvial system of central Amazonia constituted by continuous areas of Terra Firme forest (non-flooded forests) and well defined river channels was only recently established, between 250 and 45 Kya, replacing widespread floodplains (Pupim et al., 2019; Hoorn et al., 2010). Central Amazonia would have been dominated by a broad Várzea-like environment with fewer areas of non-flooded forests than today (Pupim et al., 2019). These widespread flooded habitats could be considered a suitable environment for species such as S. fuscomarginatus associated with grass and herbaceous vegetation in floodplains (Prado et al., 2005). The predominance of Várzae-like environment and past dynamics of expansions and retractions of floodplains during the late Quaternary could have promoted connections through the Amazon forest. However, it is clear that a denser sampling, mainly in Guianan savanna, is required to adequately test this hypothesis or any other related to gene flow among populations both sides of the Amazon forest.

Interglacial periods as main diversification promoters in Scinax fuscomarginatus

Our ABC results support a more important influence of Pleistocene interglacial periods than old Pliocene vicariant or dispersal events related to the final uplift of the CBP. However, the geographic distribution of the genetic structure suggests an influence of the highlands of CBP, once the geographic distribution of the most widely distributed groups (A and B) corresponds with its limits (Fig. 3). During late Miocene, the CBP compartmentalization may have acted as the main driver of genetic structure of some widely distributed frogs, like D. muelleri and Physalaemus cuvieri Fitzinger, 1826 (Oliveira et al., 2018; Miranda et al., 2019). In other frog species as Boana lundii (Burmeister, 1856) and Boana raniceps (Cope, 1862), and several plant species, although genetic clustering is in concordance with boundaries of the CBP, diversification processes are more recent, related to Pleistocene events (Vasconcellos et al., 2019; Camurugi et al., 2021; de Buzatti et al., 2018). de Buzatti et al. (2018) suggested that the pattern observed in many phylogenetically distant plant species indicates a vicariant event at this region. However, in our ABC analysis the vicariant model (model 1) with the older split event in concordance with the CBP shows the lowest posterior probability (Table 2). The confidence of phylogeographic inferences based on demographic models is challenged by the appropriate model selection and also by misspecification of the models (see Carstens et al., 2022). Besides the mentioned issues and independently of the methods used in phylogeographic inferences, it is important to take into account that genetic signatures of ancient events may have been erased by recent events (Avise, 2000). Also, based on geographic distribution of main genetic populations (groups A and B), we suggest that current topography of the Cerrado may be acting on the genetic diversity of S. fuscomarginatus. The final uplift of the CBP (late-Tertiary to early-Quaternary) was accompanied by the subsidence of peripheral areas, forming depressions that progressively expanded during Quaternary cycles of erosion (Cole, 1986; Brasil & Alvarenga, 1989). In addition to increasing topographic complexity, these depressions harbor seasonally dry forests or gallery forests (Santos et al., 2014), reinforcing the compartmentalization of savanna and grassland habitats. Hence, as suggested for B. raniceps (Camurugi et al., 2021), although the initial diversification of S. fuscomarginatus may not have been influenced by the CBP uplift itself, the genetic structure within the species seems to be maintained by the current landscape of the Cerrado.

The most ancient genetic divergence (T3) of our best-supported model (model 3) is of about 160 Kya, corresponding to the LIG. We hypothesized that during the LIG, savanna and grassland like environments were fragmented, isolating S. fuscomarginatus populations and promoting genetic differentiation. Other open-habitat frogs with similar geographic distribution [Boana albopunctata (Spix, 1824), B. lundii, and B. raniceps] also showed an influence of climatic oscillations during the Pleistocene. However, divergence between genetic populations within these species was suggested to have occurred during cold and dry periods (glacial phases) from early to middle Pleistocene (Prado et al., 2012; Vasconcellos et al., 2019; Camurugi et al., 2021) and not in warm and humid periods (interglacial phases) as we propose for S. fuscomarginatus. We suggest that the differences in times of diversification between these open-habitat species and S. fuscomarginatus could be due to specific habitat associations. The Cerrado treefrogs B. lundii and B. albopunctata are associated with gallery forests (Prado et al., 2012; Vasconcellos et al., 2019), and B. raniceps, that although more widespread, is also mainly associated with gallery forests (Vaz-Silva et al., 2020). Expansions of forest formations during interglacial phases probably fragmented and isolated populations of S. fuscomarginatus, a species associated with savannas and grassland habitats (Toledo & Haddad, 2005; Brusquetti et al., 2014; Pupin et al., 2020), and, on the other hand, promoted expansions of the other mentioned species that are mainly associated with gallery forests.

Studies on population dynamics of plants associated with Neotropical savannas and grassland environments support range expansions in glacial phases, with a clear pattern in southern ecoregions, like Pampas and Humid Chaco (Mäder & Freitas, 2019; Giudicelli et al., 2021). Palynological records also support cycles of expansion of forest plant species in southeastern Brazil during the LIG (de Oliveira et al., 2020), and expansions of savannas from south as far as 20º latitude during the last glacial period (Behling, 2002). On the other hand, niche model studies based on tree species distribution (Caryocar brasiliense, Qualea multiflora, Q. parviflora, and Q. grandiflora) show evidence of geographic retraction during glacial phases (Ramos et al., 2007; Collevatti et al., 2012; de Oliveira Buzatti et al., 2017, 2018); however, these species are widely distributed in all phytophysionogmies of the Cerrado, not only those related to open habitats. The Cerrado ecoregion harbors a mosaic of formations, like semideciduous and gallery forests, and savanna vegetation proper (Salgado-Labouriau, 1997; Behling & Hooghiemstra, 2001; Bueno et al., 2017). In turn, these savanna formations are dominated by grasses but with different levels of density and sizes of trees and shrubs, ranging from grasslands with few or no tall woody plants to dense forest with more or less closed canopy (Silva, 1997). As mentioned before, these tree species are widely distributed in all phytophisignomies of the Cerrado, except gallery forest and semideciduous forest (Oliveira-Filho & Ratter, 1995; Collevatti et al., 2012; de Oliveira Buzatti et al., 2017, 2018), this habitat extent may influence analyses based on niche preferences (Collevatti et al., 2013) and may also represent environments other than those where S. fuscomarginatus occurs.

Based on the evidence presented here, we suggest an important role of the climatic dynamic during the Pleistocene and the recent compartmentalization of Cerrado on the diversification of a species highly associated with Neotropical savannas and grasslands. However, we are aware that to go deeper on the evolutionary history of this species and specifically, to better understand the trans-Amazonian connections, future research could focus sampling efforts on populations from northern side of the Amazon and also, that a denser molecular sampling will increase the capacity to better study demographic history and gene flow dynamics.