Introduction

Disentangling the drivers of spatial genetic variation is a central theme in evolution. This is because genetic variation is a major factor in a species’ ability to adapt and persist in the face of climate and habitat changes (Manel et al. 2003; Barrett and Schluter 2008). Examining the relationship between genetic differentiation and different landscape variables while accounting for the geographic distance among populations has often been the primary approach used to study the factors and processes of a population’s genetic differentiation and its spatial structure. The pattern of isolation by distance (IBD) proposed by Wright (1943) is the simplest pattern that causes genetic differentiation among populations. According to this pattern, geographic isolation leads to gene-flow restrictions and drift, implying a positive relationship between genetic differentiation and geographic distance. However, this “pure geographic distance pattern” might not often be the best predictor of geographic variation in some organisms since it does not incorporate other factors that may affect gene flow, such as landscape features, range boundaries, or environmental conditions (McRae 2006; Storfer et al. 2007; Lee and Mitchell-Olds 2011). In this sense, IBD should be contrasted with more complex patterns (Jenkins et al. 2010), such as isolation by resistance (IBR; McRae 2006) and/or isolation by environment (IBE; Wang and Bradburd 2014). IBR posits that genetic differentiation is related to the resistance distance, which is obtained by incorporating landscape variables that can constrain the dispersal among populations, such as anthropogenic barriers, mountains, rivers, soil, and vegetation type, among others (McRae 2006). Likewise, IBE considers environmental heterogeneity and local adaptation to explain the patterns of genetic differentiation, but accounts for populations’ dependency on specific environmental conditions across the landscape (Wang and Bradburd 2014).

Herein we seek to understand how geographic and landscape features influence spatial genetic variation, population structure, and gene flow through the Neotropical highland forest enclaves embedded in the Caatinga ecoregion. These forest enclaves are viewed as an archipelago of moist forest islands surrounded by an extensive semiarid tropical sea of seasonally dry forest and shrub woodlands with an average temperature of 27 °C and an annual precipitation <800 mm (Andrade-Lima 1982; Tabarelli and Santos 2004). Locally referred to as “brejos de altitude”, these highland forest enclaves are found mostly in plateaus of 500–1100 m elevation (Andrade-Lima 1982; Tabarelli and Santos 2004), and are characterized by a humid tropical or subtropical climate with mild temperatures and annual rainfall >1200 mm (Andrade et al. 2017). A growing body of evidence supports the hypothesis that the current distribution of these forest enclaves reflects past biome dynamics, resulting from multiple episodes of forest expansions and contractions during the Pleistocene (Oliveira et al. 1999; Behling et al. 2000; Auler et al. 2004; Wang et al. 2004; Silveira et al. 2019). Distinct palaeoindicators and niche models suggest that moister climates have occurred throughout much of the Pleistocene (Oliveira et al. 1999; Behling et al. 2000; Auler et al. 2004; Wang et al. 2004; Silveira et al. 2019), which would have facilitated the connection between the Amazon and Atlantic rainforests across the Caatinga (Batalha-Filho et al. 2013; Ledo and Colli 2017). Paleopalynological records also suggested the expansion of humid forests through highlands (elevations higher than 400 m) during the Heinrich Stadial 1 Event (18.1–14.7 kya), which might have allowed migration routes that connected currently isolated forest enclaves (Pinaya et al. 2019). Finally, at the beginning of the Holocene, the drier and colder climate seems to have induced the contraction of forests, confining them to highland humid patches, while the current dry conditions of Caatinga dominate the surrounding areas (Oliveira et al. 1999; Behling et al. 2000; Auler et al. 2004; Wang et al. 2004; Ledru et al. 2016; Silveira et al. 2019).

Recent studies have emphasized the role of these complex past regional dynamics in shaping the current patterns of distribution and diversification of many organisms (Carnaval and Bates 2007; d’Horta et al. 2011; Batalha-Filho et al. 2014; Buainain et al. 2020; Bocalini et al. 2021). While some isolated populations in these enclaves have accumulated enough differentiation to be recognized as separate species (e.g., d’Horta et al. 2011; Batalha-Filho et al. 2014; Cabanne et al. 2014), others are considered isolated populations from geographically structured species with wider distributions across the Atlantic and/or Amazon rainforests (e.g., Buainain et al. 2020; Bocalini et al. 2021). Two forest-dependent passerines distributed across these forest patches at high elevations of the Caatinga in northeastern Brazil, the Ceara Gnateater (Conopophaga cearae) and the Ceara Leaftosser (Sclerurus cearensis), are ideally placed to evaluate the role of past biome dynamics in shaping current patterns of genetic variation (Fig. 1). While S. cearensis is restricted to the state of Ceará (del Hoyo et al. 2020b), C. cearae has a wider distribution, occurring in the states of Ceará, Pernambuco, and Paraíba, as well as in the Chapada Diamantina in the state of Bahia (del Hoyo et al. 2020a).

Fig. 1: Distribution of the genetic variation of Conopophaga cearae and Sclerurus cearensis from island-like forest enclaves of the Brazilian Atlantic Forest domain (green in the inset map of South America) within the Caatinga domain (beige in the inset map).
figure 1

a, b Maps showing the sampling sites of C. cearae and S. cearensis, respectively, in northeastern Brazil. The blue line in a represents the Sao Francisco River. The different symbols of sampling sites depict distinct enclaves. c, d Barplots representing the ancestry coefficients for the best K-value in sNMF; the colors indicate lineages C1, C2, and C3 for C. cearae and S1, S2, and S3 for S. cearensis; the symbols above each ancestry bar represent the sampling site for that individual according to the maps. The letters on the maps (a, b) indicate the abbreviation of Brazilian states: BA Bahia, SE Sergipe, AL Alagoas, PE Pernambuco, PB Paraíba, RN Rio Grande do Norte, CE Ceará, and PI Piauí. Photos of C. cearae (above) and S. cearensis (below) were provided by Sidnei S. Santos.

Here, we assess the population genetic variation of these two bird species using genome-wide single-nucleotide polymorphism (SNP) data generated from double-digest restriction-site-associated DNA sequencing (ddRADseq). By integrating landscape genomics, historical demography, and ecological niche modeling tools, we depict the spatiotemporal dynamics of bird populations distributed through the Caatinga moist forest enclaves. Specifically, we aim to test whether population differentiation across enclaves can be explained by three non-mutually exclusive patterns of spatial genetic variation: (1) the IBD pattern—positive relationship between the genetic and geographic distances, (2) the IBR pattern—positive relationship between the genetic distance and landscape resistance to lower altitudes and climatic suitability areas in the past, and (3) the IBE pattern—positive relationship between the genetic and environmental distances due to regional climatic heterogeneity. We also address the impact of the Late Quaternary climate change on the demography of these bird populations by using a statistical phylogeographic approach. If the hypothesis of a larger distribution of the forest enclaves during the last glacial maximum (LGM) followed by subsequent fragmentation during the Holocene (Silveira et al. 2019) is true, a signal of recent population size decline in both species is expected. By combining these approaches, we were able to illuminate the effect of Late Quaternary climatic oscillations on the demography and isolation of these forest enclaves.

Materials and methods

Sampling, RAD library preparation, and sequencing

We gathered samples from fieldwork during 2015 and 2018 and loans from Brazilian collections (Table S1). We obtained samples from 61 individuals of C. cearae and 50 individuals of S. cearensis across seven forest enclaves of both species in northeastern Brazil (Fig. 1 and Table S1). Genomic DNA was extracted from muscle tissues using the DNeasy Blood & Tissue kit (QIAGEN, Hilden, Germany). Samples were processed using the ddRAD library approach (Parchman et al. 2012; Peterson et al. 2012). The libraries were constructed following the protocols of Parchman et al. (2012) and Peterson et al. (2012) with modifications added by A. Brelsford, A. Mastretta-Yanes, J. Leuenberger, and R. Sermier. The modifications included the use of the restriction enzyme SbfI instead of EcoRI (recommended for large genomes or more individuals per library); dual-index barcoding to allow multiplexing >96 samples per library; Y-adapter for MseI from Peterson et al. (2012) to prevent amplification of MseI-MseI fragments; CutSmart buffer to simplify restriction and ligation mixes; purification post ligation using Agencourt AMPure beads (Beckman Coulter, California, USA); PCR using Q5 Hot Start Polymerase (New England BioLabs, Massachusetts, USA); modified PCR to decrease the number of cycles by increasing the number of replicates (four separate 10 µl reactions per restriction-ligation product, which were later combined) and starting DNA volume; addition of primers and dNTPs for a final thermal cycle to reduce the production of single-stranded or heteroduplex PCR products; and final cleanup and size selection with AMPure beads.

The DNA was double digested with SbfI and MseI. Next, specific adapters and unique barcodes were ligated to the digested DNA, and a purification step (to remove short fragments) was carried out using AMPure beads. After that, we performed PCR amplifications using Illumina PCR primers to amplify fragments that had both adapters and barcodes ligated onto the ends, followed by size selection of 400–500 bp fragments by gel extraction, and another purification step with the MinElute Gel Extraction kit (QIAGEN, Hilden, Germany) and magnetic AMPure beads. All libraries were quantified using the Qubit Fluorometer and subjected to paired-end sequencing using 150 bp reads on an Illumina platform at Macrogen Inc. (Seoul, Republic of Korea).

Processing RADseq data

The Sabre tool (https://github.com/najoshi/sabre) was used to demultiplex the barcoded reads into separate fastq files. The interactive toolkit ipyrad v0.7.29 (Eaton and Overcast 2020) was then used for processing the restriction-site-associated genomic datasets generated for each species (see Table S2 for ipyrad parameters). Specifically, we performed the following steps: filtering of low-quality reads, clustering within samples, joint estimation of heterozygosity and error rate, consensus base calling, clustering among samples, and formatting output files. During the processing, we used a de novo assembly method, minimum depth for both statistical and majority-rule base calling set to 10, strict filter for adapters, and trimming of the first five and the last five bases from all reads. A maximum of 10% missing data per individual was allowed for each species’ dataset. We used default values for the remaining parameters.

For the final dataset, sequences from two individuals of S. cearensis (USP/LGEMA13614 and UFBA1144) were discarded due to the low coverage, which resulted in only 4420 and 120 loci in the assembly, respectively, instead of 14,612 loci recovered for the remaining samples (see “Results”). For most analyses, we considered only one randomly chosen SNP per locus, except for the Dxy calculations, for which we used all sites (variant and invariants).

Genetic diversity, population structure, and migration tests

General measures of genetic diversity were obtained in DnaSP v.6.12 (Rozas et al. 2017) through calculations of haplotype (gene) diversity (Nei 1987), nucleotide diversity (Nei 1987), and average number of nucleotide differences (Tajima 1993). The genetic structure of each species across the forest enclaves was investigated by principal component analysis (PCA) using the glPca function from the package adegenet 2.1.2 (Jombart 2008) in R v4.0.6 (R Core Team 2018). We also evaluated population genetic structure using the sparse non-negative matrix factorization approach (sNMF) in the LEA package (Frichot and François 2015), which is based on likelihood methods, PCA, and NMF algorithms, showing robustness to departures from traditional population genetic model assumptions (Frichot et al. 2014). We tested the number of genetically distinct clusters (K) between 1 and 10. To test the accuracy of the results, two hundred replicates were run for each K under different alpha regularization parameter values (1, 10, 100, and 1000). The optimal value of K was determined using the cross-entropy criterion, which uses the imputation of masked genotypes to evaluate the capability of the algorithm to estimate ancestry coefficients (Frichot et al. 2014).

To examine the patterns of spatial population structure and connectivity, we used the estimated effective migration surface method (EEMS; Petkova et al. 2016). The EEMS approach is robust in untangling the contribution of pure IBD from that of historical barriers or heterogeneous landscapes in driving population differentiation (see Petkova et al. 2016). Similar to an IBR approach, EEMS delineates the process by which genetic differentiation accrues in landscapes lacking homogeneity, achieved through the integration of all potential migration routes between two specific points (Petkova et al. 2016). This analysis measures effective migration as representing historical rates of gene flow and relates it to expected pairwise genetic dissimilarities, resulting in a spatial representation of patterns of genetic variation, and the identification of migration corridors and/or barriers to gene flow (Petkova et al. 2016). The average genetic dissimilarity matrix was computed with the bed2diffs program (https://github.com/dipetkov/eems/tree/master/bed2diffs), after converting the VCF file into BED format using PLINK v1.9 (Chang et al. 2015). The habitat polygons were constructed based on the distribution of the sampling points at http://www.birdtheme.org/useful/v3largemap.html, an online Google Maps tool where it is possible to draw polylines and polygons around the areas of interest from geographic coordinates. For C. cearae, the area covered by the polygon comprised the Brazilian states of Ceará, Piauí, Bahia, Sergipe, Alagoas, Pernambuco, Paraíba, and Rio Grande do Norte, while for S. cearensis, the area of interest was only the state of Ceará. Analyses were run using the runeems_snps version 1 of EEMS for 30 × 106 iterations, with the first 5 × 106 excluded as burn-in, and deme sizes of 400 and 200 for C. cearae and S. cearensis, respectively. For each species, three independent analyses were performed to ensure convergence, which was verified by comparing the posterior trace plots generated by the R package reemsplots (https://github.com/dipetkov/eems/tree/master/plotting). This same package was used to visualize the final combined results.

Isolation by distance, isolation by resistance, and isolation by environment

We applied the generalized dissimilarity modeling (GDM; Ferrier et al. 2007) approach to quantify the effects of IBD, IBR, and IBE on population differentiation across forest enclaves for both species. This approach models biological variation as a function of environment and geography using distance matrices, and takes into account nonlinear relationships between response and predictor variables (Ferrier et al. 2007). We obtained the population differentiation between sampled localities by calculating pairwise Dxy genetic distances (Tables S3 and S4) for each species using the PopGenome package (Pfeifer et al. 2014) in R v4.0.4. For the IBD, we obtained a pairwise geographic distance matrix (in Km) between localities (Tables S5 and S6) using the Geographic Distance Matrix Generator 1.2.3 (available at http://biodiversityinformatics.amnh.org/open_source/gdmg/index.php).

In IBR, we considered lower altitudes (current resistance) and lower suitability areas derived by paleodistribution (historical resistance) models (see below) as the resistance for landscape dispersal. Both bird species are currently restricted to the moist forest enclaves at higher elevations (at least 500 m) encrusted in the dry vegetation of the Caatinga ecoregion. As paleoecological records suggest past connections between currently isolated forest enclaves through highlands during the Late Pleistocene (Pinaya et al. 2019; Piacsek et al. 2021), we considered higher elevations as indicating the conductance among localities. Elevation data were obtained from the EarthEnv database at a 1-km resolution (http://www.earthenv.org/; Amatulli et al. 2018). We also used the output of ecological niche modeling from the Bølling-Allerød Interstadial epoch (14.7–12.9 kya), considering higher-suitability areas as indicating conductance among localities. We used this temporal profile because of its largest paleodistribution for both species (see the ecological niche modeling results). Then, we used the circuit theory to produce the spatial resistance distances between localities using the Circuitscape v4.0.5 (McRae et al. 2008) to obtain resistance surfaces and pairwise resistance matrices between localities (Tables S7S10).

We obtained environmental data for IBE using the 19 bioclimatic variables of current conditions available in Paleoclim (Brown et al. 2018) derived from CHELSA 2.1 (Karger et al. 2017) at 2.5 arc-minutes resolution. We used the Raster package to extract the climate values per locality for each species. To minimize the collinearity among variables, we selected uncorrelated (Pearson’s r < 0.7) bioclimatic variables for each species based on the occurrence data: (a) C. cearae—mean diurnal range (bio2), isothermality (bio3), mean temperature of the warmest quarter (bio10), annual precipitation (bio12), precipitation of the driest month (bio14), precipitation seasonality (bio15), precipitation of the warmest quarter (bio18), and precipitation of the coldest quarter (bio19); (b) S. cearensis—isothermality (bio3), temperature seasonality (bio4), mean temperature of the wettest quarter (bio8), precipitation of the wettest month (bio13), precipitation seasonality (bio15), precipitation of the driest quarter (bio17), and precipitation of the coldest quarter (bio19). Bioclimatic variables used in this analysis are different from those used in the ecological niche modeling as the occurrence points are different in each analysis (here we must use localities with genetic sampling). The consideration of different occurrences can produce distinct environmental spaces and affect the collinearity among variables.

Finally, we used the gdm package (Ferrier et al. 2007; https://github.com/fitzLab-AL/GDM) in R v4.0.5 to calculate the generalized dissimilarity model. We considered the Dxy pairwise distance matrix as the response variable. We fit the model using the following predictor variables (bioformat = 3): bioclimatic variables (eight for C. cearae and seven for S. cearensis), geographic distance matrix (IBD), resistance matrix derived from elevation data (IBRtopo), and resistance matrix derived from past distribution of ecological niche modeling (Bølling-Allerød epoch; IBRpaleo). We selected the best predictors using a stepwise matrix permutation and backward elimination approach. At each step, the least important predictor was dropped (backward elimination) and the process continued until all nonsignificant predictors were removed. We used 500 permutations and defined the best model as the one that retained significant predictors (p ≤ 0.05). We also calculated the variance partitioning of significant predictors of the best model using gdm.partition.deviance function.

Demographic inferences

Significant breakthroughs in the development of methods based on population genomics have aided in the reconstruction of historical demography. Despite the popularity of some methods, there is some concern about the accuracy of the inferred demographic models because they are mostly based on a set of assumptions that are commonly ignored and/or violated in natural systems. For example, numerous studies have shown that population structure and gene flow could generate spurious signs of population size changes over time (expansion and/or bottleneck), even when the populations remained stationary through time (Wakeley 1999; Nielsen and Beaumont 2009; Chikhi et al. 2010, 2018; Mazet et al. 2016). Here, we used the Stairway plot v2.1.1 (Liu and Fu 2020), a model-flexible method based on SFS (site frequency spectrum), to investigate the impact of Quaternary climatic changes on the demographic history of both species. As both bird species are structured in three genetic clusters (see “Results”, Fig. 1), we used two different sampling approaches to reduce the potential conflicting effect of population structure (e.g., Heller et al. 2013; Chikhi et al. 2018; Wang et al. 2021; Violi et al. 2023): (i) all individuals across the whole set of populations were pooled (hereafter, species demographic analysis); and (ii) individuals from the different genetic clusters uncovered for both the species were analyzed separately (hereafter, cluster-based demographic analysis). In this case, the results should be interpreted by assuming that each cluster represents a single isolated population with an independent evolutionary history.

For species demographic analysis, we obtained the SFS for the whole individuals per species (full datasets). For cluster-based demographic analysis, individuals from each cluster (Fig. 1) were separated into different ipyrad runs using the same parameters as the full datasets and retaining only the loci present in all individuals. Due to potential issues with admixture affecting demographic inference (e.g., Heller et al. 2013), individuals of mixed ancestry (q-values lower than 0.8 for C. cearae and 0.75 for S. cearensis) (Fig. 1; samples from Maciço de Baturité of C. cearae and Serra de Uruburetama of S. cearensis), as revealed by the sNMF results, were removed in the cluster-based demographic analysis. The folded SFS of each cluster was obtained using the Python script easySFS (http://github.com/isaacovercast/easySFS). To run the Stairway plot, we assumed a mutation rate of 2.5 × 10−9 (Nadachowska-Brzyska et al. 2015) and a generation time of 2.03 years for C. cearae and 2.19 years for S. cearensis according to Bird et al. (2020). The estimation (median) and 95% pseudo-confidence interval (CI) were generated based on 200 replications.

For cluster-based demographic analysis, we also tested a set of alternative historical demographic scenarios (models). The three models included (model 1) constant population size (“null model”), in which the current effective population size (Ncur) was the only parameter estimated; (model 2) population decline, where we estimated the Ncur, ancestral population size (Nanc), and the time at which population decline began (Tdec); and (model 3) population growth, where Ncur, Nanc, and time when population growth started (Tgro) were estimated. For the last two scenarios, the range of the resize parameter was bounded by a factor of 1–5 and 0.5–0.9 to simulate population decline and growth, respectively, under a uniform prior distribution. We applied the same mutation rate and generation times used for the Stairway plot analysis. We used the coalescent-based modeling package fastsimcoal26 (Excoffier and Foll 2011; Excoffier et al. 2013) to evaluate each model based on the observed SFS. Fastsimcoal runs were performed with 40 replicates for each scenario, with 250,000 simulations per likelihood estimate, a stopping criterion of 0.001, and 10–40 expectation-conditional cycles (ECM). We obtained the 95% CI for each parameter by performing 100 parametric bootstraps of simulated SFS from the maximum likelihood estimates and re-estimating the parameters with 40 runs for each of the 100 simulated SFS. Model comparisons were performed based on their likelihoods using the Akaike information criteria (AIC; Akaike 1974).

Ecological niche modeling

To infer the effects of Quaternary climatic oscillations on species range dynamics, we modeled the distribution of both species in the present conditions and along different periods of the Late Quaternary. Considering the forest dependency of these two bird species, we expected larger distribution ranges during warmer and wetter periods, especially during the deglaciation (17–11 kya) at Heinrich, Bølling-Allerød, and Young Dryas interstadial epochs, and range contraction during warmer periods of the Holocene (last 8.3 kya). Species occurrences were obtained from the following online resources: Global Biodiversity Information Facility (GBIF, https://www.gbif.org/), speciesLink (https://splink.cria.org.br/), Sistema de Informação sobre a Biodiversidade Brasileira (SiBBr, https://www.sibbr.gov.br/), and our genetic sampling (Table S1). Occurrences were filtered by excluding points outside the known species range, which resulted in 79 and 36 points for C. cearae and S. cearensis, respectively (Tables S11 and S12). To reduce bias in the modeling, we used a thinning filter to exclude locations within a radius of up to 10 km using spThin package (Aiello-Lammens et al. 2015).

We generated ecological niche models (ENMs) under the maximum entropy approach (Phillips et al. 2006) along with bioclimatic variables available in Paleoclim (Brown et al. 2018) at a 2.5 arc-minutes resolution. As for ENM analysis, we used more records than genetic sampling points and we selected new uncorrelated (Pearson’s r < 0.7) bioclimatic variables for each species based on the occurrence data: (a) C. cearae—isothermality (bio3), temperature seasonality (bio4), mean temperature of the wettest quarter (bio8), mean temperature of the driest quarter (bio9), precipitation of the driest month (bio14), precipitation seasonality (bio15), and precipitation of the warmest quarter (bio18); (b) S. cearensis—temperature seasonality (bio4), mean temperature of the wettest quarter (bio8), precipitation of the driest month (bio14), precipitation of the warmest quarter (bio18), and precipitation of the coldest quarter (bio19). We used the following climatic models to uncover nine periods from the Late Pleistocene to the present (Brown et al. 2018): current conditions; late-Holocene, Meghalayan (4.2–0.3 kya); mid-Holocene, Northgrippian (8.326–4.2 kya); early-Holocene, Greenlandian (11.7–8.326 kya); Younger Dryas Stadial (12.9–11.7 kya); Bølling-Allerød (14.7–12.9 kya); Heinrich Stadial 1 (17.0–14.7 kya); Last Glacial Maximum (ca. 21 kya); and Last Interglacial (ca. 130 kya). For C. cearae, the layers were cropped to a geographic extent between the coordinates −44.71 (maximum longitude), −33.77 (minimum longitude), −20.00 (maximum latitude), and −2.20 (minimum latitude), and for S. cearensis, they were cropped to −44.71 (maximum longitude), −33.77 (minimum longitude), −10.00 (maximum latitude), and −2.20 (minimum latitude) using the raster package (https://rspatial.org/raster/).

The background environment for each species was obtained by randomly generating 1000 points through the minimum convex polygon. We ran and evaluated models using the ENM evaluate function from the ENMeval package (Muscarella et al. 2014). We tested regularization multiplier values from 1 to 5, with increments of 0.5, as well as the following combinations of feature classes: linear, linear-quadratic, hinge, and linear-quadratic-hinge. To avoid overfitting, we performed a spatial cross-validation with the checkerboard2 method, which splits our data into four evaluation sets. For each set, the model was trained in the remaining three sets and tested using the evaluation set. The best model was chosen based on the omission rate and area under the curve (AUC) values (i.e., we chose the model with the lowest omission rate and highest AUC). The final best-fitted models were generated using the maxent function in the dismo package (https://github.com/rspatial/dismo).

Results

Genetic diversity and population structure

From our ddRAD sequencing, we obtained 156,543,279 raw reads for 61 samples of C. cearae and 164,824,866 raw reads for 48 samples of S. cearensis. After the ipyrad filtering steps (see the parameters used in Table S2), we retained 1,690,372–3,201,267 reads per sample for C. cearae and 2,506,901–4,718,625 reads per sample for S. cearensis. The consensus base calling and clustering among samples considering a minimum depth coverage of 10 recovered 14,612 loci with 161,748 SNPs for C. cearae in a matrix with 6.47% of missing sites, and 17,789 loci with 61,301 SNPs for S. cearensis in a matrix with 5.12% of missing sites. For C. cearae and S. cearensis, we found an average heterozygosity of 0.010 and 0.006, respectively, with an error rate of 0.002. Using all samples from each species, the haplotype (gene) diversity, nucleotide diversity, and average number of nucleotide differences were 0.015, 0.00009, and 0.022 for C. cearae and 0.002, 0.00001, and 0.005 for S. cearensis, respectively. Regarding the Dxy distance, the values varied among pairs of localities for both species (Tables S3 and S4). In C. cearae, the values ranged from 0.00805 between Morro do Chapéu and Itatira to 0.00123 between Brejo da Madre de Deus and Cruz do Espírito Santo, while in S. cearensis they ranged from 0.00242 between Crato and Guaramiranga to 0.00001 between Uruburetama and Itapipoca (Tables S1, S3, and S4).

Population clustering using sNMF revealed that both species are subdivided into three distinct genetic clusters (K = 3; Fig. 1 and S1). For C. cearae, the clusters were (i) south of São Francisco River (C1); (ii) the northern region of Ceará state (C2), including individuals from Serra do Machado and Maciço de Baturité; and (iii) the northeastern region (C3), including Areia, Brejo da Madre de Deus, and Açude Cafundó (Fig. 1a, c). In S. cearensis, the clusters uncovered were (i) the western region of Ceará state (S1), including Chapada do Araripe and the north and south enclaves of Serra do Ibiapaba; (ii) the central region of Ceará state (S2), constituted by individuals from Serra do Machado; and (iii) the northeastern part of Ceará state (S3), composed of Maciço de Baturité and Serra do Maranguape (Fig. 1b, d). Additionally, samples of S. cearensis from the Serra de Uruburetama enclave showed a significant admixture among those three clusters (Fig. 1d). The population structure as revealed by sNMF is also evident in the PCA results (Fig. S2), which in turn indicates further population substructure within the three main clusters, corresponding to populations with mixed ancestry (Fig. 1 and S2).

The map of EEMS for C. cearae was generally congruent with the results inferred by sNMF, showing a clear barrier to gene flow between north and south forest enclaves, and a barrier separating east and west forest enclaves in the north (Fig. 2a). The only exception was within the cluster C2, in which the forest enclave of Maciço de Baturité appeared isolated from Serra do Machado (Fig. 2a). Interestingly, samples from Maciço de Baturité exhibited a moderate admixture with cluster C3 (Fig. 1c). For S. cearensis, the EEMS map showed a strong barrier to gene flow between the east and west enclaves, while Serra do Machado and Serra de Uruburetama appeared isolated from the remaining populations (Fig. 2b).

Fig. 2: Estimated effective migration surface (EEMS).
figure 2

a Conopophaga cearae and b Sclerurus cearensis. Posterior mean migration rates m (on the log10 scale) are color-coded in blue and orange to indicate areas with higher and lower migration rates, respectively, than the overall average rate (in white).

Isolation by distance, isolation by resistance, and isolation by environment

The overall GDM modeling explained 61.66% and 60.48% of the genetic differentiation for C. cearae and S. cearensis, respectively. However, three predictors for C. cearae (bio3, bio14, and IBRtopo) and one predictor for S. cearensis (IBRtopo) did not show significance on genetic differentiation, showing I-spline values equal to zero. In C. cearae, predictors for IBD (I-splines = 0.531), IBE (bio10 = 0.357, bio19 = 0.318, bio18 = 0.183, bio2 = 0.159, bio15 = 0.149, bio12 = 0.008), and IBR (paleo = 0.153) had unique contributions and explained 44.12% of the genetic differentiation between populations (Fig. S3). For S. cearensis, IBD (I-splines = 0.008), IBE (bio13 = 0.055, bio4 = 0.042, bio19 = 0.037, bio15 = 0.028, bio17 = 0.025, bio8 = 0.007), and IBR (paleo = 0.056) had unique contributions and explained 40.37% of the genetic differentiation between populations (Fig. S3). Notwithstanding, after permutations, the best model was the one where only geographic distance (IBD) contributed significantly (p < 0.001) to the genetic differentiation in C. cearae, and only ENM (IBRpaleo) showed significant contribution (p < 0.001) for S. cearensis (Fig. 3). Therefore, the geographic distance alone explained 38.14% of the variation for C. cearae, while IBRpaleo alone explained 18.23% of the variation for S. cearensis.

Fig. 3: Generalized dissimilarity fitted model for population differentiation (Dxy genomic distance) and significant landscape predictors.
figure 3

Conopophaga cearae (ac) and Sclerurus cearensis (df). a, d Observed dissimilarity as a function of GDM-predicted ecological distance, with each site pair represented as a point, and the line representing the GDM-predicted dissimilarity. b, e Observed dissimilarity as a function of GDM-predicted dissimilarity, with the line of equality provided. The plot of significant fitted I-splines from the GDM model for c geographic distance and f resistance matrix from the ENM raster of the Bølling-Allerød epoch.

Historical demography model testing

Demographic histories inferred in the Stairway plot showed signs of recent and synchronic declines of effective population size (Ne) in both the species (Fig. S4) and cluster-based demographic analyses (Fig. 4a, b). In species demographic analysis, population decline started around 1 kya (Fig. S4), and in the cluster-based demographic analysis, the more pronounced decline occurred in the last 10 kya (Holocene), which was preceded by population growth (Fig. 4a, b).

Fig. 4: Historical demography model testing.
figure 4

Population size changes over time estimated in Stairway plot for clusters of a Conopophaga cearae and b Sclerurus cearensis. The colors of the groups follow the sNMF clusters in Fig. 1 and the estimated effective population sizes (Ne) are plotted from the present to the past (in thousands of years). The thick lines indicate the median of 200 inferences and the shaded areas represent the 95% confidence interval. c Alternative demographic scenarios simulated in fastsimcoal2 and corresponding AIC (Akaike information criterion) scores evidencing the best model (smallest AIC) for each lineage of C. cearae (C1, C2, and C3) and S. cearensis (S1, S2, and S3) (see Fig. 1). The scenarios assume population stability, decline, or growth. The estimated parameters in each model are provided as abbreviations: Ncur (current population size), Nanc (ancestral population size), Tdec (time at which population decline began), and Tgro (time at which population growth began). Demographic parameters and AIC values are given in Table 1.

Historical alternative demographic scenarios (models) suggested population decline as the best-fitting model for each differentiated cluster for both species (Fig. 4c and Table 1), showing a synchronic population shrinking (Table 1). According to this scenario, clusters C1 and C2 of C. cearae showed a current population size (Ncur) of around 1 million genes, whereas the ancestral population size (Nanc) was estimated to be ~5.1 million genes (Table 1). For these clusters, the time when the population started to decline (Tdec) was estimated at c.a. 1 kya, during the Holocene (Table 1). For the cluster C3 of C. cearae, Ncur and Nanc were estimated at around 31 and 82 thousand genes, respectively, with population decline also starting at c.a. 1 kya, during the Holocene (Table 1). In S. cearensis, we recovered Ncur and Nanc of around 1 and 5.1 million genes, respectively, and Tdec of c.a. 1 kya, during the Holocene, for all clusters (Table 1).

Table 1 Fastsimcoal2 results for the three demographic scenarios tested for each species with the best composite likelihood, 95% confidence interval in parentheses, and Akaike information criterion (AIC) value.

Ecological niche modeling

Model selection by ENMeval fitted the hinge model with a regularization multiplier of five for both species (see Table S13 for parameters of the best-fitted models). ENM across all evaluated periods from the Late Quaternary indicated a potential range variation through time for both the species (Fig. 5) with periods of range contraction and range expansion. For the present conditions (Fig. 5a), our modeling is congruent with the known distribution of both species, but substantial changes in the potential distribution range were uncovered for the last 130 kya (Fig. 5). Overall, the projection to past conditions showed a general trend of potential range expansion during wetter periods, especially in the Bølling-Allerød Interstadial epoch (14.7–12.9 kya), and a potential range contraction during warmer periods of the Holocene (last 8.3 kya).

Fig. 5: Ecological niche models (ENMs) for Conopophaga cearae and Sclerurus cearensis across the Caatinga moist forest enclaves.
figure 5

ENMs were built for current conditions (a) and for different Late Quaternary periods: b late-Holocene (4.2–0.3 kya); c mid-Holocene (8.326–4.2 kya); d early-Holocene (11.7–8.326 kya); e Younger Dryas Stadial (12.9–11.7 kya); f Bølling-Allerød (14.7–12.9 kya); g Heinrich Stadial 1 (17.0–14.7 kya); h Last glacial maximum (ca. 21 kya); and i Last Interglacial (ca. 130 kya). Warmer colors indicate a higher probability of potential species occurrence, as depicted in the legend.

Discussion

Drivers of population geographic variation

Contemporary genetic variation of species harbors information that can help reconstruct their evolutionary history (Ellegren and Galtier 2016). This kind of inference can be further improved by explicitly comparing the demographic dynamics and population structure of co-distributed and ecologically similar species that evolved under common historical biogeographic drivers (e.g., Lourenço et al. 2018). By examining the spatial patterns of genetic variation in two ecologically related endemic bird species restricted to highland forest enclaves in northeastern Brazil, we found an overall similar pattern of genetic structure. Consistent with the strong dependency on forest habitats (Sick 1997; del Hoyo et al. 2020a, 2020b) and low vagility among fragmented forest landscapes, as previously demonstrated in their congeners (C. lineata and S. scansor) endemic to the Atlantic Forest (Marini 2010), we found high levels of population structure in both C. cearae and S. cearensis. This likely results from the effects of genetic drift as a consequence of the isolation of forest enclaves (Fig. 1 and S1). Also, genetic drift and the reduced gene flow associated with a small distribution range may be factors that can help explain the lower values of genetic diversity in S. cearensis when compared to C. cearae.

For both species, results from the generalized dissimilarity modeling showed that environmental conditions (IBE) did not predict the genetic differentiation of bird populations through forest enclaves. This result likely reflects a relative environmental homogeneity or a nonadaptive genetic variation among sampled localities for each species. On the other hand, we found a general trend toward an increase in genetic differentiation with the geographic distance between populations of C. cearae, suggesting that distance between forest enclaves plays a key role in shaping the current patterns of genetic structure. This result is in agreement with the ubiquity of the IBD pattern found in a myriad of organisms, including bird species (e.g., Cabanne et al. 2007; Maldonado-Coelho 2012; for a review, see Jenkins et al. 2010; Perez et al. 2018).

Our analysis further identified historical resistance distance (IBRpaleo) as a predictor of genetic divergence for S. cearensis. In contrast, the current IBRtopo did not play a significant role in population differentiation for both species, which agrees with paleoecological data (Pinaya et al. 2019; Piacsek et al. 2021) that suggest the role of an interplay between elevation and past climate conditions in driving montane forest expansion during the Late Quaternary.

Although the reasons for the unbalanced contribution of IBD and IBR between the two species are unclear, our findings clearly suggest that dispersal limitation due to geographical distance and historical habitat resistance together had a compound effect on the current patterns of population differentiation in both species. Also, the broader distribution of C. cearae might have contributed to the strongest IBD effect on genetic differentiation when compared with S. cearensis. We found additional support for this conclusion from the EEMS analysis (Fig. 2), which showed patterns of unevenly distributed gene flow for both species. For example, this analysis showed geographically close population pairs with very restricted or lack of migration (e.g., Serra do Machado and Serra do Baturité in C. cearae, Fig. 2a) and, on the contrary, geographically distant population pairs that present high migration rates (e.g., Serra do Ibiapaba and Chapada do Araripe in S. cearensis, Fig. 2b).

The differential habitat occupancy and biogeographic history of these species can also reflect the patterns observed here. Conopophaga cearae has a broader range across enclaves in the states of Bahia, Ceará, Rio Grande do Norte, Paraíba, Pernambuco, and Alagoas, whereas S. cearensis occurs only in the state of Ceará (Fig. 1). In this state, S. cearensis occupies more enclaves than C. cearae (Fig. 1). These distinct patterns of enclaves’ occupancy might reflect distinct historical dispersal or extinction rates between species during the Late Quaternary. Interestingly, S. cearensis is more conspicuous in dry forest and rainforest enclaves, whereas C. cearae is apparently less frequent in dry forest enclaves (personal observation). Additionally, species competition could also play a role in enclaves’ occupancy. The absence of C. cearae in the enclaves of the western state of Ceará (northern Serra da Ibiapaba) might be explained by the occurrence of another Conopophaga species in this region, the C. roberti, which occurs in western Amazonia and exhibits forest dependency (Greeney 2020). Likewise, the absence of S. cearensis in the enclaves of the state of Bahia is likely explained by the presence of S. scansor in Chapada Diamantina and Maracás. Further studies that relate species traits and habitat requirements with genetic variation should test these hypotheses.

The effective migration surfaces suggest that key geographical barriers have restricted gene flow (Fig. 2), such as the case of São Francisco River valley for C. cearae. This result is consistent with previous studies that have found this river as an important barrier for many organisms (e.g., lizards: Lanna et al. 2020; frogs: Bruschi et al. 2019; Thomé et al. 2021; birds: Capelli et al. 2020), including C. cearae (Batalha-Filho et al. 2014). Our results also suggest that past patterns of highland corridors during wetter periods of the Quaternary may have played an important role as dispersion routes between different forest enclaves (see Fig. S5) (Auler et al. 2004; Wang et al. 2004; Silveira et al. 2019; Piacsek et al. 2021). This is consistent with the role of IBRpaleo, which significantly predicted patterns of genetic differentiation among populations, particularly considering the historically suitable areas as a key variable for population migration during historically wetter periods. The Montane rainforest expansion during the Heinrich Stadial 1 Event (18–14 kya), as suggested by paleopollen and speleothem data (Pinaya et al. 2019), may have allowed the formation of dispersal corridors across the highlands (Pinaya et al. 2019; Piacsek et al. 2021). Thus, enclaves located in the same mountain chain or plateaus were more likely to exchange genes during the Quaternary than isolated enclaves in highlands surrounded by valleys or lowlands (Fig. S5). While further analysis with denser sampling within the same mountain chain or plateaus would provide the opportunity to test this hypothesis, our results emphasize the importance of accounting for current and past landscape features as drivers of intraspecific differentiation.

Demographic history through the Late Quaternary

Results from Stairway plot analyses revealed that the historical demographic profiles of both bird species were similar when the entire set of populations was assumed to behave as a single population or when it was assumed that the population was structured into three distinct genetic clusters. This suggests that sampling design has no bearing on the demographic reconstruction for both bird species. Importantly, a parallel temporal pattern of population size decline across all genetic clusters of both species is supported by the best-fit model inferred by fastsimcoal. This is consistent with the results of Stairway plot, suggesting that both species were impacted by a similar process leading to population decline during the last 10 to 1 kya (Figs. 3 and 4; Table 1). Although these results should be interpreted with caution, overall they are concordant with previous expectations of population size decline (Cabanne et al. 2008; Bocalini et al. 2021) due to reductions in the distribution of forest enclaves after the LGM (Silveira et al. 2019). The classical refuge model for the Atlantic Forest postulates refuge areas for enclaves in the Chapada Diamantina hills during the LGM, but did not recover stability areas for enclaves at the northern São Francisco River (Carnaval and Moritz 2008; Carnaval et al. 2009). However, more recent ENM-based analyses of past spatial dynamics suggest forest expansion for all enclaves into areas currently occupied by the Caatinga seasonally dry woodlands during the LGM (Silveira et al. 2019). These results concur with independent evidence provided by our ENM results (Fig. 5), which inferred a generalized range contraction for both species after warmer and moister conditions of the Bølling-Allerød interstadial epoch (14.7–12.9 kya; Köhler et al. 2011) that might have triggered population growth and connection of currently isolated enclaves. These findings along with abundant evidence from the literature, such as speleothems (Auler et al. 2004; Wang et al. 2004), paleopalynology (Oliveira et al. 1999; Behling et al. 2000; Bouimetarhan et al. 2018; Pinaya et al. 2019; Piacsek et al. 2021), and sediments (Jacob et al. 2007; Pessenda et al. 2010), support the hypothesis that populations of both passerine species experienced a recent decline with the increase of dry conditions during the Holocene (Fig. 4). Despite demographic analyses only depicting the last change of population size, the collection of data regarding historical dynamics of forest enclaves allowed us to hypothesize that these areas may have experienced cyclical episodes of expansion and contraction along the Quaternary climate changes (Oliveira et al. 1999; Behling et al. 2000; Auler et al. 2004; Wang et al. 2004; Silveira et al. 2019). For example, a comparative demographic analysis of squamates and amphibian species endemic to the Caatinga dry vegetation revealed signs of synchronic population expansions in the Late Pleistocene (Gehara et al. 2017), which is the opposite of the pattern observed here for our forest-dependent bird species. This strongly suggests that the Caatinga and forest enclaves might have had alternating episodes of expansion and retraction throughout the Quaternary climate changes.

Conclusions

This study provides important insights into how isolation and reduction of population sizes due to historical fragmentation of the habitat affected current patterns of genetic variation of forest enclaves from northeastern Brazil. For this, we explored the landscape genetic variation of two endemic forest-dependent bird species by combining the population genomics and spatiotemporal dynamics of the habitat within a comparative framework. We have shown that both current geographic distances and historical connectivity through highlands are important factors explaining the current patterns of genetic variation. Overall, our results suggest that levels of population differentiation and connectivity cannot be explained based purely on contemporary environmental conditions. Combining historical demographic analyses and niche modeling predictions in a historical framework, we provided strong evidence that climate fluctuations of the Quaternary promoted population differentiation and a high degree of temporal synchrony among population size changes in both species.

Our study highlights the importance of an integrative framework in disentangling the drivers of geographic variation for other multiple endemic species of these unique highland forest enclaves, adding new insights regarding the conservation value of this ecosystem for long-term biodiversity persistence, despite being intrinsically unstable. Moreover, recognizing the role of genetic isolation of populations distributed through the enclaves is of utmost importance for prioritizing measures of conservation that can prevent the fragmentation and destruction of this ecosystem in an era of accelerating deforestation and exploitation of natural resources.

Data archiving

Input files for population genetic analyses have been made available on Dryad (https://doi.org/10.5061/dryad.7d7wm37w1).