Introduction

During the Pleistocene and the beginning of the Holocene many territories experienced the effects of climatic and ecological instability produced by the advance and retreat of glaciers, triggering drastic changes in plant communities (Metcalfe et al. 2000; Metcalfe 2006). This transition has been recognized as a main factor that influenced the distribution, population dynamics, and the current patterns of genetic diversity of plant species (Ramírez-Barahona and Eguiarte 2013; Pedersen et al. 2015). For the Nearctic, paleoecological records and genetic patterns show the migration of temperate forests toward lower latitude and altitude in response to the advance of the glacial layers from north to south (Roberts and Hamann 2015; Napier et al. 2019). In contrast, in the subtropical and tropical regions within the Mexican territory, during glaciations, the establishment and geographic expansion of plants with higher affinity to cooler climates have been described, while tropical species were restricted to refugia where conditions allowed them to persist (Gugger et al. 2011; Ornelas et al. 2013; Ramírez-Barahona and Eguiarte 2013, 2014).

In recent years, efforts also have been made to understand the effect of glacial cycles on the genetic diversity and structure of plant species in Mexico, mainly in temperate and mountain cloud forests (e.g., González-Rodríguez et al. 2004; Gugger et al. 2011; Gutiérrez-Rodríguez et al. 2011; Ornelas et al. 2013; Ruiz-Sanchez and Ornelas 2014; Pérez-Crespo et al. 2017; Rodríguez-Correa et al. 2017). The impact of these climatic fluctuations on the population history of plant species in arid zones has been less studied but was apparently dramatic (Nason et al. 2002; Garrick et al. 2009; Ruiz-Sánchez et al. 2012; Vásquez-Cruz and Sosa 2016; Angulo et al. 2017; Cornejo-Romero et al. 2017; Loera et al. 2017; Scheinvar et al. 2017; Ornelas et al. 2018, 2019). Available evidence indicates that, because of climate cycles, the distribution ranges of xerophytic species expanded and contracted/fragmented recurrently, with concomitant lineage divergence and historical demography changes. Areas where populations of such species survived during climatically adverse periods have been called ‘xerophilous refugia’. However, contrasting results have been obtained on whether the maximum range contraction occurred during the periods of glacial maxima or during interglacial periods. The first pattern has been considered consistent with the glacial refugia hypothesis (GRH; Hewitt 2000, 2004), while the second has been named the interglacial refugia hypothesis (IRH; Hewitt 2004; Ornelas et al. 2018). In general, the IRH has been supported by results of ecological niche modeling and phylogeographic analysis for several species in the Chihuahuan and Sonoran deserts, the Tehuacán-Cuicatlán Valley and the Oaxaca Central Valleys, which have shown marked southwards range retractions or shifts during the last interglacial (LIG, ~ 120–140 Ka), followed by northwards expansions during the last glacial maximum (LGM, ~ 22 Ka) and the Holocene. Genetic signatures of these processes include historical demography evidence of population expansions and latitudinal gradients of genetic diversity, with northern populations showing genetic depauperation in comparison to southern populations (Ruiz-Sánchez et al. 2012; Angulo et al. 2017; Scheinvar et al. 2017). However, the opposite pattern (i.e., expansion during the LIG and contraction during the LGM), consistent with the GRH has also been found (Cornejo-Romero et al. 2017). Therefore, extensive research is still needed to better understand the biogeographic history of the arid zones of Mexico, considering that they constitute nearly 50% of the country surface (Challenger 1998) and have a floristic diversity of around 6,000 plant species (Rzedowski et al. 1993; Cervantes 2005) with a large number of endemisms, representing a set of natural resources that offer multiple alternatives for appropriation of timber, food and other goods (Cervantes 2005).

Mesquites (genus Prosopis L.) are among the most important tree groups in arid and semiarid areas across Mexico. The genus probably originated in tropical Africa (Burkart 1976), given the existence of Prosopis africana (Guill. & Perr.) Taub. 1893 the most basal of the species in the phylogeny of Prosopis (Pohill and Raven 1981). For the Americas, the presence of 43 species of Prosopis has been documented, with South America, and particularly Argentina, representing the main diversity center with 34 and 29 species, respectively. In North America, 10 species are known, most of them part of the “Mexico-Texas” complex (Rzedowski 1988; Palacios 2006). In México, Prosopis laevigata (Humb. & Bonpl. ex Willd.) M.C.Johnst is the most widespread species, and the “typical” mesquite of the central region of the country (Burkart 1976; Rzedowski 1988). This species represents an excellent study system to analyze the population history of arid-adapted plant species in Mexico, given its wide distribution and high abundance. The geographic distribution of P. laevigata includes several physiographic regions in Mexico characterized by arid or semiarid climate, such as the Tamaulipas Plains, the Mexican Altiplano, parts of the Trans-Mexican Volcanic Belt, the Balsas Depression, the Tehuacán-Cuicatlán Valley and the Oaxaca Central Valleys (Rzedowski 1988; Palacios 2006). These physiographic regions are characterized by geological elements that could have acted as barriers or corridors for P. laevigata at different historical periods but, at present, the distribution of the species is partially interrupted by areas with temperate vegetation characteristic of the highlands of the Trans-Mexican Volcanic Belt, and by the Sierra Madre Oriental, a mountain chain that separates the populations from the Mexican Altiplano from those of the Tamaulipas Plains. However, there are no morphological peculiarities in the populations associated with the distribution areas, and therefore, they are considered as the same species (Rzedowski 1988).

In this study, our main goal was to analyze the genetic diversity and structure of P. laevigata throughout its distribution in Mexico to gain insight into the evolutionary history of the species and contribute to the understanding of the historical biogeography of the Mexican arid zones. In particular, we aimed at i) determining if population differentiation in P. laevigata is associated with its distribution in different physiographic regions, and ii) testing if the historical demography and geographic distribution of the species during the LIG, LGM and middle-Holocene (MH) periods conforms to the GRH or the IRH.

Materials and methods

Study system

Prosopis laevigata (Fabaceae, Mimosoideae) is a tree or bush up to 12 m in height, trunk 0.3 to 1 m in diameter, with thick bark of blackish brown color (Calderón and Rzedowski 2001). Prosopis species have been characterized as self-compatible (Simpson 1977; Masuelli and Balboa 1989; Genise et al. 1990) with percentages of self-fertilization between 65 and 85% (Galindo-Almanza 1992). The main pollination syndrome of the genus is entomophily, with hymenopterans such as Ahsmaediella, Calicodoma (Megachillidae), Coletes (Colletidae) and Apis mellifera (Apidae) being the most frequent pollinators (Galindo-Almanza 1992). Mammalochory and hydrochory have been reported as the main seed dispersal syndromes in Prosopis species (Campos and Ojeda 1997; de Noir et al.2002; Pasiecznik et al. 2002).

Sampling methods

We collected fresh leaves of 220 individuals from 21 populations of P. laevigata covering the whole distribution of the species (Table 1), including the Oaxaca Central Valleys (OCV), the Tehuacán-Cuicatlán Valley (TCV), the Balsas Depression (BD), the Trans-Mexican Volcanic Belt (TMBV), the Mexican Altiplano (MA) and the Tamaulipas Plains (TP). In each population, seven to 12 adult trees were collected, with a distance of at least 50 m between individuals. The fresh leaves were kept on ice for subsequent storage at -70 °C. Additionally, voucher samples were collected from each of the populations, that were processed for later storage at the IEB Herbarium of the Instituto de Ecología A. C. (Pátzcuaro, Michoacán, México).

Table 1 Geographic data of the populations collected of Prosopis laevigata.

Laboratory procedures

Genomic DNA was extracted using a modified CTAB protocol (Otero-Arnaiz et al. 2005). We amplified six polymorphic microsatellite loci (Mo05, Mo07, Mo08, Mo09, Mo13 and Mo016) originally designed for Prosopis chilensis (Molina) Stuntz and Prosopis flexuosa DC. (Mottura et al. 2005). The PCR reactions were carried out using Platinum Master Mix (Thermo Fisher Scientific). A final volume of 10 µL was used, including 5 µL of Platinum Master Mix, 0.5 µL of each primer (10 mM), 2.8 µL of distilled water, 0.2 µL of MgCl2 (2 mM) and 1 µL of DNA (20 ng/µL). The amplification program was carried out as follows: five min at 94 °C as initial step, followed by 35 cycles with a denaturalization step at 94 °C for 90 s, annealing at 49 °C (Mo09), and 59 °C (Mo05, Mo07, Mo08, Mo13 and Mo016) for 90 s, extension at 72 °C for 90 s and a final step at 72° for 10 min. The forward primer of each pair was fluorescently labeled. The PCR products were mixed with formamide and Gen Scan Liz-600 as size standard (Applied Biosystems). The samples were denaturalized at 95 °C for two minutes and then analyzed using an ABI-PRISM 3100-Avant (Applied Biosystems) sequencer. The fragments obtained were sized and scored using the PeakScanner software (Applied Biosystems).

For the phylogeographic analysis, we initially tested seven universal chloroplast DNA (cpDNA) microsatellite loci (Weising and Gardner 1999), and the cpDNA intergenic region trnS-trnT, without finding any variation in these markers. Finally, we tested the psbA3′-trnH region (Shaw et al. 2007), which showed polymorphism. The amplification of this region was carried out in reactions with a final volume of 25 µL, including 12.5 µL of Master Mix (Promega, Woods Hollow Road Madison, WI, USA), 0.8 µL of forward and reverse primer (10 µM), 6 µL of distilled water, 1.5 µL of Bovine Serum Albumin (BSA), 1 µL of MgCl2 (2 mM) and 2.4 µL of DNA (20 ng/µL). The amplification protocol was carried out as follows: five min at 94 °C as initial step, followed by 35 cycles with a denaturalization step at 94 °C for 90 s, an annealing step at 72 °C for 90 s, an extension step at 72 °C for 90 s, and a final step at 72 °C for 10 min. The PCR products were purified using the Qiaquick kit (QIAGEN, Valencia, CA, USA), following the manufacturer’s instructions. Between three and nine individuals per population were analyzed. We failed to obtain sequences from populations 11 and 14 due to lack of amplification. The purified products were sent to MACROGEN USA for sequencing. The sequences reported in the present study are available from the GenBank database (accession numbers MK618446-MK618463).

Population genetics analyses

In order to assess the presence of null alleles, we used the software MICROCHECKER (Van Oosterhout et al. 2004). Given the high number of alleles found for the microsatellite markers as well as the relatively small sample size per population, we evaluated the statistical power of our data for detecting genetic differentiation with POWSIM (Ryman and Palm 2006). Different values of effective population size were assayed (500, 2000 and 5000) to evaluate statistical power at FST values of 0.001, 0.0025, 0.005, 0.01 and 0.02. Statistical power was expressed as the proportion of significant results for 1,000 replicates. We established a limit of 0.80 as the minimum acceptable power level according to Cohen (1988).

The FreeNA software (Chapuis and Estoup 2007) was used to estimate the null allele frequency for every locus and population based on the expectation–maximization (EM) algorithm. Parameters of genetic diversity, such as observed and expected heterozygosity (HO, HE), average number of alleles per locus (A) and effective number of alleles per locus (NEA), for each population were estimated using the software Arlequin 3.5.1.2 (Excoffier and Lischer 2010). Also, the rarefied allelic richness (AR) and heterozygosity corrected by sample size (HEc, Nei 1978) were evaluated with 20,000 permutations with the software SPAGeDi (Hardy and Vekemans 2002). To identify geographic patterns in genetic diversity levels in P. laevigata, genetic diversity parameters were regressed against the latitude, longitude and elevation of the populations with the R software (R Core Team, 2016).

The inferences about the inbreeding level in the populations were based on the Bayesian IIM (Individual Inbreeding Model) approach, implemented in the software INest 2.0 (Chybicki and Burczyk 2009). This software allows the calculation of an unbiased inbreeding coefficient (FIS) for multilocus data in presence of null alleles. The analysis evaluates the effects of null alleles (n), inbreeding (f), and genotyping errors (b) on the homozygosity values implementing the full model (nfb) and its comparison with the alternative models (nf, nb, fb) using the deviance information criterion (DIC) among models.

To evaluate the partitioning of genetic diversity within and between populations, we performed an analysis of molecular variance (AMOVA) with the software GenAlEx 6.5 (Peakall and Smouse 2012). The populations were assigned to the six physiographic regions (TP, MA, TMBV, BD, TCV, OCV) to test if the geological limits among them have constituted barriers to gene dispersal in P. laevigata. We used both FST and RST estimators based, respectively, on the infinite alleles and stepwise mutation models. Also, we carried out a Mantel test (Mantel 1967), to evaluate the existence of isolation by distance (IBD) among the populations of P. laevigata using the genetic distances (pairwise FST values) and the log geographic distances in GenAlEx 6.5.

The genetic structure of the populations was inferred with the program STRUCTURE 2.3.4 (Pritchard et al. 2000; Hubisz et al. 2009) which, through a model-based Bayesian algorithm, identifies groups of individuals by minimizing Hardy–Weinberg and linkage disequilibria. In this analysis, every individual is assigned by probability to a genetic cluster in order to identify the number of genetic clusters (K) with the maximum value of posterior probability [InP(D)] (Pritchard et al. 2000). Based on preliminary runs with higher values of K, we tested K values from 1 to 5, with 20 iterations for each K. Every run was done using 50,000 iterations as burn-in and 100,000 repetitions of the Markov chain after burn-in. The options of correlated allele frequencies and possible admixture among individuals in the populations were used. The determination of the most probable K value was executed using the maximum value of ∆K according to Evanno et al. (2005) through the online software Structure Harvester (Earl and vonHoldt 2012) As recommended by Janes et al. (2017), we conducted further analysis to test for substructure within the main genetic groups identified, using similar settings and procedures. Results of multiple runs for each K value were summarized using CLUMPP 1.1.2 (Jakobsson and Rosenberg 2007). Finally, we plotted the CLUMPP results with DISTRUCT 1.1 (Rosenberg 2004).

CpDNA sequence analysis

The electropherograms obtained were analyzed and edited using the software BIOEDIT (Hall 1999). The edited sequences were aligned manually using MEGA 4 (Tamura et al. 2007). The analyses of genetic variation were carried out considering nucleotide substitutions only, while the haplotype level analysis considered both nucleotide substitutions and insertion/deletions (indels). We calculated nucleotide diversity (π) and haplotype diversity (hS) using the software DNAsp 5.0 (Librado and Rozas 2009) and the rarefied haplotype richness with the software SPAGeDi (Hardy and Vekemans 2002) We built a statistical parsimony network with the TCS software (Clement et al. 2000), through the parsimony algorithm of Templeton et al. (1992) using a connection limit of 95% and considering gaps as fifth state. The network was edited using TCS Beautifier (Múrias Dos Santos et al. 2016).

To evaluate the partitioning of haplotype diversity within and among populations and regions, we carried out an AMOVA in Arlequin 3.5 (Excoffier and Lischer 2010) with 10,000 permutations to determine the significance of the test. Populations were grouped according to the six physiographic regions, as previously described for the microsatellite analyses. The presence of phylogeographic structure in the populations was assessed through the comparison of population differentiation with unordered (GST) and ordered alleles (NST). If NST is significantly higher than GST it means that genealogically close haplotypes tend to occur together in the same populations (i.e., there is phylogeographic structure) (Pons and Petit 1996). This test was performed in SPAGeDi (Hardy and Vekemans 2002) with 20,000 permutations.

The demographic history of the populations was investigated by calculating Tajima’s D and Fu’s Fs (Tajima 1989; Fu 1997) with the software DNAsp 5.0 (Librado and Rozas 2009) with 10,000 permutations. Additionally, we performed a mismatch distribution analysis, which assesses the relative frequency of the number of nucleotide differences among all pairs of haplotypes and compares it to the unimodal distribution expected under a recent demographic expansion (Rogers and Harpending 1998; Excoffier et al. 2009). This analysis was carried out in Arlequin 3.5.1.3 (Excoffier and Lischer 2010). The statistical test of no difference between the observed distribution and the distribution expected under the population expansion model was performed through the comparison of the sum of squares differences (SSD).

As an additional historical demography test, we ran a Bayesian Skyline Plot (BSP; Drummond et al. 2005) by means of the software BEAST 1.8.1 implemented online (http://www.phylo.org), to evaluate the effective population size variation over time. We performed two independent runs of 100 million generations using the substitution model GTR + I based on empirical frequencies, a log-normal relaxed clock model and a Bayesian Skyline tree with constant size using 10 initial groups. The parameters and trees were sampled every 1000 iterations with a burn-in of 10%. The time axis was adjusted using rates of 1.0 × 10–9 and 3.0 × 10–9 substitutions per site per year (s/s/y), encompassing the range reported for chloroplast DNA in many angiosperms (Wolfe et al. 1987; Gutiérrez-Rodríguez et al. 2011; Ruiz-Sánchez and Ornelas 2014) The results of every run were analyzed by the software TRACER 1.5 (Drummond and Rambaut 2007) in order to ensure effective sample sizes (ESS) > 200.

Ecological niche modeling (ENM)

The ecological niche of P. laevigata was modeled under present climate conditions using the maximum entropy algorithm implemented in MAXENT 3.3.3e (Phillips et al. 2006). One hundred and twenty-nine records of the presence of P. laevigata were used, coming from the MEXU herbarium database, the CONABIO database (available at http://www.conabio.gob.mx/informacion/gis/; accessed September 10, 2020) and the GBIF database (available at https://doi.org/10.15468/00000000; accessed September 10, 2020) and after excluding non-georeferenced data, duplicated records, and non-credible data. For each presence record, associated data for 19 bioclimatic variables were taken from WorldClim 2.1 (Fick and Hijmans 2017) for current climate conditions (1970–2000) with 30 arcsec resolution (available at http://worldclim.org/version2). A variance inflation factor (VIF, Brauner and Shacham 1998) analysis was applied to select the least redundant bioclimatic layers (variables with VIF > 0.8 were included). The variables finally used in the niche model were mean diurnal range, isothermality, temperature annual range, mean temperature of coldest quarter, precipitation seasonality, precipitation of wettest quarter, precipitation of driest quarter and precipitation of warmest quarter.

The model was implemented using the cross-validation resampling method with 50 replicates. As a threshold-independent method of model validation, we used the receiver operating characteristic (ROC) curve analysis. If the value of the area under the curve (AUC) of the ROC is close to 1 it indicates a good model, while values of 0.5 indicate models that are no better than a random model (Phillips et al. 2006). Also, as an additional threshold-dependent method of model validation, we used the true skill statistic (TSS; Allouche et al. 2006). The ecological niche model (ENM) of the present climate conditions was then projected to past climate scenarios, one for the Last Interglacial (LIG, ~ 120 to 140 Ka) (Otto-Bliesner et al. 2006) and two Last Glacial Maximum (LGM, ~ 22 Ka) models provided by the Paleoclimate Modeling Intercomparison Project Phase II (Braconnot et al. 2007), the Community Climate System Model (CCSM4; Collins et al. 2006) and the Model for Interdisciplinary Research on Climate (MIROC-ESM; Hasumi 2007). The LIG and LGM climate models have a resolution of 2.5 arcmin. Additionally, we used the MIROC-ESM and CCSM4 climate scenarios for the mid-Holocene (~ 6 Ka) with a 30 arcsec resolution.

Results

Genetic structure

According to MICROCHECKER, all microsatellite loci showed null alleles, varying in frequency in the populations from 0.09 (Mo07) to 0.29 (Mo13). The results of the statistical power analysis showed that for the different effective population sizes, the six microsatellite markers have sufficient power to detect genetic differentiation of FST = 0.01 at least, which corresponds to low levels of genetic differentiation and high amounts of connectivity (Online Resource 1).

The results of the analysis with FreeNA showed a proportion of null alleles from 0 to 0.38 depending on the locus and the population; however, null alleles did not affect the estimation of genetic differentiation (FST = 0.152 and 0.155 with and without the ENA correction, respectively). In general, genetic diversity levels were moderately high. We found a total of 96 alleles, summed across the six microsatellite loci analyzed. The total number of alleles per locus fluctuated from nine (Mo05) to 27 (Mo16), with an average number of alleles per locus of A = 4.34. At all loci, we found a few alleles in high frequency and many rare alleles in low frequency (alleles with a frequency > 0.10). The rarefied allelic richness showed an average value of AR = 1.68 (Table 2).

Table 2 Parameters of genetic diversity at six microsatellite loci for 21 populations of Prosopis laevigata in Mexico

The average expected heterozygosity (HE) was 0.527, ranging from HE = 0.195 (population 21) to HE = 0.680 (population 4). The average observed heterozygosity (HO) was 0.393, varying between HO = 0.148 (population 21) and HO = 0.558 (population 17). The percentage of polymorphic loci was 100% in most populations, except for populations 1, 2, 19 (P = 83.33) and 21 (P = 33.33) (Table 2). The regression analysis between genetic diversity estimates and latitude indicated positive and significant relations for NEA (R2 = 0.45, P = 0.008), A (R2 = 0.47, P = 0.006), AR (R2 = 0.20, P = 0.01), HO (R2 = 0.38, P = 0.003), and HEc (R2 = 0.29, P = 0.01), showing substantial evidence of increased genetic diversity in populations northwards. In addition, we found positive correlations of AR (R2 = 0.19, P = 0.04) and HO (R2 = 0.19, P = 0.04) with longitude, showing an increased value of these parameters in eastern populations (Fig. 1).

Fig. 1
figure 1

Regression analysis between genetic diversity parameters of Prosopis laevigata populations and geographic variables: a mean effective number of alleles (NEA) per locus versus latitude; b mean number of alleles per locus (A) versus latitude; c rarefied allelic richness (AR) versus latitude; d observed heterozygosity (HO) versus latitude; e expected heterozygosity corrected by sample size (HEc) versus latitude; f rarefied allelic richness (AR) versus longitude; g expected heterozygosity corrected by sample size (HEc) versus longitude

The software INest revealed a significant contribution of null alleles to inbreeding values according to the DIC value for the nfb (8214.56) and fb (8286.39) models. The revised estimate of inbreeding (F = 0.177), was considerably lower than the estimate without correcting for null alleles (F = 0.323).

The AMOVA using the FST estimator revealed a moderate overall genetic structure (FST = 0.16; P < 0.001). We found slight differences between the six physiographic regions indicating that only 3% (FCT = 0.03; P < 0.001) of the genetic diversity is found among regions and that 13% (FSC = 0.13; P < 0.001) is found among populations within regions, while 84% is within populations. The value of the RST estimator was higher than the FST value (0.20; P < 0.001), with 0% (RCT = -0.01; P = 0.983) of the genetic diversity being due to differences among regions, and 20% (RSC = 0.2; P < 0.001) due to differences among populations within regions (Table 3). The Mantel test between log geographic distances and the pairwise FST values showed a non-significant relationship (r = 0.064; P = 0.257), indicating no isolation by distance among populations.

Table 3 Analysis of molecular variance (AMOVA) using the infinite alleles (IAM) and the stepwise mutation models (SMM) for six microsatellite loci in 21 Prosopis laevigata populations from Mexico. The proportions of genetic variation among regions and within and between populations are shown

The ΔK statistic showed K = 2 as the most probable number of genetic groups in the data set (Fig. 2a, b; Online Resource 2a, b). Populations in the Balsas Depression (i. e. 1, 2) were almost completely assigned to the light green cluster (proportion of ancestry = 0.97), whereas populations in the western Trans-Mexican Volcanic Belt (9, 10, 11, 12 and 13) had high proportions (> 0.7) of the dark green cluster. The other populations had mixed proportions of both clusters (Fig. 2a, b).

Fig. 2
figure 2

Genetic structure in the populations of Prosopis laevigata, inferred with the STRUCTURE algorithm (Pritchard et al. 2000; Hubisz et al. 2009) from microsatellite data. a bar plot showing the assignment of all sampled individuals into two main genetic groups, b map showing the geographic distribution of these two main genetic groups, c bar plot showing the genetic assignment corresponding to the substructure within the dark green main genetic group, d map showing the geographic distribution of these three genetic subgroups, e bar plot showing the genetic assignment corresponding to the substructure within the light green genetic group, f map showing the geographic distribution of these three genetic subgroups. The numbering of populations follows Table 1. The gray shading indicates elevation with darker tones corresponding to higher elevations. SMO Sierra Madre Oriental; TP Tamaulipas plains; MA Mexican Altiplano; TMBV Trans-Mexican Volcanic Belt; BD Balsas Depression; TCV Tehuacán-Cuicatlán Valley; OCV Oaxaca Central Valleys

When testing for substructure within the two main genetic clusters, we found that K = 3 was the most probable number of genetic groups within the dark green group (Fig. 2c, d; Online Resource 2c, d). In this case, populations from the TP (18, 19, 20, 21), the east of the MA (17) and the TCV (4, 5) showed a high frequency of the light blue genetic group and the rest of the populations showed more or less equal proportions of the other genetic groups, except for populations 6 and 11 in which the dark blue genetic group predominated and population 13, which showed a high frequency of the purple genetic group (Fig. 2c, d). The analysis also suggested K = 3 as the optimal number of groups within the light green main genetic cluster (Fig. 2e, f; Online Resource 2e, f). Population 2 from the BD had an almost complete assignment (94%) to the brown genetic group, while the rest of the populations had in general less than 10% assignment to this group. Meanwhile, the yellow genetic group was predominant in the other population from the BD (1) and in population 6, while the red group was frequent in MA (14, 16) and TP populations (21) (Fig. 2f).

Chloroplast DNA and haplotype network

We sequenced 427 bp of the intergenic spacer region psbA3′-trnH in 109 individuals belonging to 19 populations of P. laevigata covering the whole distribution. A total of 18 haplotypes were found, which formed a network with a star-like shape, with the most common haplotype (77 individuals) located at the center of the network (Table 4, Fig. 3). This haplotype was found in all the populations sampled. Most of the other haplotypes were separated by one mutational step from the central haplotype, although, there were some haplotypes separated by two to six mutational steps from the central haplotype (Fig. 3).

Table 4 Genetic diversity estimates for the psbA3′-trnH region of the cpDNA in populations of Prosopis laevigata in Mexico
Fig. 3
figure 3

a Geographic distribution of the haplotypes observed in the psbA3′-trnH region of the cpDNA in Prosopis laevigata. The numbering of populations follows Table 1. Note that no data are presented for populations 11 and 14 due to amplification problems. The gray shading indicates elevation with darker tones corresponding to higher elevations. b Haplotype network obtained with statistical parsimony in TCS (Clement et al. 2000). Each circle indicates a haplotype, with circle size proportional to the frequency of the haplotype. White dots indicate mutational steps

Most of the derived haplotypes were exclusive to a single population (private haplotypes), except for the haplotypes H2 (found in populations 1, 3, 10, 12, 15 and 17), H3 (found in populations 18, 20 and 21) and H4 (found in populations 18 and 21). Eight populations were monomorphic for the most common haplotype. On the other hand, the populations with the highest number of haplotypes were 1, 12, 20 and 21, located at the southwest, center and northeast of the distribution of the species (Table 4, Fig. 3).

The haplotype diversity (hS) ranged between 0 and 1 in the populations and was on average moderate (0.4). The nucleotide diversity (π) was in general low, ranging between 0 and 0.00392 with an average of 0.0013 (Table 4). The results of the AMOVA indicated an FST = 0.09 (P < 0.001), meaning that 9% of the genetic diversity is found among populations, while 91% of the genetic diversity is found within populations (Table 5). Differences among regions were low and non-significant (FCT = 0.043; P = 0.06) signaling a high genetic similarity of P. laevigata populations along its distribution. The GST value (0.090) also indicated a low level of genetic structure across the distribution of P. laevigata. The differentiation value considering distances among haplotypes (ordered alleles) was NST = 0.101 and was not significantly different from GST (P = 0.497), denoting a lack of phylogeographic structure in this species.

Table 5 Analysis of molecular variance for the psbA3′-trnH region of the cpDNA in Prosopis laevigata populations in Mexico. The proportions of genetic variation among regions and within and between populations are shown

The values of Tajima’s D and Fu’s Fs were not significant for individual populations but both overall values were negative and significant (D = -2.018, P < 0.001; Fs = -10.868, P < 0.001), suggesting a recent demographic expansion in P. laevigata. The mismatch analysis showed a unimodal distribution, with a main peak located on a single difference between pairs of haplotypes (Online Resource 3). The sum of square deviations (SSD) did not indicate significant differences in comparison to a constant expansion model (SSD = 0.058; P = 0.312), therefore supporting a recent demographical expansion. Likewise, the Bayesian Skyline Plot analysis showed a slight increase in effective population size of the species from 70 Ka to the beginning of the LGM, followed by a more marked expansion from around 20 Ka to the beginning of the Holocene (10 Ka) (Fig. 4).

Fig. 4
figure 4

Bayesian skyline plot of Prosopis laevigata effective population size through time in Mexico, based on psbA3′-trnH cpDNA sequences from 109 individuals. The thick solid blue line is the median estimate, and the area among the light blue lines shows the 95% highest probability density (HPD) (Drummond et al., 2005)

Ecological Niche model

The potential distribution map based on the ENM of P. laevigata for the present-day climate conditions corresponded well to the known distribution of the species (AUC = 0.954 and TSS = 0.825) (Fig. 5a). When the model was projected onto the LIG, we found a substantial reduction of the suitable area for P. laevigata, which appeared mainly restricted to the western portion of the Balsas Depression (BD), with scattered small predicted areas in the central BD, the westernmost part of the TMBV and the OCV (Fig. 5b). For the LGM, somewhat different results were obtained with the MIROC-ESM and CCSM4 models (Fig. 5c, d). In the warmer and dryer MIROC-ESM model, a larger area of suitable habitat was predicted in comparison to the LIG, including most of the BD, a large portion of the Sierra Madre del Sur (SMS), the OCV and even the Central Depression of Chiapas; and to the north, some parts of the TMBV, the SMO and the MA (Fig. 5c). In contrast, the CCSM4 model suggested fragmented areas of suitable habitat in the westernmost and the central part of the BD, the OCV, the northwestern Pacific coastal plain and the MA (Fig. 5d). For the MH, differences between the MIROC-ESM and CCSM4 models were less marked (Fig. 5e, f), both showing suitable habitat for P. laevigata in the western and eastern parts of the BD and in the OCV, the TMBV, the MA (in the south eastern and central portions) and the western ridges of the SMO. The whole set of models suggests a refugial area for P. laevigata in the west of the BD during the LIG, an expansion to the east and the south in the LGM, and a northwards expansion in the MH and the present.

Fig. 5
figure 5

Ecological niche models for Prosopis laevigata in Mexico, a contemporary climate conditions, b Last interglacial (LIG) climatic conditions, c Last Glacial Maximum (LGM) under the MIROC-ESM model, d LGM under the CCSM4 model, e Mid-Holocene under the MIROC-ESM model, f Mid-Holocene under the CCSM4 model. The red area represents the potential distribution of the species in each period. SMO Sierra Madre Oriental; TP Tamaulipas Plains; MA Mexican Altiplano; TMBV Trans-Mexican Volcanic Belt; BD Balsas Depression; SMS Sierra Madre del Sur; TCV Tehuacán-Cuicatlán Valley; OCV Oaxaca Central Valleys

Discussion

In this study, we used nuclear and chloroplast DNA markers to assess genetic diversity and structure of Prosopis laevigata throughout its distribution in Mexico. Despite the ecological and economic importance of Prosopis species in the arid zones of North America, this is the first study focusing on the evolutionary history of a species of the genus in this geographic region. Overall, we found moderate levels of genetic diversity in P. laevigata (HE = 0.527) which are similar or lower than the values previously reported for a few other Prosopis species from South America calculated also from nuclear microsatellites (Prosopis alba Griseb, P. chilensis and P. flexuosa), with HE values of 0.67, 0.55 and 0.71, respectively (Bessega et al. 2016, 2019; Moncada et al. 2019). Genetic differentiation values are more difficult to compare given the different geographic scales and number of populations analyzed and, unfortunately, analyses of cpDNA diversity based on sequences have not been published yet for any other mesquite species, making comparisons impossible. Nevertheless, some of the shared biological characteristics of Prosopis species (pollination by bees, self-compatible, animal dispersed seeds) may explain to some extent the similarities in their patterns of genetic diversity (Duminil et al. 2007).

Interestingly, topographic elements associated with the limits of the physiographic regions, and particularly areas such as the highlands of the TMBV and the SMO that represent interruptions to the continuity of the Mexican arid zones and, therefore, were hypothesized to constitute barriers to gene flow among P. laevigata populations, were not associated with any significant genetic breaks in the species. Previously, it has been demonstrated that the TMBV impaired historical gene flow between northern and southern populations of trees such as Liquidambar styraciflua L. (Ruiz-Sanchez and Ornelas 2014) and even promoted species diversification in plants and animals (McCormack et al. 2011; Gándara and Sosa 2014). However, the lower elevation parts of the TMBV are probably permeable for some species of drier habitats, providing connectivity between populations to the south and the north.

In contrast, results of both the genetic and ENM analysis strongly suggest that the recent evolutionary history of P. laevigata was characterized by a significant population expansion event. Current genetic diversity and structure of the species seem to have been determined mostly by a range contraction during the LIG, when suitable habitat for the species possibly reached its minimum extent, and later expansion both northwards and southwards. Such recent demographic expansion in P. laevigata is supported by the cpDNA haplotype network, which showed a clear star-like shape with a high-frequency central haplotype which is present in all the populations and several closely related haplotypes at low frequency restricted to single populations. These patterns are typically observed in populations that have recently experienced a fast demographic growth (Hewitt 2004; Xun et al. 2016; Stefenon et al. 2019). All the historical demography analyses, including Tajima’s D, Fu’s Fs, mismatch distribution and skyline plot also supported this inference. Particularly, the skyline plot suggested that the effective population size of P. laevigata was steadily increasing before the LGM, showed a more marked increase between the LGM and the early Holocene, and then stabilized at about 5 Ka. Besides, nuclear microsatellites also showed a high-frequency allele and several rare alleles at most of the loci analyzed, a pattern that also may be explained by a population expansion (Keinan et al. 2007; Hallatschek and Nelson 2008). In terms of genetic structure, moderate levels of differentiation were found for both cpDNA and nuclear markers, as well as a lack of phylogeographic structure in the cpDNA marker. This is also typical of expanding populations because during the range expansion process the ancestral alleles or haplotypes are spread into the newly colonized areas, resulting in low or moderate differentiation values among populations. In contrast, in populations with stable historical population sizes, genetic differentiation depends on the equilibrium between mutation, gene flow and genetic drift (Excoffier et al. 2009; Hofer et al. 2009).

This inferred range expansion in P. laevigata is congruent with paleoclimatic reconstructions for the Neotropics, as well as with previous reports of demographic history for Mexican arid zones species (Ruiz-Sanchez et al. 2012; Angulo et al. 2017; Cornejo-Romero et al. 2017; Loera et al. 2017; Scheinvar et al. 2017; Ornelas et al. 2018) during the glacial/interglacial cycles of the Pleistocene. Available data indicate that during the LIG the climatic conditions were in general warmer and wetter than in the present (Scheinvar et al. 2017; Shadik et al. 2017). Afterward, a trend toward increased aridification has been inferred from various sources of evidence and multiple sites (Caballero et al. 2010; Ortega et al. 2010; Chávez-Lara et al. 2012; Caballero-Rodríguez et al. 2018). For example, Caballero et al. (2010), suggested that during the LGM arid conditions predominated in the central highlands of Mexico while the coasts were more humid. This was probably due to a moisture decline from ~ 40 Ka to 10 Ka BP resulting from the orographic shadow exerted by the eastern mountains of Mexico which blocked tradewinds from the Gulf of Mexico as a consequence of lower temperatures (Correa-Metrio et al. 2012). This trend apparently continued during the Holocene, since areas of Central Mexico that were wet during the LGM, such as the south of Guanajuato state, became dry by the end of the middle Holocene and into the late Holocene (Domínguez-Vázquez et al. 2019). A similar pattern of climate change probably occurred in northern Mexico according to the evidence provided by fossil packrat middens, which indicate the presence of Pinus L. and Juniperus L., woodlands at the end of the Pleistocene and the beginning of the Holocene that were rapidly replaced by columnar cacti and mesquite at around 5 Ka (Rhode 2002;Van Devender et al. 2007) suggesting and overall increase in aridity in these regions toward the present. However, these patterns are also thought to have been very variable in space and time during the end of the Pleistocene and the beginning of the Holocene.

In general, our results seem to support the interglacial refugia hypothesis (IRH). It is probable that during the LIG the distribution of P. laevigata was most restricted in extent and located mainly in the western part of the BD, with other smaller potential refugial areas in the central BD, the westernmost part of the TMBV and the OCV. These areas acted as sources for population expansion and migration both northwards and southwards with the aridification trends that occurred in Mexico toward the LGM and the present, and particularly with the combination of warming and aridification of the Pleistocene/Holocene transition. However, as is frequently observed (e. g. Ornelas et al. 2018; Guevara 2020; Peñaloza-Ramírez et al. 2020), the MIROC-ESM and CCSM4 scenarios differed in the extent of the predicted suitable area for P. laevigata, even though the two suggested the same trend. In contrast, the two models for the MH were much more concordant.

Interestingly, genetic diversity levels calculated from the nuclear microsatellites were positively correlated with latitude. Usually, areas within the geographic range of a species that have been recently colonized, harbor less genetic diversity in comparison to areas that sustained stable populations over time (i. e. refugia) because of successive founder effects during the process of colonization (Hewitt 2000; Gugger et al. 2008). However, our available evidence favors the scenario in which the central-west portions of the current range of Prosopis laevigata acted as refugial areas during the LIG, while southern, eastern and northern areas were more recently colonized. Still, the latitudinal trend in genetic diversity might be explained by current differences in effective population size rather than a historical signature of the colonization process. Indeed, during the collection trips in the north, we could observe very large populations with many thousands of individuals, while the current populations in the BD, the TCV and the OCV are small and scattered, even making it difficult to sample ten individuals in some of them. This pattern also suggests that the ‘leading edge’ populations of P. laevigata may be still expanding while the ‘rear edge’ populations may be decreasing. Additionally, recent ecophysiological studies of P. laevigata in populations from the whole distribution have shown better performance in germination under water deficit, salinity and temperature, as well as photosynthetic performance under drought conditions in northern populations in relation to southern ones (Contreras-Negrete et al. unpublished data), probably leading to more effective regeneration of northern populations.

The range shift and expansion in the distribution of P. laevigata during these historical climate transitions are also congruent with the ecology of the species. The seed dispersal of Prosopis species is carried out mainly by mammalians as rabbits, hares, humans, and by water courses (Campos and Ojeda 1997; de Noir et al. 2002; Pasiecznik et al. 2002). Endozoochory has been identified as an efficient mechanism in long-distance dispersal, and the seed dispersal of some species of Prosopis by the megafauna in the last 12.000 years has been proposed (Janzen and Martin 1982), which could have allowed the rapid expansion of P. laevigata populations under favorable ecological conditions during the large-scale warming and aridification process of the Holocene. Mesquites are known to be effective colonizers of open areas, since these plants have high light requirements and show high germination rates and resistance to drought and another stressful conditions typical of arid zones (Zimmermann 1991; Golubov et al. 2001; Pasiecznik et al. 2002). These ecological characteristics are responsible for the rapid spread of some Prosopis species in contemporary time and some of them have become invasive in arid zones of the Middle East, Africa and Australia (Pasiecznik et al. 2002).

Conclusion

In conclusion, we observe that the recent demographic history, as well as the current patterns of genetic diversity and structure found in P. laevigata, have been decisively shaped by the climatic fluctuations experienced by the species since the LIG, following a pattern consistent with the IRH for dryland species. Likewise, the spread of P. laevigata to the north and south of its current distribution since the LIG is consistent with the ecological characteristics of species of the genus, which have been recognized as effective colonizers, even under adverse environmental conditions. Given the widespread distribution of P. laevigata as well as its ecological importance, the present study represents relevant information on the ecological and environmental historical patterns that have shaped the biogeographic history of the arid zones of Mexico.

Data archiving statement

Sequence data have been deposited in the Genbank (accession numbers MK618446- MK618463). Microsatellite data and the sequence alignment file have been deposited in the Dryad database (https://doi.org/10.5061/dryad.w3r2280pr).