Introduction

The future of food security depends on in situ conservation of agrobiodiversity and maintaining the evolutionary potential of domesticated species (Thomas et al. 2016). Such potential depends largely on the conservation of crop wild relatives (CWR); however, multiple factors threaten their conservation and genetic diversity, mainly driven by habitat transformation and fragmentation, rapid environmental changes, introduction of invasive species, urbanization, introduction of genetically modified organisms, and widespread use of biological resources, such as logging and overharvesting (Goettsch et al. 2021). In addition, the proximity between wild plants and plantations of their domesticated forms can also cause a loss of genetic diversity in areas where they coexist (Slatkin 1987; Ellstrand et al. 1999). In this study, we analyzed the genetic diversity and structure, to identify the wild evolutionary units and possible gene flow within the wild-domesticated complex of papaya in Mexico, part of the center of origin, domestication and diversification of the species.

In general, CWR and landraces exhibit high levels of genetic variation in their centers of origin (Gepts and Papa 2003). Domesticated plants and their CWR are the result of distinct evolutionary histories that, in many cases, have increased intraspecific diversity without losing the ability to interbreed (Brozynska et al. 2016; Dempewolf et al. 2017). In many crop species, the occurrence of gene flow that enables natural interbreeding and the generation of fertile offspring is possible (Harlan and de Wet 1971) if domesticated and wild plants are sexually compatible, grow in sympatry, share pollinators (or the same pollination syndromes) and if flowering coincides totally or partially (Papa and Gepts 2003; Papa 2005). Increasing crop plant density, however, changes the scenario, and consequences may include further reduction in genetic diversity, local extinction, and the development of aggressive weedy varieties (Bartsch et al. 1999; Ellstrand et al. 1999; Mardonovich et al. 2019). Modern varieties of many crops have been selected under international standards of genetic homogeneity and phenotypic stability, therefore, their diversity is even lower, so a higher loss of genetic diversity for wild populations is expected if interbreeding occurs (Ehrlich and Raven 1969; Bartsch et al. 1999; Ellstrand et al. 1999; Papa and Gepts 2003). Moreover, establishing transgenic crops or introducing imported seeds in regions where the CWR occurs, raises the potential genetic and environmental risks of hybridization (Hails 2000; Snow 2002).

Wild populations of papaya (Carica papaya) are composed of fast-growing trees, with a simple stem and indeterminate growth (Teixeira Da Silva et al. 2007) that reach 10 m in height and can live up to 20 years (Jiménez et al. 2014) (Fig. 1a). Wild papayas bear either male or female flowers (i.e., dioecious; Fig. 1b, c). Naturally distributed in lowland tropical forests of Mexico and northern Central America, papaya is a pioneer species that colonizes newly created large clearings (Núñez-Farfán and Dirzo 1988; Paz and Vázquez-Yanes 1998; Aradhya et al. 1999; Carvalho and Renner 2012; Chávez-Pesqueira and Núñez-Farfán 2017). Wild papayas are within the pollen/seed dispersal range of papaya plantations in many parts of Mexico (Chávez-Pesqueira and Núñez-Farfán 2017). Due to agricultural practices, domesticated plants complete more generations in fewer years and show higher plant abundance, which translate into a higher availability of pollen and seeds (Papa and Gepts 2003; Papa 2005). There is evidence that wild and domesticated papaya plants share pollinators and floral visitors, such as moths, bees, and ants (Moo-Aldana et al. 2017; Badillo-Montaño et al. 2018; Pacheco‐Huh et al. 2021), facilitating interbreeding between these two groups with different evolutionary histories (Badillo-Montaño et al. 2019; Pacheco‐Huh et al. 2021). Pollen and seeds of domesticated plants can reach wild populations from different sources such as plantations, feral plants, and even from individuals with intermediate phenotypes, such as those that have been identified as natural populations in Costa Rica (Brown et al. 2012), as there is evidence that they facilitate gene flow from domesticated to wild papaya (Wu et al. 2017). Flowering and fruiting of wild plants occur throughout the year. Fruits are consumed and probably dispersed by birds and small mammals (Chávez-Pesqueira and Núñez-Farfán 2016).

Fig. 1
figure 1

a Wild population of Carica papaya showing female individuals with immature fruits, at Yucatan, Mexico. b Female flower of wild C. papaya. c Male inflorescence of wild C. papaya. d Plantation of C. papaya showing immature fruits of the Maradol variety, at Yucatan, Mexico. e Hermaphrodite flower of a Maradol papaya. f Mature fruit of Maradol papaya

Characteristics associated with the papaya domestication syndrome are fruit gigantism (Carvalho and Renner 2012; Fig. 1d, f), loss of dormancy (Paz and Vázquez-Yanes 1998), reduced defense strategies against antagonists (Pacheco‐Huh et al. 2021), and changes in floral morphology and reproductive system, such as a tendency to hermaphroditism in domesticated varieties (Fig. 1e) (Chávez-Pesqueira and Núñez-Farfán 2017), which in turn can lead to changes in sex ratio, thus altering the effective population size (Chávez-Pesqueira et al. 2014). The genetic diversity of domesticated papaya is considerably lower compared to wild populations (Chávez-Pesqueira and Núñez-Farfán 2017). Mexico is the world’s leader exporter of papaya (FAO 2022). The Maradol papaya is the predominant improved variety grown in Mexico (Soriano-Melgar et al. 2016) and has displaced local varieties since its introduction ca. 1990s, currently representing 95% of the country's production (Vázquez et al. 2010; SIAP 2018). However, despite the fact that local varieties have lost their large-scale commercial importance, some are still grown in rural communities at low densities in home gardens and milpas, where they are associated with diverse uses (Moo-Aldana et al. 2017).

In the case of wild papaya plants, much of its natural distribution in Mexico is fragmented due to changes in land use, thus producing a decreased intrapopulation genetic diversity in populations of forest fragments, and an increased interpopulation genetic differentiation when compared to populations in the primeval forest (Chávez-Pesqueira et al. 2014) In addition, being strictly dioecious, isolated populations of wild papaya experience biased sex ratios and high levels of biparental inbreeding (Brown et al. 2012; Chávez-Pesqueira et al. 2014). Regions where wild populations coexist with their domesticated relatives, such as Mexico, are of particular interest for conservation matters. Here, we aim to determine the distribution of wild papaya, to assess the genetic diversity and structure of wild papaya populations to identify the species’ evolutionary units, and to analyze the possible effect of gene and transgene flow between domesticated and wild papaya plants in Mexico.

Materials and methods

Potential distribution map

Occurrence data

We collected occurrence records for wild C. papaya across their geographic range in Mexico. We examined, gathered, and cleaned the data from the following sources: (1) through direct observations during several surveys across Mexico (Chávez-Pesqueira et al. 2014; Chávez-Pesqueira and Núñez-Farfán 2016; unpublished field observations); and (2) by carefully reviewing specimens from seven herbariums (six from Mexico and one from the United States): Herbario-Fibroteca U Najil Tikin Xiw of Centro de Investigación Científica de Yucatán, A.C. (CICY), Herbario Nacional of the Universidad Nacional Autónoma de México (MEXU), Herbario Alfredo Barrera Marín of Universidad Autónoma de Yucatán (UADY), New York Botanical Garden (NYBG), Herbario of Escuela Nacional de Ciencias Biológicas (ENCB), Instituto de Ecologia, A.C. (XAL), and Estación de Biología Los Tuxtlas, UNAM (Universidad Autónoma de México), distinguishing specimens with wild characteristics (small fruits with scarce pulp). From the 816 papaya specimens we examined in herbariums, only 69 were validated as wild plants. From these, only 26 had precise and reliable geographic information and were used, along with locations of wild populations collected in this and previous studies (Chávez-Pesqueira et al. 2014; Chávez-Pesqueira and Núñez-Farfán 2016), for a total of 106 records (Table S1). After cleaning the geographic inconsistences, we thinned the dataset following a distance-based approach (excluding duplicated records within a grid of 1 × 1 km) with the NTBOX package version 0.4.6.1 (Osorio-Olvera et al. 2020) in R version 3.5.0 (R Core Team 2021). Finally, for modeling and evaluation purposes, we split the thinned database into two sets (i.e., training and testing) of nearly identical sizes following the checkerboard partition method implemented in the package ENMeval (version 0.3.0; Muscarella et al. 2018). We show the localities used in this study in Fig. 2 (n = 106).

Fig. 2
figure 2

Distribution of 106 records (orange dots) of wild Carica papaya in Mexico. (Color figure online)

Climatic variables

The environmental layers used in this study correspond to bioclimatic variables summarizing annual, seasonal, and extreme tendencies derived from monthly temperature and precipitation data for Mexico between 1910 and 2009 at 30″ resolution (~ 1 km) (Cuervo-Robayo et al. 2014). We excluded four variables (i.e., mean temperature of the most humid quarter, mean temperature of the least humid quarter, precipitation of the warmest quarter, and precipitation of the coldest quarter) due to artifacts resulting from the combination of temperature and precipitation data (Escobar et al. 2014). To avoid model overfitting due to multicollinearity among variables, we selected a sub-set of uncorrelated and biologically meaningful variables. For this, we first assessed variables’ contribution in exploratory runs of a Maxent model (Phillips et al. 2006) (Table S2) by measuring the variable contribution percentage, permutation importance, and model gain through jackknife tests. Then, we calculated pairwise Pearson’s correlation coefficients from the occurrences. From these analyses, we excluded highly correlated variables (| r | > 0.8) and retained important variables with the greatest biological relevance for further modeling (Wegier 2013; Simoes et al. 2020). Thus, we used five variables: annual mean temperature (Bio1), temperature seasonality (Bio4), minimum temperature of the coldest month (Bio6), temperature annual range (Bio7), and precipitation of the driest month (Bio14).

The calibration area, or the ‘M’ element of the BAM diagram, refers to areas that have been accessible to the species via dispersal over relevant time periods (Barve et al. 2011; Peterson and Soberón 2012). We considered the geographic extent of Mexico as the calibration area.

Ecological niche modeling

We modeled the ecological niche of wild C. papaya using the Maxent algorithm version 3.3.3 k (Phillips et al. 2006). Maxent is a machine-learning algorithm that uses the maximum entropy principle to identify a target probability distribution subject to a set of constraints related to the environmental characteristics of occurrences and a sample of the calibration area (Phillips et al. 2004, 2006). To select an optimal combination of Maxent parameters (regularization multipliers [RM] and feature classes [FC], see below), we developed a series of candidate models with the R package ENMeval (Muscarella et al. 2018); thus, we assessed five FC combinations: L, LQ, LQH, LQHP, and LQHPT (where, L: linear; Q: quadratic; H: hinge; P: product, and T: threshold) and tested RM values ranging from 0.5 to 4 in increments of 0.5. We selected the best combination of model parameters according to performance and complexity criteria, namely: area under the ROC curve (AUC) > 0.9, omission rates < 0.10, and the Akaike information criterion (AICc) where delta AIC ≤ 2. Then, a model with specific RM and FC settings was run in Maxent (i.e., tenfold cross-validation models, 20,000 background points, and 1000 maximum iterations). We selected the average Cloglog output for environmental continuous suitability visualization and reclassified it into a binary map (i.e., presence or absence of suitable conditions) in ArcGIS version 10.2.1 (ESRI 2015) by applying a threshold value balancing a low omission error and the proportional predicted area. Finally, to assess model performance and significance, we evaluated the AUC ratio of the partial receiver operating characteristic curve (pROC; 1,000 replicates, and E = 0.05), calculated the omission rate, and performed binomial tests with the testing datasets obtained previously (see the occurrence data section above) in NTBOX (Osorio-Olvera et al. 2020).

Sampling for genetic analyses

Samples from 556 papaya individuals (447 wild and 109 domesticated) were obtained from 33 collection sites. For wild papaya, samples were obtained from 24 collection sites throughout the natural distribution of the species in Mexico. Samples for domesticated papaya were obtained from nine plantations of Maradol papaya from two different regions of Mexico (four plantations of the Pacific coast from the states of Colima, Jalisco and Guerrero, and five plantations from the state of Yucatan) (Fig. 4, Table S3). Genetic data were obtained from six microsatellite primers (Ocampo et al. 2006, Table S4) for all 556 individuals, and one cpDNA region, psbA-trnH, for 411 individuals (368 wild and 42 domesticated). For wild individuals, genetic data were obtained from field collections in the western coast of Mexico, and integrated with previous data from southeastern Mexico (Chávez-Pesqueira and Núñez-Farfán 2016) (Table S3) to represent the entire natural distribution of papaya in Mexico. DNA was extracted from leaf tissue using the CTAB protocol (Doyle and Doyle 1987) and the amplification of both markers (microsatellites and the psbA-trnH region) was following Chávez-Pesqueira and Núñez-Farfán (2016).

Data analyses for nuclear microsatellites

The genetic structure of wild individuals was inferred with two approximations, using Geneland v.4.0.9 (Guillot et al. 2008) and STRUCTURE v. 2.3.4 (Pritchard et al. 2000). With Geneland, we carried out ten independent runs, each run consisted of 2,000,000 Markov chain Monte Carlo (MCMC) iterations with a thinning value of 100, and a coordinate uncertainty of 0.5 (which takes into account the presence of individuals with identical coordinates) (Salamon et al. 2020). Uncorrelated and null allele model options were chosen. Runs were recalculated with a burn-in of 1,000 iterations, and the average posterior probability was used to choose the best-suited run. For post-processing of the output files, the spatial domain was defined as X = 300 and Y = 300 with a burn-in of 1,000. The results of three individual runs were compared to assess spatial group correspondence. With STRUCTURE, ten iterations were run under an admixture model assuming uncorrelated frequencies with a range of K values from one to eight (structure proposed by Geneland). Each run had a burn in period of 500,000 followed by 1,000,000 Markov chain replicates (MCMC). To obtain the optimum K value, the method proposed by Evanno et al. (2005) was used.

We evaluated pairwise genetic differentiation of populations by assuming a step-wise mutational model with RST statistic (Slatkin 1995), then partitioning of genetic variability within and among populations groups was tested by analysis of molecular variance (AMOVA; Excoffier et al. 1992) among the genetic cluster derived from Geneland using GeneAlex v.6.5 (Peakall and Smouse 2012).

Standard genetic diversity indices as the percentage of polymorphic loci (%P), number of alleles (Na), observed heterozygosity (HO), and expected heterozygosity (HE), were estimated for each wild genetic cluster derived from Geneland and for the 109 domesticated individuals (which formed a single genetic group) using the Arlequin 3.5 software (Excoffier and Lischer 2010).

Gene flow estimators

Gene flow between domesticated and wild populations was evaluated using two methods. (1) RST was used to indirectly calculate the number of migrants per generation (Nm) between pairs of populations: Nm = [(1/RST) − 1]/4]. (2) BayesAss v.3.0.4 (Wilson and Rannala 2003) was used to obtain recent migration rates (m). This software uses genetic assignments to estimate short-term dispersal rates (over the past two generations). We performed five runs (each with different starting seed value) of 10,000,000 generations, with a burn-in of 1,000,000 generations, and sampled the chain every 2,000 generations.

Data analyses of cpDNA

368 cpDNA sequences were used to assess the genetic structure of wild individuals using Geneland v.4.0.9 (Guillot et al. 2008), considering the polymorphic sites detected with DnaSP v.6.12.03 (Rozas et al. 2017). To make the structure consistent between different runs, we removed from the analysis those individuals that had identical sequences to the domesticated group, resulting in 296 individuals. Options were as described for the microsatellite markers but using 4,000,000 Markov chain Monte Carlo (MCMC) iterations.

We used Arlequin v.3.5 (Excoffier and Lischer 2010) to obtain the number of polymorphic sites and number of haplotypes, and haplotype and nucleotide diversity for 368 wild and 42 domesticated papaya individuals.

We evaluated pairwise genetic differentiation of populations using the NST estimator. NST is analogous to FST, but considers the average number of differences between pairs of sequences, either of the same or different subpopulations (Lynch and Crease 1990). We also calculated the number of migrants per generation (Nm) between pairs of populations. Estimators of genetic differentiation and Nm were obtained using DNAsp v.6.12.03 (Rozas et al. 2017).

Haplotype networks were constructed to examine genealogical relationships between haplotypes using statistical parsimony with TCS 1.2.1 (Clement et al. 2000). Gaps in the sequences generated by insertions and deletions were considered as a fifth state. For this analysis, the domesticated populations of papaya were not considered.

Transgene monitoring

For the transgene monitoring, 935 individuals of papaya were used (Table S5). For this analysis we included samples from the wild-domesticated complex of papaya, including local varieties from home gardens to cover the broadest representation of the species in Mexico. From the 935 individuals, 320 belonged to wild individuals obtained from 22 collection sites of wild papaya used in the genetic analyses described above (it was not possible to use the 24 collection sites, given that individuals from two sites did not amplify the primers), and from six new collections sites from the states of Tabasco and Guerrero, for a total of 320 wild individuals. We used 615 domesticated samples obtained from different sources (plantations, markets and commercial seeds): 240 obtained from plantations (mostly from the Maradol variety, but one from the Tainung variety), from markets (Maradol variety), and from commercial seeds (from the Amarilla, Criolla and Maradol varieties), and 135 belonged to samples collected in home gardens (mainly Maradol variety, but some local varieties like Amarilla, Amameyada and Criolla) (Table S5). DNA extraction was following the CTAB protocol with some modifications (Doyle and Doyle 1987). Three primers obtained from JRC GMO MATRIX (European Commision 2014) were used to detect possible transgenic events in Mexico (Table S6). Primers P35s and Tnos, allow the detection of all transgenic papaya lines registered in the European Commission database, and the first PRSV-CP, identifies the protein coat of Papaya Ringspot Virus (PRSV). For the detection of possible transgenic events, PCR reactions were performed for 187 pools of DNA containing 1 μl of DNA (10 ng/μl) from five individuals. In total, 64 pools from wild individuals, 48 from plantation individuals (domesticated), 27 pools from individuals from home gardens (domesticated), and 48 from individuals from commercially sold seeds (domesticated) were analyzed. PCR reactions were composed of: 1 × buffer, 1.5 mM MgCl2, 0.6 mM dNTPs, 1 μM primer (F and R), 1U taq polymerase (Promega Gotaq Flexi DNA) and 5 μl of DNA. The reaction conditions for primers P35S and Tnos were: initial denaturation at 94 °C for 3 min, 35 cycles of 30 s at 94 °C, 40 s at 55 °C–40 s and 40 s at 72 °C, and a final extension of 72 °C for 10 min. The reaction conditions for the PRSV-CP and CHY primers were: initial denaturation at 94 °C for 10 min, 35 cycles of 30 s at 94 °C, 30 s at 60 °C, 30 s at 72 °C, and a final extension of 72 °C for 3 min.

Results

Potential distribution map

We generated 40 candidate models and selected the parameter combinations that obtained the best performance and complexity estimates (i.e., FC = LQ, RM = 0.5; Supplementary Materials 2). With these settings, the resulting niche model was statistically significant (p < 0.0001***) according to both significance evaluations (binomial test and partial ROC analysis; AUC ratio = 1.79). The model exhibited good predictive performance (omission error = 0.018) and indicates climatic suitability in the coasts of the Pacific, the Gulf of Mexico, and the Yucatán Peninsula (Fig. 3). The most important variables were annual mean temperature (Bio1) and minimum temperature of the coldest month (Bio6), as the former decreases the model gain the most when it is omitted and the latter results in the highest gain when used in isolation.

Fig. 3
figure 3

a Climatic suitability map of wild Carica papaya in Mexico. The red color shows the areas with the highest climatic suitability for the species, while the light blue shows the areas with low climatic suitability; white color shows the areas with zero climatic suitability. b Binary map of the presence (green)/absence (white) of wild C. papaya in Mexico. (Color figure online)

Population structure and nuclear microsatellite diversity

Six genetic clusters (hereafter named evolutionary units) were detected by posterior cluster membership for the wild plants (Geneland K = 6, Fig. 4a): North-Gulf unit included the following collection sites: Cielo, Huasteca, Tamazunchale; Isthmus-Gulf unit included: Matias Romero, Acayucan, Los Tuxtlas, Palenque and Villa Guadalupe; South-Pacific unit included: Poza Rica, Ventanilla, Santiago Astata and Marquelia; Yucatan Peninsula unit included: Oxtankah, Chichen Itzá, Dzibilchaltún, Río Lagartos and Cancún; Western Yucatan Peninsula unit included: Mamantel and Caoba; and North-Pacific unit included: Coast of Michoacán, Tecpan, Copala, Tepic and Bucerías. With STRUCTURE, a lower number of genetics clusters was recognized (K = 4, Fig. 4b). All domesticated papaya individuals were grouped as a single group (hereafter named domesticated group). For the six evolutionary units inferred by Geneland we obtained an overall RST value of 0.057 (p = 0.001), ranging from 0.006 to 0.092 (Fig. 5a). When including the domesticated group, the RST had a value of 0.103 (p = 0.001), and between pairs of wild populations and the domesticated group RST values ranged from 0.006 to 0.568. Hierarchical partition of molecular variance (AMOVA) for the six wild units revealed that the highest proportion of variance was among individuals (72%) and the lowest proportions were within individuals (23%) or among units (6%).

Fig. 4
figure 4

The six and four evolutionary units inferred from Geneland (a) and STRUCTURE (b) using nuclear microsatellites from wild Carica papaya in Mexico. In (a), blue triangles: North-Gulf unit; purple triangles: Isthmus-Gulf unit; orange triangles: South-Pacific unit; pink triangles: Yucatan Peninsula unit; lilac triangles: Western Yucatan Peninsula unit; green triangles: North-Pacific unit. Black squares correspond to the cultivated. In (b), STRUCTURE plot showing the four evolutionary units (ΔK = 4) for wild and domesticated C. papaya. The colored bar at the top corresponds to the evolutionary units found in Geneland. (Color figure online)

Fig. 5
figure 5

Heatmap of pairwise genetic differentiation (RST and NST) values (lower part) and number of migrants per generation (Nm) (upper part) among wild evolutionary units and domesticated papaya using two molecular markers: (a) genetic differentiation (RST) and number of migrants per generation (Nm), using six nuclear microsatellites for six wild evolutionary units and domesticated papaya; (b) genetic differentiation (NST), and number of migrants per generation (Nm), for the psbA-trnH marker in three genetic groups and domesticated papaya. The colors indicate RST, NST or Nm values ranging from lower values (green) to higher values (red). IG, Isthmus-Gulf unit; NG, North-Gulf unit; YP, Yucatan Peninsula; WYP, Western Yucatan Peninsula unit; SP, South-Pacific unit; NP, North-Pacific unit; Dom, domesticated group. (Color figure online)

The number of migrants per generation (Nm) among pairs of wild units was in all cases above 1 (Fig. 5a). However, this estimator was lower when comparing between the wild populations and the domesticated population (Nm < 1), except for the North-Pacific unit (Nm = 2.473). Recent migration rates among units (m), as estimated with BayesAss, showed low recent migration rates (Fig. 6), with higher rates among geographically close evolutionary units (Yucatan Peninsula—Western Yucatan Peninsula), and lower rates between geographically distant units (North-Pacific with any evolutionary units). However, this estimator showed low levels between the domesticated group and the wild evolutionary units (m < 0.01, values = 0.0033 ± 0.0033 to 0.0084 ± 0.0056), in contrast to the wild units which presented higher recent migration rates (m values = 0.0032 ± 0.0031 to 0.1307 ± 0.0461). However, the direction of gene flow was higher from the domesticated group (source population) to the wild units (recipient population) (Fig. 4b).

Fig. 6
figure 6

Recent migration rates (m ± s.d.) between source and recipient populations of wild papaya in Mexico. The arrows show the direction of gene flow between evolutionary units. Only m values higher than 0.01 are shown; the thicker arrows indicate higher values of m. The NP unit does not show connections due to low values of m (m < 0.01). The color of each circle represents the inferred evolutionary unit of Geneland using nuclear microsatellites. Letters inside the circles represent the abbreviation of the evolutionary units: IG, Isthmus-Gulf unit; NG, North-Gulf unit; YP, Yucatan Peninsula; WYP, Western Yucatan Peninsula unit; SP, South-Pacific unit; NP, North-Pacific unit. (Color figure online)

All loci were polymorphic and moderately high values of neutral genetic diversity were obtained for wild papaya in Mexico (HO = 0.645 and HE = 0.836; ranges for HO values = 0.485 to 0.771 and HE values = 0.624 to 0.816) (Table 1). Most of the wild evolutionary units showed high values of genetic diversity (HE > 0.7), except the North-Pacific unit. Contrastingly, the domesticated group presented extremely low values of genetic diversity (HO = 0.018 and HE = 0.106).

Table 1 Genetic diversity estimators for six nuclear microsatellites in six evolutionary units and the domesticated group of papaya in Mexico

Population structure and chloroplast DNA diversity

Using a total of 296 wild individuals, three genetic clusters were detected by posterior cluster membership (Geneland K = 3; Fig. 7). Group 1 included the following collection sites: Acayucan, Los Tuxtlas, Villa Guadalupe, Poza Rica, Matias Romero, Ventanilla, Santiago Astata, Marquelia, Caobas, Dzibilchaltún, Chichen Itzá, Río Lagartos, Cancún, Oxtankah, Copala, Tecpan and Tepic; Group 2 included: Cielo, Huasteca, Tamazunchale; and Group 3 included: Bucerías, Costa de Michoacán, Palenque and Mamantel. Overall wild groups showed a high genetic diversity of haplotype diversity (Hd = 0.934; ranges for Hd values = 0.772–0.918). However, the nucleotide diversity was low (π = 0.009 to 0.016). In contrast, the domesticated group presented low values of haplotype and nucleotide diversity (Hd = 0.094; π = 0.000) (Table 2).

Fig. 7
figure 7

Evolutionary units inferred from Geneland using the psbA-trnH chloroplast region for wild papaya in its distribution in Mexico. Blue triangles: Group1; yellow triangles: Group2; purple triangles: Group3.

Table 2 Genetic diversity estimators for the psbA-trnH marker in three wild genetic groups and the domesticated group in Mexico

Genetic differentiation among the three wild genetic groups inferred by Geneland resulted in a NST = 0.409. Overall (including wild and domesticated groups) NST value was 0.446. Between pairs of wild groups NST values ranged from 0.127 to 0.522 (Fig. 5b), while between pairs of the wild groups with the domesticated one, NST values ranged from 0.108 to 0.644. The number of migrants per generation (Nm) among pairs of wild groups was low in some cases, but above 1 between Groups 1 and 3 (Nm values = 0.23 to 1.71) (Fig. 5b). However, this estimator was moderate between the wild groups and the domesticated group (Nm < 1) except between Group 3 and the domesticated group (Nm = 2.06).

The haplotype network showed the distribution and frequency of 68 haplotypes for the 24 collection sites (Fig. 8). The most frequent haplotypes (H1, H2, H3) were present in all three wild units (Fig. 8a), except for group 3 that does not include H3 (Fig. 8b). In addition, the evolutionary units detected showed exclusive haplotypes (Fig. 8). The domesticated group was not considered in the haplotype network due to the almost null genetic variation found.

Fig. 8.
figure 8

68 Haplotype network for the psbA-trnH chloroplast region of wild Carica papaya in its distribution in Mexico. The size of the circles is proportional to the frequency of each haplotype. Black dots represent unsampled haplotypes. Each section, size and color of haplotypes (pies) are proportional to the representation of the evolutionary units inferred from Geneland using (a) nuclear microsatellites (as in Fig. 4a), and (b) psbA-trnH chloroplast region (as in Fig. 7)

Evaluation of transgene monitoring

Of the 187 DNA pools examined, none amplified for the transgene regions (Table 3). To corroborate the absence of transgenes in papaya, individual samples were randomly chosen from each DNA pool to analyze the P35S region, with negative results for all samples.

Table 3 Pools of Carica papaya DNA analyzed and number of positive samples for the three transgenes analyzed

Discussion

Our study includes the analysis of the Mexican distribution of Carica papaya, area proposed as the center of origin of the species, the center of distribution of genetic diversity and the site where its domestication began and continues to the present day. Our main objective was to identify the true evolutionary units of wild papaya and to provide necessary information on the areas where the processes that originate and maintain its variation occur for conservation purposes, under the premise that "you cannot conserve what you do not know". Our search for collection sites was exhaustive in order to identify the limits of the evolutionary units, using two different molecular markers (nuclear microsatellites and chloroplast sequences) which, due to their differences, provide complementary histories. The discovery of the evolutionary units will allow changes in the current management and conservation perspectives for the species, and to understand and manage the potential threats faced by the evolutionary units of wild papaya.

Wild papaya distribution

In order to obtain the natural distribution of wild papaya, unlike other studies (Fuentes and Santamaría 2014; Espinosa et al. 2018; Hernández-Salinas et al. 2021), we ensured to use data exclusively from wild individuals, by distinguishing wild herbaria specimens from cultivated ones from seven national and international herbariums, and by performing an extensive field sampling. Thus, we are certain that only data from wild specimens was used to construct the potential distribution map. Wild papaya shows a wide distribution in Mexico, covering lowland tropical and subtropical vegetation areas ranging from southern Sinaloa in the west coast and from El Cielo, Tamaulipas, on the east coast, to southern Mexico, where it is widely distributed in the Yucatan Peninsula (Figs. 2, 3). No wild specimens from the states of Sinaloa, Nayarit and Michoacán were found in herbariums, which highlights the importance of the discovery and field collections of wild populations in these areas, since its distribution in these regions was unknown. Alarmingly, the current rate of deforestation and habitat fragmentation in the natural distribution of papaya is high enough as to endanger the existence of plant species (Novick et al. 2003). This, coupled with the lack of information on the state of the CWR, warns about the importance of studying and conserving wild populations of important crop species.

Genetic diversity, structure and gene flow in wild populations of Carica papaya in Mexico

Genetic structure analyses using microsatellite markers and chloroplast DNA recognized six and three genetic groups, respectively. This indicates that wild papaya populations in Mexico behave as very large evolutionary units. The identification of evolutionary units is very useful because it allows to establish the set of individuals or populations with similar genetic characteristics that have diverged for a long time, which in turn helps to generate better conservation strategies that include the ecological and evolutionary processes of the species (Vogler and Desalle 1994; Moritz 1999). The greater area and dispersion that characterize large evolutionary units allows them to resist certain changes, such as environmental impacts and anthropogenic effects, and therefore their risk of local extinction decreases (Moritz 1999; Funk et al. 2012; Casacci et al. 2014). However, vulnerability to evolutionary processes deserves attention, since they can occur at any site of the periphery, or at any individual within the area, silently spreading and affecting a larger area than expected, which is why in-depth studies such as this are needed.

Genetic diversity was, in general, high for wild populations of papaya in Mexico for both markers. Overall, we found similar values of genetic diversity compared to previous studies in Mexico (Chávez-Pesqueira et al. 2014; Chávez-Pesqueira and Núñez-Farfán 2016), but higher than studies of wild papaya in Central America and the Caribbean (D’eeckenbrugge et al. 2007; Ocampo et al. 2007; Brown et al. 2012; Mardonovich et al. 2019). High haplotype diversity values were also obtained with the chloroplast marker, suggesting Mexico as an important genetic reservoir for the species, as genetic diversity has been maintained at high values over time. However, the difference in the number of groups identified with Geneland with the two types of markers could be related to recent anthropogenic effects. At the local level, Chávez-Pesqueira et al. (2014) found that populations located in a fragmented forest in Los Tuxtlas, Mexico, showed lower genetic diversity and higher genetic differentiation compared to populations from continuous forests. The increasing fragmentation of the natural habitat of wild papaya represents a worrisome scenario for the maintenance of genetic partitioning of the wild form of the species. On the other hand, cpDNA data suggest that gene flow through seeds was probably important for the maintenance of genetic connectivity in the past, and that recent fragmentation has probably generated barriers for seed dispersers (Chávez-Pesqueira and Núñez-Farfán 2016).

The haplotype network showed diversification processes and gene flow events (Fig. 8a, b). Most units show unique haplotypes and few mutational steps between them. Haplotype networks allow us to observe the spatial distribution of diversity and possible gene flow events, such as between the Yucatan Peninsula to the North Pacific (H24, H57) and vice versa (H50 and H58; Fig. 8). Therefore, gene flow between populations appears to be a frequent evolutionary process that occurs naturally.

Gene flow between wild and domesticated papaya

The obtained values of Nm (for both markers) and m (for nuclear markers) revealed high gene flow among the wild evolutionary units (Figs. 5, 6), as well as the existence of gene flow between Maradol papaya plantations and the wild evolutionary units (Fig. 5). However, we found that the direction is mainly from domesticated to wild plants, and that the highest value of gene flow with the nuclear markers was between the domesticated group and the North-Pacific unit (Nm = 2.47; Fig. 5a), which presented the lowest values of genetic diversity (Table 1). This unit is embedded in one of the regions with the highest presence of papaya plantations in Mexico (SADER 2021), which could explain the higher gene flow values. Similarly, with the cpDNA markers, we found higher gene flow from plantations to group 3 (Nm = 2.06; Fig. 5b), which is also the group with the lowest nucleotide diversity. In both cases, we found the lowest levels of genetic differentiation with the domesticated group (Fig. 5), compared to the other wild units, which are more genetically differentiated from it.

The low levels of genetic diversity in domesticated papaya (Table 1, 2), together with the evidence of gene flow between domesticated and wild papaya (Fig. 5), highlight the importance of conserving wild genetic diversity, since gene flow from domesticated plants could affect the fitness of wild units; in addition to modifying the ecological and biological functionality of wild plants by homogenizing their diversity. Commercial plantations in Mexico represent large extensions with thousands of plants per unit area, which promotes a higher probability of interbreeding with wild plants (Ellstrand et al. 1999). Furthermore, we recommend that future research include native varieties as it is highly likely that they maintain gene flow with commercial plantations and wild populations, as well.

The low diversity in the Maradol improved variety of papaya is determined from its origin, since only two parental lines were used to obtain it, and several rounds of artificial selection were carried out to gain genetic stability and fixation of desirable traits (Rodríguez and Corrales 1967; Rodríguez 2008). In this study, we confirmed that the genetic diversity obtained from plantations from two different regions of Mexico (Pacific and Yucatan Peninsula) is practically nil for both molecular markers (Tables 1, 2). It is known that the domestication process implies a genetic bottleneck that generally reduces the diversity found in wild plants (Yamasaki et al. 2005). In the case of improved varieties, such as Maradol, subsequent bottlenecks occur that further decrease genetic diversity, increasing their vulnerability to future environmental changes (Van De Wouw et al. 2010).

With the nuclear markers, the nine papaya plantations clustered into a single genetic group (Fig. 4b), indicating that plantations have a very similar genetic constitution. Likewise, with the cpDNA marker we found that most domesticated individuals contain the H1 haplotype, which is the most represented in wild plants and is found throughout the natural distribution, supporting that genetic diversity in plantations is limited. We suggest that the limited gene flow from wild to domesticated plants is explained mainly by farmers' management practices, since they generally use commercial seeds. As a result, the commercial cultivation of Maradol papaya creates a significant barrier to gene flow for wild plants. Wild pollen that lands on plantations represents a loss because it won't reach natural populations, reducing the likelihood of outcrossing. However, this study did not consider papaya plants grown in backyards or home gardens, which could represent important sources of domesticated diversity. Furthermore, plants in home gardens could act as connectors for gene flow, and even facilitate the subsequent introgression of domesticated alleles into wild populations, as wild and domesticated plants share pollinators and floral visitors (Moo-Aldana et al. 2017; Badillo-Montaño et al. 2018; Pacheco‐Huh et al. 2021).

In papaya, gene flow from domesticated to wild individuals can cause diverse consequences at different levels, e.g., (1) morphological consequences, such as increased size in fruits (Paz and Vázquez-Yanes 1998; Chávez-Pesqueira and Núñez-Farfán 2016; Fig. 1) and flowers (pers. obs.). Moreover, as individuals of Maradol papaya are mainly hermaphrodites, this characteristic can be transmitted to wild plants (pers. obs.) affecting pollination, since hermaphrodite flowers self-pollinate before anthesis (Manshardt et al. 2016). (2) Physiological consequences, such as decreasing the importance of specific environmental conditions for breaking dormancy and seed germination (Paz and Vázquez-Yanes 1998), affecting the presence of the species in natural seed banks. Moreover, in contrast to Maradol plants, wild plants are tolerant of heat deficit and water stress (Estrella-Maldonado et al. 2021), hence gene flow may lessen this feature in wild plants. Similarly, wild plants are more tolerant to antagonistic damage (Pacheco‐Huh et al. 2021), therefore gene flow could increase the vulnerability of wild plants. (3) Ecological consequences, such as modifications to ecological interactions; wild plants interact with a wide range of organisms compared to Maradol plants (Badillo-Montaño et al. 2019; Pacheco-Huh et al. 2021). Therefore, gene flow from domesticated to wild plants could have an impact on the entire chain of ecological interactions, particularly specialized ones. (4) Evolutionary consequences, such as changes in allele frequencies and a tendency to homogenize genetic diversity, reducing diversity in wild units and increasing their vulnerability to environmental changes.

Transgene flow monitoring

Transgenic papaya was released in Hawaii in the late twentieth century (Tecson Mendoza et al. 2008; Silva-Rosales et al. 2010; Manshardt 2014) to combat the disease caused by PRSV. Gene flow events between transgenic varieties of papaya to other varieties of the species, have been reported (Gonsalves et al. 2012; Gonsalves 2014; Manshardt et al. 2016), showing the occurrence of gene flow from the hermaphrodite variety "Rainbow" (transgenic) to other non-hermaphrodite varieties of papaya. Currently, the countries where the importation of transgenic papaya from Hawaii is allowed are the United States, Canada and Japan (Chávez-Pesqueira and Núñez-Farfán 2017). In Mexico, there have been attempts to create and release transgenic papaya (Cabrera-Ponce et al. 1995; de la Fuente et al. 1997; Guzmán-González et al. 2006; Silva-Rosales et al. 2010). However, Mexican farmers have opposed to the introduction of transgenic papaya because they opt to improve their management practices to deal with PRSV (Silva-Rosales et al. 2010). Moreover, without transgenic papaya, Mexico represents the world’s leading exporter of papaya (FAO 2022),

Transgene flow can be assessed with molecular markers to analyze introgression from domesticated to wild relatives, as well as other risks to genetic diversity within the wild-to-domesticated complex (Wegier et al. 2011; Hernández-Terán et al. 2017). Transgene flow is a cause for concern as the goal of CWR conservation is to maintain the processes that favor their continued diversification and potential in new scenarios (Tobón-Niedfeldt et al. 2022), and transgene introduction in wild populations have shown to modify native ecological interactions (Vázquez-Barrios et al. 2021). In the extensive sampling conducted in this research, from commercial plantations, local varieties and wild papaya individuals, transgenes were absent (Table 3). However, if the country's protection goals include maintaining the genetic diversity of the wild-to-domesticated papaya complex without transgenes, legal and accidental releases must be prevented.

Conservation and management for Carica papaya

There is great concern about the conservation of many CWR. It has been estimated that up to 75% of wild species could be under some kind of threat (Dempewolf et al. 2014), mainly due to anthropogenic actions (Sala et al. 2000; Wright 2010; Haddad et al. 2015; Newbold et al. 2015), affecting the ecological processes that promote and maintain the genetic diversity of wild populations. In the case of wild papaya, although it has been recognized as a high priority species for conservation (Castañeda-Álvarez et al. 2016), assessments corresponding to the IUCN red list, NOM-059-SEMARNAT-2001, and other strategies to safeguard evolutionary resilience (Mastretta-Yanes et al. 2018; Tobón-Niedfeldt et al. 2022), are still needed. Our results are of great importance for global and local agrobiodiversity conservation plans, and show the importance of considering the wild-to-domesticated complex of species in conservation strategies. Although the crop's vulnerability is growing due to its low genetic variation, Mexico's cultivated areas and its exports have increased in recent years (FAO 2022). The likelihood that gene flow from domesticated to wild papaya plants will continue, should be addressed with comprehensive measures, governmental tools, awareness of markets and consumers, and considering the communities that still conserve the diversity of native varieties.

The integral conservation of agrobiodiversity depends on various governmental bodies in Mexico, which should be informed of these results in order to improve decision-making. In the same way, local farmers should be able to take better decisions by knowing and valuing the current state of their genetic resources. Crops that remain in their centers of origin, diversity and domestication face complexities that should be evaluated and recognized. Every part of the wild-to-domesticated complex involves a rich and unique evolutionary history that silently offers us future opportunities but just as silently is lost unseen.

For papaya we recommend the constant evaluation of genetic diversity in the wild evolutionary units to detect possible reductions in genetic diversity, as well as a systematic transgene monitoring in the wild-to-domesticated complex, to report any case of transgene introduction. We suggest paying special attention to the wild evolutionary units that remain in areas of high papaya production, since gene flow from plantations is more likely to reduce their genetic diversity.