Introduction

Papaya (Carica papaya L.) is one of the most widely grown fruit plants in the countries of Tropical America (Oliveira et al. 1994). It is a trioecious species; i.e., the plant can be found in three sex forms: female, male, and hermaphrodite (Storey 1953). In papaya, sex is determined by a pair of newly evolved sex chromosomes with the XX genotype in female plants, XY genotype in male plants, and XYh genotype in hermaphrodite plants (Liu et al. 2004).

In an effort to understand the origins of hermaphroditism and the evolution of sex chromosomes, researchers sequenced the hermaphrodite-specific (HSY) region of the Yh chromosome and its counterpart of the X chromosome (Wang et al. 2012) as well as the male-specific (MSY) region of the Y chromosome and concluded that hermaphrodite plants (XYh) are likely the result of domestication of a wild dioecious population in the North Pacific region of Costa Rica (Van Buren et al. 2015). The HSY and MSY regions share 99.6% of identity (Van Buren et al. 2015), and the distinct regions (0.4%) are responsible for the traits that distinguish a male from a hermaphrodite plant; i.e., these regions have genes responsible for carpel abortion and development of long peduncles in the male inflorescence (Liao et al. 2017).

Identifying the sex of papaya plants in the seedling stage is a step of extreme importance, as it will allow for earlier fruit production and economy of financial and natural resources such as water and nutrients. Because sex identification and sexing are performed only after flowering, at approximately 90 days post-transplanting, three to four seedlings must be planted per hole to attain a higher number of hermaphrodite plants and consequently higher yields per planted area.

Aiming at an early sex identification of papaya plants, markers specific for the determination of sex were developed and used, yielding positive results, whether they be the SCAR (sequence-characterized amplified region) (Parasnis et al. 2000; Deputy et al. 2002; Urasaki et al. 2002; Chaves-Bedoya and Nunez 2007), RAPD (random amplified polymorphic DNA) (Sondur et al. 1996; Deputy et al. 2002; Lemos et al. 2002; Urasaki et al. 2002), SSR (simple sequence repeats) (Parasnis et al. 1999; Santos et al.2003; Eustice et al. 2008; Na et al. 2012), or AFLP (amplified fragment length polymorphism) (Ma et al. 2004) types.

Studies on sex determination in stages prior to flowering are very relevant for papaya breeding programs in that they allow for an earlier identification of the desirable sex types. In this regard, hermaphrodite plants are most desired because of the great demand for commercial crops, as they are higher yielding, and also because of the consumer preference for pyriform fruits produced from hermaphrodite flowers (Liao et al. 2017). Moreover, hermaphrodite plants carry the advantage of having perfect flowers, which makes self-fertilization possible for the development of lines and production of stable hybrids.

By contrast, in dioecious genotypes, self-pollination and the generation of lines are not possible, given the nature of their reproductive system, where allogamy necessarily occurs. As a consequence, dioecious plants have a high level of heterozygosis, which culminates in highly heterogeneous hybrids (Silva et al. 2007a). However, dioecious populations are known to not be commonly few or nothing improved (especially in Brazil) and genotypes of these populations may carry genes of interest to the breeder, e.g., genes that confer adaptation, rusticity, and resistance to biotic and abiotic stress. Therefore, the sex reversal of a dioecious genotype to a hermaphrodite featuring good morpho-agronomic attributes brings great advantage and celerity to breeding programs, because in addition to allowing for the broadening of the genetic base and introgression of genes of interest, they also enable the self-fertilization of the material and the production of stable lines and hybrids.

The occurrence of seasonal floral sex reversal is common in the papaya crop, where male plants may have hermaphrodite flowers and vice versa, whereas female flowers are considered stable (Storey, 1958). These floral reversals, known as ‘summer sterility’ (XYh → XY) or ‘male papaya’ (translated from the Portuguese mamão macho) (XY → XYh) (Damasceno Júnior et al. 2008), are attributed to genetic and environmental factors (Hofmeyr 1939; Awada 1958; Arkle Junior and Nakasone 1984; Silva et al. 2007b; Damasceno Júnior et al. 2008), besides epigenetic alterations (Negrutiu et al. 2001; Vyskot and Hobza 2004; Aryal and Ming 2014; Liao et al. 2017). Nevertheless, these reversals are seasonal and do not change the genetic constitution of the plant.

Though at a low frequency, natural sex reversal in a male plant to hermaphrodite is a spontaneous phenomenon (Liao et al. 2017), as was seen in the dioecious genotype Cariflora (Ramos et al. 2011), for which only genetic material of the male and female sexes were known then. The occurrence of natural sex reversal provides breeders immediately with material that would take several generations to be obtained via introgression of the allele responsible for hermaphroditism, in breeding programs (Silva et al. 2007a; Ramos et al. 2014).

Recently, two locus-specific molecular markers (PMSM1 and PMSM2) were developed based on the 8.396 pb sequence of a specific retrotransposon from the MSY region capable of detecting the occurrence of sex reversal in a male plant to hermaphrodite (Liao et al. 2017). These two loci are amplified only in male plants and hermaphrodites originating from sex reversal (XY to XYh), with no amplification occurring in female or hermaphrodite plants. These loci were validated in eleven gynodioecious varieties and four dioecious varieties, featuring efficient distinction between gynodioecious and dioecious cultivars in addition to a clear identification of male-to-hermaphrodite sex-reversed plants.

Half-sib progeny of papaya were evaluated as to their resistance to citrus black spot and phoma leaf spot and showed to be promising by virtue of their resistance to both incidence and severity of the former disease (Vivas et al. 2012, 2014). These dioecious progeny were introduced in the recurrent selection program as a source of resistance-conferring alleles. Amidst these progeny, a hermaphrodite genotype was identified in M3 among the female plants (named M3 hermaphrodite), raising questions regarding its origin. The confirmation of a hermaphrodite genotype originating from this resistant progeny would be of great importance to breeding programs, as it would enable the development of lineages and their use in the development of agronomically superior hybrids carrying alleles for resistance to the main diseases attacking the papaya crop. In this sense, SSR markers from the HSY and autosomal regions as well as locus-specific molecular markers were used in the present study to investigate the origin of a hermaphrodite plant (M3) identified in a dioecious progeny, determining whether sex reversal or contamination occurred.

Results

The genetic-diversity parameters observed heterozygosity (Ho) and inbreeding coefficient (f) were estimated for the 15 C. papaya genotypes (females and hermaphrodites) based on the 15 loci from the HSY region, 15 autosomal loci, and for both regions together (Table 1). For the loci from the HSY region, a total of 40 alleles were generated, with an average of 2.67 alleles per locus; Ho ranged from 0.053, averaging 0.23; and f varied from 0.047 to 1, averaging 0.57. In the autosomal region, 42 alleles were generated, with an average of 2.8 alleles per locus; Ho varied from 0 to 0.47, averaging 0.29; and f ranged from 0.15 to 1, averaging 0.48. In the analysis of both regions, i.e., the 30 loci together, 43 alleles were observed, with an average of 2.87 alleles per locus; Ho varied from 0.03 to 0.47, averaging 0.26; and f ranged from 0.18 to 0.94, averaging 0.52.

Table 1 Observed heterozygosity (Ho) and inbreeding coefficient (f) estimated for the 15 C. papaya genotypes (females and hermaphrodites) from the analysis of 15 loci from the HSY region, 15 loci from the autosomal region, and the 30 loci together

The clustering of loci from the HSY region (Fig. 1A) reveals that group I contained the genotypes Criola de Costa Rica and Golden. Group II was composed of genotype Maradol Mexican Origin, and group III was formed by genotype Hermaphrodite M3, most part of the genotypes, including all genotypes of the dioecious progeny in addition to four controls (two dioecious genotypes (Cariflora and Cimarron) and two hermaphrodite genotypes (Maradol GL and Sekati).

Fig. 1
figure 1

Dendrogram of genetic dissimilarity among 15 C. papaya genotypes (famales and hermaphrodites) from the germoplasm bank of UENF/Caliman obtained by the UPGMA clustering method. A- Based on the analysis of 15 microsatellite loci from the HSY region (cophenetic correlation coefficient: 0.73; stress: 19.46; distortion: 3.78); B- Based on the analysis of 15 autosomal microsatellite loci (cophenetic correlation coefficient: 0.73; stress: 18.34; distortion: 3.36)

In the cluster analysis of the autosomal loci (Fig. 1B), the genotypes Cimarron, Sekati, Maradol GL, Maradol Mexican Origin, and Hermaphrodite M3 formed group I, the genotypes of the dioecious progeny formed group II; the Criola de Costa Rica, Cariflora and Golden composed group III.

When the loci from the HSY region were analyzed together with the autosomal loci (Fig. 2), group I was formed by genotypes Criola de Costa Rica and cultivar ‘Golden’. Group II had most part of the genotypes, namely all genotypes of the dioecious progeny (M 3-7, M 3-4, M 3-8, M 3-5, and M 3-9 plus dioecious genotype Cariflora). Lastly, group III comprised hermaphrodite genotype M3, strongly linked to Sekati, Maradol GL, Maradol MX and Cimarron (dioecious control).

Fig. 2
figure 2

Dendrogram of genetic dissimilarity among 15 C. papaya genotypes from the germoplasm bank of UENF/Caliman obtained by the UPGMA clustering method, based on the 30 microsatellite loci. (Cophenetic correlation coefficient = 0.81; Distortion = 1.95; Stress = 13.98)

The genotypes were also evaluated by a scatter-plot analysis based on the principal coordinate analysis (PCoA), as shown in Fig. 3. Together, the first two coordinates explained 50.05% of the total data variation, with 20.93% of this variation explained by coordinate 1 and 36.19% by coordinate 2. Quadrants I and II contain the hermaphrodite genotype under investigation, all hermaphrodite controls, and dioecious controls Cimaron and Criola de Costa Rica, whereas quadrants III and IV hold all genotypes of the dioecious progeny and dioecious genotype Cariflora, used as control.

Fig. 3
figure 3

Distance projection onto the two-dimensional plane using two coordinates among 15 C. papaya genotypes (females and hermaphrodites), considering the 30 loci analyzed

Figure 4 illustrates the amplification patterns obtained for the 16 genotypes analyzed using the PMSM1 specific marker; locus 71E as a positive control that amplifies an autosomal fragment in the three sex types; and the W11 male/hermaphrodite-specific marker, which amplifies a fragment in male and hermaphrodite plants, but not in female plants.

Fig. 4
figure 4

Amplification of male-specific fragments using the PMSM1 marker. Marker 71E was used as a positive control that amplifies an autosomal fragment of all three sex types. Specific marker W11 amplifies a fragment of male and hermaphrodite plants, but not female plants. M: marker 100 pb; 1: M3-5; 2: M3-7; 3: M3-8; 4: M3-4; 5: Hermaphrodite M3; 6: M3-3; 7: M3-2; 8: M3-9; 9: Cariflora; 10: Cimarron; 11: Maradol_MX; 12: Sekati; 13: Maradol_GL; 14: Criola de Costa Rica; 15: Golden; 16: male)

As can be observed in the electrophoretic profile of the obtained fragments (Fig. 4), all genotypes showed amplification for autosomal marker 71E, used as a positive control. As expected, the male/hermaphrodite-specific marker (W11) had no amplification for the female genotypes, only for the hermaphrodite and male genotypes. However, zero amplification was observed for hermaphrodite genotype Maradol GL, indicating the presence of a false-negative plant (plant 13). For the PMSM1 marker, amplification was detected only in the male genotypes, suggesting that the investigated hermaphrodite material M3 (well 5 - red) did not undergo sex reversal from a male to a hermaphrodite plant.

Discussion

In the analysis of descriptive parameters of genetic diversity for the HSY region, no amplification was expected from SSR loci from the HSY region for the female genotypes. However, the markers are located at borders A and B, which are common regions for the X and Yh chromosomes. In other words, they have similar nucleotide sequences (Na et al. 2012), which explains the high inbreeding coefficient observed in the hermaphrodite genotypes despite the differences between X and Yh chromosomes.

With respect to the autosomal region, low inbreeding and higher heterozygosity were observed in the dioecious genotypes, corroborating the results reported by Ramos et al. (2014) after analyzing this parameter in hermaphrodite and dioecious papaya genotypes. It should be stressed that hermaphrodite M3 exhibited an amplification profile similar to the dioecious material in the analysis of autosomal loci, which confirms the hypothesis that genomic alterations might have occurred in the sex chromosome as a result of sex reversal. Genotypes Golden and Maradol MX showed an inbreeding coefficient equal to one, as was observed in lineages in the experiment led by Oliveira et al. (2010). Considering the analysis of the 30 loci together, hermaphrodite M3 showed low inbreeding and higher heterozygosity, and the same was observed for dioecious control Cimarron.

Based on the cluster analysis, considering the hypothesis of occurrence of sex reversal, greater genetic proximity between hermaphrodite genotype M3 and the hermaphrodite controls was expected for the loci from the HSY region, since studies have shown that this region contains genes controlling the determination of hermaphroditism and maleness (the latter in the counterpart of the Y chromosome—MSY region). In the analysis of autosomal loci, still under the hypothesis of sex reversal, a shorter distance would be expected between hermaphrodite genotype M3 and the dioecious genotypes, considering that gene modifications or chromosomal rearrangements in those regions would likely not result in alterations of sex type, because the genes involved in this trait are located in specific regions of the sex chromosomes. On the other hand, under the hypothesis of pollen contamination, similarity would be expected between hermaphrodite genotype M3 and both the hermaphrodite and the females genotypes from dioecious progeny, with M3 clustering between the dioecious progeny and the hermaphrodite control, given the contribution of both gametes for embryo formation and consequently the formation of the plant. Therefore, hermaphrodite M3 being grouped near the dioecious genotypes when the loci from the HSY region were analyzed and near the hermaphrodites in the autosomal regions was not an expected pattern for either hypothesis (sex reversal or pollen contamination).

Despite being dioecious materials, genotypes Criola de Costa Rica and Cimarron were exceptions in the analysis of distance projection onto the two-dimensional plane using two coordinates, because the analyzed genomic regions present greater genetic similarity between these two genotypes and the hermaphrodite controls. This is explained by the greater proximity seen when the genetic distances were plotted on the plane, corroborating the clustering (Fig. 2) observed for the 30 loci.

The cluster analysis of the genotypes was not clearly conclusive as to whether hermaphrodite genotype M3 originated from sex reversal or contamination. Although there was evidence of occurrence of pollen contamination when the 30 loci were analyzed together, the expected results for this hypothesis were not observed when the loci from the HSY region were analyzed separately from the autosomal loci. Thus, the use of locus-specific marker PMSM1 in the second stage of the study is justified, given its specificity in the identification of male-to-hermaphrodite sex-reversed mutants, as proposed by Liao et al. (2017).

As reported by Liao et al. (2017), marker PMSM1 was designed based on the 8396 pb male-specific insertion sequence in the MSY region and thus did not amplify in female or hermaphrodite plants. This marker is effective in identifying male-to-hermaphrodite sex-reversed plants, although there is only a 0.4% difference between the HSY and MSY regions. Thus, a lack of amplification in hermaphrodite M3 (well 5—red) was observed for marker PMSM1, indicating no mutation or natural sex reversal occurred, but rather a probable contamination. There was a possible pollen contamination from genotype Maradol (used as control in the assay for evaluation of resistance to phoma-spot and black spot), from which the dioecious progeny evaluated in the present study was selected (Vivas et al. 2012).

The lack of amplification observed for marker W11 in genotype Maradol GL, a hermaphrodite control (Fig. 4, well 13), was a false-negative result; i.e., absence of bands in a knowingly hermaphrodite genotype. Similar findings were reported by Deputy et al. (2002) and Oliveira et al. (2007), who observed the occurrence of false-negative results in studies with marker W11.

Molecular analysis revealed a probable pollen contamination between the resistant dioecious progeny and genotype Maradol, which originated hermaphrodite genotype M3. Although the hypothesis of sex reversal was discarded based on the analysis results, hermaphrodite genotype M3 is being evaluated as to its morpho-agronomic traits that confer resistance to black-spot and phoma-spot under field conditions, because this genotype can be an interesting choice if superiority for agronomic traits and resistance to these diseases is proved. A resistant hermaphrodite genotype can be used in new hybridizations to generate stable hybrids, emerging as an alternative for sustainable production of the papaya crop with lesser environmental impact by virtue of decreased use of fungicides. Further, it will also contribute to fruit quality, since pesticide residues can be a health dangerous when consumed in large quantities (Nunes and Ribeiro 1999), in addition to being an obstacle to the export of papaya (Gomes Oliveira et al. 1994).

Based on the result of molecular analysis performed in the present study, the hermaphrodite genotype identified in the dioecious progeny resistant to phoma-spot and black-spot is not a mutation or the result of natural sex reversal, which suggests a probable pollen contamination. Hermaphrodite genotype M3 is being evaluated as to its morpho-agronomic traits that confer resistance to black-spot and phoma-spot, since a resistant hermaphrodite genotype can be produced to be used in new hybridizations and in the generation of hybrids agronomically superior and stable from the genetic standpoint.

Materials and methods

Plant material and genomic DNA extraction

The dioecious progenies identified by Vivas et al. 2012, 2014) were introduced into the recurrent selection program as a source of alleles that confer resistance to black-spot and phoma-spot. In the field of seed production, which was set up to produce the base population of the recurrent selection program, a hermaphrodite genotype (called hermaphrodite M3) was identified among the female plants of the M3 progeny, which generated doubts about their origin. Searching for the characterization via molecular analysis and investigation on the origin of this hermaphrodite M3 genotype the leaf tissue sample of all plants of the M3 progeny after the flowering stage was collected, maintaining control at the field level through sexual phenotyping.

In the first stage of the study, 15 genotypes (females and hermaphrodites) were subjected to molecular analysis, comprising seven females genotypes originating from a dioecious progeny M3 (M3-2, M3-3, M3-4, M3-5, M3-7, M3-8 e M3-9) resistant to phoma-spot and black-spot (Vivas et al. 2012, 2014); one hermaphrodite genotype (M3; Fig. 5) identified in the said progeny; three control females genotypes (Cariflora, Cimarron, Criola de Costa Rica) and four hermaphrodites control genotypes (Sekati, Maradol Mexican Origin, Maradol GL and ‘Golden’). In the second stage of the assay, one male genotype of the dioecious progeny (M3) was included as control. All genotypes belong to the active germplasm bank (AGP) of the papaya breeding program of UENF/Caliman, kept by the company Agrícola S.A., located in the municipality of Linhares - ES, Brazil (19°06′–19°18′S and 39°45′–40°19′W).

Fig. 5
figure 5

Hermaphrodite plant M3 evaluated using molecular markers (genotype belonging to the germoplasm bank of UENF/Caliman, kept by the company Caliman Agrícola S.A. (located in Linhares—ES, Brazil)

For genomic DNA extraction, leaf samples from the 16 genotypes were collected at the germoplasm bank and placed in foil envelopes that were identified, immersed and transported in liquid N2, and stored in an ultrafreezer at a temperature of − 80 °C. DNA extraction was performed following the CTAB protocol (Doyle and Doyle 1987), with modifications suggested by Daher et al. (2002).

To evaluate the concentration and quality of DNA, samples were stained with a mixture of Blue Juice and Gel Red at the 1:1 ratio and evaluated on 1% agarose gel, using the lambda (λ) quantifier at the concentration of 100 ng/μL. Images were captured by the MiniBis Pro gel documentation system (Bio Imaging Systems). Next, samples were quantified by Gel quant software to estimate the quantity of DNA obtained. Subsequently, working samples were standardized to a concentration of 5 ng/μL and kept at − 20 °C.

Amplification of microsatellite primers and locus-specific markers

Sixty-six primers were initially screened and, of these, 30 were selected (Table 2) for amplification reactions for exhibiting polymorphism, consisting of 15 SSR markers from the HSY region (Yh chromosome) and 15 SSR markers covering autosomal regions of the papaya genome, previously reported by Na et al. (2012) and Eustice et al. (2008), respectively. Reactions were performed in a final volume of 13 µL containing 1× Taq Buffer (NH4)2SO4, 0.2 mM dNTP, 1.9 mM MgCl2, 0.2 mM primer (F + R), 10 ng DNA, and 0.6 U Taq DNA polymerase.

Table 2 List of the polymorphic loci used in the molecular analysis of the genotypes under investigation, with respective sequences, linkage groups (LG), and annealing temperatures (AT)

In the second stage, we used male-specific marker PMSM1, which amplifies 585 pb fragments in male plants and XY-to-XYh sex-reversed plants (Liao et al. 2017); male/hermaphrodite-specific marker W11, with 792 pb (Deputy et al. 2002); and 1000 pb autosomal DNA marker 71E, used as control for amplifying in all sex types (Table 2). Multiplex PCR reactions were carried out in a final volume of 13 µL containing 1× Taq Buffer (NH4)2SO4, 0.2 mM dNTP, 1.9 mM MgCl2, 0.2 mM of each primer (F + R), 10 ng DNA, and 0.6 U Taq DNA polymerase.

DNA amplifications were performed in a gradient thermocycler (Applied Biosystems) according to the following program: one cycle at 95 °C for 5 min for initial DNA denaturation followed by 35 cycles at 94 °C for 1 min for denaturation, 1 min at the ideal temperature (°C) for primer annealing (specific for each primer) (Table 2), and 3 min at 72 °C for primer extension. Subsequently, a 7-min cycle at 72 °C was run for the final extension followed by a reduction to 4 °C at the end of amplification. After being stained with the mixture of Gel Red and Blue Juice (1:1), amplification products were separated by 4% agarose gel electrophoresis, subjected to 0.30 A for a period of 3–5 h. Thereafter, the images were captured by the MiniBis Pro gel documentation system (Bio Imaging Systems). Polymorphic loci were subjected to statistical analyses.

Statistical analysis

Data obtained from the amplification of microsatellite loci were converted to a number code for each allele in the locus. For example, if the locus presented three alleles, the representation was 11, 22 and 33 to homozygous forms (A1A1, A2A2 and A3A3) and 12, 13 and 23 to heterozygous forms (A1A2, A1A3, and A2A3). Which was used to obtain the genetic dissimilarity matrix by the Weighted Index, using Genes software (Cruz 2013). Cluster analysis by the UPGMA (Unweighted Pair Group Method with Arithmetic Mean Analysis) method was performed by R software using the package cluster/rect.hclust (Development Core Team 2011); a two-dimensional scatter plot based on the Principal Coordinate Analysis (PCoA) method was generated using GenAlex 6 software (Peakall and Smouse 2006); and the observed heterozygosity (Ho) and inbreeding coefficient (f) were estimated using PowerMarker software version 3.25 (Liu and Muse 2005).