Introduction

The genus Carica is monotypic and only includes papaya (Carica papaya L.). Papaya belongs to the small family Caricaceae which consists of six genera and 35 species (Badillo 2000). Most of them are diploid (2n = 18) and a relatively small genome of 372 Mb was found in C. papaya (Arumuganathan and Earle 1991; Parasnis et al. 1999; Kim et al. 2002; Liu et al. 2004). There are three distinct types of C. papaya plants: (1) dioecious papayas have male and female flowers on separate trees, (2) gynodioecious papayas bare female flowers on some trees and bisexual (hermaphrodite) flowers on others, and (3) trioiceous papayas have male, female, and hermaphrodite flowers in different plants.

Many landraces and cultivars present hermaphrodite plants, bearing perfect flowers and producing fruits shaped from long-cylindrical to ellipsoidal, which are preferred for commercial production. The economic importance of papaya resides largely in the fruit production, and Brazil is the world's major producer (FAOSTAT 2007).

Papaya exhibits considerable phenotypic variation for many morphological and horticultural traits, including fruit size, fruit shape, flesh color, flavor and sweetness, length of juvenile period, plant stature, stamen carpellody, and carpel abortion (Kim et al. 2002; Ocampo Pérez et al. 2006). Nevertheless, low levels of genetic variation for resistance to major fungal and virus diseases were observed in Carica genus (Nishijima 1994). On the other hand, resistance to several diseases which affect papaya are identified in the Vasconcellea genepool, including resistance to PRSV-P (V. cundinamarcensis, V. candicans, V. stipulata, V. cauliflora e V. quercifolia; Horovitz and Jimenez 1967; Alvizo and Rojkind 1987), Asperisporium Caricae (V. cundinamarcensis; Drew et al. 1998); Fusarium and Meloidogyne (V. weberbaueri; Scheldeman et al. 2003), phytoplasma (V. parviflora; Drew et al. 1998), and Phytophthora (V. goudotiana; Drew et al. 1998).

In recent years, molecular markers have been widely used in practical plant breeding to access the genetic variability available in germplasm banks, manage and develop core collections, target crosses, classify germplasm into interest groups and identify duplicate accessions (Rafalski and Tingey 1993; Manifesto et al. 2001; Martins-Lopes et al. 2007). Additionally, available molecular markers linked to important genes have proved to be useful for early selection of different desirable traits (Deputy et al. 2002; Dillon et al. 2005). These molecular markers have also been used to develop genetic maps, and for analysis of qualitative and quantitative inheritance (Luo et al. 2001; Jansen 2005; Milczarski et al. 2007). Other applications, such as marker-assisted backcrossing can reduce the number of generations needed to obtain a genotype with 98% or 99% genetic similarity to the recurrent parent for a fixed sample size (Liang et al. 2004; Oliveira et al. 2005). The first study of genetic diversity in papaya was made by Tan and Weinheimer (1976) using isozyme. Genetic relationships between papaya and related wild species have been investigated using isozyme (Morshidi 1998), Random Amplification of Polymorphic DNA (RAPD) (Sondur et al. 1996; Stiles et al. 1993; Jobin-Décor et al. 1997), Restriction Fragment Length Polymorphism (RFLP) (Aradhya et al. 1999), Amplified Fragment Length Polymorphism (AFLP) markers (Kim et al. 2002; Van Droogenbroeck et al. 2002), PCR-RFLP (Van Droogenbroeck et al. 2004), SSR (Kyndt et al. 2005; Kyndt et al. 2006) as well as Inter-Simple Sequence Repeats (ISSR) (Carrasco et al. 2009).

A large number of different molecular techniques are at present available and each of them differs in its informational content. Although in principle all types of markers would be suitable for our purpose, microsatellites (Single Simple Repeat (SSR)) are especially useful for diversity studies (Baumung et al. 2004).

Multi-locus approaches, such as RAPD, AFLP, and ISSR, may be convenient but have some technical and analytical drawbacks, such as dominance. Multi-locus data are typically analyzed as pairwise comparison of complex patterns that only have meaning relative to others in the same study, thus results are to a limited extent comparable among studies. By contrast, single-locus markers, such as SSR, are usually characterized by co-dominance and thus are more flexible and supply more robust and comparable data (Brondani et al. 1998; Rallo et al. 2000; Karp 2002).

Despite the present availability of molecular marker techniques, papaya have just recently received molecular attention, especially by development of a large number of SSR markers with different motifs, by data mining analysis from DNA databases (Oliveira et al. 2008a) and by screening of sequence data from bacterial artificial chromosomes (BAC) ends and complementary DNA (Eustice et al. 2008). The utility of SSR as genetic markers to investigate relationships among plants has been clearly established (Zhou et al. 2003). Microsatellites have also been extensively exploited for fingerprinting, phylogenetic studies, genetic and QTL mapping for a wide range of species. The genetic analysis based on SSR made it possible to investigate the occurrence and variability of simple sequence repeats at the whole genome level in germplasm accessions of papaya. The objective of this investigation was to test the suitability of SSR for genomic analysis in C. papaya.

Material and Methods

Plant Material

Thirty papaya accessions and eighteen landraces collected from Muritiba, Bahia-Brazil, were used to screen for SSR polymorphisms (Table 1). These papaya plants were maintained in the Papaya Germplasm Bank (PGM) at Embrapa Mandioca e Fruticultura Tropical (CNPMF), in Cruz das Almas, BA, Brazil. Among the 30 papaya accessions, two were cultivars, 12 were improved (but not released) breeding lines, and 16 were unimproved germplasm. All landraces were unimproved germplasm (Table 1).

Table 1 Germplasm accessions (code CMF) of Carica papaya and landraces (code M) used to analyze levels of microsatellite polymorphism including origin of variety and mating system

DNA Extraction

Young papaya leaves were harvested and stored at −80°C for long-term storage. Genomic DNA was extracted based on the procedure described by Doyle and Doyle (1990). DNA quantification was carried out in an agarose gel (1.0% w/v) by comparing the fluorescent intensity of the sample stained with ethidium bromide (1.0 mg/mL), relative to a dilution series of Lambda DNA (Invitrogen, Carlsbad, CA) as standard of known concentration.

PCR Amplification

A set of 100 SSR primers developed by Oliveira et al. (2008a) was tested for amplification and polymorphism. Each PCR reaction was prepared as follows: 20 ng DNA template, 20 mM Tris-HCl (pH 8.4), 50 mM KCl, 0.3 mM of each primer, 1.5 mM MgCl2, 0.2 mM dNTPs, and 0.5 U Taq DNA Polymerase (Invitrogen, Carlsbad, CA) in a total volume of 20 µL.

PCR cycling consisted of 94°C for 4 min, followed by 35 cycles of 94°C for 40 s, (annealing temperature of 55°C, 60°C, or 62°C according to each SSR primer (Table 2)) for 40 s, and 72°C for 1 min, with a final extension at 72°C for 2 min, on a PTC-100 thermal cycler (MJ Research, Inc., Watertown, MA). After cycling, fragments with size difference shorter than 10 base pairs (bp) were electrophoresed on a 6% (w/v) denaturing polyacrylamide gel in a Hoefer SQ3 DNA sequencer gel electrophoresis unit (Pharmacia Biotech Inc., San Francisco, CA) at 70 W for 2.5 h. The gels were stained with silver nitrate, according to Creste et al. (2001). Previous analysis in polyacrylamide gel showed some primers that produces fragments longer than 10 bp in size difference. These loci were electrophoresed on a 3% agarose 1,000 gel (Invitrogen, Carlsbad, CA) at 130 V for 3.5 h. The 50-bp ladder (New England Biolabs, Inc., Beverly, MA) was used as a molecular weight standard to estimate the size of microsatellite alleles.

Table 2 Characteristics of 81 microsatellite loci developed for Carica papaya

Data Analysis

Genetic variability was measured as allelic richness determined by the total number of the detected alleles and the number of alleles per locus (N A ), observed heterozygosity (H O ), expected heterozygosity (H E ) and polymorphism information content (PIC). We defined rare alleles as those whose individual frequency is lower than 1% in the investigated materials. Common alleles are those that occur with a frequency between 1% to 20%, while those whose frequency is higher than 20% are classified as most frequent alleles. Genetic distances between individuals were estimated by shared allele distance (SAD). All these analysis were carried out using the software POWERMARKER version 3.25 (Liu and Muse 2005). The matrix of genetic distance was used to construct the neighbor-joining tree, using the MEGA 4.1 package (Tamura et al. 2007). To assess confidence in the nodes of a tree, bootstrap values were obtained from 1,000 replicates by re-sampling microsatellite loci.

Results

Polymorphism of Microsatellites

All 100 SSR loci were screened for patterns of amplification using two individuals of the PGM-CNPMF. Eighty-one primer pairs were selected according to strength, clarity of banding patterns, and successfully amplified PCR product of high quality, besides the polymorphic amplification product in the expected size. Characteristics of the 81 primer pairs and optimal conditions for their amplification are given in Table 2. Subsequently, all primer pairs that amplified a specific band were used for the genotyping of 30 germplasm accessions and 18 landraces. In total, 59 primer pairs amplified a polymorphic and easily scorable PCR product, while 22 pairs amplified a monomorphic one.

Considering the 59 SSR loci analyzed in the present study and a total of 48 genotyped individuals obtained from partial outcrossing and selfing germplasm, the SSR markers detected a total of 237 alleles. The least and the most variable loci displayed 2 (CP06, CP09, CP11, CP20, CP22, CP23, CP24, CP27, CP28, CP29, CP33, CP36 and CP38) and 11 (CP16) alleles, respectively (Table 3). The average allele number per locus was 4.02 (monomorphic loci excluded). Table 3 summarizes the locus specific descriptive statistics for the 59 SSR markers.

Table 3 Allelic composition, allele size variation (bp), expected and observed heterozygosity, and polymorphic information content (PIC) of the 59 polymorphic SSR loci in 48 individuals of Carica papaya

Expected heterozygosity was nominally larger than the observed heterozygosity under HWE for all loci, except on CP31 and CP48. Only loci CP21, CP33, and CP38 were found to be in HWE. The observed heterozygosity ranged from 0.00 (CP09, CP14, CP22, CP27, CP47, CP52, CP55, CP57, CP59, CP63, CP68, CP69, CP72, CP73, CP80, CP83, CP89, CP94, CP95, CP97, and CP100) to 0.85 (CP31), and gene diversity (H E ) from 0.08 (CP22) to 0.82 (CP16). The CP16 locus showed the highest polymorphism information content (PIC = 0.81) and the CP22 locus the lowest (PIC = 0.08), average PIC being 0.53.

When looking at SSR classes and motifs, the compound SSRs showed higher allele numbers (average 4.27 per locus) and PIC values (average 0.55 per marker) followed by dinucleotide (average alleles, 4.08 per locus; PIC value average 0.52 per marker) and trinucleotide SSRs (average alleles, 3.2 per locus; PIC value, average 0.47 per marker). Among dinucleotide SSRs, AG/TC or GA/CT repeat motifs exhibited more informativeness (average alleles, 4.56 per locus and PIC value, average 0.59 per marker) as compared with TA/AT repeat motifs (average alleles, 4.0 per locus and PIC value, average 0.50 per marker) and to GT/CA or TG/AC repeat motifs (average alleles, 3.5 per locus and PIC value, average 0.52 per marker) (Table 3).

Allelic Composition

The allelic composition revealed that rare alleles were represented by < 1.3% of the total number of alleles detected. Of the 237 alleles detected, three were rare, 115 common, and 119 most frequent alleles. Rare alleles were detected from CP16, CP30, and CP53. Common alleles were detected at 46 SSR loci, with an average of 1.95 alleles. In contrast, all SSR loci detected 1 to 3 most frequent alleles in the individuals (Table 3).

This study detected unique alleles within groups (Table 3). The germplasm accessions contained the largest number of unique alleles (46), while 14 unique alleles differentiated landraces from the papaya germplasm. Of the total alleles detected, papaya germplasm and landraces shared 74.7% of alleles (177).

Genetic Diversity

Based on the unique DNA fingerprint profiles of each genotype obtained by the polymorphic markers, a dendrogram was constructed to understand the relationships among the germplasm and landraces surveyed. The neighbor-joining cluster analysis based on shared-allele distance could successfully differentiate all the papaya accessions (group A) and landraces (group B) included in this study (Fig. 1) with the high bootstrap value of 95%. Clearly, four subgroups could be seen in germplasm accessions while landraces presented just two.

Fig. 1
figure 1

Neighbor-joining tree based on the shared allele distance of 59 SSR markers using 30 germplasm accessions at PGM-CNPMF (code CMF) and 18 landraces (code M). A1, A2, A3, and A4 are subgroups of the A group clustered by germplasm accessions; B1 and B2 are subgroups of B group that contained only papaya landraces. Distance bootstrap values are given together with subgroups in parenthesis

Subgroup A1 comprised all gynodioecious accessions, being one cultivar (CM128), eight improved germplasm (CMF068, CMF082, CMF108, CMF115, CMF125, CMF138, CMF142 and CMF143) and five unimproved germplasm (CMF054, CMF058, CMF101, CMF102 and CMF129). Subgroup A2 had two improved (CMF008 and CMF038) and three unimproved germplasm (CMF011, CMF017 and CMF188), whereas subgroup A3 was formed by one improved (CMF123) and six unimproved germplasm (CMF134, CMF135, CMF157, CMF165, CMF189 and CMF191). The other cultivars analyzed (CMF024) were clustered with three germplasm accessions in subgroup A4. Subgroups A1, A2, A3 and A4 were clustered with a relatively low bootstrap support value (36%, 25%, 28%, and 30%, respectively), but subgroups B1 and B2, had a moderate bootstrap support value (48% and 38%, respectively).

The genotypes belonging to landraces collected in the State of Bahia (Brazil) were grouped in clusters (B1and B2), with 14 and four individuals, respectively. No specific grouping was observed for the improved and unimproved cultivars, which dispersed in all subgroups.

Discussion

Microsatellite Informativeness

SSR markers have been extensively used for DNA fingerprinting and elucidating genetic relationships within plant species (Paniego et al. 2002; Stajner et al. 2005). The ability to distinguish between closely related genotypes is a function of the high heterozygosity values of SSR markers. The set of SSR markers characterized in this study proved to be useful for a broad genetic analysis of C. papaya.

Previous studies have shown that papaya contains abundant SSRs (Santos et al. 2003; Ocampo Pérez et al. 2006; Eustice et al. 2008; Oliveira et al. 2008a), but the characterization of SSR markers in C. papaya germplasm is not yet well studied. Here, we observed relatively high levels of multiallelism at all 59 SSR loci analyzed. Mean number of alleles per locus (4.02, for 59 loci) and expected heterozygosity (mean of 0.59) were higher than those reported by Ocampo Pérez et al. (2006) who used 26 polymorphic markers, and observed 3.8 alleles per locus and H O and H E values of 0.42 and 0.57, respectively, but intermediate for samples collected in Guadeloupe, Venezuela, Colombia, Barbados and Costa Rica, that showed 6.6 alleles per locus for 15 loci, and H E values between 0.37 and 0.69 (Ocampo Pérez et al. 2007).

The number of alleles per locus reported in our study is most likely a minimum value. The lower number of alleles can be due to the relatively few samples analyzed and to the different mating systems of these genotypes. Half of the accessions were gynodioecious, which is almost exclusively inbreeding and tends to decrease the allele number per locus. The number of alleles should increase when all germplasm accessions of the PGM of CNPMF were sampled.

Except CP11 and CP40, all SSR described here are derived from a genomic sequence and showed considerable polymorphism. This is in agreement with Eustice et al. (2008), who found a relatively high level of polymorphism in the genomic rather than the genic region, using seven papaya accessions (SunUp, Kapoho, 2H94, UH918, Kaek Dum, UH928, and AU9) to screen SSR polymorphisms. As in other crops, the selective pressures from breeding can significantly reduce genetic diversity in the target genes or genic regions, while levels of genetic diversity in the genomic region remain high.

The broad range of observed heterozygosity (0.00 to 0.85) and expected heterozygosity (0.08 to 0.82) result from the broad variation in number of alleles per locus and allele frequency distribution within genotypes. The obtained zero values of H O at twenty-one loci could be explained by the low number of alleles at these loci (2 to 6 alleles) as well as their combination in homozygous state. Loci with smaller numbers of alleles or with a skewed frequency distribution such as CP06, CP09, CP11, CP20, CP22, CP23, CP24, CP27, CP28, CP29, CP33, CP36, and CP38, tend to have lower heterozygosity values and, consequently, lower probability of paternity exclusion when studying natural population.

Ninety-five percent of the SSR markers surveyed for heterozygosity (56 of 59) deviated significantly from Hardy–Weinberg equilibrium. In all cases, the deviation was in the direction of reduced heterozygosity due to inbreeding, presence of null alleles, natural or artificial selection favoring homozygosity for particular loci and population bottlenecks. Such result has been commonly observed in surveys of other species (Carrasco et al. 2009). A possible reason for high levels of inbreeding in C. papaya is the elimination of the male plants in trioic accessions (male, female, and hermaphrodite plants) and self-compatibility that stimulates crossing among related individuals and increases the degree of selfing.

In relation to polymorphism, microsatellites of papaya showed enrichment for di and trinucleotide SSR repeat motifs (Eustice et al. 2008; Oliveira et al. 2008a). Overall, AT/TA is the predominant dinucleotide motif, and AAT/TTA is the predominant trinucleotide motif. The previous surveys carried out on microsatellite abundance in plant genomes have shown AT as the most frequently occurring dinucleotide repeat motif followed by AG/TC and GT/CA (Condit and Hubbel 1991; Powell et al. 1996; Yonemaru et al. 2009). Compound SSRs, mainly from different dinucleotide repeats, and dinucleotide repeats were more informative (more average alleles per locus and PIC). Among dinucleotides, although AT/TA-rich motif is prominent, AG/TC or GA/CT repeat motifs exhibited more informativeness. The same pattern was observed in other species (Ferguson et al. 2004; Moretzsohn et al. 2005).

Diversity Structuring

Our study based on SSR analysis clearly revealed genetic diversity in C. papaya germplasm samples and some landraces cultivated by farmers in the State of Bahia. Neighbor-joining tree broadly separated germplasm accessions from landraces with high bootstrap support (95%). However, clusters from each subgroup were poorly supported by bootstrap values (Fig. 1). A low bootstrap value means that a grouping is sensitive to the combinations of genotypes that are evaluated, implying that more data may alter the grouping.

The main cluster (group A) is composed of germplasm accessions of C. papaya subdivided in four smaller clusters. The smaller one (group B) is clustered only by landraces, which have two subgroups. The grouping of genotypes representing landraces in different clusters is in agreement with their origin, and is important as a first indicative of the genetic background of most of the germplasm of PGM at CNPMF.

The genetic variability in landraces grown by small farmers in the region of Recôncavo of Bahia, where the landraces were obtained, was maintained over generations. This variability is quite different from the one in the PGM, because papaya accessions from the Recôncavo of Bahia has not been deposited in the germplasm bank. The additional genetic variability will be used in breeding programmes to enhance the diversity of breeding populations for selection gains in the future.

Ocampo Pérez et al. (2007) analysed genotypes from Costa Rica, Colombia, Venezuela, Guadeloupe and Antillean islands using SSR markers and PCO analysis, and found that, with few exceptions, they were clustered according to their geographic origin. In our study, although there are few accessions belonging to different countries, no high correlation between the clustering pattern and the geographical location was observed. The Brazilian accessions were clustered in all subgroups. The accessions from South Africa and Hawaii grouped into subgroup A1, Thailand into subgroup A3; whereas accessions from Costa Rica and Malaysia grouped into A2 and A4 subgroup; and Taiwan into A1 and A2 subgroup (Fig. 1).

The 30 germplasm accessions and 18 landraces used for characterization of SSR polymorphism allowed the detection of considerable genetic variability, as showed on six diversity subgroups (Fig. 1). The average similarity based on shared allele distance between accessions was 0.44, which is very similar to the value of 0.48 obtained by Ocampo Pérez et al. (2007), using Dice distance. Previous reports using dominant markers, as RAPD (Stiles et al. 1993) and AFLP (Kim et al. 2002), showed values of 0.78 (Jaccard distance) and 0.88 (Dice distance), respectively. So, the SSRs markers revealed a more important polymorphism than the other nuclear DNA markers used so far in papaya (Ocampo Pérez et al. 2007).

Most of the observed variation occurred among accessions. All genotypes present in A1 and A4 subgroup are gynodioecious, whereas B1 and B2 are dioecious. In addition, the A2 and A3 subgroups are composed of gynodioecious and dioecious genotypes. The gynodioecious tends to show lower genetic variability within accessions than the dioecious, because self-compatibility increases the inbreeding.

Many aspects, such as the breeding system, seed and pollen dispersal, plant longevity and agricultural practices influence the genetic diversity, including the proportion of variation distributed within and between populations (Hamrick and Godt 1996). Kim et al. (2002) analysed the genetic relationships among C. papaya cultivars, breeding lines, unimproved germplasm, and related species using AFLP markers and suggested limited genetic variation in papaya, with smaller genetic diversity within the same gene pools, such as gynodioecious and dioecious cultivars. According to Carrasco et al. (2009), the genetic diversity of Vasconcellea pubescens, a species of the Caricaceae family, was remarkably low using ISSR (Inter Single Simple Repeat). The major genetic diversity was found within groups (65%) when South and the North samples from Chile were analyzed together. In addition, samples from South and the North analyzed separately showed that 82% and 60% of genetic diversity were within groups, respectively.

Efforts are made to maintain the genetic diversity of papaya breeding through the stocks in germplasm banks to study natural variation. Although artificial by their nature and submitted to several forces, including genetic drift and man driven artificial selection, germplasm collections play an important role in the way the banks are structured. In this study, we found a great depth of allelic diversity among landraces and germplasm, because this diversity is not distributed randomly among the genotypes, but rather structured into two groups. Thus, many landraces will be incorporated to PGM at CNPMF, to increase the stored variability for future use in conservation and breeding programs.

SSR in Genetic Studies of C. papaya

SSRs are powerful molecular biology tools that have been used for a wide variety of applications. Although good estimators of population genetic parameters have been presented for dominant markers (Lynch and Milligan 1994; Zhivotovsky 1999), the codominant nature of SSR allows more precise calculations of heterozygosity, Hardy-Weinberg equilibrium, differentiation and gene flow (Lowe et al. 2004). Polymorphisms based on SSR are more powerful to estimate genetic parameters of populations and understand detailed patterns of gene flow and parentage composition (Dow and Ashley 1996; Collevatti et al. 1999; Ren et al. 2008). Besides, SSRs markers are more useful for genetic mapping (Oliveira et al. 2008b; Wang et al. 2009), and ancestry studies (Guilford et al. 1997; Gianfranceschi et al. 1998).

The papaya microsatellite loci published here provide an abundant set of genetic markers for detailed studies of population genetic structure, hybridization among populations, paternity, migration, phylogeography of the genus Carica, creation of linkage and physical maps, and location of genes of interest, particularly those functioning as QTL, encoding important agricultural traits such as disease resistance, yield, fruit type, and fruit size. Specifically, we are now using these markers to understand the genetic variability of 160 papaya germplasm accessions collected from different countries, and that are part of the PGM at CNPMF.