Introduction

Watermelon, Citrullus lanatus Thunb. Matsum. & Nakai is an important vegetable fruit crop. Total yield in 2009 was over 100 million tons accounting for 10.7% of total vegetable production (FAO Statistics 2009, http://www.fao.org/). China is currently the top watermelon producer with 68.2 million tons in 2009 which was 67.7% of the world production, Due to its economic and nutraceutical importance, watermelon is gaining more popularity throughout the world (Padmavathi et al. 2010). Each year, a large number of varieties are being released and commercialized (Wehner 2002). However, genetic diversity within watermelon cultivars is low. For example, Levi et al. (2001b) found high genetic similarity values (92–99%) among watermelon cultivars, which make them especially vulnerable to attacks by the large numbers of diseases of insect pests of watermelon. Therefore, in China, there are increasing interests from both growers and watermelon breeders in introducing new germplasm from different geographical regions to enhance watermelon genetic diversity.

Watermelon belongs to the genus Citrullus (family Cucurbitaceae) and is the only cultivated species in this genus (Bisognin 2002). It is believed to have originated in Africa (Simmonds 1979) but is now widely spread throughout the world. The primary gene pool for watermelon breeding includes three major forms: C. lanatus var. lanatus (C. l. var. lanatus hereinafter), C. l. var. citroides and C. colocynthis (Jeffrey 2001). Citrullus lanatus var. lanatus includes both wild and cultivated types (Jeffrey 2001). Citrullus lanatus var. citroides, the cow-melons (Tsamma and citron types) are useful as rootstocks for their disease resistances (Dane and Lang 2004; Thies et al. 2010). Citrullus colocynthis, the bitter apple grown for medicinal purposes, is considered a likely ancestor of watermelon (Burkill 1985; Navot and Zamir 1987; Jarret et al. 1997). All of them are valuable genetic resources in watermelon improvement.

Assessment of genetic variation is important for not only watermelon breeding but also efficient management and protection of germplasm resources (Krishna et al. 2004). Meanwhile, cultivar identification is necessary for protecting breeders’ rights. Detection and utilization of the genetic variation and cultivar identification are some important tasks for watermelon breeders.

Traditional evaluation of genetic variation and germplasm management is based on a set of morphological descriptors, which, however, are limited and often influenced by environments. In addition, many modern varieties and hybrids are phenotypically less distinct making morphological evaluation more difficult. It is usually difficult to classify more than 200 accessions at one time based solely on their morphological characteristics. Therefore, the development of reliable methods is necessary to allow for the assessment of genetic diversity and cultivar identification (Naito et al. 2008).

Over the years, a number of molecular marker systems have been used for watermelon genetic diversity and cultivar identification studies including isozymes (Navot and Zamir 1987), random amplification of polymorphic DNA (RAPD) (Williams et al. 1990), amplified fragment length polymorphism (AFLP) (Vos et al. 1995), sequence-related amplified polymorphism (SRAP) (Li and Quiros 2001), and high frequency oligonucleotides targeting active genes (HFO-TAG) (Levi et al. 2010). Analysis using isozymes (Navot and Zamir 1987) and RAPD markers (Levi et al. 2000; Harris et al. 2008) revealed low DNA polymorphism among cultivated watermelons. For example, of 1,309 RAPD markers, only 75 were polymorphic (5.9%) between two watermelon breeding lines (Harris et al. 2008). Other marker systems such as AFLP (Che et al. 2003; Levi et al. 2004), SRAP (Levi et al. 2006; Zhang et al. 2008), EST-PCR (Levi et al. 2009) and HFO-TAG (Levi et al. 2010) produced slightly higher polymorphism among watermelon cultivars than the RAPD markers, but all in general were not breeder friendly. Results from these genetic assays were hardly comparable due to lack of a common core set of reference genotypes and the use of different marker systems. Therefore, it is desirable to develop a more robust, efficient and cost effective marker system that allows integration and amalgamation of data sets gathered at different times and in different laboratories.

Among all marker systems, single nucleotide polymorphism (SNP) (Coryell et al. 1999) is the best marker for marker-based studies. However, at present, the high cost associated with SNP genotyping prevents it from wide use by most watermelon breeding programs. It is especially not practical for small laboratories in developing countries; a gap exists between SNP discovery and reduction to practice (Appleby et al. 2009). At present, a marker of choice is microsatellite or simple sequence repeat (SSR) (Tautz 1989). There are a number of advantages for using SSRs such as their co-dominant and multi-allelic nature, highly reproducibility (Oliveira et al. 2006; Ibrahim et al. 2010), which is very suitable for accession discrimination and assessment of genetic variation (Ebana et al. 2008).

In watermelon, SSRs have recently been used for germplasm characterization (Jarret et al. 1997; Guerra-Sanz 2002; Mujaju et al. 2010), evaluation of genetic diversity (Padmavathi et al. 2010), germplasm management (Zhang et al. 2010), cultivar identification (Levi et al. 2009), and genetic mapping (Levi et al. 2006). For cultivar identification, evaluation of genetic diversity or whole genome genetic mapping studies, full coverage of representative SSRs is needed. In the real world, it is not practical and economic to test hundreds of SSRs (Varshney et al. 2008). Identification of a core set of representative markers with adequate resolving power and genome coverage might be a method of choice for this purpose.

Compared with many other crops (Xie et al. 2010; Huang et al. 2009; Ebana et al. 2008; Federico et al. 2008), watermelon genome-wide SSRs are limited. However, this situation is changing rapidly due to technology development, especially in whole plant genome sequencing. In 2008, The National Engineering Research Center for Vegetables (NERCV) of Beijing, China and its collaborators initiated the International Watermelon Genome Project to sequence the watermelon genome. Using the next-generation Illumina Genome Analyzer (GA) sequencing technology, an elite East Asian ecotype watermelon inbred line 97103 (C. l. var. lanatus) was sequenced and a draft genome assembly is near completion (Xu et al. manuscript in preparation). To explore genetic diversity for watermelon improvement, we also re-sequenced the genomes of 16 additional watermelon genotypes including three plant introduction (PI) lines of C.l. var. citroides, two PIs and ten cultivars of C. l. var. lanatus, and one PI of C. colocynthis. This effort has led to genome-wide discovery of SSRs and SNPs. So far, 3,889,080 SNPs and 13,744 SSRs have been detected. Using these markers, a high-density watermelon genetic map was developed. These resources provide us unprecedented opportunities to explore the watermelon genome for breeding purposes.

The objective of this study was to develop a core set of microsatellite markers that can be used for evaluation of genetic diversities of watermelon collections. We first built a dendrogram of the 17 watermelon genotypes with 3.9 million SNPs. Using this dendrogram as a calibration, we identified a core set of 23 SSRs from 704 SSRs that have been mapped in our high-resolution genetic map. The utility of this core set of SSRs was further demonstrated in 100 commercial cultivars and elite lines.

Materials and methods

Plant materials

Two sets of materials were employed in this study. The first set included 17 Citrullus genotypes with diverse genetic backgrounds and horticultural traits, which have been used in whole genome re-sequencing in the senior author’s lab. The characteristics of these 17 lines are summarized in Table 1. They were chosen as representatives of the watermelon collection with maximum diversity.

Table 1 Characteristics of 17 watermelon lines used for whole genome re-sequencing and genetic diversity analysis

The second set included 100 commercial cultivars and elite lines which were used to assess the potential utility of the core set SSR markers developed in this study. Twenty of these lines were closely related based on their pedigrees. The names and seed sources of these 100 accessions are provided in Table 2.

Table 2 List of watermelon cultivars or breeding lines genotyped by SSR markers

SNP discovery through whole genome re-sequencing in watermelon

We conducted whole genome sequencing and assembly of an elite East Asian ecotype watermelon inbred line 97103 using Illumina GA (Illumina, Inc., San Diego, USA) at Beijing Genome Institute (BGI)-Shenzhen. The whole genome was sequenced at 107.4 × coverage of the 97103 haploid genome (430 Mbp), and the assembly contained 184 N90 scaffolds (Xu et al. manuscript in preparation). This 97103 watermelon draft genome assembly was used as reference for SNP discovery.

An additional 16 genotypes (Table 1) were re-sequenced at 6.3–18.8× depth each using the Illumina GA platform. All high quality (HQ) GA reads of these 16 watermelon lines were aligned against the 97103 draft reference genome using SOAPaligner (Li et al. 2009a) with the following criteria: if an original read could not be aligned to the reference sequence, the first nucleotide at the 5′ end and two nucleotides at the 3′ end were removed and the modified read was realigned to the reference again. If there was no alignment, two more nucleotides at 3′ end were removed from the sequence. This procedure was repeated until alignment was achieved or until the read length was shorter than 27 bp. The average sequencing depth and coverage was calculated using the alignment results.

For SNP detection among the 17 watermelon lines, the following three steps were taken. (1) The computer program SOAPSNP (Li et al. 2009b) was used to calculate the likelihood of each individual’s genotype based on the alignment results. A Bayesian model was applied to calculate the probability of individual’s genotypes with the actual data; (2) All the individual likelihood files were integrated to produce a pseudo-genome for each site in all 17 samples by maximum likelihood estimation (MLE). Sites meeting criteria in copy number, sequencing depth, quality score and minor allele counts were kept for the following rank sum test adjustment. SNPs passing the rank sum test (P > = 0.005) were fixed as members of the HQ SNP set; (3) Base types were allocated back to each individual based on genotypes of HQ SNPs and each individual likelihood file. The genotype with the highest likelihood was chosen as the consensus genotype in each individual. Overall, of the total 64.2 Gb genome DNA sequences generated, 84.4% were aligned to the reference genome. The average sequencing depth was 10.2× and 95.8% of the reference genome was covered by the re-sequenced reads. In total 3,889,080 SNPs were detected across the genomic sequences among the 17 genotypes. The details of resequencing data for the 16 lines are shown in Table 3.

Table 3 Polymorphisms of 16 Citrullus genotypes as compared with the watermelon line 97103 based on whole genome re-sequencing

Strategies to develop a core set of SSR markers in watermelon

To identify a core set of SSR markers that can be used for evaluation of genetic diversity in the majority of watermelon collections, we first developed a dendrogram of 17 re-sequenced watermelon lines based on the nearly 3.9 million SNPs, which was then used as a model for validating the efficacy of SSR markers selected. Bioinformatic analyses of the 97103 draft genome identified 13,744 SSRs, 3,759 insertion and deletion (InDel) polymorphisms, and 584 structure variation (SV) markers. These markers were used for the construction of a high-density watermelon linkage map using 103 F8 recombinant inbred lines (RILs) derived from 97103 × PI 296341-FR (Ren et al. manuscript in preparation). Line 97103 belongs to C.l.var. lanatus and PI 296341-FR is an accession of C.l. var. citroides. The high-resolution map includes 704 SSRs, 219 InDels and 36 SVs spanning ~800 cM. The 704 SSRs were the starting marker set for developing the core set of SSRs. Our objective was to eliminate sequentially the SSR markers with low power of discrimination to get a final combination of markers that can be use for genetic diversity analysis.

The 704 SSRs that were polymorphic between 97103 and PI 296341-FR were further tested for polymorphism between two elite cultivated watermelon lines 97103 and Sugarlee (Crall and Alstrom 1983). 97103 is East Asian ecotype whereas Sugarlee is a North American ecotype. The underlying reasoning for this was that watermelon has a narrow genetic base (Levi et al. 2001a, b). Consequently, markers that were polymorphic between the two cultivated elite lines would likely be more useful for genetic diversity studies for other cultivated lines. As a result, 78 SSRs that were polymorphic between 97103 and Sugarlee were identified.

Next, a unweighted pair-group method with arithmetic mean (UPGMA) dendrogram (Sneath and Sokal 1973) was built using the 78 SSRs for the 17 watermelon genotypes (Table 1) to evaluate their ability to differentiate these watermelon lines. Since the 78 SSRs provided the same resolution in resolving these 17 lines, they were used as candidates for selection of the core SSR set. Dendrograms were built by sequentially decreasing the number and varying the combinations of SSRs. A final core set was determined based on the following criteria: (1) The SSR-based UPGMA dendrogram resembled the SNP-based dendrogram; (2) Polymorphic information content (PIC) value was over 0.45; (3) all SSRs evenly distributed across the watermelon genome with at least one marker from each linkage group; (4) easy to use in PCR assay (clear PCR product under standard reaction conditions); (5) each SSR detected a single locus. Eventually, 23 SSRs meeting these criteria were selected. Information (marker name, primer sequences, PIC) of all 23 SSR markers is listed in Table 4. Lastly, the utility of this core set of 23 SSRs was validated in 100 commercial cultivars and elite lines.

Table 4 Information of 23 core set SSR markers for watermelon genetic diversity analysis

SSR marker analysis

All tested watermelon lines were grown in the greenhouse at 25/18°C day/night temperatures. For each line, young tender leaves were collected from three 15-day old seedling plants and pooled, and stored at −80°C for DNA isolation following Levi and Thomas (1999).

Each 15 μl PCR reaction mixture contained 20 ng template DNA, 4 μM each of the left and right primers, 2.5 mM MgCl2, 2 mM each of dNTPs, 0.5 U Taq DNA polymerase in 1× PCR buffer (Takara Company, China). The PCR reaction started with 94°C for 5 min followed by 35 cycles of 94°C for 20 s, 55°C for 20 s, and 72°C for 90 s with a final extension at 72°C for 8 min. The PCR products were analyzed using 6% polyacrylamide gel electrophoresis in 1× TBE buffer. The gel was stained with silver staining using the SILVER SEQUENCE DNA Sequencing System (Promega, Madison, WI, USA).

Data analysis

For genetic diversity analysis of SNP data, individual SNPs were used to calculate genetic distances between watermelon lines. The p-distance between two individuals i and j (Tamura et al. 2004) was defined as follows.

$$ {\text{D}}_{\text{ij}} { = }\frac{ 1}{\text{L}}\sum\limits_{\text{i = 1}}^{\text{L}} {{\text{d}}_{\text{ij}}^{\text{I}} } $$

where L is the length of the region in which HQ SNPs were identified.For A/C alleles at position l,

$$ {\text{d}}_{\text{ij}}^{ 1} { = }\left\{ {\begin{array}{*{20}c} { 0 ,} \\ { 0. 5 ,} \\ { 0. 5 ,} \\ { 1 ,} \\ \end{array} \begin{array}{*{20}c} {{\text{if}}\;{\text{genotypes}}\;{\text{of}}\;{\text{the}}\;{\text{two}}\;{\text{individuals}}\;{\text{are}}\;{\text{AA}}\;{\text{and}}\;{\text{AA}}} \\ {{\text{if}}\;{\text{genotypes}}\;{\text{of}}\;{\text{the}}\;{\text{two}}\;{\text{individuals}}\;{\text{are}}\;{\text{AA}}\;{\text{and}}\;{\text{AC}}} \\ {{\text{if}}\;{\text{genotypes}}\;{\text{of}}\;{\text{the}}\;{\text{two}}\;{\text{individuals}}\;{\text{are}}\;{\text{AC}}\;{\text{and}}\;{\text{AC}}} \\ {{\text{if}}\;{\text{genotypes}}\;{\text{of}}\;{\text{the}}\;{\text{two}}\;{\text{individuals}}\;{\text{are}}\;{\text{AA}}\;{\text{and}}\;{\text{CC}}} \\ \end{array} } \right. $$

The neighbor-joining method (Saitou and Nei 1987) was then used to construct the dendrogram on the basis of the distance matrix calculated by the software MEGA5 (Tamura et al. 2011). Bootstrap values were calculated in 1,000 replicates.

Polymorphic SSR markers were scored for the presence or absence of the corresponding bands among the tested accessions. Stutter and background bands were excluded. The scores ‘1’ and ‘0’ indicated the presence and absence of the bands, respectively. The polymorphism information content (PIC) of each marker was calculated according to Smith et al. (1997) as follows: PIC = 1−ΣP 2i , where Pi is the band frequency of the ith allele.

Genetic similarities between the 17 re-sequenced genotypes were calculated using Nei and Li’s coefficients index (Nei 1977) with the Freetree software (Pavlicek et al. 1999). The dendrogram was constructed with the same software using UPGMA. The robustness of nodes was inferred by bootstrap analysis of 1,000 replicates.

The dendrogram of all 117 watermelon lines based on 23 SSR markers was constructed using software, Powermarker v3.0 (Liu and Muse 2005; http://statgen.ncsu.edu/powermarker/), and visualized with Dendroscope (Huson et al. 2007; http://www.dendroscope.org).

Results and discussion

Polymorphisms among 17 Citrullus genotypes based on whole genome re-sequencing

The whole genomes of 16 watermelon lines were re-sequenced with the Illumina GA platform with 6.3–18.8× genome coverage. Alignment of the these genome sequences with the 97103 reference draft genome revealed DNA-level sequence polymorphisms which included SNPs, Indels and other structural changes. Details of polymorphisms of each genotype as compared with 97103 are summarized in Table 3. Among these re-sequenced lines, JX-2 and 97103 had the lowest polymorphism (46,790); whereas PI 482276 and 97103 had the highest (186244, Table 3). In total 3,889,080 SNPs were detected across the genomic sequences among the 17 genotypes.

SNP-based genetic diversity among 17 Citrullus genotypes

Based on 3,889,080 SNPs across 17 re-sequenced watermelon genotypes, a UPGMA dendrogram was built (Fig. 1a), which clearly separated 17 Citrullus genotypes into two branches. The first branch was C. colocynthis, and the second one belonged to C. lanatus, which further split into two sister clades of var. citroides and var. lanatus. The var. lanatus subclade had three clusters. The two wild accessions (PI 595203, PI 482271) formed the first and the second cluster, respectively. The third cluster had the cultivated types, which had two sub clusters: the East Asian ecotype and the American ecotype. The East Asian ecotype sub-cluster included 97103, JX-2, JLM, JXF, RZ-901 and XHBFGM, while the American ecotype sub cluster was composed of Black Diamond, Calhoun Gray, Sugarlee, RZ-900 and Sy-904304 (Fig. 1a).

Fig. 1
figure 1

Dendrograms of 17 watermelon lines based on 3,889,080 SNP loci (a), 78 SSR markers (b), and 23 core set SSR markers (c). Significant bootstrap value after re-sampling data for 1,000 times is shown at each node of the dendrogram

This SNP-based dendrogram was consistent with previous classifications of these 17 lines based on morphological and parentage data (Wehner 1999a, b, 2002). For example, Calhoun Gray, Black Diamond and Sugarlee all belong to the American ecotype although they have different fruit characteristics. Calhoun Gray, derived from Calhoun Sweet × Charleston Gray, produces elongated fruit resembling to that of Charleston Gray. Black Diamond produces globular fruit with dark green skin which is similar to that of Calhoun Sweet. Calhoun Gray and Black Diamond were closely clustered likely due to their common parent Calhoun Sweet in the pedigrees. Calhoun Gray and Sugarlee (derived from Texas W5, Summit, Charleston Gray, Fairfax, Crimson Sweet, and Graybell) also shared a common parentage with Charleston Gray. These three cultivars of the American ecotype showed close genetic similarity based on SNP clustering (Fig. 1a). Similarly, cultivars of East Asian ecotype 97103 and JX-2 appeared to be closely related. Both were derived from same parent with mid-sized globular fruit and pink flesh. These results also showed that the cultivated watermelons are genetically distant from C. l. var. citroides and C. colocynthis. The genetic distance between C. lanatus and C. colocynthis was greater than that between C. l. var. citroides and C. colocynthis which was consistent with early findings (Levi et al. 2001a).

Evaluation of genetic diversity among 17 Citrullus genotypes using 78 SSRs

From our whole genome mapping effort (Ren et al. manuscript in preparation), we identified 704 polymorphism SSRs between two parental lines, 97103 of C.l. var. lanatus and PI 296341-FR of C.l. var. citroides. These 704 SSRs were subjected to polymorphism screening between two elite cultivated watermelon lines 97103 (East Asian ecotype) and Sugarlee (American ecotype), and 78 polymorphic SSRs were identified.

Genetic diversity was evaluated with the 78 SSRs among the 17 Citrullus genotypes with whole genome re-sequencing data (Table 1). In total 285 alleles were detected ranging from 2 to 7 alleles and an average of 3.7 alleles per locus. The PIC value ranged between 0.11 and 0.82 with an average of 0.60.

Of the 285 alleles, 57, 22 and 32 were unique to var. lanatus, var. citroides and C. colocynthis, respectively. Forty-eight alleles were shared between var. lanatus and var. citroides; 22 shared between var. lanatus and C. colocynthis and 21 between C. colocynthis and var. citroides. Of the 57 polymorphic alleles in var. lanatus, 10 presented in the Citrullus wild types and 47 in the cultivated watermelons. Among the 47 cultivar-specific alleles, there were no polymorphic alleles that only existed in East Asian ecotype lines or only in American ecotype lines suggesting a narrow genetic background of cultivated watermelons.

Cluster analysis using UPGMA based on similarity coefficients confirmed the genotypic relationships among the 17 Citrullus lines (Fig. 1b). This 78 SSR-based UPGMA dendrogram was the same as the one developed from 3.9 million SNPs suggesting that these SSRs were highly representative of the watermelon genome, had high resolving power and were as effective for genetic analysis as the original 3.9 million SNPs. (Fig. 1b).

Development and validation of a core set watermelon SSRs for genetic diversity analysis

Using the procedure and criteria described early (“Materials and methods” section) we were able to identify 23 SSRs as the core set for genetic diversity analysis in watermelon. Detailed information of the 23 SSR markers is summarized in Table 4. The 23 SSRs detected 97 alleles among the 17 watermelon lines (Table 1) ranging from 2 to 7 alleles per marker and an average of 4.2 alleles per locus. The PIC value ranged from 0.45 to 0.82 with the average of 0.66. These markers were largely evenly distributed across the genome. Each linkage group had at least two markers except Linkage Group 3 with only one marker (Table 4).

The 23 SSRs in the core set seem to be highly representative and informative which was evidenced from the following two observations. First, the correlation coefficient between genetic similarity matrices of the core set (23 SSRs) and the base set (78 SSRs) was highly significant (r = 0.96, P < 0.005). Second, the dendrogram based on the core set 23 SSRs among the 17 watermelon lines (Table 1; Fig. 1c) was identical to that created with 78 SSR markers (Fig. 1b), which in turn was largely consistent with the dendrogram based on 3.9 million SNPs. In all three dendrograms (Fig. 1), the majority of the branches were supported by high bootstrapping values.

We attempted to further reduce the number of SSRs in the core set by selecting only one marker from each linkage group (11 in total) with high PIC value (on average 0.66) to construct a UPGMA dendrogram. Although the resulting dendrogram of the 17 genotypes (not shown) was largely consistent with their classification based on morphological and parentage data, Calhoun Gray, the typical American ecotype was placed in a clad with watermelon lines of East Asian ecotype, which was not correct. Bootstrapping values of some nodes were very low. This suggested that the core set of 23 SSR markers was suitable and sufficient to discriminate the 17 Citrullus genotypes at the species level.

The high consistency of clusters based on the three sets of marker data (3.9 million SNPs, 78 SSR base set and 23 SSR core set) clearly support our idea of establishing a smaller set of SSRs that capture the majority of genetic diversity in Citrullus. Of the 11 linkage groups, each LG had at least two SSRs except LG3 which had only one marker (BVWS00048, Table 4). On our high-resolution genetic map, 74 SSR loci were mapped in LG3, but only one SSR showed polymorphism between the two elite lines 97103 and Sugarlee. The reason for this is unknown, and may worth further investigation.

Evaluating the utility of core set SSRs

In order to assess the utility of the core set SSR markers, we conducted clustering analysis with the 23 SSRs on 100 additional commercial cultivars and elite breeding lines of watermelon (Table 2). A UPGMA dendrogram was constructed for all 117 entries which is shown in Fig. 2. The resulting dendrogram had six distinct clusters, which belonged, respectively, to C. colocynthis (blue color), C. l. var. citroides (red color); wild watermelon C. l. var. lanatus (black color); American ecotype or its derivatives of C. l. var. lanatus (25 entries, green color), East Asia ecotype or its derivatives (51 accessions, pink color), and lastly a hybrid or intermediate ecotype (yellow color). The 35 accessions in the hybrid/intermediate ecotype were mostly the offspring of crosses between the American and East Asian ecotypes. This observation suggested that the accessions belonging to the American and East Asian ecotypes appeared to have relatively wider genetic distances from each other.

Fig. 2
figure 2

UPGMA phenogram of 117 watermelon accessions (17 from Table 1 and 100 from Table 2) based on the 23 core set SSR markers. The 117 test accessions were in six distinct clusters. Color codes blue C. colocynthis; red C. lanatus var. citroides; black wild type of C. lanatus var. lanatus; green American ecotype or its derivatives of C. lanatus var. lanatus; yellow the intermediate ecotype; pink East Asian ecotype or its derivatives of C. lanatus var. lanatus. (Color figure online)

Some morphologically very similar watermelon lines could be differentiated with this core set SSR markers. For example, GS1 (97103) and FP3 (98R) had the coefficient of similarity of 90%, but their morphology and growth habits were almost identical. On the other hand, some accessions, such as FP29 (Fu Feng), FP41 (Fu948-2) and FP43 (T2) shared all the alleles detected by the 23 core set SSR markers. These accessions also exhibited similar morphology and growth habits and have been previously speculated to be synonymous. In total, nine such accessions (7.7%) showed no genetic differences detected by this core set SSRs, all of which happened to be elite lines developed by NERCV.

The genetic variations in watermelon between East Asian and American ecotypes were rarely explored previously. SNP and core set SSR data from the present study clearly demonstrated a high level of genetic variation between the two ecotypes. Our SNP analysis revealed that the SNP frequency among lines of East Asian ecotype were at 0.25–0.36 SNP per kilobases (kbp), which was 0.44–0.47/kbp within American ecotype lines (data not shown). The coefficient of similarity among East Asian ecotype lines was 65–90%, which was 63–88% within American ecotype genotypes. These observations indicated that American ecotype watermelon lines had a higher level of genetic diversity than that of the East Asian ecotype ones. This may be explained by the origin of the accessions tested. The American ecotype lines were collected (Zhang et al. 2010) from different places while the East Asian ecotype lines were derived from limited parental materials with high degree of inbreeding over a long period of time. Some pre-selected lines from a few crosses were exclusively clustered with the local ones, possibly related to maintaining the quality and characteristic of the old varieties. The results also indicated the possibility to use the American ecotype watermelons to enhance the genetic diversity in Chinese breeding programs.

Conclusion

From whole genome sequencing and high resolution genetic mapping, we identified a core set of 23 highly informative and representative, evenly distributed SSR markers in watermelon, which should have wide use in germplasm management, variety identification, parental material selection and genetic pool assessment in watermelon breeding.