Introduction

Common almond [Prunus dulcis (Mill.) D. A. Webb., Syn. Prunus amygdalus Batsch.] is economically one of the most important nut tree crops for its kernels (seeds). It has a small genome (2n=16, 0.54−-0.67 pg of DNA/2C) (Dickson et al. 1992) which is about twice that of Arabidopsis thaliana (0.30 pg of DNA/2C) (Arumuganathan and Earle 1991). Almond originated from Central Asia where it has probably been cultivated for millennia, and was introduced into Xin-Jiang, Northwest China from Iran around the first century BC (Zhu 1983). Nearly 40 almond cultivars including some important ones, such as ‘Yingzui’, ‘Zhipi’, ‘Shuangguo’, ‘Make’, ‘Wanfeng’, ‘Shuangren’, ‘Hanfeng’ and ‘Kexi’ have been described in the region to date. The study of genetic diversity of existing Chinese almond cultivars is very important to enhance their improvement in Chinese almond breeding programs.

Genomic SSR markers have been developed in almost every cultivated fruit species of Prunus, including apricot (Lopes et al. 2002; Messina et al. 2004), peach (Cipriani et al. 1999; Testolin et al. 2000, 2004; Sosinski et al. 2000; Aranzana et al. 2002, 2003; Bliss et al. 2002; Dirlewanger et al. 2002, 2004; Georgi et al. 2002; Wang et al. 2002), sour cherry (Downey and Iezzoni 2000), Japanese plum (Mnejja et al. 2004), and sweet cherry (Clarke and Tobutt 2003). Among these, peach genomic SSR markers have been successfully used for molecular identification and genetic similarity analysis of genotypes within other species of Prunus like apricot (Hormaza 2002; Zhebentyayeva et al. 2003) and sweet cherry (Dirlewanger et al. 2002; Wunsch and Hormaza 2002; Schueler et al. 2003). Peach genomic SSRs (derived from genomic libraries) (Martinez-Gomez et al. 2003; Testolin et al. 2004) and SSRs derived from almond and peach EST sequences (Xu et al. 2004) have been used in almond genetic studies. However, indels (insertions and deletions) and substitutions at SSR loci produce allelic size homoplasy in A. thaliana (Symonds and Lloyd 2003) and maize (Matsuoka et al. 2002). Therefore, exploration for SSR mutational patterns is important in order to interpret and use SSR data correctly.

In a previous report, we developed EST-SSR markers for a phylogenetic analysis of almond trees from China and the Mediterranean region (Xu et al. 2004). However, only amplifications and allelic size differences on PAGE were recorded; the precise mutational patterns causing allelic size variations of SSR loci have never been investigated. In the current work, genetic diversity of Chinese almond cultivars in comparison to international almond cultivars from other parts of the world was revealed using 16 SSR markers. In addition, alleles from six of the SSR loci were sequenced to document the mutational patterns.

Materials and methods

Plant materials and DNA extraction

The 43 accessions of almond and peach are listed in Table 1. Young leaves were collected in the field two weeks after flowering and transported to the lab on ice and stored at –70°C till use. Genomic DNA was extracted from young leaves using the CTAB method (Xu et al. 2004).

Table 1 Plant materials used in this study

SSR marker selection

Eight single-locus genomic SSR markers, all of which were transferable among the Prunus, and eight single-locus EST-SSR markers, all of which produced well-scored fragments on silver-stained PAGE in almond were selected and used for PCR amplification. The loci BPPCT004, BPPCT007, BPPCT010, BPPCT014 and BPPCT026 (Dirlewanger et al. 2002) were from a CT-enriched genomic library of peach, Pchgms1 (Sosinski et al. 2000) was from a small–insert size genomic library of peach, and Pchgms26 (Wang et al. 2002) and Pchgms31 (Georgi et al. 2002) from a peach large insert BAC library. The loci ASSR4, ASSR17, ASSR46, ASSR54 and ASSR63 derived from almond EST sequences were previously reported (Xu et al. 2004). Three new EST-SSR markers, ASSR70, ASSR71 and ASSR72, were developed in this study and named following the convention of Xu et al. (2004). The locus ASSR70 with a (CT)24 repeat was located in the 5′-UTR region of an almond MADS-box gene sequence (GenBank Acc. No. BU574658), the locus ASSR71 with a (GA)15 repeat and the locus ASSR72 with a (TC)19 repeat were, respectively, located in the 5′-UTR regions of two peach EST sequences (GenBank Acc. No. DQ102369 and BU043090). The forward (F) and reverse (R) PCR primer sequences flanking the three loci were as following: the locus ASSR70 (F=5′-ACTGTTACTGAGCCATGAAGAAGA-3′, R=5′-GAGAAACAAAGAGACCCCAAGAAG-3′), ASSR71 (F=5′-AACTTTGTCTGCCTCTCATCTTA-3′, R=5′-AGCAGCCCATTTCTTCTCTTG-3′) and ASSR72 (F=5′-AAGTGGGATTGGTAGGGAGGAAG-3′, R=5′-CTACGGAGCCAGTTGAGAAAAG-3′) with the expected sizes of 146 bp, 168 bp and 206 bp, and annealing temperatures of 60°C, 55°C and 60°C, respectively.

PCR amplification and detection of fragments

The PCR amplifications and detection of SSR alleles were performed as described by Xu et al. (2004). To assure precision and reproducibility of fragments, DNA samples were amplified and analysed at least twice from each individual.

Cloning and sequencing of SSR alleles

Alleles of 6 of the 16 SSR loci including 2 genomic SSR loci, BPPCT010 with (AG)4GG(AG)10 and BPPCT026 with (AG)8GG(AG)6 (Dirlewanger et al. 2002) , and 4 EST-SSR loci, ASSR4 with (TAAAAA)3 , ASSR17 with (AG)15 , ASSR63 with (CAT)5 and ASSR72 with (TC)15 repeat motifs (Xu et al. 2004) in their originally identified sequences, were cloned and sequenced.

Alleles from the six SSR loci were cut from the dried PAGE gels and used as templates for a new round of PCR amplifications. Each of these alleles was directly cloned into the pGEM®-T Easy Vector (Promega, USA) according to the manufacturer’s instructions, and transformed into Escherichia coli DH5 α cells. DNA sequencing was carried out using CEQ8000™ Genetic Analysis System and Dye Terminator Cycle Sequencing Chemistry Protocol (Beckman Coulter, USA). To obtain reliable sequences, at least three clones per allele were sequenced. The nucleotide sequences were aligned using Clustal W of DNASTAR software (Madison, WI, USA) with manual adjustments.

Data analysis

Number of alleles, observed heterozygosity (H o) and power of discrimination (PD) were calculated for each locus. H o was calculated as the number of genotypes which were heterozygous at a given locus divided by the total number of genotypes surveyed at that locus. PD was calculated as 1-ΣG 2 ij (Kloosterman et al. 1993), where G ij is the frequency of the j th genotype for the i th locus summed across all alleles at that locus. Genetic similarity (GS) between any two pairs of the 38 almond cultivars was calculated from the alleles across the 16 SSR loci using the Jaccard similarity coefficients (Sneath and Sokal 1973). A dendrogram was constructed using data obtained from the 16 SSR loci, 6 of which were already used previously (Xu et al. 2004), with the un-weighted pair group method with arithmetic averages (UPGMA; Sneath and Sokal 1973) on the basis of the similarity coefficients. All these analyses were performed with NTSYS-pc 2.10 software package (Rohlf 2002).

Results

PCR amplification of SSR loci

Due to the diploid constitution of common almond, a maximum of two reproducible alleles on silver-stained PAGE were obtained in all the 38 almond individuals analyzed with the 8 EST-SSR loci, and these loci also yielded good amplification products within the expected size range for the three peach individuals analyzed. Similarly, the eight genomic SSR markers examined all typically gave reproducible amplification with one or two fragments in all the almond individuals, the size ranges of which were similar to those reported for these loci in peach by Sosinski et al. (2000), Dirlewanger et al. (2002) and Georgi et al. (2002). Across all the 16 loci, the allele sizes in almond were different from those of peach. However, alleles present in the two hybrids between peach and almond, ‘Dulcis Pioneer’ and ‘Liquan’, occurred in at least one of the almond or peach individuals.

Size polymorphism and discrimination

The eight EST-SSR markers produced a total of 62 alleles with an average of 7.8 alleles and H o= 0.678 per locus. The eight genomic SSR markers produced a total of 61 alleles with an average of 7.6 alleles and H o= 0.628 per locus (Table 2). Locus ASSR17 was the most informative (12 alleles) among the EST-SSR markers and BPPCT007, BPPCT010 and BPPCT026 were the most informative (11 alleles each) among the genomic SSR markers.

Table 2 No. of alleles, observed heterozygosity (H o) and power of discrimination (PD) of EST- and genomic SSR markers among almond cultivars

The number of genotypes per locus was 13.3 on average for the EST-SSR markers and was 12.9 for the genomic SSR markers. The 7 EST-SSR markers except for ASSR4, and 4 genomic SSR markers (BPPCT007, BPPCT010, BPPCT026, and Pchgms1) had a PD value higher than 0.800 among the 38 cultivars. The average PD value of the 8 EST-SSR markers (0.842) was higher than that of the eight genomic SSR markers (0.833). The four SSR markers, ASSR17, ASSR63, BPPCT010 and Pchms31 could identify all the individuals except for the three pairs of international cultivars, ‘Mission’ and ‘Texas’, ‘Jeffries’ and ‘Non Pareil’, and ‘Ferraduel’ and ‘Ferragñes’.

An average of 6.3 alleles and H o= 0.696 per locus was found among the Chinese cultivars across the 16 SSR loci (Table 2), whereas the international cultivars had 5.6 alleles and H o=0.583.

The Jaccard similarity coefficients for the 16 SSR loci were used to analyze genetic similarities (GS) between all pairs of 38 almond cultivars (data not shown). GS values ranged from 0.083 (‘Wanfeng’ and ‘Dabadan’) to 0.966 (‘Duoguo’ and ‘Zhipi’) in comparisons between every pair of the Chinese cultivars. GS values in comparisons between anyone of the Chinese cultivars and all the international cultivars ranged from 0.069 (‘Wanfeng’ and all the 15 international cultivars) to 0.251 (‘Ayuehunzibadan’ and all the 15 international cultivars) with an average value of 0.174.

UPGMA cluster analysis based on GS values for comparisons among all 43 samples including 38 almond individuals, 2 hybrids between peach and almond, and 3 peach individuals was used to construct a dendrogram (Fig. 1) with the cophenetic value of 0.854, indicating a high level of reliability. The two hybrids between almond and peach appeared genetically more similar to the three peach individuals than to the almond individuals and they together formed an out-group. All the Chinese cultivars clustered together.

Fig. 1
figure 1

UPGMA dendrogram of genetic relationship among Chinese and international cultivars of almond, and two hybrids of peach and almond as well as three peach cultivars based on analysis using eight EST- and eight genomic SSR markers. All names of the plant materials are as indicated in Table 1

Allelic sequence variation of SSR loci

Substantial sequence variation was found in the 117 sequences of 6 SSR loci from almond cultivars (see Supplementary Data). For the locus ASSR4, which was originally interpreted as an imperfect AAAAAT repeat motif from an almond EST sequence BI203129 (Xu et al. 2004), different allelic sequences were produced as a result of changes in the number of a virtually perfect TAAAAA repeat motif and two single base substitutions, A/G and T/C, in the flanking regions. Similarly SSR polymorphism at the locus ASSR63 was observed due to the expansion of the CAT repeat motif in comparison with the reference sequence (GenBank Acc. No. CA854147), and only a single base insertion occurred in the flanking region of an allele from the cultivar ‘Shuangren’.

At the four SSR loci of AG/CT repeat type, ASSR17, ASSR72, BPPCT010 and BPPCT026, a striking feature of allelic variation was the occurrence of insertions, interruptions, substitutions, or composite repeat motifs in the SSR regions. The locus ASSR17 was originally shown to have a perfect AG repeat motif from the almond sequence (GenBank Acc. No. CA853978), whereas DNA sequencing in this work revealed that the locus has nearly continuous TG and AG repeat motifs in 13 of the 20 alleles sequenced. A G/T substitution interrupting the original perfect AG repeat motifs was observed in 7 of the 20 alleles, and the allelic size variation appeared only to be the result of variation in the number of TG and AG repeats.

Similarly, at the locus ASSR72 a perfect TC repeat motif was identified from a peach DNA sequence (GenBank Acc. No. BU043090), and the GA doublet created composite repeat motifs (GATC)1-−3(TC)13−-23 in almond cultivars. Of the 19 alleles sequenced 17 have two GATC repeats, and the expansion from two to three GATC repeats in two alleles of ‘Jeffries’ and ‘Tianrentaobadan’, was simultaneously linked with A/C substitutions in the flanking regions. The only source of size variation at the locus was the change in the number of GATC and CT repeat motifs.

Interruptions in the AG repeat regions were observed at the two loci, BPPCT010 and BPPCT026, which were originally identified in peach genomic sequences (Dirlewanger et al. 2002), such as the GG/GGGG interruptions after the fourth or sixth AG from the 5′end of the repeat region at the locus BPPCT010 and the G/GGG interruptions before the second AG from the 3′end of the repeat region at the locus BPPCT026. All allelic size variation of the two SSR loci was attributed to the changes in the number of motif repeat and the interruptions.

Size homoplasy, i.e. sequence variation that does not cause allelic size change, was observed at the ASSR17, ASSR72, BPPCT010 and BPPCT026 loci. This homoplasy was, however, present only in a small number of the SSR alleles in some almond cultivars, for examples, the ASSR17 alleles of 143 bp in ‘Jianzuihe’ and ‘Ferraduel’ and 149 bp in ‘Baibadan’ and ‘Ayuehunzibadan’, the ASSR72 alleles of 159 bp in ‘Jeffries’ and ‘Wanfeng’ and 163 bp in ‘Supernova’ and ‘Tianrentaobadan’, and the BPPCT026 alleles of 96 bp in ‘Hanfeng’ and ‘Shuangguo’. The BPPCT010 locus displayed more size homoplasy, affecting the alleles of 96, 102, 108, 114 and 120 bp.

Discussion

SSR as efficient markers for genetic diversity analysis of almond

In almond, the utility of SSR markers in the study of genetic diversity has been demonstrated (Martinez-Gomez et al. 2003; Xu et al. 2004). It was further corroborated in this work, three EST-SSR markers, ASSR70, ASSR71 and ASSR72, and seven genomic SSR markers, BPPCT004, BPPCT007, BPPCT010, BPPCT014, BPPCT026, Pchgms26 (F1, R1) and Pchgms31 previously reported in peach, have been newly applied in the analysis of almond genetic diversity.

The peach genomic SSR markers yielded amplification products with the original repeat motif in all the almond individuals analyzed, exhibiting not only a very high level of polymorphism but also efficient discrimination of cultivars (Table 2). The results indicated that the genomic SSR markers previously reported in other species of Prunus could be excellent genetic markers for almond.

Up till now, all the genomic SSR markers together with the EST-SSR markers developed at our laboratory were still insufficient to identify the almond individuals ‘Mission’ and ‘Texas’, ‘Jeffries’ and ‘Non Pareil’, and ‘Ferraduel’ and ‘Ferragñes’ using fragment size determination. ‘Mission’, originally known as ‘Texas Prolific’ or ‘Texas’, was believed to be a seedling of the French cultivar Languedoc 302, and therefore, the two accessions are genetically very similar. It was interesting that the four alleles of ‘Mission’ and ‘Texas’ had different structures in the SSR region of the locus BPPCT026. ‘Jeffries’ is a naturally occurring somaclonal mutant taken from a ‘Non Pareil’ tree, and was identical with ‘Non Pareil’ at all loci in the present study. ‘Ferraduel’ and ‘Ferragñes’ are very closely related, as they were developed from the same breeding program (Socias i Company 1998) and seemed genetically identical in this study. It might be possible to discriminate between ‘Jeffries’ and ‘Non Pareil’ and between ‘Ferraduel’ and ‘Ferragñes’ through DNA sequence analysis of other SSR loci.

Xu et al. (2004) have identified 178 almond and 497 peach SSRs in 1,482 almond and 4,985 peach contigs and singletons assembled from 3,863 almond and 10,185 peach ESTs, respectively, implying that there could still be at least hundreds of SSR markers from the almond and peach ESTs that could be explored for genetic diversity analysis of almond.

Features of SSR loci mutations in almond

Dinucleotide repeats are the most abundant in vertebrates as well as in plants, such as rice, soybean and wheat (Gao et al. 2004). We have reported that AG/CT was the most frequent in almond and peach (Xu et al. 2004). Here we analyzed sequence variations of 117 alleles from six SSR loci, especially of 98 alleles from four loci of AG/CT repeats in almond.

In maize, a large-scale analysis of SSR loci has shown that the allelic divergence is due to the most frequent indels in their flanking regions (Matsuoka et al. 2002) while no insertions or deletions were discovered in the flanking regions of 84 sequenced alleles in A. thaliana (Symonds and Lloyd 2003). Similar invariance in flanking SSR regions has been observed in almond after DNA sequence analysis. Ninety-eight alleles from the four SSR loci with AG/CT repeats, ASSR17, ASSR72, BPPCT010 and BPPCT026 had no indels in the flanking regions. By contrast, the repeat motifs in the four SSR loci showed substantial size variability, involving insertions and occurrences of new motif repeats in addition to varying number of AG/CT repeats, such as insertions of GG/GGGG and G/GGG in most alleles of the loci BPPCT010 and BPPCT026, respectively, and occurrences of new TG and GATC repeat motifs at the loci ASSR17 and ASSR72, respectively.

Symonds and Lloyd (2003) found that interruptions in the repeat regions of most SSR loci were associated with shortening of the original repeat length in A. thaliana. In almond, substitutions in the motif repeat regions were observed in some alleles. Thus, long repeats were split into short ones, such as G/T substitution in the three alleles of the locus ASSR17, and G/C in the three alleles of the locus BPPCT026. Indels in the repeat regions and occurrences of new repeat motifs also reduced the lengths of some of the SSR regions. These changes might derive from mutational forces operating on SSR regions and hindering their further expansion.

Interestingly, expansion of the GATC repeat motif from two to three at the locus ASSR72 occurred together with single point mutations (from A to C) in the flanking region. A similar phenomenon was observed at the locus ASSR4; two point mutations (one from C to A and the other from C to T) in the flanking regions of ASSR4 were found to co-occur with the expansion of two TAAAAA repeats to three. Thus, our results suggested that point mutations in the flanking regions may be related to the birth of a new SSR repeat motif, supporting the model which assumes that a length-independent mutation process operates on short SSR loci (Dieringer and Schlotterer 2003). As indicated at the two composite SSR loci, ASSR17 and ASSR72, we might also conclude that a SSR motif with more repeats should provide an even more efficient substrate for rapid mutation rate in comparison with SSR motifs containing fewer repeats (Symonds and Lloyd 2003).

In this work, homoplasic SSR alleles arose due to base substitutions, interruptions and/or composite repeat motifs; therefore, the actual degree of polymorphism and the genetic relationship in almond would probably be underestimated if based only on SSR allele size variation. Further work is needed to interpret SSR data correctly and to understand the molecular mechanism underlying the SSR allelic variation in almond and other Prunus species.