Introduction

Alfalfa (Medicago sativa L.) is one of the most important forage legumes and widely planted for hay, and pasture in the world. Alfalfa exists at two ploidy levels, diploid (2n = 2x = 16) and tetraploid (2n = 4x = 32) with a basic chromosome number of eight. Most of alfalfa cultivars are autotetraploid and developed by phenotypic selection. Although many traits such as disease resistances, pest resistances, winter survival, etc., had been successfully improved by the phenotypic selection, it is time consuming. The molecular breeding approaches like marker-assisted selection could enhance efficiency of cultivar development in term of gain per unit cost and time.

Efficient and robust molecular markers are essential for molecular breeding. Simple sequence repeat (SSR) or microsatellite, 1–6 bp iterations of DNA sequences, was found in both coding and noncoding regions [13]. Because of co-dominant inheritance, abundance in genomes and high reproducibility, SSR markers have been developed and extensive used in molecular genetic studies for many species. In alfalfa, SSR markers have been broadly applied for population structure and diversity [4], genetic mapping [5, 6], and comparative mapping [7]. For autotetraploid species like alfalfa, up to four different alleles could be scored from one SSR marker in a single genotype. Compared to dominant markers and bi-allelic markers like single nucleotide polymorphism (SNP) markers, multiple alleles of SSR markers are particularly informative and superior in genetic linkage map and QTL analysis for autotetraploid species [8]. Almost the SSR markers applied in alfalfa were derived from Medicago truncatula, a closely relative model species of alfalfa except that 61 polymorphic genomic SSR were developed by He et al. [9]. The expressed sequence tag (EST) derived SSR markers (EST–SSRs) developed from M. truncatula showed high level of transferability to alfalfa and other closely related species [10]. Different from M. truncatual, alfalfa is perennial. Genome size of alfalfa is about 800–1,000 Mbp, twice of M. truncatula [11]. Therefore, development of SSR markers directly from alfalfa could provide more informative markers for genetic studies and breeding applications.

Availability and continuous enrichment of ESTs in alfalfa can be served for development of EST–SSR markers. In total 12,371 ESTs (till 3 May, 2010) are now available for M. sativa in the National Center for Biotechnology Information (NCBI). The objects of this study were: (1) to analyze the frequency and distribution of SSRs in the alfalfa ESTs, (2) to develop and characterize of alfalfa EST–SSRs, (3) to assess intra-species genetic diversity and their cross-species/genera transferability.

Materials and methods

Plant materials and DNA extraction

Total 28 accessions from M. sativa ssp. sativa, M. sativa ssp. falcata, M. sativa ssp. coerulea, M. sativa ssp. varia, M. sativa ssp. hemicycle, and M. sativa ssp. glomerata were used to validate EST–SSR markers developed from this study (Table 1). One accession each of Medicago minima, Medicago lupulina, Trifolium repens, and Melilotus albus were used to assess the cross-species/genera transferability (Table 1). T. repens, and M. albus are the two species from different genera but in same tribe with M. sativa. Seeds were obtained from the Plant Genetic Resource Conservation Unit of USDA-ARS (Utah, USA) and Institute of Animal Science, Chinese Academy of Agricultural Science (Table 1). All the accessions used in this study were grown in the greenhouse of Institute of Animal Sciences, Chinese Academy of Agricultural Sciences. Total genomic DNA was extracted from young leaves of five plants each accession following the CTAB method [12]. DNA quality and quantity were checked in 1 % agarose gels and Unico UV-2000 Spectrophotometer (Unico, USA), respectively. The working concentration of DNA was adjusted to 50 ng/μL.

Table 1 Germplasm accessions used in this study

Data mining for EST–SSR

A total of 12,371 M. sativa EST sequences were retrieved from the (NCBI, http://www.ncbi.nlm.nih.gov/dbest/, 3 May 2010). The raw EST sequences were processed to remove the 5′ or 3′ end of poly A or poly T stretches by using the EST-trimmer software (http://www.pgrc.ipk-gathersleben.de/misa/download/esttrimmer.pl). After pre-treatment, the ESTs were assembled into larger ESTs using the CAP3 assembler software [13]. The criteria for assembly were an overlap size of 40 bp with 90 % identity. The identification and localization of potential SSRs were carried out using the MISA (http://www.pgrc.ipk-gatersleben.de/misa). The criteria for identifying SSRs for all possible combinations of core sequences were 6, 5, 4, 4, and 4 repeats for di-, tri-, tetra-, penta-, and hexanucleotides, respectively. Mononucleotide repeats were ignored because it was difficult to distinguish real mononucleotide repeats from polyadenylation products and single nucleotide stretch errors generated by sequencing.

Primer pairs were designed using the Batch Primer 3 software http://probes.pw.usda.gov/cgi-bin/batchprimer3/batchprimer3.cgi. The parameters for primers design were: (1) primer length from 18 to 24 with 20 as the optimum; (2) PCR product size from 100 to 350; (3) annealing temperature from 57 to 63 °C and with an optimum annealing temperature of 60 °C; (4) GC contents from 45 to 55 %, with 50 % as optimum.

Amplification and detection of SSR alleles

PCR amplifications of genomic DNA was carried out in a 25 μL reaction volume in an ABI 2700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) containing 2.5 μL 10× PCR buffer (100 mM Tris–HCl pH 8.8 at 25 °C; 500 mM KCl, 0.8 % (v/v) Nonidet), 0.5 μL 10 mM dNTPs, 1 U of Taq DNA polymerase (Sangon, Shanghai, China), 0.5 μL 10 μM of each primer, 2.0 μL 25 mM MgCl2 and 50 ng of template DNA. The following PCR profile was used: an initial denaturing for 8 min at 95 °C, followed by 10 cycles of 95 °C for 1 min, 60 °C for 30 s, and 72 °C for 45 s; and 20 cycles of 95 °C for 45 s, 55 °C for 30 s, and 72 °C for 45 s; a final extension at 72 °C for 6 min. Fluorescence-labeled primers were synthesized by Sangon (Shanghai, China). An ABI3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA) was used to capture amplification products by a fluorescence detection system for SSR markers. Fragment sizes were determined using an internal size standard (LIZ500, ABI, USA), and the outputs were analyzed using GeneMapper software (http://www.appliedbiosystems.com.cn/).

Allele frequency and diversity analysis

The EST–SSR bands were scored as presence (1) or absence (0). The expected heterozygosity (He) was calculated using GenALEx 6 [14]. Polymorphism information content (PIC) was calculated by PIC_CALC 0.6 (http://hi.baidu.com/luansheng1229/item/306815126d58e3a4feded5a4) according to Botstein et al. [15]. A dendrogram was constructed based on Jaccard’s similarity coefficient using the unweighted pair group method with arithmetic average (UPGMA) with the SAHN module of NTSYS-pc [16].

Results

Frequency and distribution of alfalfa EST–SSRs

A total of 11,732 ESTs with an average length of 545 bp were obtained after pre-treatment analysis of the 12,371 alfalfa ESTs retrieved from NCBI. This represented approximately 5.96 Mb of alfalfa genomic. In total, 4,913 potential Unigenes with a mean length of 604 bp including 1,478 contigs and 3,435 singletons were generated. Of the 11,732 ESTs, 716 were identified containing SSRs. Of the 716 SSR-containing ESTs, 54 (7.5 %) contained two or more SSRs, and 39 (5.4 %) presented in compound formation. Total 774 SSRs were identified from the 716 unique ESTs. On an average, one SSR was identified per 7.7 kb. Of the 774 EST–SSRs, di-, tri-, tetra-, penta-, and hexanucleotide SSRs account for 26.1, 48.8, 11.5, 9.7, and 3.9 %, respectively (Table 2). As shown in Table 3, SSR length was mostly distributed from 12 to 20 bp, accounting for 81.4 % of total SSRs, followed by 21–30 bp length range (141 SSRs, 18.2 %). A total of 68 SSR motifs were identified. The di-, tri-, tetra-, penta- and hexanucleotide repeats had 3, 10, 19, 18 and 18 types, respectively (data not show). The most abundant type was the AG/CT repeats (17.2 %) followed by AAG/CTT repeats (15.1 %), ACC/GGT repeats (7.2 %), ATC/ATG repeats (7.1 %), AAC/GTT repeats (6.9 %), AC/GT (6.6 %), AGC/CTG repeats (6.3 %), and AAAAC/GTTTT repeats (5.8 %). The remaining motifs presented a frequency of 27.8 %. It is interesting that there is no CG/GC repeats.

Table 2 Summary of EST–SSR searching results
Table 3 Length distribution of EST–SSRs based on the number of repeat units

Polymorphic analysis and transferability of EST–SSR markers

A total of 100 primer pairs were designed. The remaining ESTs did not quality for primer design as the flanking the SSRs were too short (generally < 40 nucleotides) or inability to match the criteria for primer design. The 100 EST–SSR primer pairs were used to screen a panel of 28 alfalfa accessions (Table 1). Total 30 primer pairs were able to produce clear and expected size of amplicons (Table 4). The remaining 70 EST–SSR primer pairs either had no amplification products or produced a number of faint bands indicative of non-specific amplifications or gave larger and smaller amplicons than the expected size. The details of these 30 EST–SSR primer pairs are available in Table 4.

Table 4 Characterization of 30 Medicago sativa EST–SSR markers

Of the 30 EST–SSR markers, 29 were polymorphic among the 28 alfalfa accessions. A total of 198 alleles were scored from the 29 polymorphic EST–SSR markers (Table 4). The number of alleles per marker varied from two (MsEST109) to 21 (MsEST66) with average of 6.8 alleles per marker (Table 4). The PIC ranged from 0.195 for MsEST84 to 0.896 for MsEST109 with an average value of 0.608 (Table 4). The expected heterozygosity (He) varied from 0.068 for MsEST79 to 0.442 for MsEST43 with an average of 0.207 (Table 4).

To assess cross-species/genera transferability, 30 alfalfa EST–SSRs markers were also used to screen the four accessions from M. minima, M. lupulina, T. repens, and M. albus. The transferability rates of these EST–SSRs ranged from 100 % in M. minima, followed by 83.3 % in M. lupulina, 70 % in M. albus, and 63.3 % in T. repens (Table 5).

Table 5 Cross-species/genera transferability of 30 M. sativa EST–SSR markers

Genetic diversity analysis

The Jaccard’s coefficients for the 28 alfalfa accessions were calculated based on the 29 EST–SSR markers. The lowest genetic similarity coefficients value (0.238) was observed between M. sativa ssp. coerulea (6P1639) and M. sativa ssp. varia (6P2111). A dendrogram was constructed based on the estimated Jaccard’s coefficients by a total of 198 polymorphic bands (Fig. 1). The value of cophenetic correlation coefficient is 0.92, indicating a very good fit between the dendrogram cluster and the original similarity matrix. All the 23 accessions of M. sativa ssp. sativa were clustered together and clearly separated from other five subspecies (Fig. 1). Among the 23 M. sativa ssp. sativa accessions, the dendrograms did not show clear clustering pattern of geographically closer accessions in the present study indicating that the association between genetic similarity and geographical distance was less significant (Fig. 1). ZXY01 (a cultivar from China) and PI 445767 (a cultivar from Australia) are similar to each other, but different from others. Sixteen accessions grouped into the biggest cluster which including almost accessions from Europe, four accession from Africa, two accessions from Asia, two from South America and one from North America. Other five accessions one from USA, one from Russia, one from Argentina, two from Asia (Kazakhstan and Saudi Arabia) separated from each other. Among the different ploid level, two diploid alfalfa (6P1639 and 8P4818) are clearly different from other tetraploid alfalfa genotypes.

Fig. 1
figure 1

The UPGMA tree of the 28 alfalfa accessions based on 29 EST–SSR markers

Discussion

In the present study, 716 (6.1 %) SSR-containing ESTs were identified from a total of 11,732 alfalfa ESTs. This result indicated that abundance of SSRs for alfalfa ESTs was higher than that for M. truncatula (3.0 %) [10]. Total 774 EST–SSR markers were identified from the 716 SSR-containing ESTs. The frequency of alfalfa EST–SSRs was about one EST–SSR marker per 7.7 kb. This was similar to the frequency reported in Coffea (7.73 kb) [17], peanut (7.3 kb) [18], sweet potato (7.1 kb) [19], cassava (7.0 kb) [20] but higher than that lotus (13 kb) [21]. A lower frequency was found in rubber tree (2.25 kb) [22], pepper (3.8 kb) [23], and in tea (3.5 kb) [24]. However, a direct comparison of abundance estimation and frequency occurrence of SSR in different reports is difficult due to the fact that the estimates were dependent on the SSR search criteria, the size of the dataset, the database mining tools and the EST sequence redundancy [25].

In previous studies, di- and tri-nucleotide repeats were generally the dominant motif found in many species. Di-nucleotide repeats were the dominant repeat in EST–SSR in cassava [20], rubber tree [22], tea [24], coffee [26], and physic nut [27]. Tri-nucleotide repeats were found the most abundant repeat motif in EST–SSR in peanut [18], sweet potato [19], chickpea [28], and castor bean [29]. In the present study, tri-nucleotide repeats (48.8 %) was the most abundant motif and followed by di-nucleotide (26.1 %). AG/CT (17.2 %) and AAG/CTT (15.1 %) were the most frequent di- and trinucleotide motif types, respectively, in this study. Similar results have been reported in many dicotyledonous species [2022, 28, 29]. In the present study, there is no GC dinucleotide SSRs were detected, which agree with other reports that no GC dinucleotide SSRs were found [10, 22, 26, 30].

Total 29 polymorphic EST–SSR markers were scored for 28 alfalfa accessions. The PIC values across the 28 accessions ranged from 0.195 to 0.896, and most of the markers (72.4 %), indicating a high level of polymorphism. EST–SSRs derived from M. truncatula showed high level of transferability to relative species including M. sativa [10]. Similarly, the EST–SSRs derived from alfalfa also showed high level of transferability in this study.

Conclusion

In this study, we developed EST–SSR markers for alfalfa using M. sativa EST database. The alfalfa EST–SSR markers showed high level of polymorphism and were highly transferable across a number of distantly related species. As the enrichment of alfalfa ESTs, more genome wide EST–SSR markers could be developed. These EST–SSR markers will facilitate marker-trait association, QTL mapping, and genetic diversity analysis.