Introduction

The genomes of all eukaryotes contain iterations of 1- to 6-bp nucleotide motifs. This class of DNA sequences is known as microsatellites (Litt and Luty 1989) or simple sequence repeats (SSRs; Tautz et al. 1986). SSRs are abundant across genomes, show high levels of polymorphism, and are usually considered as evolutionarily neutral DNA sequences. These are randomly or nearly randomly distributed across the euchromatic genome (reviewed in Li et al. 2002). SSRs are ubiquitously interspersed in eukaryotic genomes and can be frequently used as highly variable and multi-allelic PCR-based genetic markers (Brown et al. 1996). In wheat, Röder et al. (1998) developed the first molecular map consisting of 279 loci of genomic SSRs. These SSR markers have been extensively applied to molecular genetic studies and breeding research in wheat (Peng et al. 1999, 2000a, 2000b, 2000c, 2003; Peng 2000). Recently, many more SSRs were mapped and high density microsatellite consensus maps (Somers et al. 2004) or chromosome deletion bin maps (Sourdille et al. 2004) have become available. However, development of SSR markers is expensive, labor intensive and time consuming, especially if they are developed from genomic libraries (Röder et al. 1998; Varshney et al. 2002).

Recently, SSRs have been identified in genes and expressed sequence tags (ESTs; reviewed in Li et al. 2004). ESTs are single-pass sequence segments of expressed genes (Adams et al. 1991). They are derived from cDNA libraries made from multiple tissues under various treatments and used to identify as many genes as possible in an organism. ESTs have been developed for numerous organisms and sequences are deposited in GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html) and in specific species databases. Among plants, wheat currently has the largest number of ESTs. There are over 500,000 wheat ESTs deposited in the database as of 15 July 2004 (http://wheat.pw.usda.gov/wEST). This collection includes 90,016 hexaploid wheat ESTs developed by a group of United States researchers funded by the National Science Foundation (http://wheat.pw.usda.gov/NSF/project/). Out of these ESTs, 8,241 were physically mapped to chromosomes or chromosome bins using wheat aneuploid stocks and Southern hybridization (Hossain et al. 2004; Lazo et al. 2004; Linkiewicz et al. 2004; Miftahudin et al. 2004; Munkvold et al. 2004; Peng et al. 2004; Qi et al. 2004; Randhawa et al. 2004; Conley et al. 2004; http://wheat.pw.usda.gov/NSF/project/).

SSRs have been identified in the ESTs (designated eSSR hereafter) of wheat (Eujayl et al. 2002; Kantety et al. 2002; Varshney et al. 2002; Gao et al. 2003; Nicot et al. 2004), and other cereals (Hackauf and Wehling 2002; Kantety et al. 2002; Varshney et al. 2002; Gao et al. 2003; Thiel et al. 2003). The frequency of eSSRs was 1/6.2 kb–1/17.4 kb in wheat, 1/3.9 kb–1/11.8 kb in rice, and 1/7.5 kb–1/28.3 kb in maize (Varshney et al. 2002; Gao et al. 2003). The proportion of SSR-containing ESTs (designated SSRESTs hereafter) in cereals varies from 1.5% in maize to 6.6% in rye (Hackauf and Wehling 2002; Kantety et al. 2002). However, the frequency of eSSRs was variable in different studies even for the same species. Moreover, information on the physical location of eSSRs was not available in previous studies. The objectives of this study were (1) to analyze the frequency of eSSRs in the wheat genome, (2) to construct consensus chromosome bin maps of eSSRs in wheat, and (3) to develop and evaluate eSSR markers for application in breeding and comparative genomics.

Materials and methods

Identification of SSRs in expressed sequence tags

The Perl script MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html) was used to identify eSSRs in a wheat EST database consisting of about half a million ESTs (http://wheat.pw.usda.gov/cgi-bin/ace/search/wEST). In the present study, eSSRs were considered to contain motifs one to six nucleotides in size. The minimum repeat unit was defined as ten for mononucleotides, six for dinucleotides, and five for all the higher order motifs including tri-, tetra-, penta-, and hexanucleotides. Frequency of eSSR refers to kilo-base pairs of EST sequences containing one SSR. Percentage of SSREST refers to number of SSRESTs per 100 ESTs examined. Percentage of multi-SSR EST refers to number of ESTs containing more than one SSR per 100 SSRESTs. Percentage of compound eSSR refers to number of eSSRs present in compound formation (≥2 SSRs interrupted by ≤100 bases in an identified SSREST) per 100 identified eSSRs.

Chromosome localization of SSR-containing ESTs

Eight thousand two hundred and forty-one ESTs were physically mapped to chromosome bins of wheat using cytogenetic stocks (Hossain et al. 2004; Lazo et al. 2004; Linkiewicz et al. 2004; Miftahudin et al. 2004; Munkvold et al. 2004; Peng et al. 2004; Qi et al. 2004; Randhawa et al. 2004; Conley et al. 2004; http://wheat.pw.usda.gov/NSF/project/progress.html). A bin is a chromosome segment flanked by two deletion breakpoints (Endo and Gill 1996). The MISA software (http://pgrc.ipk-gatersleben.de/misa/misa.html) was used to identify SSRESTs in this set of wheat ESTs. In order to construct a reliable consensus physical map of eSSRs and detect patterns of genomic distribution of eSSRs, we first verified all the mapping data on the project webpage (http://wheat.pw.usda.gov/cgi-bin/westsql/map_locus_rev.cgi) by careful inspection of the online mapping images and included only those that we were able to confirm. The chromosome locations of SSRESTs are graphically presented in consensus bin maps for the seven homoeologous chromosome groups.

Development of eSSR markers and evaluation for use in mapping and comparative genomics

Three SSRESTs were chosen from each arm of the seven consensus maps and an additional six from 1S for primer design. The software program, Primer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi), was used to design primer pairs flanking eSSRs in the selected 48 SSRESTs. A set of 59 aneuploid stocks including 21 nulli-tetrasomic and 38 ditelosomic lines of Chinese Spring (CS) background (Sears 1969) was used to map eSSR marker loci to chromosome arms. A set of 12 Triticeae cultivars or accessions including seven common wheat cultivars/accessions, two tetraploid accessions (T. dicoccoides, T. durum), two barley accessions and one rye cultivar (Table 1) were used to test polymorphism and transferability among Triticeae species of wheat eSSR markers. PCR procedures were according to Peng et al. (1999), and PCR products were detected using 3% agarose gel with EtBr staining or 5% sequencing gels with silver staining.

Table 1 Triticeae materials used for polymorphism test of wheat EST-derived simple sequence repeat (eSSR) markers

Statistical analysis

Distributions of SSRESTs among three genomes, seven homoeologous groups, 21 chromosomes, and between the long and short arms of each chromosome were compared with that of the original mapped ESTs (Hossain et al. 2004; Lazo et al. 2004; Linkiewicz et al. 2004; Miftahudin et al. 2004; Munkvold et al. 2004; Peng et al. 2004; Qi et al. 2004; Randhawa et al. 2004; Conley et al. 2004; http://wheat.pw.usda.gov/NSF/project/progress.html). Correction factors were calculated as the percentage of ESTs the in specific genome, homoeologous groups, chromosome, and chromosome arms out of the total mapped ESTs. The reference values were obtained by multiplying the corresponding correction factors with total SSRESTs mapped to genome, homoeologous group, chromosome, and chromosome arms. The distribution difference was tested by χ 2 analysis. A software program, STATISTICA (StatSoft 1996), was used to conduct χ 2 analysis.

Polymorphism information content (PIC) is a measure of the polymorphism-detecting capacity of markers. PIC refers to the value of a marker for detecting polymorphism within a population, and depends on the number and frequency distribution of detectable alleles. We used a simplified formula (Anderson et al. 1993) to evaluate the PIC of eSSR markers, assuming that the inbred wheat lines are homozygous. The calculation of PIC for the ith marker is:

$${\text{PIC}}_{i} = 1 - {\sum {p^{2}_{{ij}} {\left( {j = 1,2, \ldots ,n} \right)}} }$$

where p ij is the frequency of the jth pattern for the ith marker and the summation extends over n patterns.

Results

Frequency of eSSRs in wheat

We examined nearly 500,000 wheat gene segments (ESTs) comprising 238.05 Mb DNA sequence and detected a total of 43,598 SSRs. Taking into account those ESTs which contained more than one eSSR, the total number of SSRESTs was 36,520 (7.41%). The frequency of occurrence for eSSRs was one eSSR in every 5.46 kb of EST sequences. Out of the total SSRESTs, 4,647 (12.72%) contained two or more eSSRs. Among the total eSSRs, 5,133 (11.77%) appeared to have compound formation.

Distribution of wheat eSSRs based on motif sizes

Figure 1 shows the distribution pattern of eSSRs in terms of motif size. Among the eSSRs detected, the trinucleotides are the most frequent, with a frequency of 59.7%, followed by mono- (19.7%) and dinucleotides (16.7%). The tetranucleotide eSSRs have a much lower frequency (2.7%) than those with mono-, di-, and trinucleotide motifs. The frequencies of eSSRs with motif sizes of five or six are less than 1%.

Fig. 1
figure 1

Frequency distribution of EST-derived simple sequence repeats (SSRs) based on motif size

Distribution of wheat eSSRs based on motif types

The eSSRs identified in the present study could be basically characterized by 16 types of motifs (Fig. 2). In general, the eSSRs were found to be unevenly distributed across motif types. The motif CCG/CGG had the highest frequency of 20.5% followed by the motifs A/T (13.6%), AAC/GTT (11.7%), AG/CT (8.7%), AGC/CGT (7.8%), C/G (6.2%), AGG/CCT (5.9%), and AC/GT (5.3%). The other eight types of motifs possessed a frequency of <5%.

Fig. 2
figure 2

Frequency distribution of EST-derived SSRs based on motif sequence type

Physical mapping of wheat SSRESTs

Among 8,241 wheat ESTs physically mapped by the ten United States laboratories including ours as of February 2004 (http://wheat.pw.usda.gov/NSF/project/), 341 were SSRESTs. A total of 1,010 EST loci derived from these 341 SSRESTs were mapped to specific bins, arms or chromosomes of seven homoeologous groups in wheat. The consensus physical (chromosome bin) maps showing chromosomal locations of SSRESTs, the corresponding cDNA libraries and the containing SSR sequences were constructed and are presented in Figs. 3 and 4.

Fig. 3
figure 3figure 3figure 3figure 3figure 3figure 3

Consensus bin maps of SSRESTs for wheat homoeologous group 1, 2, 3, 5, 6, and 7. The accession numbers of SSRESTs mapped to specific bins are shown on the left side, and those mapped to combined bins are shown on the right side of the chromosomes. Vertical lines on the right side span the range of combined bins. Letters after EST accession numbers refer to the code for cDNA libraries: A wheat pre-anthesis spike, B wheat etiolated seedling root, C Secale cereale anther, D Triticum monococcum early reproductive apex, E wheat unstressed seedling shoot, F wheat vernalized crown, G wheat 5–15 DAP spike, H wheat endosperm, I Aegilops speltoides anther, J wheat 20–45 DAP spike, K wheat heat-stressed flag leaf, L wheat Fusarium graminearum-infected spike, M wheat cold-stressed seedling, N wheat drought-stressed leaf, O wheat heat-stressed spike, P durum wheat whole plant, Q wheat salt-stressed crown, R wheat unstressed root tip, S wheat heat-stressed seedling, T wheat salt-stressed sheath, U wheat drought-stressed seedling, and V wheat dormant embryo. Following the code is the eSSR motif sequence. COM Compound formation, ≥2 SSRs interrupted by ≤100 bases in an identified SSRESTs. SSRESTs in italics are those which only mapped to specific chromosomes, and not to chromosome arms or bins

Fig. 4
figure 4

Chromosome bin map of SSRESTs for wheat homoeologous group 4. The consensus chromosome 4 integrated 4B and 4D. SSRESTs in bold and underlined letters detected structure changes between 4A and 4B/4D (bold-font SSRESTs were mapped to 4AS and 4L arms, the underlined-font SSRESTs were mapped to 4AL and 4S arms). BE518074 was the only SSREST mapped to similar arms, 4AS and 4S. All other notations are the same as in Fig. 3

Due to the significant structural changes in group 4 chromosomes, bin maps for chromosome 4A and a consensus chromosome 4 incorporating 4B and 4D are presented separately (Fig. 4). Among 15 SSRESTs mapped to chromosomes 4A, 4B and/or 4D, 14 detected structural changes between 4A and 4B/4D. The pericentric inversion in 4A established arm homoeologies as 4AS =4BL =4DL, and 4AL =4BS =4DS. This result further confirms the findings of structure changes of group 4 chromosomes based on chromosome morphology and C-banding patterns (Endo and Gill 1984; Gill et al. 1991) and molecular mapping (Mickelson-Young et al. 1995).

The mapped SSRESTs covered the whole genome of hexaploid wheat. In general, however, there were more SSRESTs in the distal regions than in the proximal regions. The distribution patterns of SSRESTs or eSSRs were variable among chromosome groups and between chromosome arms (Figs. 3, 4).

Distribution of SSRESTs among genomes, chromosomes and between chromosome arms

The number of SSREST loci mapped to the B genome was 17.7% and 16.6% higher than that mapped to the A and D genomes, respectively (Table 2). In comparison to the reference number of SSRESTs based on the original mapped ESTs, the number of SSRESTs in the B genome was in slight excess, while the number of SSRESTs in the A and D genomes was slightly less. However, the deviation was not significant (P =0.823) as indicated by the χ 2 test (Table 2).

Table 2 Distribution of SSR-containing ESTs among genomes, chromosomes and between chromosome arms [T = A + B + D; O observed, R reference value of SSREST number which was based on the original number of mapped ESTs (Peng et al. 2004; Qi et al. 2004; http://wheat.pw.usda.gov/cgi-bin/westsql/map_locus_rev.cgi)]

As shown in Table 2, the number of SSREST loci mapped to homoeologous groups 1, 2, 3, 5 and 7 exceeded the reference values based on the original mapped ESTs, while the reverse was true for groups 4 and 6. The χ 2 test indicated that the deviation was nearly significant (P =0.053).

The number of SSREST loci mapped to chromosomes 1B, 3B, 4A, 6B and 7D exceeded the reference values based on the original mapped ESTs. The number of SSREST loci was similar to or lower than the reference value for the other 16 chromosomes. However, the deviation was not significant (P =0.963) as indicated by the χ 2 test (Table 2).

For 16 of the 21 chromosomes, the number of SSREST loci mapped to the long arm exceeded that mapped to the short arm. For the remaining five chromosomes, the short arm had the same or a larger number of SSRESTs than the long arm. Furthermore, χ 2 test indicated that the deviation from reference values based on the original mapped ESTs was significant (P <0.05) only for chromosomes 5B, 5D, and 7B (Table 2).

Experimental evaluation of eSSRs

To evaluate the usefulness of the identified eSSRs, primers were designed for 48 selected SSRESTs, and the amplified loci were mapped to chromosome arms using wheat cytogenetic stocks and a PCR approach. The Electronic Supplementary Material lists 266 eSSR loci derived from the 48 eSSR primer pairs and mapped to specific chromosome arms of wheat. The number of mapped loci ranged from 1 to 28 and an average of 5.5 loci were mapped for each primer pair (Table 3; Electronic Supplementary Material). Out of the 48 eSSR markers, 43 (∼90%) amplified at least one band from the expected chromosome arm, based on mapping information of the original SSRESTs (Figs. 3, 4, Table 3). Five did not amplify loci from the expected chromosomes, possibly due to the imperfect primer design or other unknown reasons. The high coincidence in chromosome location between SSRESTs and eSSR markers is encouraging for the application of EST-derived markers in genetic studies.

As shown in Table 3, the number (266) of eSSR loci was obviously larger than loci (165) detected via the RFLP approach (designated eRFLP hereafter). For 25 of the 48 SSRESTs, the number of eSSR loci exceeded that of the corresponding eRFLP loci. For 17 of the 48 SSRESTs, the number of eSSR loci was smaller than that of the corresponding eRFLP loci. The number of eSSR loci was equal to that of eRFLP for only 6 of the 48 SSRESTs. However, most (79.2%) of the excessive eSSR loci relative to eRFLP loci were derived from 6 (12.5%) SSRESTs, i.e., BE494877, BE495462, BE495116, BE497184, BF478437 and BG262868. The other 42 (87.5%) SSRESTs only contributed 20.8% to the loci difference between eSSR and eRFLP (Table 3). Thus, only a small number of SSRESTs significantly contributed to the number difference between mapped eSSR and eRFLP loci.

Table 3 Chromosome location coincidence between EST-derived simple sequence repeats (eSSR) and eRFLP loci (detected via the RLFP approach) of SSR-containing ESTs in wheat

The eSSRs were also evaluated for polymorphism and transferability to other cultivars or other Triticeae species. Figure 5 is an example of a PCR profile of wheat eSSR marker Xcwem20 on 12 accessions/genotypes of Triticeae species listed in Table 1. As a measure of the informativeness of eSSR markers, PICs were obtained for 15 eSSR markers with loci covering all the seven homoeologous chromosome groups, by testing a set of seven hexaploid wheat lines used in our wheat genomics and breeding programs (Table 1). Each band in each accession was considered as an allele of a genetic locus for each eSSR marker. As shown in Table 4, the PIC of eSSR markers ranges from 0.29 to 0.76 and has a mean value of 0.45±0.16 which is lower than 0.54±0.19 for genomic SSR markers (Röder et al. 1995) and higher than 0.30±0.27 for RFLP markers (Anderson et al. 1993). The average number of alleles detected by genomic SSR markers was 4.6 (Röder et al. 1995). In the present study, the average number of alleles detected by eSSR markers reached 20.1 (Table 4) which is significantly larger than that of genomic SSR markers.

Fig. 5
figure 5

eSSR fingerprint of 12 Triticeae genotypes generated with Xcwem20. Lanes A, B, C, D, E, F, G, H, I, J, K and L contain amplified bands from Chinese Spring, PI294994, PI262660, CI2401, CO970547, Above, Glupro, Langdon, Hermon H52, Mona, Wadi Qilt 23–39 and Turkey 67, respectively. The details of these genotypes are shown in Table 1. Lane M is 50-bp DNA ladder (Invitrogen cat. no. 10416–014)

Table 4 Polymorphism information content (PIC) and interspecies transferability of wheat eSSR markers

In other studies, less stringent PCR conditions had to be used to amplify bands from related species using wheat genomic SSRs (Röder et al. 1995). We were able to amplify products for all the 15 wheat eSSR markers from tetraploid wheat, barley and rye DNA samples using the same PCR conditions used for hexaploid wheat DNA (Fig. 5, Table 4).

Discussion

Microsatellites in genes

Microsatellites (SSRs) are usually considered as evolutionarily neutral DNA markers (Tachida and Iizuka 1992; Awadalla and Ritland 1997; Schlötterer and Wiehe 1999) and selectively neutral sequences randomly or nearly randomly distributed over the euchromatic genome (Schlötterer and Wiehe 1999; Schlötterer 2000). However, reports have repeatedly shown that SSRs also exist within genes or functional sequences (reviewed in Li et al. 2004). Expansion of a trinucleotide repeat in coding regions of a human novel gene is sometimes associated with neuropathological diseases, e.g., Huntington’s disease (Huntington’s Disease Collaborative Research Group 1993). The waxy gene in rice is found to contain a (CT)n SSR in the 5′-untranslated region (UTR) whose length polymorphism is associated with amylose content (Ayres et al. 1997). Bao et al. (2002) also found that (CT)n SSRs in starch-synthesizing genes are related to starch physicochemical properties in waxy rice. Recently, SSRs were identified in expressed genes or ESTs of several crop plants including wheat (Eujayl et al. 2002; Hackauf and Wehling 2002; Kantety et al. 2002; Morgante et al. 2002; Schloss et al. 2002; Varshney et al. 2002; Gao et al. 2003; Thiel et al. 2003; Nicot et al. 2004). In the present study, we identified 43,598 SSRs within nearly half a million wheat ESTs or the expressed gene segments. This result further corroborates that SSRs do exist in the expressed portions of wheat genomes or genes, and must have been involved in the expression and/or function of genes. We gave the designation eSSRs for SSRs found in ESTs.

Frequency of eSSRs in wheat

We obtained a frequency of 8.85% eSSRs or the occurrence of one SSR in every 5.46-kb sequence for the 238.05-Mb wheat EST database. Through analysis of small size genomic and EST sequences in wheat, Arabidopsis, rice, soybean and maize, Morgante et al. (2002) found that the frequency of SSR was significantly higher in ESTs than in genomic DNA across all species. Therefore the frequency of genomic SSRs in wheat may be expected to be lower than 1/5.46 kb if a similar searching stringency is applied.

Kantety et al. (2002) and Gao et al. (2003) reported a lower frequency of 3.2% and one SSR every 17.42 kb ESTs in wheat, respectively. The lower frequency may be partially due to a more stringent criterion for defining microsatellites. For example, Kantety et al. (2002) used a criterion of a minimum length of 18 (di- and tri-) bases. Gao et al. (2003) used a criterion of a minimum length of 18 (tri-) and 20 (di-) bases. Varshney et al. (2002) adopted a similar stringency as in our study and obtained an average SSR frequency of 1/6.0 kb for five cereal crops and 1/6.2 kb for wheat, which is close to our observed frequency (1/5.46 kb).

The frequency variations of eSSRs among different studies are partly due to the sampling error, exclusion of some EST libraries, and stringency of limits set for SSR extraction. A relatively low stringency applied in the in silico analyses may partially explain the relatively high SSR frequencies in the present study and Varshney et al. (2002). It is necessary to note that the database containing nearly half a million ESTs or 238.05 Mb EST sequence length used in the present study is the largest compared to the previous reports of 38,238 ESTs used by Kantety et al. (2002), 37.5 Mb explored by Varshney et al. (2002), 21.4 Mb searched by Gao et al. (2003), and 170,746 ESTs examined by Nicot et al. (2004). The frequency of eSSRs identified via the whole database of wheat ESTs in the present study may be close to the actual frequency of eSSRs in the wheat genome.

Importance of tri-nucleotide eSSRs in wheat

The majority of genomic SSRs (48–67%) found in many species are di-nucleotides (Wang et al. 1994; Schug et al. 1998). In contrast to the tri-nucleotide SSRs, di- and tetra-nucleotide SSRs are much less frequent in coding regions than in non-coding regions (reviewed in Li et al. 2002). The tri-nucleotide SSRs are reported as the most abundant class of SSRs in the EST sequences of all the crop species including wheat (Kantety et al. 2002; Morgante et al. 2002; Varshney et al. 2002; Gao et al. 2003), which was verified in this study. Suppression of non-trinucleotide SSRs in coding regions could reduce frameshift mutations, and this may be a reason for the predominance of tri-nucleotide SSR (Metzgar et al. 2000). Morgante et al. (2002) suggested that mutation pressure and positive selection for specific single amino acid stretches are possible reasons for the doubled frequency of tri-nucleotide repeats relative to mono- and dinucleotide repeats in the genes of plant species.

In our study, the eSSRs with CCG/CGG motif are the most frequently observed among all types of motifs (Fig. 2) which agrees with the findings of Nicot et al. (2004). Among the trimeric repeats, the motif CCG is the most common in several cereals including wheat with a frequency of 32% (Varshney et al. 2002). Gao et al. (2003) showed that CCG/CGG is the most frequent motif followed by AAC/TTG. The CCG motif was also one of the most frequent motif sequences in Kantety et al. (2002). The abundance of CCG repeats is a specific feature of monocot genomes, which may be due to the increased G + C content (Morgante et al. 2002). In contrast, AAT/ATT motifs had the lowest frequency (<1%) in the present study (Fig. 2). This may be due to the fact that TAA-based variants code for stop codons that have a direct effect on protein synthesis (Chin et al. 1996).

Genomic distribution patterns of eSSRs

Genetically mapped molecular markers including RFLPs (Liu and Tsunewaki 1991), genomic SSRs (Röder et al. 1998), and AFLPs (Peng et al. 2000c) were shown to be non-randomly distributed among genomes of wheat with a larger number in the B genome, and this non-randomness was attributed to higher polymorphism in the B genome. Qi et al. (2004) found that 11% more EST loci were mapped in the B genome than in the D and A genomes. A larger number of SSRESTs was also observed in the B genome (Table 2).

Wheat ESTs were not always evenly distributed among genome, homoeologous groups, chromosomes, and between arms of each chromosome (Hossain et al. 2004; Lazo et al. 2004; Linkiewicz et al. 2004; Miftahudin et al. 2004; Munkvold et al. 2004; Peng et al. 2004; Qi et al. 2004; Randhawa et al. 2004; Conley et al. 2004; http://wheat.pw.usda.gov/NSF/project/). Deviations of SSREST distribution patterns from that of the original mapped ESTs for genomes, homoeologous groups, chromosomes, and most of the chromosome arm pairs (long vs short arm) were not significant (Table 2). Therefore, eSSRs as represented by SSRESTs basically follow distribution patterns of the original mapped ESTs in the wheat genome (Hossain et al. 2004; Lazo et al. 2004; Linkiewicz et al. 2004; Miftahudin et al. 2004; Munkvold et al. 2004; Peng et al. 2004; Qi et al. 2004; Randhawa et al. 2004; Conley et al. 2004; http://wheat.pw.usda.gov/NSF/project/).

Potential of eSSR markers in cereal breeding and comparative genomics

EST-derived SSR or eSSR markers constitute a novel source of markers that are physically associated with coding regions of the genome and thus enhance the role of genetic markers in germplasm evaluation by enabling the variation in expressed genes to be assayed. However, to date a limited number of eSSR markers has been mapped in wheat. This includes 11 in Holton et al. (2002), and 112 in Nicot et al. (2004) of which 5 are publicly available. Here we present 341 wheat eSSRs present in 1,010 loci physically mapped on specific chromosomes or bins (Figs. 3, 4). Forty-eight of these were converted into eSSR markers which amplified 266 loci physically mapped to chromosome arms (Table 3; Electronic Supplementary Material). The observed high coincidence of chromosome location between SSRESTs and the derived eSSR markers (Table 3) suggests that the mapping information presented for SSRESTs (Figs. 3, 4) can be used to derive additional eSSR markers for various applications.

In previous studies, an average of <2 loci per SSR was genetically mapped for both genomic SSR and eSSR markers (Röder et al. 1998; Nicot et al. 2004). In comparison, we physically mapped an average of 5.5 loci per eSSR marker (Electronic Supplementary Material). This higher number of loci per eSSR may be explained by the difference in mapping approaches used, i.e., genetic mapping in the previous studies (Röder et al. 1998; Nicot et al. 2004) and physical deletion mapping in the current study (Table 3; Electronic Supplementary Material). The high frequency of inter-chromosome duplication observed for wheat ESTs (Peng et al. 2004) also may be one of the reasons for the high number of mapped eSSR loci per marker.

In general, eSSR markers detected more loci than eRFLP did. Although a large portion (25/48) of the eSSR markers amplified more loci than the corresponding eRFLP markers detected, a significant part (17/48) of the eSSR markers amplified less loci than the corresponding eRFLP markers detected (Table 3). Furthermore, the unmapped eRFLP bands/loci due to the technical weakness of the RFLP method also seemed to play an important role (>31.7%) in the number differences between eSSR and eRFLP loci as shown in Table 3. Therefore, a combination of eSSR and eRFLP approaches is necessary in genomic analyses of wheat SSRESTs.

We observed an interesting phenomenon. For seven eSSR markers, Xcewm1, 5, 14, 35, 39, 42 and 49, a single band was observed to disappear in two nullisomic-tetrasomic and the corresponding ditelosomic lines. For example, the Xcewm1c band (134 bp) disappeared in both nulli-1A/Dt1AS and nulli-1D/Dt1DS. Hence, the same band was physically mapped to two different chromosome arms, 1AL and 1DL (Electronic Supplementary Material). Xcewm35 and Xcewm39 detected single bands residing in 4AL and 4DS (Electronic Supplementary Material) because of the arm homoeologies, 4AS =4BL =4DL and 4AL =4BS =4DS as shown in Fig. 4 and previous studies (Endo and Gill 1984; Gill et al. 1991; Mickelson-Young et al. 1995). The possibility that more than one chromosome or chromosome arm is missing in the nulli-tetrasomic and ditelosomic lines used in this study can be ruled out, since these lines were cytogenetically characterized before use in this study. Further studies are underway to elucidate the nature of this phenomenon.

The PIC value of markers indicates the usefulness of DNA markers for gene mapping, molecular breeding and germplasm evaluation. We obtained an average PIC of 0.45±0.16 for 15 eSSR markers on seven hexaploid wheat lines. This value is lower than that reported for genomic SSR markers (0.54±0.19, Röder et al. 1995), but higher than RFLP markers (0.30±0.26, Anderson et al. 1993). This agrees with the observations of Eujayl et al. (2002) and Nicot et al. (2004). The eSSRs are located in genes or coding regions of genome. Thus the lower PIC of eSSRs relative to genomic SSRs is perhaps due to the high conservation of genes/coding regions among cultivars within a species. Artificial selection pressure in the breeding process also has played a role in reduction of polymorphism or PIC values among cultivars. Nevertheless, compared to genomic SSRs, eSSR markers possess additional advantages in that (1) they are cheaper to develop using huge publicly available EST databases, (2) they exist within genes and are therefore perfect genetic markers for target genes; and (3) they appear to be more readily transferable among species (Holton et al. 2002; Table 4).

Genomic SSRs are frequently not transferable to closely related species (Röder et al. 1995; Sourdille et al. 2001), and thus not suitable for comparative genomics study. Holton et al. (2002) reported a relatively high level of transferability (26–55%) for eSSRs between wheat and barley. Tall fescue eSSR markers were also transferable across several grass species including rice, ryegrass and wheat (Saha et al. 2004). All the 15 eSSR markers we tested amplified PCR products in tetraploid wheat (T. dicoccoides, T. durum), barley (H. spontaneum, H. vulgare), and rye (S. cereale). The majority of the 15 eSSR markers could also detect polymorphism between the two tetraploid wheat accessions and between the two barley accessions (Table 4). The transferability of eSSR markers across related Triticeae species makes them a valuable source for comparative genomics studies.