Introduction

Bread wheat (Triticum aestivum L em. Thell.) is one of the most important crops in the world. It is an allohexaploid (2n=6x=42) containing three distinct but related genomes, A, B and D, each with seven chromosomes. It has a large genome of 16×109 bp (Bennett and Smith 1976), of which more than 80% is repetitive DNA. Detailed RFLP genetic maps (Nelson et al. 1995a, 1995b, 1995c; Marino et al. 1996) and physical maps (Gill et al. 1993; Kota et al. 1993; Hohmann et al. 1994; Delaney et al. 1995a, 1995b; Mickelson-Young et al. 1995) for all the seven homologous groups are now available. RFLP analysis, however, is limited by its labor-intensiveness and low polymorphism in wheat. In contrast, PCR-based molecular markers such as microsatellites or simple sequence repeats (SSRs) are easy to use and exhibit a higher degree of polymorphism. As of now, a total of 450 microsatellite markers have been added to wheat genetic maps by different research groups (Röder et al. 1998; Stephenson et al. 1998; Pestsova et al. 2000; Gupta et al. 2002). However, traditional SSR markers have some disadvantages. First, genomic SSR markers were mostly derived from the intergenic regions with no gene function. Second, procedures for developing those markers are complex, which include isolating and sequencing clones containing putative SSR motifs, and subsequently designing and testing the flanking primers. Recent large-scale sequencing projects have produced a large amount of single-pass sequences of complementary DNAs (cDNAs) from different plant species (http://www.ncbi.nlm.nih.gov; http://www.graingenes.org). The number of expressed sequence tags (ESTs) deposited in GenBank for wheat, maize, rice and soybean has mounted to 416,000, 197,000, 113,000 and 308,000 sequences respectively (released 10/1/03, http://www.ncbi.nih.nlm.gov/dbEST). Studies on the distribution of microsatellites in ESTs (EST-SSRs) have been carried out in both eukaryotic (Cardle et al. 2000; Tóth et al. 2000; Kantety et al. 2002) and prokaryotic genomes (Gur-Arie et al. 2000). The estimated frequency of EST-SSRs is higher in the coding than that in the non-coding sequences (Temnykh et al. 2001; Morgante et al. 2002), suggesting that a significant proportion of ESTs can be used as polymorphic SSR markers. Furthermore, if a relatively stringent threshold for sequence similarity is used, EST-SSR mining across plant species could lead to the development of anchor markers with putative function in related plant species. Thus, traditional marker assisted selection (MAS) can be replaced by direct gene selection for targeted traits.

As of 4 February 2003, 6,636 ESTs has been mapped in wheat by RFLP analysis using mapping populations and deletion lines (http://wheat.pw.usda.gov/NSF/progress_mapping.html). However, genetic maps built from EST-SSRs have not been reported. We present here a genetic map containing 101 EST-SSR loci based on three mapping populations of hexaploid wheat, of which 74 loci represent gene mapping according to sequence similarity.

Materials and methods

Microsatellite marker development

As reported previously (Gao et al. 2003), a total of 71,495 wheat EST sequences were obtained from the wheat NSF project homepage (http://wheat.pw.usda.gov/NSF/progress_est.html). ESTs containing SSRs of at least 18 bp long for 1–6 repeat patterns were extracted and primers were designed using the program Primer 3 (http://www.basic.northwestern.edu/biotools/Primer3.html).

Plant materials, DNA extraction and PCR amplification

Three mapping populations were used in this study. The main one was an ITMI mapping population, generated by single-seed descent (F7) hybrids from the cross of a synthetic hexaploid wheat W-7984 with T. aestivum variety ‘Opata 85’ (WOpop); a RIL population generated from a cross of released Chinese variety ‘Wenmai 6’ and a land variety ‘Shanhongmai’(WSpop); and a double haploid (DH) population generated from cross of Chinese released variety ‘Lumai 14’ and ‘Hanxuan 10’ (LHpop). Ninety to 100 lines were selected randomly from each population and genomic DNAs were extracted from the 3-week-old leaves using an SDS-phenol-chloroform method (Devos et al. 1992). Conditions for PCR reactions were as described by Röder et al. (1998) except that the annealing temperatures were adjusted depending on the different primer pairs.

Genetic mapping

Reference maps consisting of 519 anchor markers (mainly RFLPs) for WOpop, 320 (mainly SSRs and AFLPs) for LHpop (unpublished data) and 197 (mainly SSRs and AFLPs) for WSpop (unpublished data) were prepared using MAPMAKER/Exp v3.0b (Lander et al. 1987). New microsatellite markers were integrated into the skeleton maps according to the procedures described by Gupta et al. (2002) at a LOD score of 2.5. Centimorgan units were calculated using the Kosambi mapping function (Kosambi 1944). Mapped wheat microsatellite loci were designated as Cwm for “wheat microsatellites derived from cDNAs”. To construct a consensus linkage map from the three individual maps, anchor markers (mainly SSR) were chosen as standard markers that were mapped in both WOpop and LHpop or WOpop and WSpop. The positions of these loci in LHpop and WSpop were then assigned approximately for the WOpop linkage groups.

Homology searching

ESTs containing mapped microsatellites were searched against GenBank nonredundant (nr) database using TBLASTX or BLASTN algorithms (http://www.ncbi.nlm.nih.gov/BLAST). First, sequences with expected value<10-7 by TBLASTX or <10-15 by BLASTN were assigned putative functions. For example, Ivrv-A1a-SSR and Ivrv-A1b-SSR, two alleles of gene Ivrv, were mapped to the A genome by EST-SSR markers. If the E value was greater than 10-7–10-15, the markers followed Cwm designation, with a number indicating the primer pairs designed originally.

Results

Marker evaluation: functionality and polymorphism

One thousand two hundred and twenty-eight ESTs containing microsatellites were mined from the 71,495 ESTs. A total of 597 primer pairs were designed from the 1,228 ESTs-SSRs, of which 478 (80%) amplified products successfully, based on DNAs from the parents of the mapping populations (WOpop, LHpop, and WSpop). The other 20% of these primer pairs either amplified products of larger sizes than expected (fragments above 500 bp), or in most cases, produced no products. Ninety-two, 58 and 29 primers showed polymorphism between parents of WOpop, LHpop and WSpop, respectively. Thirty-one polymorphic primer sets were shared by the parents of WOpop and LHpop/WSpop, and five were shared by the parents of LHpop and WSpop. Therefore, 29.9% (143/478) of the tested primers yielded unique polymorphic EST-SSRs.

Mapping and distribution of microsatellite loci

Of the 143 polymorphic EST-SSR markers based on the three mapping populations, 88 primer sets demonstrated reproducible amplification and were used for genetic mapping. A total of 101 microsatellite loci amplified by the 88 primer pairs were integrated into the three reference maps: 67 on WOpop, 25 on LHpop, and 9 on WSpop (Fig. 1). The genetic length of WOpop reached up to 4,641.2 cM, and contained 65 EST-SSR loci together with 519 anchor markers. The average distance between loci was 7.9 cM. Two loci, Atp-D1-SSR and Cwm224, were not included in the calculation because they were assigned to the intervals of chromosome1D and chromosome 6D of the RFLP map respectively (Fig. 1). Ten of the 88 primer sets amplified more than one marker, and the highest number of loci was produced by Cwm231 with 4 loci mapped to the non-homologous groups.

Fig. 1
figure 1figure 1figure 1figure 1

Molecular linkage map of bread wheat by EST-SSR markers using three mapping populations: W7984×Opata85 (WOpop), Lumai×Hanxuan (LHpop) and Wenmai6×Shanhongmai (WSpop). For WOpop, markers with a LOD>2.5 were integrated into the RFLP framework; the other markers were placed to the most probable interval. The approximate positions of markers mapped on LHpop and WSpop were assigned to the right of the chromosomes on WOpop correspondingly. Estimated centromere locations were shown in black lines

Of the 101 loci, 24 were mapped on the D genome, whereas 40 and 37 were mapped to the A and B genomes respectively (Fig. 1). The distribution of microsatellites among seven homologous groups was not random. Thirty-one loci were mapped to the three chromosomes of homologous group1 whereas only 4 loci were mapped on chromosomes of group 4 and no locus was mapped on chromosome 4B.

Putative functions of the mapped genes

Table 1 listed the loci for which sequence homology had been determined. The data indicated that 74 (73.2%) of 101 loci were corresponded to genes of known function. Sequence similarity searches revealed storage proteins, regulatory factors as well as structural genes and genes involved in such diverse processes as DNA synthesis, cell cycle regulation, carbon metabolism, fatty acid metabolism, membrane transport and signal transduction. Except for glutenins and gliadins that had been mapped previously as RFLP markers, all the putative genes represented by EST-SSRs were mapped for the first time in wheat (Fig. 1).

Table 1 Description of EST-derived wheat microsatellites mapped on the three populations: W-7984×Opata85 (WOpop), Lumai×Hanxuan (LHpop), and Wenmai6×Shanhongmai (WSpop). Loci were named after their GenBank homologues if the alignment on tBlastX or BlastN searching gave an E value ≤10-7 (tBlastX) or ≤10-15 (BlastN) respectively; otherwise, the locus was named Cwm followed by the primer pair numbers.Loci on WOpop were assigned to chromosome arms, while the loci on LHpop and WSpop were allocated to possible positions on WOpop, and then assigned to the appropriate chromosome

Discussion

We present here the first genetic map of the bread wheat genome based on microsatellites derived from ESTs. By comparison with the traditional method for developing microsatellites, mining microsatellites from ESTs can save considerable time and cost. In this study, 80% of the primer pairs successfully amplified products, a rate much higher than that reported by Röder et al. (1998) from wheat genomic DNA (30%) and that by Stack et al. (2000) from wheat ESTs (50%). Most of the primer pairs produced clear and strong amplification products. Under similar PCR conditions (same thermocycler and Taq polymerase and buffer), there were no obvious differences between SSR markers designed from genomic sequences or ESTs. However, it appears that EST-SSRs show fewer alleles than SSRs designed from genomic DNAs. Functional constraints on ORFs may account for the lower percentage of polymorphism (19.2%) between the parents of OWmap population as compared to the polymorphism rate of 33% reported by Gupta et al. (2002) using SSR designed from genomic DNA. On the other hand, we were able to map 65 markers (73.9% of all mapped primer sets) with putative functions to 20 chromosomes. Such results will be valuable for targeted traits selection in crop breeding. For instance, EST-SSRs associated with gliadin or glutenin will be helpful for evaluating bread-making quality, whereas markers related to stress responsive genes may facilitate selection for tolerance against biotic and abiotic stresses.

This study presents a starting point for the construction of a candidate gene map of bread wheat genomes using EST-SSRs. Since 478 functional EST-SSR markers were developed from 71,000 wheat ESTs in the present study, we estimate that approximately 2,600 EST-SSR markers could be derived from the 400,000 wheat ESTs publicly available. Combined with the cSNP makers generated from ESTs, construction of a high-resolution and marker-dense transcriptional map of bread wheat is feasible in the near future.

The EST-SSR markers presented in this study have many advantages over those from anonymous genomic DNAs. Most of the EST-SSR markers (74) represented genes based on a stringent sequence similarity threshold. For example, at an E value of 10-35, a Dof protein was identified and mapped to chromosome 1BS. The Dof proteins are a large family of transcription factors, recently discovered and present only in plants (Papi et al. 2002). A number of Dof proteins are being characterized in maize (Vicente-Carbajosa et al. 1997; Yanagisawa and Sheen 1998; Yanagisawa 2000), barley (Mena et al. 1998), pumpkin (Kisu et al. 1998; Shimofurutani et al. 1998), tobacco (Baumann et al. 1999) and Arabidopsis (Gualberti et al. 2002) and these genes appear to confer distinct functions in different plant taxa. So far, only the Dof gene DAG1 was convincingly demonstrated to have effects on seed germination in Arabdopsis (Papi et al. 2000). There has been no description of this gene family in wheat until now. With the EST-SSR approach, we were able to map a Dof homolog on chromosome 1B, enabling further research to determine the functions of this gene in wheat. Homeodomain-leucine zipper (HD-zip) proteins, another large family of transcription factors, are apparently unique to plants (Johannesson 2000). HD-zip protein Oshox1 showed repressor function in rice but conferred transcriptional activation in yeast (Meijer et al. 2000). We have mapped one member of this protein family to chromosome 2BL based on sequence similarity to rice HD-zip protein (E value 10-28). Based on high sequence similarity to that of rice, an actin-depolymerizing factor was located on bread wheat chromosome 3BS. Studies confirmed that this actin-depolymerizing factor was involved in pollen actin reorganization (Lopez et al. 1996) and affected pollen tube elongation (Chen et al. 2002). These examples illustrate the power of using homology search from rice to place genes with putative function onto a wheat map that will facilitate functional assignment of wheat ESTs.

Consensus gene maps are important tools for comparative genetics. With the availability of the whole genome sequences of rice and Arabidopsis as well as the abundant EST data from many plant taxa, it is possible to reveal more information between monocots and dicots by sequence alignment since gene contents and gene orders among different plant species are highly conserved (Bennetzen and Freeling 1993; Gale and Devos 1998). It implied that alignment of common markers on maps of one member of the grass family allows tentative assignment of functionality to the other genes (Gale and Devos 1998). Alignment of sequence data across organisms will become an increasingly important aspect of future gene discovery and development strategy. Colinearity combined with homology information offers an attractive approach to comparative genomics.