Introduction

Microsatellites, which are genetic markers based on the PCR amplification of simple sequence repeats, have been found to vary in the polymorphism they detect depending on their length, genomic location and the sequence of the repeat motif they contain (Hearne et al. 1992). Microsatellite markers for plant species generally have been made to target di-, tri- or tetra-nucleotide repeats with either equal GC/AT content, more AT content than GC content or more GC content than AT content, though less frequently in this last case (Powell et al. 1996). The source sequences for microsatellites have included simple sequence repeat-enriched library clones, expressed sequence tags and bacterial artificial chromosome end sequences (Edwards et al. 1996; Rota et al. 2005; Mun et al. 2006). The most valuable microsatellite markers in many settings are those that detect a high amount of variability that are easy to amplify and are single copy rather than being multiple banding (Morgante and Olivieri 1993; Powell et al. 1996). Therefore it is important to evaluate the different sources and motifs of microsatellites for their productivity in each crop in terms of producing polymorphic, amplifiable markers.

Legume crops are of primary importance to world agriculture due to their role in biological nitrogen fixation, wide environmental adaptability and utility in rotations with cereals (Graham and Vance 2003). Among the grain legumes, soybean, peanuts and common beans are the most important commercial crops, with soybean and peanuts as important for protein and oil production and common bean the most important for direct human consumption (Broughton et al. 2003).

Microsatellites have been developed for various legume crops but from different sources. In soybean, many microsatellites are available and most have been made through the development of ATA-enriched microsatellite libraries (Cregan et al. 1999; Song et al. 2004) with a large number also developed through end sequencing of bacterial artificial chromosome clones (Shultz et al. 2007). Similarly, in chickpea, TAA-based microsatellites have proven to be highly polymorphic (Huttel et al. 1999; Udupa et al. 1999; Udupa and Baum 2001; Udupa et al. 2004).

In common bean, on the other hand, microsatellites have mostly been developed through SSR enriched libraries for GA or CA repeat containing sequences (Gaitán-Solís et al. 2002; Métais et al. 2002; Yaish and Pérez de la Vega 2003; Buso et al. 2006; Campos et al. 2007; Benchimol et al. 2007) and through screening of EST collections or gene sequences (Yu et al. 1999, 2000; Blair et al. 2003; Hanai et al. 2007). The latter SSR type have generally been of lower polymorphism than genomic microsatellites developed for the crop and therefore are most useful in inter-genepool crosses but less so in intra-genepool crosses (Blair et al. 2006a). In peanuts, low polymorphism of microsatellite markers has been a serious problem especially within the cultivated accessions (Hopkins et al. 1999), but has been overcome by screening of restriction digested genomic clones for various motifs including AT-rich sequences (Ferguson et al. 2004). A very complete study of microsatellite frequency in the model legume Medicago truncatula has shown that AT and ATA motifs along with GA and CA motifs are frequent in genomic sequences from gene-rich BACs and BAC ends in this species (Mun et al. 2006). In summary, the screening of AT-rich motifs such as ATA and AT have proven to be useful for developing large sets of microsatellites in individual legume species but has not been widely exploited for common bean, although results from Métais et al. (2002) show the promise of the ATA motif in detecting high polymorphism within snap beans.

In plant breeding applications, microsatellites are commonly used for genetic mapping of commercial crops and for diversity assessments of germplasm accessions (Morgante and Olivieri 1993). Highly polymorphic microsatellites tend to be more useful in these studies because they can be used in any genetic cross combination or in closely related samples of genotypes from a local collection and in this sense, microsatellites can be complementary to SNP markers which tend to be bi-allelic and genepool specific (Rafalski 2002). In common beans, we have used GA-motif microsatellites along with EST-based microsatellites to construct a genome-wide, simple-sequence framework map based on inter genepool crosses (Blair et al. 2003) and the same markers have been useful for evaluation of a broad range of diversity (Blair et al. 2006a) and for evaluating race structure (Díaz and Blair 2006; Blair et al. 2007). However, additional microsatellite markers are needed for saturation mapping, for constructing genetic maps in very narrow genetic crosses such as within a given race of common beans and for evaluating within accession variability. For these activities, highly polymorphic markers such as the AT-rich microsatellites available for some other legumes would be desirable for common beans.

The objective of this research, therefore, was to develop a set of microsatellite markers for common bean based on AT-rich motifs and ATA trinucleodite repeats and to evaluate their utility for genetic analysis of the crop. To do this, we build on the work of Métais et al. (2002) who developed an initial set of 15 ATA-based microsatellites. We specifically concentrated on characterizing microsatellites selected from ATA versus CAC/CA/GA enriched libraries and evaluated the genetic diversity detected by the different types of markers and their distribution on a common bean genetic map.

Materials and methods

Plant materials and DNA extraction

One population of recombinant inbred lines and a set of 18 diverse genotypes were used for this study: The population was based on the cross DOR364 × G19833 as described in Blair et al. (2003). The genotypes for the polymorphism survey consisted in 18 genotypes evaluated as survey I by Blair et al. (2006a), which included 7 genotypes from the Andean gene pool and 11 from the Mesoamerican gene pool, with a total of 15 cultivated and 3 wild accessions, these last genotypes representing Argentinean, Colombian and Mexican wild diversity. Among the cultivated genotypes, three were advanced breeding lines from CIAT (BAT477, BAT881 and DOR364) with the remainder germplasm accessions (G11360, G11350, G21657, G21078, G21242, G14519, G19833, G3513, G21212, G24404, G24390, G19892) or locally bred varieties (eg., Carioca and ICA Radical Cerinza, varieties from Brazil and Colombia, respectively) which have been used to develop mapping populations for common bean including recombinant inbred lines BAT881 × G21212 in Frei et al. (2005) and an advanced backcross population from Radical Cerinza × G24404 in Blair et al. (2006b). Germplasm accessions meanwhile, represented the Nueva Granada and Peru races within the Andean gene pool and the Durango, Jalisco and Mesoamerica races within the Mesoamerican gene pool. The diversity panel included the parents of the DOR364 × G19833 mapping population, which was useful to determine parental polymorphism before genetic mapping. Total genomic DNA for these genotypes was isolated using a CTAB extraction method as described in Afanador and Hadley (1993) from bulked-leaf tissues of eight greenhouse-grown plants per line, so as to capture within accession diversity as part of the parental screening. Any markers detecting heterozygosity/heterogeneity in the parents of the DOR364 × G19833 population were not used for genetic mapping.

Microsatellite markers

The construction of the SSR-enriched libraries and microsatellite development was as described in Métais et al. (2002) with two types of markers developed: (1) those selected based on the motif ATA (AT-rich) and (2) those selected based on the motifs CAC, CA and GA (non-AT-rich). Microsatellites were named according to the convention shown in Table 1 and were amplified for the polymorphism survey using standard PCR conditions as described in Blair et al. (2003) with a hot start of 92°C for 3 min; then 30 cycles of 92°C denaturing for 30 s; 60°C annealing for 30 s and 72°C extension for 45 s; followed by a 5 min final extension at 72°C. The PCR reactions were carried out in a 12 μL final volume containing 25 ng of genomic DNA, 0.15 μM of each of the forward and reverse primers, 10 mM Tris–HCl (pH 7.2), 50 mM KCl, 1.5 MgCl2, 200 mM of total dNTP and 1 unit of Taq polymerase (Promega, Madison WI). Amplifying primers were mapped using a touchdown profile with 1°C drop per cycle in annealing temperatures for 9 cycles followed by 35 cycles at 52°C. After amplification, the PCR reactions were concentrated in a vacuum centrifuge for 30 min upon which a volume of 4 μL of formamide, containing 0.4% bromophenol blue and 0.25% w/v xylene cyanol FF, was added to each PCR reaction and the mixture was denatured at 92°C for 2 min. The mixtures were then loaded onto 4% denaturing polyacrylamide (29:1 acrylamide: bis-acrylamide) gels that contained 5 M urea and 0.5× TBE and run in Sequi-Gen GT electrophoresis units (Biorad, Hercules, CA, USA) at approximately 120 W and 45°C. Detection of PCR amplification products was via silver staining according to Blair et al. (2003) and the allele sizes were estimated based on 10 and 25 bp MW ladders.

Table 1 Design and diversity assessment of 85 common bean microsatellite loci

Data analysis

Genetic diversity detected by the individual markers was evaluated by estimating the polymorphism information content (PIC) from the allele assignments. The PIC value refers to the relative value of each marker with respect to the amount of polymorphism it exhibits. PIC values as well as the distribution and range of allele sizes were calculated using the software program Powermarker (Liu and Muse 2005). For the genetic mapping exercise, each RIL was scored for the parental allele it contained for each microsatellite. Segregation data was used to place the microsatellites on the established genetic map for DOR364 × G19833 described in Blair et al. (2003). Linkage analysis was conducted with the Kosambi mapping function using the software application Mapmaker 2.0 for Windows (Lander et al. 1987) and linkage group orientation was drawn as for the genetic map of Freyre et al. (1998). The microsatellites were placed to the established linkage groups with the ‘try’ and ‘compare’ commands with a minimum LOD of 3.0.

Results

Marker characterization

A total of 85 common bean microsatellites were developed in this study (Table 1). Of these, 74 were based on AT-rich sequences and 11 were based on non-AT-rich sequences. The AT-rich microsatellites included 71 that were newly developed from an ATA-enriched library developed for this study and three AT-rich microsatellites from gene-based sequences (HRG, ME1 and BN). The 11 non-AT-rich microsatellites included two microsatellites from CAC selection, two from CA selection, and six from GA selection along with a GC-rich gene-based microsatellite (PG2). The four gene-based microsatellites are listed after the 81 microsatellites from the motif-enriched libraries in Table 1. The development of the non-AT-rich microsatellites is described in Métais et al. (2002) and complements that work, while the AT-rich microsatellites were developed from a further 1632 ATA-enriched clones. These were screened by hybridization using a 32P labeled (ATA)8 oligonucleotide probe resulting in 698 clones with a clear hybridization signal for which clear sequences were obtained from 288 clones.

After analysis, 271 sequences contained ATA repeats while 17 clones did not present microsatellites. Sequence alignment was used to determine possible contigs that grouped the 271 positive clones into 84 clusters, which were used to define the 71 markers described above. From these results, we deduce a redundancy of (187/271) 69% for the ATA library.

The microsatellites from the ATA-enriched library had 8–72 repeat motif units; however the mean number of repeats across the ATA-based markers (15.5) was closer to the smaller of these values. The microsatellite with the highest number of repeats was ATA20 (72) while ATA13, ATA5 and ATA30 also had large numbers of repeats (68, 42 and 37, respectively), however, the last two with some imperfect repeats. ATA172 and ATA69, each with 24 repeats, were the highest repeat number microsatellites among the remaining markers while all the other microsatellites in this category had fewer than 21 repeats. The microsatellites developed for this motif may be biased against long-repeat lengths, since ATA repeats generally are difficult to sequence as was evidenced in the sequencing success rate of 41.2% for the original ATA containing clones.

A total of 68 of the ATA-targeted microsatellites had a single type of repeat while four were complex microsatellites having two or more repeat motifs. In these cases, ATA motifs were paired with GAA and ATG tri-nucleotide motifs (for ATA2 and ATA268, respectively) or with TA di-nucleotide repeats (for ATA76 and ATA268). While searching for ATA motif microsatellites, we also identified two markers that contained AT-rich sequences but not the precise ATA trinucleotide motif that had been enriched for; these were the TAAAA-based marker ATA271 and the AT-based marker ATA39. Another marker, ATA5, targeted ATN repeats, that is to say it was AT-rich having imperfect repeats at the third position with N equal to A, G or T nucleotides. In addition, 18 other microsatellites were imperfect and had interrupted repeats in parts of their ATA-based sequences, of which ATA220 contained a motif where the repeats began with (TTA)6, continued with (TAT)4 and ended up with (ATT)5; while ATA289 began with (TAT)2 and ended with (AAT)12.

Of the non-AT-rich microsatellites, the two CA-selected markers had 12 and 13 repeats, respectively; the two CAC-selected microsatellites had long GA motifs (Table 1); and the GA-selected microsatellites had either GT or GA motifs with an average repeat number of 31.7. It is not known why selection for CAC motif produced GA microsatellites but this could have been due to hybridization stringency used in the procedure by Métais et al. (2002). Four gene-based sequences were selected to add additional AT-rich microsatellites, including two based on dinucleotide (AT)n motifs and one based on the trinucleotide (TAT)9, as well as a CG-rich marker PG2 based on the trinucleotide (CGG)8. The average number of repeats in these microsatellites was 12.8.

When all the AT-rich microsatellites (those with AT, ATA or ATG motifs) were compared to all the non-AT-rich microsatellites (those with CA, GT, GA or CGG motifs), we observed that the average number of repeats was lower for the AT-rich microsatellites (15.5 vs. 25.4 repeat units) although this was mainly due to the GA microsatellites that tended to be longer in repeat number (33.6) than the AT (15.7), ATA (15.6), or GT (12.3) motif markers.

Diversity assessment

PCR amplification and banding patterns of the AT-rich and non-AT-rich markers across the diversity panel were comparable, however, AT-rich microsatellites tended to be slightly more difficult to amplify but more diverse in terms of allele numbers. Figure 1 shows the amplification pattern of three of the ATA based microsatellites in the parental survey of 18 genotypes. A total of seven microsatellites (ATA13, ATA39, ATA82, ATA118, ATA122. ATA154 and GA11), which included the two with the largest number of ATA repeats produced uneven PCR amplification and these were not considered in the diversity analysis.

Fig. 1
figure 1

Amplification pattern of three polymorphic AT-rich microsatellites on the genotype survey of 18 genotypes

While a majority of the microsatellites produced single bands, multi-copy banding was observed for 11 markers as listed in Table 1 and these were also not evaluated for polymorphism information content. Several markers (ATA30, ATA108, ATA150, ATA241) produced double banding patterns, which were interpretable for polymorphism information content. Stutter bands were more common for the AT-rich microsatellites, but in these cases the top band was generally stronger than the rest of the bands and was used for allele calling.

The average number of alleles found for each class of microsatellites varied with the motif. AT motif microsatellites were the most variable showing on average 12 alleles across the 18 genotypes used in the parental survey. GA motif microsatellites were also very variable showing 8 alleles per locus on average, followed by ATA motif microsatellites with 5.8 alleles per locus on average. Meanwhile, the GT motif microsatellites were highly monomorphic, averaging only 1.5 alleles per locus.

Polymorphism information content was related to average number of alleles and again the AT motif microsatellites had the highest value (0.892), compared to GA (0.706) and ATA (0.600) motif microsatellites. The CA motif microsatellites had very low PIC values of 0.140. The highest PIC values were observed for the marker ATA76 (0.858) and the gene-based microsatellites HRG (0.905) and ME1 (0.878) that had AT motifs, as well as the complex, GA and GT motif marker GA16 (0.874). The next highest PIC values were found for the ATA motif markers, ATA7, ATA9, ATA20, ATA24, ATA150, ATA170, ATA180, ATA234, ATA241 and ATA296, all with PIC values above 0.800 and 7–11 alleles per locus. One other GA motif marker, CAC1 showed a large number of alleles (12) and a high PIC value (0.827). The remaining markers showed varying degrees of polymorphism except for four of the ATA motif markers (ATA73, ATA121, ATA157 and ATA236) and two of the GT motif markers (CA3 and GA4) that were monomorphic. Across all the microsatellites tested, there was an average of 5.89 alleles detected and an average polymorphism information content of 0.590, which was higher than the values reported in Blair et al. (2006a) for a mix of GA motif and cDNA-based microsatellites (PIC value = 0.534); in Benchimol et al. (2007) for mostly GA- and CA-based microsatellites (PIC value = 0.45); and in Métais et al. (2002) who analyzed more closely related germplasm and had PIC values averaging 0.440 and allele number averaging 5.3.

Correlations between the number of alleles and the PIC values were high across the range of all microsatellites (r = 0.878, P = .0000) and for both AT-rich (r = 0.877, P = .0000) and non-AT-rich (r = 0.897, P = .0000) microsatellites individually. Meanwhile the number of alleles and PIC values were not significantly correlated with the number of repeat motifs within the microsatellites (r = −0.273 and r = −0.113, respectively). However, the allele size range between largest and smallest allele was correlated with the number of alleles present for a locus (r = 0.623, P = 0.0000) and the PIC value (r = 0.434, P = 0.001). The correlations of allele size range with number of alleles and PIC values were also significant when considering AT-rich (r = 0.599 and r = 0.386, respectively) and non-AT-rich (r = 0.903 and r = 0.826) microsatellites, separately.

Figure 2 shows that the distribution of allele size ranges for the markers developed from the AT-rich versus the non AT-rich microsatellites. The average allele size that ranges across all the markers was 28.2 nt. Interestingly, a binomial distribution can be observed for the AT-rich microsatellites, whereas the non-AT-rich microsatellites show a skewed distribution towards low allele size ranges. The size differences between the largest and the smallest alleles were 24.4% wider on average for the AT-rich microsatellites (28.0 bp) compared to non-AT-rich microsatellites (22.5 bp). For most of the markers (71.2%), the expected allele size based on the Fin de Bagnols source (Métais et al. 2002) was within the range between the largest and the smallest allele sizes for each marker.

Fig. 2
figure 2

Distribution of allele size ranges for AT-rich (hatched bars) and non-AT-rich (solid bars) microsatellites on the panel of 18 common bean genotypes

Genetic mapping of microsatellites

Overall polymorphism rate was 74.6% across the parents of the DOR364 × G19833 population with a total of 50 microsatellites amplified for genetic mapping. The polymorphism rate was higher in the AT-rich microsatellites (77.5%) compared to the non-AT-rich microsatellites (55.5%). Both were within the range of polymorphism for the same population as analyzed for other genomic microsatellites analyzed by Blair et al. (2006a). From this effort, a total of 45 loci could be placed on the DOR364 × G19833 genetic map (Fig. 3). All microsatellites were placed at a LOD > 3.0 and integrated with the markers mapped by Blair et al. (2003). New microsatellite loci were found on each of the 11 linkage groups of the species, however the AT-rich microsatellite loci tended to be located at relatively few genomic locations.

Fig. 3
figure 3

Genetic map for the DOR364 × G19833 (chromosome designations b01b11) population showing positions of 45 microsatellite loci. Other genetic marker positions as reported in Blair et al. (2003)

While each linkage group was found to have at least one new microsatellite locus; linkage groups, b02, b09 and b11 had larger numbers of microsatellite loci placed on them, (8, 6 and 7 markers, respectively) when compared to the other linkage groups where 4 or fewer loci each mapped. Clustering of the ATA microsatellite loci occurred on four linkage groups, especially on linkage group b02 where markers were mapped to two small regions in the middle of the linkage group (indicated as cluster A), as well as in regions on linkage groups b08 and b10. The marker ATA154 clustered with a group of GA motif markers from Blair et al. (2003) on linkage group b03 (indicated as cluster B); while the markers ATA150a and ATA173 clustered with another group of GA-based BM markers on linkage group b07 (indicated as cluster C).

In contrast, the microsatellite loci were better distributed on linkage groups b01, b04 and b09; while linkage group b11 had a cluster of three ATA microsatellites in the middle of the linkage group and more evenly distributed markers at the extremes. While the majority of the mapped microsatellites were mapped as single loci, duplicate loci were mapped for ATA108 on linkage groups b01 and b11; for ATA150 on linkage groups b07 and b10; and for ATA241 on linkage groups b10 and b11. Mapping of the other double banding marker, ATA30 was not possible due to monomorphism in the population.

Discussion

In this study, we first determined the utility of AT-rich microsatellites in terms of detecting allele variability and polymorphism information content. In general terms, we found the AT-rich microsatellites to be very useful markers, similar to results from several other legumes where these motifs have been targeted (Cregan et al. 1999; Huttel et al. 1999; Udupa et al. 1999; Udupa and Baum 2001; Udupa et al. 2004; Ferguson et al. 2004; Song et al. 2004). We also found that AT motif microsatellites tend to be more polymorphic than GA-based microsatellites, that ATA microsatellites are approximately equal or higher in polymorphism with GA motif markers and that CA microsatellites are of low polymorphism in common beans based on results from this study and comparisons to previous work with predominantly GA-, CA- and cDNA-based markers from Blair et al. (2006a, b) and Benchimol et al. (2007). These results are similar to those for soybean and chickpea, where AT and ATA based microsatellites were highly productive for marker development (Huttel et al. 1999; Song et al. 2004), and with the results from peanuts, where AT markers had more alleles than ATA-, GA- or GT-based markers (Ferguson et al. 2004). AT and ATA motif SSRs are also abundant and highly polymorphic in rice (Temnykh et al. 2001). While AT and ATA markers are arguably the most valuable markers within a species, it is notable that AT-rich microsatellites are not likely to be conserved as SSR loci between different genera of legumes such as chickpea, lentil and pea (Choumane et al. 2004).

Earlier efforts for microsatellite development in plants tended to target GA and CA motif SSRs because of their utility in human populations (Hearne et al. 1992) and because of their ease of detection in hybridization or enrichment-based screening protocols. Concurrently, ATA motif microsatellites tended to be more difficult to develop due to the low-melting temperature of ATA primers used to develop enriched libraries (Métais et al. 2002) and ATA probes used to hybridize with non-enriched libraries (Cregan et al. 1999). AT motif microsatellites likewise have been difficult to develop since AT sequences are palindromic and DNA with long stretches of dinucleotide AT repeats are difficult to clone and to sequence (Powell et al. 1996). Despite these difficulties, both ATA and AT motif microsatellites have been developed for peanuts, chickpea and soybean through hybridization based selection using non-enriched libraries (Cregan et al. 1999; Huttel et al. 1999; Ferguson et al. 2004). Our study also shows that it is possible to create libraries that are enriched for ATA based microsatellites and that this works well for marker development in common beans, although a certain amount of ATA markers are difficult to amplify. These results build on those of Métais et al. (2002) who developed the first ATA microsatellites, Gaitán-Solís et al. (2002) who attempted ATA enrichment but without success and Caixeta et al. (2005) who discovered a limited number of ATA motif microsatellites based on subcloning of common bean BACs. As was found for other crops, sequencing of positive clones for ATA-motifs can also yield compound microsatellites including some with AT-motifs. Interestingly, AT motifs were common among microsatellites uncovered in the sequencing of a common bean RFLP library (Murray et al. 2002; Blair et al. 2003) and were the most prevalent motif in a previous search of common bean Genbank sequences by Yu et al. (1999) surpassing GA, ATA or other trinucleotide repeats although they were rather rare in the EST screening performed by Hanai et al. (2007) where AT and ATA motifs were 15.3 and 2.5% of the totals identified among 240 SSRs found for over 3,000 unigenes.

Another finding of our study was that while AT-rich microsatellites are comparable or better than GA motif microsatellites developed from enriched libraries there was variability in terms of the diversity uncovered by each marker. As in the study by Ferguson et al. (2004), the diversity at AT-rich microsatellite loci does not appear to be related to size of the repeat array found in the SSR loci in common bean. This could be due to the frequency of imperfect SSR arrays we found for ATA motif microsatellites. Interruptions in repeat arrays have been suggested to serve as anchors that prevent strand slippage that is thought to be the generator of diversity at microsatellite loci (Chambers and MacAlvoy 2000).

In general, the average number of repeats found in this study was similar to the number of repeats in previously developed microsatellites from enriched libraries, averaging 15.8 across all the newly developed markers compared to 15.4 and 16.0 for microsatellites from the GA enriched libraries made by Gaitán-Solís et al. (2002) and Buso et al. (2006), respectively. However, some very long ATA, AT and GA microsatellites (over 30 repeats) were identified in this study and some of these are highly polymorphic. Hanai et al. (2007) meanwhile found many compound microsatellites in their GA and CA motif library enrichment with repeat numbers ranging from 2 to 20 and averaging 6.8 for any single SSR stretch.

In addition to the imperfect repeat loci identified, we also found a few compound repeat loci with some of these were fairly diverse (ATA76) in terms of allele number while others were not (ATA2 and ATA268). The most polymorphic of these loci was unique in having two adjacent repeat arrays, namely (ATA)12 and (TA)8 separated by 29 base pairs. A similar ATA and AT compound locus was carefully analyzed by Udupa and Baum (2001) in 114 chickpea accessions with the finding that it was also highly variable and that the variability was due to expansion and contraction of the number of repeats. They also found, in contrast to our study, that the threshold number of repeat arrays for the two individual motifs within the locus was reciprocally correlated so that the overall size of the locus did not expand beyond a given limit. This difference may have been due to the chickpea locus having a longer repeat array than our common bean locus. For ATA motif microsatellites in common bean, variability at the long repeat marker ATA20 was studied by Métais et al. (2002) and found to be due to increases and decreases in repeat number rather than to the changes in the regions flanking the repeats.

A valuable characteristic of the AT-rich microsatellites, especially not only the AT-based markers but also the ATA markers, was that their high level of polymorphism allowed a large proportion of the markers to be genetically mapped. A total of 45 new loci could be mapped on the DOR364 × G19833 integrated map. The polymorphism of AT-rich markers in this population was over 10% higher than that for GA-based microsatellites as reported in Blair et al. (2006a, b). With the genetic mapping of microsatellites previously conducted in the species by Yu et al. (2000), Blair et al. (2003) and Grisi et al. (2007), this brings to over 160, the number of mapped loci for common bean. An interesting observation was that an intermediate number of new loci were mapped to the middle and ends of linkage group b01, and to linkage group b09, two regions that suffer from low diversity in other crosses and was difficult to map in Blair et al. (2003) or Grisi et al. (2007).

As in the earlier study, we found that a certain number of microsatellites detected double loci although the majority was single-copy markers. Duplicate loci could have resulted from segmental duplication events or similarity in primer sequences across more than one locus. Some AT-rich microsatellites have been found to be associated with repetitive elements that could be dispersed throughout the genome (Ramsay et al. 1999; Temnykh et al. 2001), which might explain why we identified duplicate loci for some of the new microsatellites and also why we obtained a group of microsatellites that produced multiple bands and that therefore were not useful as single copy markers. Other discoveries from the genetic mapping of ATA microsatellites was that their loci cluster rather than being randomly distributed and that a greater number of loci were found on linkage groups b02, b09 and b11 compared to other linkage groups. This same phenomenon of clustering was observed for GA-based microsatellites from enriched libraries that were mapped by Blair et al. (2003), however these were found on different linkage groups, such as b03, b04, although clustering on linkage group b02 was in common with this study. Indeed our mapping shows that some ATA motif markers fall into clusters of GA based BM markers especially on linkage groups b03 and b07.

Furthermore, the ATA-based microsatellites tended to be at interstitial locations rather than at distal locations on the linkage groups. This may reflect a bias of ATA motif sequences against telomeres or towards proximal locations. Clustering of genomic microsatellites has been associated with retrotransposons and may reflect their distribution (Ramsay et al. 1999) and appears to be more common with microsatellites from enriched libraries (Ramsay et al. 2000; Tang et al. 2002). In a few cases, microsatellites from enriched genomic libraries or those associated with retrotransposons have been found to be clustered in or around centromeric regions (Areshchenkova and Ganal 1999; Jones et al. 2002; Ramsay et al. 2000). The centromeres in common bean have not been accurately mapped and therefore it is impossible to tell if the ATA microsatellite clusters are actually located near them although progress in the physical mapping of common bean may allow this in the future (Pedrosa et al. 2003). Clustering is usually not evident with microsatellites derived from gene sequences (Scott et al. 2000; Cordeiro et al. 2001) or from BAC ends (Mun et al. 2006).

In conclusion, the targeting of ATA motifs for enrichment was a valuable approach to marker development in common bean. Along with the AT motif microsatellites and additional GA markers, this study provides a set of well characterized genetic markers that can be used for diversity assessment or genetic mapping, especially in populations derived from narrow crosses. The increase in microsatellites based on the ATA tri-nucleotide motif is useful for diversity assessment, since these markers are often easier for allele calling than GA microsatellites due to the larger size difference in repeat units of the trinucleotide motif markers which allow them to be used in automated multiplex fingerprinting. Future work will concentrate on mapping the ATA motif microsatellites in additional populations and obtaining ATA or AT motif microsatellites from BAC end sequences, given that other sources for this motif such as ESTs are not very productive (this laboratory, unpublished results). It is important likewise, to know how often AT and ATA or other motifs occur in non-enriched genomic libraries. Overall our aim has been to produce informative microsatellite markers for whole genome analysis, gene tagging, marker-assisted selection and diversity evaluation.