Common bean (Phaseolus vulgaris L.) is one of important economic crops cultivated in Asia, Africa, Western Europe and Latin America. World production stands at about 20 million tons (El-Aal et al. 2011). The species is an annual diploid (2n = 2x = 22) species, and has a genome size of 650 Mb (Arumugamathan and Earle 1991). Common bean was domesticated in two centers of origin (Andes and Mesoamerica), which generated two major gene pools (Andean and Mesoamerican) (Singh et al. 1991a). The existence of the two gene pools has been confirmed by phaseolin seed proteins (Gepts et al. 1986), allozymes (Santalla et al. 2002; Singh et al. 1991a), morphological traits (Singh et al. 1991b) and DNA markers (Beebe et al. 2000; Blair et al. 2006; Zhang et al. 2008). China is considered to be a secondary center of diversity, and the beans there have been classified into the two gene pools (Andean and Mesoamerican) based on phenotypic traits, phaseolin and simple sequence repeat (SSR) markers (Zhang et al. 2008). Other secondary centers of diversity are found in Africa (Asfaw et al. 2009), Europe (Santalla et al. 2002) and parts of lowland Latin America (Durán et al. 2005).

Molecular markers have proven to be powerful tool in studies of genetic variation, dissecting quantitative trait loci, genetic mapping and molecular breeding in bean genomics research. Of the molecular markers, SSR (microsatellite) markers are more widely distributed than other markers in the common bean genome (Blair et al. 2006, 2008, 2009a, b; Córdoba et al. 2010). Microsatellites are effective in showing a high degree of polymorphism and are excellent for use in genetic mapping or gene tagging studies (Matus and Hayes 2002; Mitchell et al. 1997). A large number of SSR markers have been developed for common bean (Blair et al. 2003, 2008, 2011, 2012; Blair and Hurtado 2013; Córdoba et al. 2010; Gaitán-Solís et al. 2002; García et al. 2011; Hanai et al. 2007, 2010; Yaish and Pérez de la Vega 2003; Yu et al. 2000) but many have been based on gene sequences and are still not perfectly distributed in common bean. Therefore, it is necessary to develop new genomic SSR markers to fill some gaps in the genetic map of the species.

A range of techniques can be used obtain partial genomic sequences and the development of the 454 sequencing platform provides individual laboratories with access to genomic sequencing. The 454 sequencer can efficiently obtain genomic or expressed sequence reads of DNA sequences (Kalavacharla et al. 2011). Compared to other sequencing technologies, the 454 pyrosequencing technique has the benefit of providing large amounts of sequence data at low cost, accurate results, high sensitivity, automated reads and no need for fluorescence-labeled primers. Microsatellites have the essential ability to explore polymorphisms on panels of diverse germplasm. The diversity in polymorphism level sometimes depends on the repeat motif length and sequence and their position in gene-coding or non-coding fragments (Eujay et al. 2002; Temnykh et al. 2000, 2001; Thoquet et al. 2002). Polymorphism information content (PIC) analysis can be used to assess markers for appropriate selection for genetic mapping and phylogenetic analysis (Anderson et al. 1993). In common bean, microsatellite markers are useful for constructing genetic maps (Yu et al. 2000; Blair et al. 2003, 2008; García et al. 2011), and can be regard as more polymorphic than earlier markers used to identify genetic diversity (Blair et al. 2006), such as RAPD (Freyre et al. 1996) and RFLP (Becerra-Velazquez and Gepts 1994).

The objective of our research, therefore, was to create abundant bean DNA sequences through the genome sequencing of a standard Chinese landrace, Hong Yundou, and to develop polymorphic SSR markers from these sequences. Secondary objectives were to genetically map the new markers and determine their effectiveness in the detection of diversity.

The principal genotype used for sequencing was Hong Yundou, which is a well-known common bean landrace originally from Hebei, China, characterized by high yield, high quality and high resistance to anthracnose as a dry bean. In addition, 131 genotypes were used for diversity assessment (Table S1), most of which were landraces from China (127 genotypes), while four were from the International Center for Tropical Agriculture. Genomic DNA was isolated using plant Genomic DNA Extraction Kit (Tiangen Biotech) and the DNA was diluted to the appropriate level for sequencing or marker amplifications.

The Roche 454 (GS FLX Titanium System) high-throughput pyrosequencer was applied to DNA sequencing based on DNA samples according to the manufacturer’s instructions. A number of perfect di- and tri-nucleotide SSR motifs were searched for from the new DNA sequences using SSR identification software SSRIT (Simple Sequence Repeat Identification Tool) on the GRAMENE website (http://www.gramene.prg/db/marlers/ssr-tool). Primer pairs were designed using the software Batch Primer (You et al. 2008). The parameters for primer design were as follows: lengths of the expected PCR product were between 100 and 300 bp, lengths of the primers were between 18 and 22 nt, the annealing temperature ranged from 50 to 60 °C, and GC content ranged from 40 to 60 %. The markers were named with the prefix BMg, standing for bean microsatellites from genomic sequence.

All microsatellite markers were amplified using 15 μl PCR Master Mix, containing 20 ng DNA, 0.2 μM of each forward and reverse primer (Invitrogen, USA), 0.25 mM dNTP, 1.5 μl 10 × PCR buffer with 1.5 mM MgCl2, and one unit of Taq DNA polymerase, and performed on a T100 thermal cycler (Bio-Rad Research, USA). PCR conditions consisted of the following: 94 °C for 5 min, then 35 cycles each of 94 °C for 45 s, 50–60 °C for 45 s, and 72 °C for 45 s, plus a final extension at 72 °C for 7 min. PCR products were resolved on 6 % polyacrylamide gels in 0.5 × TBE buffer. The gels were detected by silver staining after electrophoresis and the bands were visualized in white light.

The PIC of SSR markers, the number of alleles per locus (Na), and the observed and expected heterozygosity (H o and H e) was analyzed using POPGENE software (Yeh et al. 1997). The software PowerMarker v3.25 was used to construct a simple-matching dendogram from Nei’s genetic distance matrix adopting the unweighted pair group method with arithmetic averages (UPGMA) as the analysis (Liu and Muse, 2005). The tree was viewed and plotted with MEGA 4.0 (Tamura et al. 2007). MapManager software with the Kosambi mapping function (Manly et al. 2001) was used to map these new markers to linkage groups which were named according to the chromosome conventions in Blair et al. (2003). The linkage map was established with a minimum P value of 0.05.

The main advantage of the 454-FLX system was the generation of longer reads, which were able to obtain average lengths of 250 bases which were coupled in mate-paired ends (Daniel et al. 2008). In this study, a total number of 667,221 reads accounting for 217 Mb of raw data were generated. The length of the reads varied from 40 to 682 bp, with a median length of 420 bp (Table S2). A total of 13,306 perfect microsatellites were detected in the DNA sequences. The microsatellites included 1,530 (11.5 %) di-nucleotide motifs, 4,168(31.3 %) tri-nucleotide motifs, 1,212 (9.1 %) tetra-nucleotide motifs, 3,169 (23.8 %) penta-nucleotide motifs and 3,227 (24.3 %) hexa-nucleotide motifs. Among the di-nucleotide motifs, the AT/TA type was more abundant (60.1 %) than AG/TC (24.8 %) and AC/TG types (15.1 %), which was similar to the previous results from Lagercrantz et al. (1993). No CG/GC types were found in the common bean 454 sequences. Our research identified ten tri-nucleotide motif types and supports the proposal that the AAG and AAT motifs were the most abundant, accounting for 27.3 and 23 % of occurrences, respectively, and the CCG motif was the least abundant (3.0 %). Among the types of tetra-nucleotide and penta-nucleotide SSRs, AT-rich motifs were the most common, with 29.2 % AAAT and 14.5 % AAAAT motifs, respectively. Hexa-nucleotide motif types included AAGCCC (13.9 %), CTTGGG (11.5 %), AATGTC (7.6 %) and TTGACT (6.3 %).

In this study, we used 454 sequencing to obtain 217 Mb of sequence data for a Chinese landrace of common bean which is an important source of diversity for breeding efforts in China (Zhang et al. 2008). The amount of sequence data represents about 30 % of the common bean nuclear genome given that the chromosomes sizes are considered to add up to around 650 Mb in total (Arumugamathan and Earle 1991). Data-mining of 454 sequences has become common for discovery of markers of various types. Here we concentrated on finding SSR loci and developing SSR markers from the dataset. An SSR search was justified based on the novel source of DNA sequence for this effort, especially since few microsatellites have been found for beans specifically from China, a secondary center of diversity.

In terms of the types of motifs identified in the 454 sequences, these were slightly different than in previous studies of common bean SSRs, with tri-nucleotide, tetra-nucleotide and penta-nuceotide loci common. In another study of SSRs from genomic DNA, Blair et al. (2009b) identified dinucleotide repeats as the most abundant class of microsatellite in the cDNA library of common bean. However, that study used small-insert libraries derived from frequently cutting restriction enzymes and differences may exist among genomic regions in terms of SSR loci. Alternatively, different search engines find different motifs with different frequencies in terms of the SSR detection. For the tri-nucleotide SSRs, early studies indicated AAT, AAC and TGC as the most common tri-nucleotide repeats in plants (Wang et al. 1994). This observation agrees with the results we obtained and also with Blair et al. (2008), who reported that ATA or AAT repeats were frequent in common bean genomic sequences. Meanwhile, in the Medicago truncatula genome, AT, AAT, AG and AC motifs were the most common for genomic sequences (Mun et al. 2006). In the only other study in common bean that has used 454 pyrosequencing to detect SSRs in common bean, Kalavacharla et al. (2011) found that tri-nucleotide repeats were more common than di-nucleotide repeats in expressed sequences from two genotypes, Sierra and BAT93. Repeats with motifs of 4, 5 or 6 bp were very uncommon in these expressed sequence tags (ESTs), as was also found by Blair et al. (2009a, 2011) for cDNA clones that were sequenced from the genotypes DOR364 and G19833.

To sample these microsatellite loci, we developed a total of 90 SSR primer pairs based on genomic sequences. Among these primers, tri-nucleotide repeats were the main type of loci selected, with 81 in this category and only 8 di-nucleotide repeat loci. For each of the 90 SSR markers, the number of alleles across 131 accessions ranged from two to seven with an average of 3.56 alleles per loci. The SSR markers had a PIC value in the range of 0.11–0.73 with an average of 0.44. Some di-nucleotide repeats such as BMg112 with (AG) 10 and BMg720 with (TC)13 had high PIC values of 0.58 and 0.72, respectively, while some tri-nucleotide repeats such as BMg1581 with (AAG)6 and BMg180 with (TTA)6 had lower PIC values of 0.11 and 0.18, respectively. Furthermore, the expected heterozygosity values ranged from 0.10 to 0.76 (0.52 on average), and the observed heterozygosity values of all markers were zero (Table S3). The observed heterozygosity is lower than the expected heterozygosity; the discrepancy was attributed to forces such as inbreeding. The average PIC values, reflecting the allelic diversity and frequencies among the sampled individuals, observed for these genomic markers were high (0.52); these data could be influenced by the estimates for the number and genetic relationship of the individuals used to access SSR genetic information. Our results confirm that genomic SSR markers are generally more polymorphic compared to EST-SSR markers, as found by Blair et al. (2006, 2009a, b). Higher levels of polymorphism for genomic markers have been detected previously by Blair et al. (2006), Gaitán-Solís et al. (2002) and García et al. (2011). Here, we report an average number of 3.56 alleles among the 131 genotypes for the genomic SSRs that were developed and tested, which is slightly lower than previous estimates considering the number of genotypes evaluated.

To determine the utility of the new SSR markers in polymorphism detection and genepool identification, a set of 131 common bean accessions were used to analyze genetic diversity, among which a total of seven control accessions from Zhang et al. (2008) were used. These included four accessions (Zhu Shadou, Peng Dou, Hong Yundou3 and Hong Yundou) belonging to the Andean gene pool and three accessions (Baijing Dou, Si Jidou and Jing Midou) belonging to the Mesoamerican gene pool as checks (Table S1). These accessions represented high-quality traits and possessed useful hereditary features such as resistance to biotic and abiotic stresses. Seeds of these accessions were obtained directly from the National Genebank of China. Most of the genotypes represented landraces or improved varieties from China although four were from Argentina. Four former gene-based markers (BMd20, BMd26, BMd45 and PV-GAAT001) (Zhang et al. 2008) were selected to compare the effectiveness of 90 new SSR markers to be used in phylogeny analysis. The phylogenetic tree showed that the 90 genomic SSR makers were sufficient to classify the 131 common bean accessions into Andean and Mesoamerican gene pools, which were almost exactly coincident with the four formerly mapped gene-based markers (Fig. 1). Hence, the new SSR markers were an exact complement of the former SSR markers for expressing a good pattern of amplification and separating the Andean and Mesoamerican gene pools in common bean. The new BMg markers were highly efficient in differentiating genepools and assigning each of the control genotypes evaluated here to the correct division within the species. The 90 new genomic SSR markers are a valuable resource for researching other germplasm collections in terms of their diversity and genepool proportions.

Fig. 1
figure 1

Phylogenetic tree of 131 common bean accessions as listed in Supplementary Table S3. a Dendrograms based on four mapped markers; b dendrograms based on the 90 genomic SSR markers. The trees were constructed by MEGA 4.0 with UPGMA (unweighted pair group method with arithmetic averages)

To investigate the location of the 90 new markers on chromosomes, we used mapping data for 172 individuals of an F2 segregating population which was derived from Hong Yundou crossed with Jingdou. Linkage analysis showed that 85 primer sets were located on chromosomes (Fig. 2). The distribution of SSR markers among eleven chromosomes ranged from four (chromosome Pv4) to 14 (chromosome Pv1), with an average number of eight, and the maximum distance between adjacent markers was 35.5 cM (BMg585 and BMg2098 on chromosome Pv9). Overall, the majority of new markers tended to cluster at certain points on the linkage map. For example, five markers (BMg118, BMg618, BMg679, BMg724 and BMg105) were found to form a clustered group on chromosome Pv01 within a 6.1-cM interval.

Fig. 2
figure 2

Molecular linkage map of common bean by SSR markers using the mapping population Hong Yundou × Jingdou. For mapping, markers were integrated into the known primer framework at a P value of 0.05. Markers (BMg-) in red are the 85 SSRs newly developed by genomic sequencing; those in black are the previously mapped primers. (Color figure online)

Our study aimed to evaluate the map locations of as many polymorphic BMg markers as possible. This was done to integrate the new markers with previously mapped markers from Blair et al. (2003, 2008) and to determine the level of clustering of the new genomic SSRs. We found that, as in these previous studies, the new markers were clustered together in various specific parts of the genome. An explanation for the uneven coverage of the genetic map could be the intermediate sequencing coverage of the full genome, which could have biased the development of SSR markers and led to clustered groups of markers. On the other hand, clustering could be due to the preferential isolation or sequencing of unmethylated genomic DNA, which would tend to cluster the resulting genomic SSRs together in the gene-rich regions of the genome. Genomic microsatellite clustering appears to be more common in enriched libraries (Ramsay et al. 2000; Tang et al. 2002), and may also reflect the distribution of retrotransposons (Ramsay et al. 1999). Blair et al. (2008) showed that the microsatellites tended to be cluster rather than distributed randomly, reflecting a bias of sequences towards proximal locations. In summary, we conclude that the distribution of SSR markers may be associated with the distribution of sequences obtained from genomic sequencing.

In conclusion, the 454-FLX second-generation sequencing technology was a very convenient and effective tool to exploit for the generation of new genetic markers, and specifically for SSR markers. Here, we developed and characterized a set of 90 polymorphic microsatellites from the many SSR loci detected in the 454-derived sequences, and most of these markers were also mapped onto the chromosomes of common bean. These SSR markers will prove useful in further genetic mapping, genetic diversity and molecular breeding in the future.