Introduction

Common bean (Phaseolus vulgaris L.) is one of the principal grain legumes of eastern and southern Africa, occupying more than 4 million ha annually and providing food for more than 100 million people (Wortmann et al. 1998). It is the second most important source of dietary protein and the third most important source of calories for lower income African households after cassava and maize (Broughton et al. 2003). Total production in sub-Saharan Africa is around 3.5 metric tons with 62% of production in East African countries of Burundi, DR Congo, Ethiopia, Kenya, Rwanda, Tanzania and Uganda making this the most important regions for the crop within the continent. Although the crop is basically cultivated for home consumption in much of East Africa, it is also rapidly evolving into a cash crop in certain countries with Ethiopia earning about US$ 6.25 million (equivalent to 60 million ET Birr) from bean exports in 2005 (Teshale Assefa, personal communication 2007). Regional trade is significant for some trading partners, for example from Ethiopia, Tanzania and Uganda into Kenya.

Cultivated common beans originated in Latin America from two recognized centers of domestications about 7,000–8,000 years ago (Gepts and Debouck 1991). The multiple regions of domestications endowed the crop with relatively high diversity that is broadly classified into two genepools: Mesoamerican and Andean (Gepts et al. 1986; Singh et al. 1991a, b, c). The two genepools further differentiate into different races, such as Mesoamerica, Durango, Jalisco and Guatemala in the Mesoamerican genepool and Nueva Granada, Peru and Chile in the Andean genepool (Singh et al. 1991b; Beebe et al. 2000). The genepool and race differences have been validated using various marker systems including seed size, phaseolin (seed storage protein) patterns, plant morphology, isozymes, RFLP, RAPD, AFLP and microsatellite markers (Singh et al. 1991a, b, c; Becerra and Gepts 1994; Beebe et al. 2000, 2001; Islam et al. 2004; Blair et al. 2006, 2009).

Common beans are believed to have been introduced together with maize into the east coast of Africa by Portuguese and Spanish traders in the sixteenth and seventeenth century (Greenway 1945; Gentry 1969). Since then farmers have developed farming practices adapted to local conditions by preservation and exploitation of useful alleles which have resulted in a range of morphologically diverse landraces (Wortmann et al. 1998; Sperling 2001). Moreover, with recent efforts to improve on-farm level productivity by many national bean-breeding programs in Africa, new germplasm sources have been continually introduced to African farming systems from different parts of the world since the 1980s (CIAT 2005). The existence of both genepools (Andean and Mesoamerican) in Africa has furthermore been documented (Martin and Adams 1987) and probably is a result of original introductions and subsequent imports of novel germplasm. Given the wide range of landraces on the continent, Africa can be considered to be a secondary center of diversity for common beans (Allen and Edje 1990; Wortmann et al. 1998; Sperling 2001).

Despite recognition of the genetic diversity of common beans in eastern and southern Africa, it is not clear if the observed variation in and between landraces is the result of the original differences between the various introductions or whether they result from a continuous process of natural hybridization and selection by farmers and by the environment. Gene flow within and between genepools and races via spontaneous out-crossing in farmer’s field or crossing programs in formal breeding could result in intermediate phenotypes that do not correspond well to any of the single race or genepool divisions (Beebe et al. 2001; Islam et al. 2004; Díaz and Blair 2006; Blair et al. 2007). Understanding the pattern of population-genetic structure and diversity of bean landraces and cultivars (hereafter accessions) and their relationships with the Andean and Mesoamerican genepools can therefore provide information on gene flow and be of great importance for future common bean breeding in the region. However, to date, the diversity assessment exercises in the region have mainly been limited to agro-morphological traits and no comprehensive marker evaluation of bean landraces has been conducted.

This study, therefore, aims to examine the genetic diversity and relationships among and within accessions from two East African countries (Ethiopia and Kenya) in relation to the Andean and Mesoamerican genepools using microsatellite marker analyses combined with morphological evaluation. Ethiopia and Kenya accessions were selected for analysis because these countries are among the ten largest common bean producers in sub-Saharan Africa (Hillocks et al. 2006) and bean production is very diverse in terms of agro-ecology, social settings and production systems (from monocrop to relay-cropping or intercropping) found in both countries. Throughout these countries, the crop is grown mainly by farmers with low external inputs and landraces remain the dominant source of seed for planting, although popular modern varieties have also been released in recent years, suggesting that landrace diversity may be under threat of being replaced by modern varieties in the near future, increasing the urgency of this germplasm characterization.

Materials and methods

Plant materials

A total of 192 accessions collected from a range of common bean production ecologies in Ethiopia and Kenya together with four control genotypes for the Andean and Mesoamerican genepools were grown in a greenhouse at CIAT for DNA extractions and analysis. The East African accessions were selected on the basis of their origin from the CIAT genetic resource unit collection (http://www.ciat.cgiar.org/urg/beans.htm) with 99 genotypes from Ethiopia and 89 genotypes from Kenya and the majority being landraces and only a few being commercial cultivars (supplementary information). The control genotypes were selected on the basis of their use in previous studies with microsatellite markers (Blair et al. 2006; Díaz and Blair 2006; Blair et al. 2007). These were: ‘Calima’ (G4494) a variety from Colombia, and ‘Chauca Chuga’ (G19833), a landrace from Peru as the Andean control genotypes; as well as DOR364 (or ‘Dorado’), a variety from CIAT/El Salvador, and ICA Pijao, a variety from Colombia as Mesoamerican controls. The seeds of these genotypes were also provided by the CIAT germplasm bank.

Morphological measurements

Morphological variables were measured on plants raised in both the greenhouse and the field at CIAT headquarters in Palmira (1,000 m altitude, mean growing temperature 24°C, Mollisol soil), Colombia in 2007. In the greenhouse, plants were grown from August to October 2007 in plastic pots carefully packed with 5 kg of field soil from Palmira mixed with river sand on a 2:1 w/w (weight-by-weight) ratio. A total of four plants were grown for each accession across two pots. For the field evaluation, 20 seeds of each accession were planted in November 2007 in a single 2-m long row with inter- and intra-row spacing of 60 and 10 cm, respectively. The plants were provided with optimum conditions for crop growth and development both under greenhouse and field conditions. The assessed morphological variables were bracteole size (small, medium or large), bracteole shape (cordate, ovate, lanceolate or triangular), outer base of the standard (banner petal), corolla type (smooth or striped), flower color (white, light purple or dark purple), growth habit (determinate bush, indeterminate bush, indeterminate prostrate or indeterminate climbing bean) and stem pigmentation (absent, light red or dark red). All variables were evaluated on four comparably aged plants per accession according to CIAT (1987) and Singh et al. (1991b). Additionally, primary and secondary seed colors and seed size were recorded after harvest for field grown seed.

Microsatellite marker evaluation

For molecular level diversity assessment, total genomic DNA for each accession was isolated from a bulked leaf tissue sample of 1-week old, paper-germinated plants from six randomly selected seeds using a CTAB extraction method as described by Afanador et al. (1993). Since these are CIAT genebank accessions that generally were purified from original landraces with each seed type within a landrace receiving a separate entry, we assumed that we were dealing with mainly single genotypes but that any heterozygosity would be captured within the six-plant sample. The DNA quality was evaluated on 1% agarose gels and quantified with QUANTITY ONE v. 4.0.3 software (Bio-Rad Lab., Hercules, CA). DNA was then diluted to 5 ng/μL for further use in the genotyping experiments. Microsatellite marker evaluation involved a total of 38 fluorescently labeled microsatellites selected to represent both cDNA-based and genomic markers and for high polymorphism information content (Blair et al. 2006, 2009). The genomic microsatellites included BM139, 140, 141, 143, 151, 156, 165, 172, 175, 183, 187, 188A, 188B, 2001, 205; BMd12, 36, AG1 and GATs54, 91 (Gaitán et al. 2002). The gene-based microsatellites included BMd1, 2, 8, 15, 16, 17, 18, 20, 45, 46, 47, 51 (Blair et al. 2003), and PV-ctt001a, PV-ctt001, PV-ag003, PV-ag001, PV-at001, PV-at003 (Yu et al. 2000). PCR amplifications were carried out on a MJ Research Inc. PTC-100 thermo-cycler using 96-wall plates as described by Blair et al. (2006) with a 13-μL final reaction volume that included 3 μL of genomic DNA, 1.5 μL of each primer at concentrations of 0.16 μM, 0.78 μL of Mg buffer at a concentration of 1.5 mM, 0.72 μL of 1× PCR buffer (10 mM pH 7.2 Tris–HCl, 50 mM of KCl), 0.13 μL of dNTP at concentration of 0.2 mM and 0.15 μL of 1.0 unit Taq polymerase and 5.22 μL ddH2O. The PCR products of different size and contrasting fluorescent labels were pooled and diluted with sterile deionized water to equalize signal strength. The DNA fragments from pooled PCR amplifications were then separated by capillary electrophoresis using an ABI3730 DNA analyzer (Applied Biosystems, Foster City, CA). The fragment analysis data from ABI3730 system were analyzed and allele sizes scored with GENEMAPPER version 3.7 software (Applied Biosystems). The observed allele size was then adjusted for the discrete allele size using AlleloBin software (http://test1.icrisat.org/gt-bt/download_allelobin.htm) and allele sizes for the control genotypes (Calima, G19833, DOR364 and ICA Pijao) were confirmed to be of the same sizes as in Blair et al. (2006, 2007) and Díaz and Blair (2006).

Genetic diversity analysis

The pattern of genetic diversity within and among accessions and across the countries of collection (Ethiopia vs. Kenya) was assessed for both morphological and molecular data using several software programs. Morphological marker data from greenhouse and field trials were averaged and subjected to frequency distribution analyses in SAS statistical package version 9.1.3 (SAS Institute 2003). The morphological traits were scored for each genotype based on presence and absence. These data were used to generate a binary matrix of presence and absence which was used for principal coordinate analysis (PCoA) and for creating the matrix of average taxonomic distance (i.e., DIST coefficient in the procedure) between accessions, respectively, in the SIMQUAL and SIMINT subprograms of NTSYS-pc, version 2.10 (Rohlf 2002). Genetic relationships within and among accessions from the two East African countries based on genotypic data were assayed with a neighbor-joining method in DARWIN 5.0 software (Perrier et al. 2003; Perrier and Jacquemoud-Collet 2006). Genetic distance matrices were generated using the Peakall et al. (1995) method of calculating individual by individual genetic distances from co-dominant markers. Accordingly, for each SSR marker, with i-th, j-th, k-th and l-th different alleles, a set of squared distances was defined as d 2(ii, ii) = 0, d 2(ij, ij) = 0, d 2(ii, ij) = 1, d 2(ij, ik) = 1, d 2(ij, kl) = 2, d 2(ii, jk) = 3, and d 2(ii, jj) = 4. Genetic distance matrices for each locus were summed across loci assuming statistical independence. The genetic distance values were then subjected to PCoA as implemented by GENALEX version 6.1 software (Peakall and Smouse 2007). Patterns revealed by the first three coordinates of each accession were plotted using the Graph module and the G3D procedure of the software program SAS. Genetic diversity parameters such as number of alleles (N A), number of effective alleles (N E), Number of private alleles (N PA), observed heterozygosity (H O), standardized allelic richness (A R), gene diversity (G D), Shannon’s information index (I), fixation index (F), percent polymorphic loci were estimated with FSTAT version 2.9.3 (Goudet 2001) for each pre-determined group based on origin of the accession of collection (by country) and genepool assignment as differentiated by neighbor-joining analysis and PCoA.

Molecular analysis of variance and population structure analysis

Partitioning of total genetic variation into within and among genepool diversity and country of origin was performed with a molecular analysis of variance (AMOVA) procedure in GENALEX. To infer pattern of population structure, both population level and individual-based clustering approaches were employed. Global F ST and pairwise F ST were estimated using Weir and Cockerham’s Q (Weir and Cockerham 1984). F ST values and significance of estimates were calculated with FSTAT. Other parameters such as gene flow, Nei’s unbiased genetic distance and identity were computed to assess the degree of population differentiation using GENALEX. The Bayesian genotypic clustering method INSTRUCT (Gao et al. 2007) was used to validate population-based approaches and to infer population structure among the genotypes. INSTRUCT is an extended Bayesian clustering approach of STRUCTURE (Pritchard et al. 2000) that absorbs inbreeding or selfing rate for population inference. It quantifies the contribution of two forms of non-random mating: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes) when determining the pattern of existing genetic variation (Gao et al. 2007). INSTRUCT was run for K = 2 to K = 6 in mode 2 for joint inference of population selfing rate and population sub-structure for five independent chains, each chain with 200,000 iteration steps, 100,000 burn-ins, and a thinning interval of ten steps, assuming different starting points. Graphical representations of population assignments from INSTRUCT were produced from the program DISTRUCT (Rosenberg 2002).

Results

Morphological diversity

Significant variation was observed for most morphological traits measured on the East African accessions with two to four character states found per trait as shown in Table 1. Significant differentiation of accessions from the two countries was observed for bracteole size, growth habit and seed size whereas for base of the standard, bracteole shape, flower color and stem anthocyanin pigmentation the difference was non-significant. In these cases, the majority of the accessions from both Ethiopia and Kenya had smooth outer base of the standard and no stem anthocyanin pigmentation.

Table 1 Frequency distribution for morphological traits evaluated for East African accessions in relation to genepool control genotypes

A greater proportion of the accessions from Ethiopia had larger bracteole size, cordate or ovate bracteole shape, white flower color and smaller seed size, characteristics typical of the Mesoamerican genepool; and a larger proportion of accessions from Kenya had predominantly medium to large bracteole size, lanceolate or triangular bracteole shape and medium to larger seed size typical of the Andean genepool. The accessions from both countries were showing a range of growth habits and seed colors; however, type-III growth habit was prevalent in Ethiopian accessions and type I and II growth habit was prevalent in Kenyan accessions.

The dominant primary seed colors throughout the Ethiopian accessions were white, red and tan/brown whereas in Kenyan accessions purple, cream, yellow and red-seeded genotypes were common. A majority of the accessions in both countries were of a single primary color and had no secondary seed color; however, among those with secondary seed colors, red and cream mottled seed types were more prevalent in Ethiopia and Kenya, respectively.

Analysis of the morphological variables showed grouping of Andean and Mesoamerican genotypes combined with probable introgression between the genepools as shown by the PCoA in Fig. 1. In this graph, the first and second dimensions (Dim-1 and Dim-2) explained 21.0 and 10.6% of the total variation in the data set, respectively. Together, the first two-dimensions explained 31.69% of the total variation; and overall the PCoA analysis separated the Mesoamerican control genotypes from Andean control genotypes with concomitant clustering of some accessions into their respective gene pools. Many accessions, meanwhile, occupied intermediate positions between the two genepools and the control genotypes for the two genepools, probably due to introgression and/or shared morphological markers such as seed color and growth habit of the accessions in them.

Fig. 1
figure 1

Principal coordinates analysis of the 192 Ethiopian and Kenyan accessions based on nine morphological traits. Filled triangles to the left indicate placement of Andean control genotypes and filled triangles to the right indicate Mesoamerican control genotypes

Genetic associations among accessions

Genotyping results with the fluorescent microsatellite markers were also used to cluster the accessions, and genetic associations among accessions from Ethiopia and Kenya with respect to Andean and Mesoamerican control genotypes as shown in Figs. 2 and 3. In these graphs, distinct clusters were apparent with the SSR markers unambiguously assigning accessions to the Andean and Mesoamerican genepools both with neighbor-joining dendograms (Fig. 2) and with the 3D plot of the PCoA based on pairwise genetic distances (Fig. 3). Within each country, accessions from the same collection site were often in different clusters and likewise accessions from different collection sites were clustered together (Fig. 2I, III) indicating the possibility of gene flow between sites and regions within Ethiopia and within Kenya. When comparing across countries in the overall analysis (Fig. 2II), accessions from the same country of origin tended to cluster together especially with the Andean genotypes indicating distinct germplasm at the national level and perhaps some cross-border gene flow between the countries.

Fig. 2
figure 2

Neighbor-joining dendograms depicting genetic relationship between common bean accessions from Kenya and Ethiopia with respect to Andean and Mesoamerican control genotypes. I Ethiopian accessions, II global accessions (full set of the study materials) and III Kenyan accession. Different line shading represent different collection sites within each of the countries (I, III) and country of origin (II). Downward facing arrows indicate Andean controls and upward facing arrows indicate Mesoamerican controls. A Andean, M Mesoamerican, Int introgression as explained in the text. Numbers along branches indicate bootstrap support (shown only for values greater than 50)

Fig. 3
figure 3

Principal coordinate analysis based on microsatellite markers showing spatial distribution of Ethiopian and Kenyan accessions compared to Andean and Mesoamerican control genotypes. Each dimension explains 56.37% (Dim1), 12.01% (Dim2) and 11.30% (Dim3) of variation. The three dimensions together explained 79.68% of total variation present in the data set

Results of the PCoA were in agreement with those of the neighbor-joining dendrograms, with two major groups detected: one clearly representing the Andean genepool and the other the Mesoamerican genepool. The division of the accessions into two major groups showed that there was correspondence between the grouping of East African bean landraces and the respective genepools in the primary centers of diversity. However, the further differentiation into recognized bean races belonging to these genepools was not apparent, although some sub-grouping was observed in the analysis. The overall variation explained by the principle coordinate analysis was 79.7% with dimensions 1, 2 and 3 explaining 56.4, 12.0 and 11.3%, respectively.

Genetic diversity within and among accessions and country of collection

All of the microsatellite markers used in this study were polymorphic. The proportions of polymorphic loci were 89.5% in the control genotypes and 100% in both Ethiopian and Kenyan accessions (Table 2). A total of 389 alleles were detected among the 192 bean accessions with an average of 10.24 alleles per marker. The number of alleles per markers ranged from 2 in BMd46 to 35 in Pv-at001, with the mean number of effective alleles per locus not significantly different among the two East African collections but slightly lower in Kenyan (3.39) than in Ethiopian (3.72) accessions. Meanwhile, the mean number of private alleles per population was slightly higher for Kenyan (2.42) versus Ethiopian (1.25) accessions although allele richness was not significantly different between countries of origin and genepools.

Table 2 Mean SSR diversity for 38 microsatellite loci in Ethiopian, Kenyan and genepool control genotypes

AMOVA results showed that 66% of allelic diversity was attributed to individuals within genepool (P < 0.001) while only 34% was distributed among genepools. No significant variation for molecular diversity was observed between countries of collection denoting shared alleles among them. However, Ethiopian accessions had slightly higher level of gene diversity compared to Kenyan accessions. Within the country of origin and between the genepools, accessions within the Mesoamerican group of East African accessions had slightly higher gene diversity than those within the Andean group. Similarly, Shannon’s information index was slightly higher for Ethiopian than for Kenyan accessions and for Mesoamerican genepool accessions compared to Andean genepool representatives. The observed heterozygosity and probable out-crossing values were low for all the study materials reflecting the inbreeding nature of the common bean crop. However, the heterozygosity and out-crossing values were slightly higher for the East African accessions (0.11–0.15) compared to the control genotypes (0.04) which might be explained by these genebank accessions resulting from the collection of varietal mixtures which are common in many farmer fields in the region. Higher observed heterozygosity was observed among Ethiopian accession than among Kenyan accessions overall and for the Mesoamerican genepool genotypes in Ethiopia and Andean genepool genotypes in Kenya.

Population differentiation and structure

Genetic differentiation in the East African bean landraces and cultivars was also analyzed with POPGENE (Yeh et al. 1997) and F ST values among pairs of populations were found to range from 0.037 to 0.632 with an overall average of 0.273 (Table 3). Population differentiation was higher between genepools (F ST = 0.189, P < 0.001) than between countries of origin (F ST = 0.06, P < 0.001). However, for the comparison between the countries of origin, Andean genepool accessions were more highly differentiated (F ST = 0.331, P < 0.001) than Mesoamerican genepool accessions (F ST = 0.04, P < 0.001). Correspondingly, some level of gene flow (Nm = 3.927) existed between the two neighboring East African countries, which was higher for Mesoamerican representatives (Nm = 6.421) than for Andean representatives (Nm = 3.940). Average Nei’s unbiased genetic distance was high between genepools (0.665) but low between countries of origin (0.195). Within genepool, the Mesoamerican representatives presented lower genetic distances than the Andean genepool representatives in each country. Genetic identity was fairly high between the two countries (0.823); however, it was low between genepools (0.204–0.507) and intermediate within genepools (0.673–0.916).

Table 3 Pairwise genetic differentiation, gene flow, unbiased Nei’s genetic distance and identity among and between genepools and countries of origin in East African landraces and cultivars

Population structure analysis with INSTRUCT confirmed the existence of the two genepools for the East African Highland common bean accessions (Fig. 4). The analysis for K = 2 populations showed individual genotypes from the two countries distributed between the two genepools which was congruent with neighbor-joining and PCoA that clearly separated the Mesoamerican and Andean genepools. At K = 3, the Mesoamerican genepool genotypes further separated into two sub-groups with a low level of admixture, while the Andean genepool genotypes did not show any separation. At K = 4, the Mesoamerican accessions further subdivided into three groups but no meaningful interpretation of population structure could be made. At K = 5, the Andean group separated according to country of origin with very little admixture between Ethiopian and Kenyan Andeans, supporting earlier analysis that depicted distinct germplasm at this national level separation. At K = 6, the Andean groups further differentiated into three groups, principally in Kenya where two subgroups were highly admixed; while the Mesoamerican genepool maintained the same sub-grouping as observed at K = 4. INSTRUCT software predicated K = 6 as the optimum population structure in the study material, therefore no further population subdivisions were modeled. The morphological characteristics predominant in each sub-population at K = 6 are given in Table 4.

Fig. 4
figure 4

Population structure for 192 common bean accessions from the East African Highlands compared to Andean and Mesoamerican control genotypes at K = 2 to K = 6. Predetermined group names indicated below figure are AC Andean control genotypes, AE Andean genotypes from Ethiopia, AK Andean genotypes from Kenya, MC Mesoamerican control genotypes, ME Mesoamerican genotypes from Ethiopia and MK Mesoamerican genotypes from Kenya

Table 4 Some characteristics of sub-populations identified at K = 6 population structure level for the East African common bean landraces

Discussion

The level of polymorphism in landraces and cultivars from Ethiopia and Kenya was found to be considerable, especially with microsatellite marker analysis. Our result identified common beans from this region as distinguishable into both Andean and Mesoamerican genepools as described by various authors (Gepts et al. 1986; Singh et al. 1991a, b, c; Becerra and Gepts 1994; Islam et al. 2002; Blair et al. 2006, 2009). The conservation of the genepool separation typical of the primary centers of diversity has been observed before for bean in southern Africa (Martin and Adams 1987) and is also a hallmark of bean diversity in other secondary centers of diversity outside of the Americas, such as Southwest Europe (Rodiño et al. 2006) and China (Zhang et al. 2008).

The separation of East African bean landraces into the two recognized genepools was stronger with SSR markers than with morphological markers indicating the success of this marker type in detecting genepools in common beans. Similar results were obtained in previous studies of accessions from primary and secondary centers of diversity analyzed with SSRs (Blair et al. 2006, 2009; Zhang et al. 2008). Despite the limitations of morphological analysis, the similarity distance matrices obtained using SSR markers was significantly correlated with that obtained with morphological markers (r = 0.49731, P < 0.001) based on the MXCOMP procedure of NTSYS-pc and testing using the normalized Mantel Z-statistics (Rohlf 2002). However, this positive and significant correlation might be misleading as distance matrices of morphological markers did not produce completely congruent patterns of population structure with that of SSR-based genetic distance matrices, which were of better resolution, indicating an under-estimate of genetic relationships with morphological markers. Discrepancy of clustering based on morphological and molecular markers has been attributed to hybridization or mutation that leads to divergent morphological or molecular profiles (Singh et al. 1991b). In addition, lower heritability or similarity of character states can also lead to poor separation based on morphological characteristics. Therefore, the use of informative molecular markers as a prior clustering criterion to improve the resolution power of morphological markers in common bean germplasm characterization is valid as was suggested by Singh et al. (1991a).

Microsatellite analysis showed generally low levels of introgression between the genepools compared to morphological analysis where character states were shared between the accessions belonging to each of the genepools. PCoA of morphological traits showed many intermediate genotypes while the same analysis for the SSR markers showed very few genotypes that were intermediate between the Andean and Mesoamerican clusters. Furthermore, in the neighbor-joining dendograms only one genotype from Ethiopia (G18863) was intermediate between the genepools. In line with this, low gene flow and high genetic differentiation between genepools were observed in the present analysis. However, within genepool, gene flow was higher both between countries and within each country especially for Mesoamerican representatives.

These results suggest that the genetic divergence in East African bean landraces could be due to the original differences in introduced germplasm from the primary centers of origin combined with spontaneous out-crossing in farmer field and further farmer selection for adaptation to production niches and uses. For example, early flowering was common in many of the Andean accessions compared to the Mesoamerican accession in both greenhouse and field evaluation (data not shown). This lack of flowering synchronization could make inter-genepool hybridization a less likely phenomenon for the East African highland germplasm, even if varietal mixtures of both Andean and Mesoamerican phenotypes are a common farming practice in many parts of the region.

Another conclusion from the molecular analysis was that there was low gene flow between the two countries compared to gene flow within each country. This led to recognizably distinct germplasm at the national level especially for Andean genepool accessions. The molecular diversity was also reflected in the diversity for seed color, size and shape (data not shown) and the fact that the countries share very few seed types in the Andean genepool and only a few seed types in the Mesoamerican genepool. This might highlight different informal or formal institutional introductions and weak trans-national bean seed exchange or social and commercial networks especially for landrace varieties. One exception to this may be the case of small red-seeded beans in cross-border grain trade from southern Ethiopia to certain parts of Kenya. Apart from gene flow, an additional reason for the divergence in germplasm between Ethiopia and Kenya may be the existence of different farmers’ selection preferences in each country in accordance with ecological adaptation, cooking value and market orientation. In fact, in Ethiopia, the small white and small red color classes are the preferred bean seed classes for export and local consumption, respectively; whereas in Kenya, large-seeded, red mottled seed types have high market preference (Wortmann et al. 1998).

Despite the distinct germplasm at the national level and at the genepool level, further race structure was not apparent in the East African common bean landraces. While all the population-genetic analyses employed in this study showed good congruence in the division of the landraces into two genepools, the number of sub-groupings varied depending on the analysis conducted. For example, INSTRUCT analysis suggested six sub-populations within the landraces with three of these corresponding to Andean genepool groupings and the other three to Mesoamerican genepool groupings; however, these six populations were not evident in the PCoA for the SSR markers where only three groupings, two in the Mesoamerican genepool and one in Andean genepool were found. Some of the genotypes clustered together in the neighbor-joining dendograms were assigned in different groups with INSTRUCT.

Andean diversity was found to be relatively high but difficult to subdivide with the Andean genepool control genotypes, Calima and G19833, representing Nueva Granada and Peru races, respectively (Blair et al. 2007) clustered together in both principal coordinate and population structure analyses. The closer placement of the two Andean control genotypes and concomitant overlap with other accessions from Ethiopia and Kenya might indicate the representation of the East African Andean genotypes, especially those from Kenya as part of a race Nueva Granada/race Peru complex. This was evident in the distinction of the Andean groups in Table 4 where one group (A2) consisted of medium to large-seeded genotypes with small to medium sized bracteoles and a range of growth habits while another group (A1) was made up of only Kenyan genotypes with medium to large seeded and cylindrical or kidney seed shape genotypes having small to medium bracteole size, and ovate, lanceolate or triangular bracteole shape that corresponds to race Nueva Granada descriptors based on Singh et al. (1991b). The final Andean group (A3) consisted of Andean genotypes that were mostly from Ethiopia that had small to medium bracteole size, lanceolate or triangular bracteole shape, medium to large seed size, predominantly type I or II growth habit, and oval or rounded, cream spotted or tan seed shape.

Within the main Mesoamerican genepool grouping, most accessions clustered into three subgroups: M1 group included the Mesoamerican control genotypes DOR364 and ICA Pijao which are designated as race Mesoamerica in Díaz and Blair (2006) indicating the probable representation of East African Mesoamerican landraces by this race. This group possessed smaller seed size, larger cordate bracteole size and type II or III growth habit that corresponded to the race Mesoamerica description of Singh et al. (1991b). Another grouping under this genepool, M2, represented by 19 landraces from Ethiopia and nine landraces from Kenya had all large bracteole size and small to medium red, white or black seed with considerable admixture with the M1 populations suggesting that this represents another sub-grouping of race Mesoamerica. All small red-seeded genotypes including the dominant ‘Red Wolayta’ from Ethiopia were included under this group indicating that small red-seeded beans in Ethiopia have a narrow genetic background as compared to white and black beans that were distributed in the two other Mesoamerican sub-populations. Meanwhile, the majority of the small white-seeded genotypes from Ethiopia were represented in the third Mesoamerican sub-population, M3, which included 40 accessions from Ethiopia and five from Kenya almost all with small to medium seed size, indeterminate prostrate (III) or indeterminate climbing (IV) growth habit characteristic of the Durango–Jalisco race complex (Singh et al. 1991b; Díaz and Blair 2006). Hence, we suspect the apparent representation of this race complex in East Africa but further analysis would be needed to confirm this. A comparison of the East African beans to Latin American germplasm from the Caribbean, Central America, Mexico or Brazil as likely sources of germplasm sent to East Africa would also be valuable as would a comparison to European germplasm from the ex-Colonial countries that probably served as transit points for this diversity.

In this regard, many of the small red beans preferred in Ethiopia are typical of Central America (Singh et al. 1991a) and could have arrived through trade via Spain. Meanwhile, red mottled beans preferred in Kenya are typical of the Caribbean and could have followed the same route. Durán et al. (2005) characterized a large set of landraces of this seed type and found separation of the genepools based on morphological characteristics and RAPD markers with most of the large-seeded genotypes coming from the Eastern Caribbean. Rodiño et al. (2003, 2006) found small white beans in the Iberian Peninsula, and both studies observed inter-genepool introgression that may have produced new seed types.

In conclusion, our study found that population structure for the East African common bean landraces was based mainly on genepool origin and that introgression or gene flow was moderate. Given that beans in this region are often cultivated in marginal, risk-prone production ecologies (Wortmann et al. 1998), it will be interesting to correlate genetic diversity with drought tolerance and adaptation potential in future association mapping work. The results presented here also pave the way for rational use of East African germplasm and strategic crossing plans that could be used to identify transgressive segregation based on distinct germplasm at the national or regional level. In this regard, further phenotyping could identify the genotypes that would be the most valuable gene sources in future breeding programs in the region. Finally, the results also suggest that a considerable amount of common bean genetic diversity is present in East Africa motivating renewed conservation efforts for the region.