Introduction

Phaseolus vulgaris L. of Central American origin (Gioia et al. 2019), is an important legume of the Fabaceae family (Mayo-Prieto et al. 2019). It is commonly known as the common bean, dry bean, string bean, field bean, French bean, and kidney bean (Musango et al. 2016). It is a diploid (2n = 2x = 22) and a predominantly self-pollinating crop with a low frequency of crossing (Burle et al. 2010). It has two distinct gene pools, namely the Mesoamerican and Andean gene pools (Musango et al. 2016). The gene pools show variations in agronomic traits, such as seed size and shape as well as growth habits (Lei et al. 2020). P. vulgaris is an important field crop in South Africa (Muedi et al. 2015). The major South African provinces for small-scale farming of P. vulgaris production are Eastern Cape, KwaZulu-Natal, and Mpumalanga (Fourie 2002).

P. vulgaris is grown worldwide for its edible leaves, green immature pods and dry seeds (Gioia et al. 2019). It is a very nutritious crop because of its high protein content and a high quantity of fiber that provides vital nutrients and complex carbohydrates (Guidoti et al. 2018). It provides a cheap source of protein to people in developing countries (Jannat et al. 2019). Landraces are varieties of plants domesticated from the wild through natural and artificial selection (Abdollahi et al. 2016). P. vulgaris landraces are characterized by seed size, colour, and pattern (Gioia et al. 2019). Landraces help small-large scale farmers or agricultural programs to adapt to new challenges such as climate change (Padilla-Chacón et al. 2019).

Landraces of P. vulgaris vary in their vegetative and reproductive traits. Germination percentage among the landraces ranges from 84.0 to 93.8% (Kalauni et al. 2019). Their growth habit is either climbing, semi-climbing, erect, or bushy (Abdollahi et al. 2016; Loko et al. 2018). The colour of the stems is either green, green with pink pigmentation, or green with purple pigmentation (Loko et al. 2018). In Portugal and Bulgaria, some P. vulgaris landraces plants have shorter stems (19.5 cm) whereas others have taller stems (123.4 cm) (Stoilova et al. 2005). The colour of P. vulgaris landraces’ flowers are either white, red, pink, purple to light purple (Ekbic and Hasancaoglu 2018) or white with lilac edges or with red stripes (Okii et al. 2014).

In Turkey, P. vulgaris landraces range from 41 to 55 in days to flowering (Ekbic and Hasancaoglu 2018). Seed colours vary from white, cream, brown, yellow, green, yellowish green, red, black, purple, to a bicolour. The seeds of P. vulgaris are either narrower (5.26 mm) or wider (10.04 mm) in India (Dutta et al. 2016). The seeds are also either longer (16.7 mm) or shorter (10.0 mm), thinner (4.2 mm), or thicker (8.2 mm) in Iran (Marzooghian et al. 2013). The seeds among the P. vulgaris landraces in Turkey are either lighter (29.82 g) or heavier (55.35 g) (Yeken et al. 2018).

Molecular markers are used to reveal variations among the P. vulgaris landraces at the DNA level, providing a more reliable tool for germplasm (Bilir et al. 2019). Simple Sequence Repeats (SSRs) also known as microsatellites are small stretches of repeated DNA, usually of one to six nucleotides (Mishra et al. 2014). They are commonly composed of: mononucleotide (A), dinucleotide (AT), trinucleotide (ATC) and tetranucleotide (AGGT) repeats (Córdoba et al. 2010). They are frequently used in P. vulgaris because of their high levels of polymorphism and reproducibility (Gioia et al. 2019).

Phaseolus vulgaris landraces from Turkey have high polymorphism, where the number of alleles ranges from 6 to 29 with a mean of 14.8 alleles per locus (Bilir et al. 2019). The observed heterozygosity (Ho) ranges from 0.000 to 0.099 with the mean value of 0.006 across all markers for P. vulgaris landraces in Italy (Gioia et al. 2019). The polymorphic information content (PIC) values range from 0.055 to 0.721 over 13 loci and seven SSR loci have a PIC greater than 0.5 with the mean value of 0.0492 (Wang et al. 2012).

There are many P. vulgaris landraces grown by rural communities in South Africa. Few studies have reported the morphological and molecular diversity of these landraces. Diversity studies have mainly been limited to morpho-agronomic traits and no comprehensive marker evaluation of P. vulgaris has been documented in South Africa. Thus, this study aimed to determine variation in morpho-agronomic traits and genetic diversity among P. vulgaris landraces revealed by SSR markers. Hence, genetic diversity study among various P. vulgaris landraces will help to identify genes for future breeding programs.

Materials and methods

Seed sourcing and experimental design

Seeds of Phaseolus vulgaris landraces were collected from rural communities of KwaZulu-Natal [Durban (29.85870 S, 31.02180 E), Empangeni (28.75320 S, 31.89350 E), Eshowe (28.89470 S, 31.46280 E), Mtubatuba (28.40590 S, 32.21430 E), and Port Shepstone (30.72770 S, 30.44730 E)]; Limpopo [Polokwane (23.89620 S, 29.44860 E)]; Mpumalanga [Bushbuckridge (24.83980 S, 31.04640 E), KwaNdebele (25.25420 S, 28.42300 E) and Nelspruit (25.47530 S, 30.96940 E)]; and Gauteng [Benoni (26.15110 S, 28.36960 E)] provinces. Table 1 describes the 38 landraces used in this study, whose names were created from the: area of the collection—percentage of seed coat colour—seed shape. The study was conducted at the University of Zululand, KwaDlangezwa campus, Orchard Unit farm (28.85240 S, 31.84910 E). The landraces were sown from August to November over two seasons. P. vulgaris landraces were planted in a randomized complete block design with three replications. The experimental field was 50 m in length and 5 m in width. Plots were 140 cm in length, 140 cm in width, and 50 cm apart. Each landrace was sown in four rows of 120 cm long, with an inter-plant spacing of 10 cm and inter-row spacing of 10 cm. Ten seeds were planted in each row.

Table 1 Components for naming Phaseolus vulgaris landraces

Plant material and morphological description

A total of 38 P. vulgaris landraces were used in the current study (Table 1). Qualitative and quantitative characteristics of both vegetative and reproductive traits were recorded on five randomly selected plants per plot. Plants in the inner rows were tagged and used for measurements to eliminate border effects.

The germination percentage was recorded at 14 days after planting using the following formula: GP (%) = (number of germinated seeds/total number of seeds sown) × 100 (Abdel- Haleem and El-Shaleny 2015). Other vegetative traits were measured at 33 days after planting (before flowering) to eliminate the interference with the flowering period. However, the plant height and the number of branches were determined at harvest (101 days after planting). Growth habits and stem colour were determined for each landrace. The plant height (cm) from the scar of cotyledonous leaves to the stem apex was measured using a ruler. The stem diameter (mm) was measured between the scar of the cotyledonous leaves and the first set of true leaves, using Vernier calipers. The number of branches was counted manually.

The colour of leaves and leaf veins was determined for each landrace. The number of leaves per plant was determined by direct counting. The chlorophyll content (mg cm−2) was measured using a CCM-200 plus chlorophyll content meter with a measurement area of 0.71 cm2 on two points of each lobe of the second leaf from the apex. An average for all points was recorded as the final value for each plant (Pereyra et al. 2014). The leaf area (LA) [Area (mm2) = length (mm) × width (mm)] of the middle leaf lobe was measured using a ruler.

The colour of the flowers, pods, seeds, and seed shape were determined among the P. vulgaris landraces. The days of first flowering and 50% flowering were recorded (from the date of sowing to the date on which approximately 50% tillers produced flowers) for landraces. The number of pods per plant was determined by direct counting. Vernier calipers were used to measure the pod length (mm) from the tip to the highest point on the pod, as well as the pod width (mm). The number of seeds per pod and plant was determined by direct counting. Vernier calipers were used to measure the seed length (mm), breadth (mm), and thickness (mm). A Mettler PC 2000 weighing scale was used to determine the hundred and total seed mass (g).

DNA extraction protocol

DNA was extracted from young leaves of Phaseolus vulgaris using the Quick-DNA™ plant/seed kit according to the instruction provided by the manufacturer (QIAGEN 2016). The DNA was extracted by Inqaba Biotechnical Industries (Pty) Ltd, Pretoria, South Africa. The dried leaves samples were finely cut and 150 mg of the sample were added to a ZR BashingBead™ Lysis Tube (2.0 mm). A 750 µl of BashingBead™ buffer was added to the tube and cap tightly. BashingBead™ buffer was secured in a bead beater fitted with a 2 ml tube holder assembled and processed at a maximum speed of 5 min. ZR BashingBead™ Lysis tube was centrifuged in a microcentrifuge at 10,000 × g for 1 min. A 400 µl supernatant was discarded after being transferred to a Zymo-spinTM III-F filter. An amount of 1200 µl of Genomic lysis buffer was added to the filtrate in the collection tube and mixed well.

An 800 µl of the mixture was transferred to a Zymo-spin™ IICR column2 in a collection tube and centrifuged at 10,000 × g for 1 min. The flow-through was discarded from the collection tube. Again, 800 µl of the mixture was transferred Zymo-spin™ IICR column2 in a collection tube and centrifuged at 10,000 × g for 1 min. 200 µl of DNA pre-wash buffer was added to the Zymo-spin™ IICR column in a new collection tube and centrifuged at 10,000 × g for 1 min. 500 µl of gDNA wash buffer was added to Zymo-spin™ IICR column and centrifuged at 10,000 × g for 1 min. The Zymo-spin™ IICR column was transferred to a clean 1.5 ml microcentrifuge at 10,000 × g for 30 s to elute the DNA. A Zymo-spin III-HRC filter was placed in a clean collection tube and a 600 µl prep solution was added. The mixture was again centrifuged at 8000 × g for 3 min. The eluted DNA was transferred to a prepared Zymo-spin™ III-HRC spin filter in a clean 1.5 ml microcentrifuge tube and centrifuged at exactly 16,000 × g for 3 min.

Simple Sequence Repeat (SSR) amplification

The polymerase chain reaction (PCR) amplifications were performed by the Eppendorf mastercycler® in 50 ng/µl of DNA template in two separate 10 µl volume reactions. The reactions contained 4 µl of DNA template, 0.8 µl of deoxyribonucleotide triphosphate (dNTPs) (2.5 mM), 1.0 µl of 10 × buffer and 0.06 µl of Taq polymerase (Inqaba Biotec). In the first reaction, 1.0 µl of MgCl2 (50 mM), 1.0 µl of forward and reverse primers (5 µM) and 1.14 µl of ultra-pure water were included. In the second reaction, a 1.2 µl of MgCl2 (50 mM) and 1.5 µl of both forward and reverse primers were added to make up the master mix. Forward primers were labelled with M13 FAM (blue), T7 565 (red), pGEX5 550 (yellow) fluorescent dyes. The PCR conditions consisted of denaturing at 94 °C for 2 min, nine cycles at 93 °C for 15 s, annealing at 65 °C for 20 s, and the extension at 72 °C for 30 s.

The annealing temperature of each cycle decreases by 1 °C with the final 30 cycles at 55 °C and the final elongation step at 72 °C for 5 min. The PCR products were separated by capillary electrophoresis analysis performed on an ABI3500 genetic analyzer. Allele size was determined for each SSR locus using GeneMarker HID version 2.9.5.

Data analysis

Morphological data were analyzed using GenStat Release version 12.1 for quantitative characteristics. The means of the different traits were compared using Tukey’s 95% confidence intervals test (P ≥ 0.05). Variability of quantitative traits between landraces as evaluated by calculating the principal component analysis, biplots (PCA), and agglomerative hierarchical clustering (dendrogram) among traits were determined using XLSTAT (2019.1).

Genetic analysis, namely allele number and frequency, gene diversity, heterozygosity, and the polymorphism information content (PIC), was calculated in PowerMarker software v 3.25. To clarify the gene differentiation between landraces, Nei’s genetic distance was evaluated. The population structure analysis was analysed using the Bayesian model-based clustering approach, using STRUCTURE v 2.3.4 program was applied to detect population genetic structure using a defined number of pre-set populations K, where each K is characterized by a set of allele frequencies at each locus. The Evanno test is recommended to help with the identification of the best-fitting number of populations within a sample.

The structure program was set as follows: the analysis was run with 10 simulations per K value from K = 1 to 10, using a burn-in period length of 5000 and after burn-in 50,000 replicates. The most expected value of K for each test was detected by ∆K (Evanno et al. 2005) using the Structure Harvester (Earl and Vonholdt 2011), online (http://tayloro.biology.ucla.edu/struct_harvest/). Bar plots were generated with mean results of runs for the most K value using STRUCTURE v 2.3.4. The principal coordinate analysis (PCoA) was performed using GenAlEx v 6.4 software. The dendrogram was obtained using the Unweighted Pair Group Method of Arithmetic mean (UPGMA) in the PowerMarker and then generated with the Mega software for displaying genetic relations among the P. vulgaris landraces.

Results

Morpho-agronomic variation

A dendrogram for morpho-agronomic traits based on Euclidean distance grouped the landraces into four clusters (Fig. 1). Cluster I was divided into sub-clusters IA and IB. Sub-cluster IA was composed of D-100By-Cl, D-50C50Gy-K, D-90C10LR-Cl, E-100Bk-Cl, E-50LR50C-Cl, E-90LB10M-Cu, Em-100YG-Cl, and PS-90LB10B-Cl. These landraces were also associated with greater germination percentage, earlier flower formation, and shorter seeds (Tables 2 and 3). Sub-cluster IB consisted of Br-100LB-Cl, D-50M50LB-Cl, D-90M10LB-Cl, D-50P50LB-Cl, E-50M50C-K, E-90M10C-Cl, E-50YG-Cl, E-100YG-Cl, Em-50LB50M-Cl, M-90LB10M-Cl, N-100DP-K, N-100LP-K, PS-90DB10LB-Cl, PS-50M50LB-Cl, PS-90LB10M-Cl, and PS-90M10LB-Cl. Cluster II was composed of P-50M50C-O, KN-50B50M-Cl, and B-50B50M-Cl. These landraces were further associated with greater plant height, stem diameter, leaf area, pod length, and longer wider and thicker as well as heavier seeds (Tables 2 and 3) as well as similar seed coat colour, but KN-50B50M-Cl, and B-50B50M-Cl differed in colour intensity from P-50M50C-O (Table 1).

Fig. 1
figure 1

Dendrogram grouping of Phaseolus vulgaris landraces based on Euclidean distances. Numbers 1–38 correspond to the landraces described in Table 1

Table 2 Variation in germination percentage (14 days after planting (DAP)) and vegetative traits (33 DAP) among Phaseolus vulgaris landraces
Table 3 Variation in reproductive traits among Phaseolus vulgaris landraces

Cluster III consisted of D-100C-Cl, D-90LB10B-Cu, D-50RB50LB-Cl, D-100YG-Cl, Em-50Bk50C-Cu, Em-100LB-Cl, KN-100 W-Cl, and PS-100YG-Cl. These landraces were related to narrower leaves, numerous seeds per pod and plant as well as lighter 100-seed mass (Tables 2 and 3). Cluster IV was composed of out-groups Phaseolus coccineus and Phaseolus lunatus. The out-groups were associated with greater stem diameter, leaf area, chlorophyll content, pod length and seed length, width, and thickness as well as numerous leaves, branches, heavy 100-seed mass, and fewer seeds per pod (Tables 2 and 3). The relationship between landraces was further illustrated by a biplot, where almost all traits correlated positively with PC1, except for leaf area, germination percentage, number of seeds per pod, and number of seeds per plant (Fig. 2). Biplot further clustered the landraces with similar morphological traits into three different groups. Group I was composed of the out-groups P. coccineus and P. lunatus. Group II included landraces D-50M50LB-Cl and D-50P50LB-Cl from Durban, N-100DP-K, and N-100LP-K from Nelspruit, KN-50B50M-Cl from KwaNdebele, and P-50M50C-Cl from Polokwane. Landraces D-50M50LB-Cl, KN-50B50M-Cl and P-50M50C-O had similar seed coats, which differed in colour intensity and area of origin. N-100DP-K and N-100LP-K also had similar seed coat colours, but differed only in colour intensity (Table 1). All the remaining landraces formed Group III.

Fig. 2
figure 2

Biplot based on the first two principal components (PC) for morpho-agronomic traits and Phaseolus vulgaris landraces. Landraces are explained in Table 1. Traits: GP germination percentage, SD stem diameter, PH plant height, LA leaf area, CC chlorophyll content, NB number of branches, DFF days to first flowering 50% F 50% flowering, NP number of pods, PL pod length, PW pod width, NSP number of seeds per pod, NSPl number of seed per plant, SL seed length, SW seed width, ST seed thickness, TSM total seed mass, HSM 100-seed mass

Allele number and major allele frequency of SSRs

The seven simple sequence repeat markers produced reliable results when applied to the P. vulgaris landraces samples and the out-groups P. coccineus and P. lunatus. The reliability was based on clear constituent amplification of well-defined expected alleles. The seven analysed SSR loci produced a total of 51 alleles with a mean of 3.64 alleles per marker (Table 4). The number of alleles ranged from one to six, where the reverse marker PV-atcc001 and the forward and reverse marker of PV-ccct001 had the fewest (one) alleles while the forward markers of PV-ag001 and PV-ggc001 produced numerous (six) alleles. The major allele frequency ranged from 0.48 to 1.00 with a mean of 0.75 (Table 4). Reverse marker PV-ctt001 had the minimum allele frequency (MAF = 0.048), whereas the reverse marker of PV-atcc001 as well as the forward and reverse marker of PV-ccct001 had the maximum allele frequency (MAF = 1.00).

Table 4 Genetic variability within Phaseolus vulgaris landraces for seven SSR markers

Genetic diversity and distance, observed heterozygosity and polymorphic information content between Phaseolus vulgaris landraces

The genetic diversity ranged from 0.00 to 0.65 with a total mean of 0.36 (Table 4). The reverse marker PV-atcc001 had the highest genetic diversity (0.65), whereas the reverse marker PV-atcc001, and the forward and reverse markers of PV-ccct001 were the lowest (GD = 0.00). Almost all markers showed observed heterozygosity of zero, except for the reverse markers PV-ggc001 (He = 0.03) and PV-ag001 (He = 0.05) (Table 4). The highest polymorphism (PIC = 0.58) was recorded in the reverse marker PV-ctt001, while the lowest (PIC = 0.00) was found in reverse marker PV-atcc001, and the forward and reverse marker of PV-ccct001 (Table 4).

The genetic distance varied from 0.00 to 0.79 (Table 5). The Eshowe landrace E-50M50C-K had the closest distance (GD = 0.00) with landraces E-50M50LB-Cl, E-90M10C-Cl, Em-50M50LB-Cl, and KN-50B50M-Cl. The genetic distance between E-90LB10M-Cu, E-100Bk-Cl, and Em-100YG-Cl was also zero. The Durban landrace D-100C-Cl showed the farthest genetic distance (GD = 0.79) with landraces E-50LR50C-K, and PS-90DB10LB-Cl. The KwaNdebele landrace KN-100 W-Cl and the Empangeni landrace also had the farthest genetic distance.

Table 5 Nei’s genetic distance of Phaseolus vulgaris landraces using seven SSR markers

Population structure and genetic relationship

Population structure among Phaseolus vulgaris

The Evanno test found a sharp strong maximum for Delta K at K = 2 in the plots of L (K) versus Delta (Fig. 3), and thus clustering the P. vulgaris landraces into two sub-populations. The population structure grouped the genetic relationships of the South African landraces into subpopulations and admixtures as shown in K = 2 and K = 3 (Fig. 4). The structure analysis clustered the 38 landraces into two sub-populations (K2.1 (red), and K2.2 (green)) based on their morpho-agronomic traits at K = 2. K2.1 (red) contained Durban landraces D-100By-Cl, D-90C10LR-Cl, D-50M50LB-Cl, D-50P50LB-Cl, D-50RB50LB-Cl, and D-100YG-Cl, Eshowe landraces E-90LB10M-Cu, E-50M50C-K, E-90M10C-Cl, E-50YG-Cl, and E-100YG-Cl, Empangeni landraces Em-50M50LB-Cl, and Em-100YG-Cl, KwaNdebele landrace KN-50B50M-Cl, Nelspruit landraces N-100LP-K as well as the Port Shepstone landrace PS-100YG-Cl. K2.2 (green) included the D-100C-Cl from Durban, Em-100LB-Cl (Empangeni), KN-100 W-Cl from KwaNdebele, and Br-100LB-Cl from Bushbuckridge as well as the out-groups P. coccineus and P. lunatus.

Fig. 3
figure 3

The Evanno test showing plot parameters of L (K) and Delta against the likely subpopulations of the 38 landraces

Fig. 4
figure 4

Population structure for 38 Phaseolus vulgaris landraces from selected provinces of South Africa revealed by SSR analysis. K = 2 above; K2.1 (red), K2.2 (green), K = 3 below; K3.1 (red), K3.2 (green), K3.3 (blue). Numbers 1–38 correspond to the landraces described in Table 1

The following P. vulgaris landraces were found in the admixtures: PS-90LB10M-Cl and PS-50DB50LB-Cl from Port Shepstone were shared in between K2.1 and K2.2 ((98% red and 2% green) and (95% red and 5% green), respectively). E-50LR50C-K from Eshowe was shared between K2.1 and K2.2 (90% red and 10% green), Port Shepstone landrace PS-90DB10LB-Cl (80% red and 20% green) as well as N-100DP-K from Nelspruit (75% red and 25% green). Again, the landraces PS-90LB10B-Cl from Port Shepstone and M-90LB10M-Cl from Mtubatuba both were shared between K2.1 and K2.2 (70% red and 30% green). However, Port Shepstone landrace PS-50M50LB-Cl and Durban landrace D-90LB10B-Cu were shared in between K2.2 and K2.1 (98% green and 2% red). D-50C50Gy-K from Durban and B-50B50M-Cl from Benoni were shared between K2.2 and K2.1 (95% green and 5% red), and P-50M50C-O from Polokwane (90% green and 10% red). PS-90M10LB-Cl from Port Shepstone, Em-50Bk50C-Cu from Empangeni, and D-90M10LB-Cl from Durban were shared between K2.2 and K2.1 ((80% green and 20% red), (70% green and 30% red) and (60% green and 40% red), respectively), and the Eshowe landrace E-100Bk-Cl (50% green and 50% red).

The further clustering of the population at K = 3 resulted in the separation of South African landraces into three sub-populations. The first group (K3.1 (red)) included Benoni landrace B-50B50M-Cl as well as the out-groups P. coccineus and P. lunatus. The second group K3.2 (green) composed of D-100C-Cl and Em-100LB-Cl from Durban and Empangeni. The Durban landraces D-50RB50LB-Cl, D-100By-Cl, D-50M50LB-Cl, D-90C10LR-Cl, D-50P50LB-Cl and D-100YG-Cl, Eshowe landraces E-90LB10M-Cu, E-50M50C-K, E-90M10C-Cl, E-50YG-Cl, and E-100YG-Cl, Empangeni landraces Em-50M50LB-Cl, and Em-100YG-Cl, KN-50B50M-Cl from KwaNdebele, Nelspruit landrace N-100LP-K, as well as PS-100YG-Cl from Port Shepstone formed group K3.3 (blue).

The majority of the landraces were admixtures, where the landraces Br-100LB-Cl from Bushbuckridge, and D-90LB10B-Cu from Durban were shared between K3.2 and K3.1 (98% green and 2% red), and KN-100 W-Cl from KwaNdebele (95% green and 5% red). However, landrace PS-90LB10M-Cl was shared in between K3.3 and K3.1 (98% blue and 2% red) as well as landraces N-100DP-K and PS-90DB10LB-Cl (60% blue and 40% red), PS-90M10LB-Cl (95% blue 5% red), and M-90LB10M-Cl (50% blue and 50% red). The landrace Em-50Bk50C-Cu was shared in between K3.2 and K3.3 (52% green and 48% blue). The following landraces were shared between K3.1, K3.2, and K3.3: PS-50DB50LB-Cl from Port Shepstone had 98% blue, 1% green, and 1% red, and E-50LR50C-K had 85% blue, 10% red, and 5% green. Again, PS-90LB10B-Cl had 60% blue, 35% green and 5% red, E-100Bk-Cl had 50% blue, 45% green and 5% red, P-50M50C-O had 50% red, 30% red and 20% blue, PS-50M50LB-Cl had 55% green, 40% red and 5% blue, D-50C50Gy-K had 50% red, 44% green and 6% blue, and landrace D-90M10LB-Cl had 40% green, 38% blue and 22% red.

Principal coordinate analysis of Phaseolus vulgaris landraces revealed by SSR markers

In the principal coordinate analysis (PCoA), landraces were grouped based on the genotypic distance, where different landraces were colour-coded according to their area of origin and the two outgroups (Fig. 5). The first two components of the principal coordinates accounted for 44.97% of the total variation. In the upper portion of the first quadrant, landraces D-50RB50LB-Cl from Durban, N-100P-K, and N-100DP-K from Nelspruit, PS-90LB10M-Cl from Port Shepstone as well as the admixtures formed by D-100By-Cl, and D-100YG-Cl from Durban, E-50YG-Cl from Eshowe, N-100LP-K from Nelspruit, and Port Shepstone landrace PS-100YG-Cl were clustered closer together. In the lower portion of the quadrant, the following landraces were closely associated: E-90LB10M-Cu, E-50LR50C-K and E-100YG-Cl from Eshowe, D-90C10LR-Cl from Durban, and Em-100YG-Cl from Empangeni.

Fig. 5
figure 5

Principal coordinate analysis (PCoA) of Phaseolus vulgaris landraces from SSR markers based on the genotypic distance. The landraces were divided into twelve populations based on their area of origin: diamond red- landraces from Benoni; square green- landraces from Bushbuckridge; triangle navy blue- landraces from Durban; circle yellow- landraces from Eshowe; diamond purple- landraces from Empangeni; square light blue- landraces from KwaNdebele; triangle maroon- landraces from Mtubatuba; circle dark green- landraces from Nelspruit; diamond navy blue- landraces from Polokwane; square yellowish-green- landraces from Port Shepstone; triangle dark purple and circle bluish-green represent the out-groups P. coccineus and P. lunatus, respectively

In the second quadrant, landraces were scattered apart. Landraces D-90LB10B-Cu from Durban, KN-100 W-Cl from KwaNdebele, and Br-100LB-Cl from Bushbuckridge were grouped closely. The Empangeni landrace Em-100LB-Cl and D-100C-Cl from Durban were closely associated. The out-groups P. coccineus and P. lunatus, and Em-50Bk50C-Cu from Empangeni were closely clustered. Whereas, PS-90LB10B-Cl from Port Shepstone was further apart from all the landraces in the quadrant. In the third quadrant, PS-50DB50LB-Cl from Port Shepstone was formed in the upper portion. KN-50B50M-Cl from KwaNdebele was associated with D-50M50LB-Cl and D-50P50LB-Cl from Durban, E-50M50C-K and E-90M10C-Cl from Eshowe, Em-50M50LB-Cl from Empangeni, and PS-90DB10LB-Cl from Port Shepstone.

The landraces were scattered in the fourth quadrant, D-90M10LB-Cl from Durban was formed in the upper portion of the quadrant. D-50C50Gy-K from Durban and P-50M50C-O from Polokwane were closely associated. Again, B-50B50M-Cl from Benoni and PS-90M10LB-Cl from Port Shepstone were associated. However, the Port Shepstone PS-50M50LB-Cl, E-100Bk-Cl from Eshowe, and M-90LB10M-Cl from Mtubatuba were further apart.

The phylogenetic relationship between Phaseolus vulgaris landraces

The phylogenetic relationship was further illustrated by the dendrogram using the unweighted pair group method of arithmetic mean (UPGMA) diagram based on Nei’s genetic distance (Fig. 6). The landraces were divided into seven groups by the dendrogram. P. coccineus and P. lunatus from the out-group, each made up a singleton in Clusters I and II, respectively. Cluster III contained landraces KN-100 W-Cl, Br-100LB-Cl, D-100C-Cl, and D-90LB10B-Cu. Cluster IV contained D-50C50Gy-K and PS-50M50LB-Cl, while D-90M10LB-Cl was in its cluster (Cluster V). Cluster VI consisted of PS-90M10LB-Cl, B-50B50M-Cl, M-90LB10M-Cl, and P-50M50C-O. Cluster VII contained all the remaining South African landraces.

Fig. 6
figure 6

Unweighted Pair Group Method of Arithmetic mean (UPGMA) dendrogram based on Nei’s genetic distance of Phaseolus vulgaris landraces using SSR markers

Discussion

Variation in morpho-agronomic traits

The number of groups in a dendrogram (four) (Fig. 1) and biplot (three) (Fig. 2) differed although they both evaluated morpho-agronomic variation, because the dendrogram based its variation on Euclidean distances whereas biplot was on the principal components. However, in both the dendrogram (Cluster IV) and biplot (Group I), the out-groups Phaseolus coccineus and Phaseolus lunatus formed their own group. The association of these landraces was possibly due to their indeterminate climbing growth habit associated with taller plants, numerous branches, and longer, wider, thicker, and heavier seeds, and their delay in the days to flowering (Tables 2 and 3).

The grouping of D-50M50LB-Cl, P-50M50C-O, and KN-50B50M-Cl in a biplot might be due to their shared similarity in seed coat colours (but differed in colour intensity) as well as longer, wider, and thicker seeds and leaves with high chlorophyll content. This is also true for N-100DP-K, N-100LP-K, and D-50P50LB-Cl (Fig. 2). Similarly, P. vulgaris landraces from Bulgaria and Portugal with similar seed coat colour but differed in shape, colour intensity, and area of origin, clustered together (Stoilova et al. 2013). In both the dendrogram (Cluster III) and biplot (Group III), landraces D-100C-Cl, D-90LB10B-Cu, D-50RB50LB-Cl, Em-50Bk50C-Cu, Em-100LB-Cl, KN-100 W-Cl, and PS-100YG-Cl were perhaps grouped according to their similarity in area of origin and morpho-agronomic traits. (Figs. 1 and 2). The same clustering might also be due to their similarity in numerous and longer pods which yielded smaller, lighter, and many seeds (Table 3). Correspondingly, landraces from Ethiopia with numerous pods that contain small and numerous seeds, clustered together (Bareke 2019).

The association of landraces in Cluster IB of the dendrogram (Fig. 1), could have resulted because most of these landraces were taller; had numerous and broader leaves; longer, wider, thicker, and heavier seeds than others (Tables 2 and 3). Plant height and seed traits are considered highly heritable traits (Musango et al. 2016), thus these landraces might be essential in plant breeding programs. In the current study, landraces with the same seed coat colour but different origins clustered together as follows: B-50B50M-Cl and KN-50B50M-Cl in Sub-cluster IA; D-50M50LB-Cl, Em-50M50LB-Cl, and PS-50M50LB-Cl in Sub-cluster IB of a dendrogram (Fig. 1); and D-100YG-Cl, and PS-100YG-Cl in Group II of a biplot (Fig. 2). Comparable studies on P. vulgaris landraces from Poland, Bulgaria and Portugal showed landraces with the same seed coat colour but from different environments being clustered together (Stoilova et al. 2013; Boros et al. 2014).

Genetic diversity among the landraces

A total of 51 alleles with an average of 3.64 alleles per locus, and ranged from one to six as detected by seven Simple Sequence Repeat (SSR) markers were found among the South African P. vulgaris landraces (Table 4). The 13 SSR markers among P. vulgaris landraces in Turkey had a higher average (14.8) and range (6–29) than the alleles of the current study (Bilir et al. 2019). The genetic differences in allelic numbers between the two countries could be due to the diversity in the structure, motif, length, and genomic content of the SSR loci (Blair et al. 2006). The production of numerous (six) alleles per locus by forward and reverse markers of PV-ag001 and PV-ggc001, and also forward and reverse markers of PV-ctt001 in this study, probably means that these SSR markers detected a high degree of polymorphism (Burle et al. 2010). The major allele frequency that ranged from 0.48 to 1.00 with an average of 0.75 in the current study (Table 4) was higher than the range (0.17–0.81) and average (0.46) of the frequency of major alleles of P. vulgaris landraces in Southern Italy (Scarano et al. 2014).

Genetic diversity that ranged from 0.00 to 0.65 among the P. vulgaris landraces in South Africa (Table 4) was within a range from 0.00 to 0.96 found among P. vulgaris landraces from Brazil (Burle et al. 2010). The observed heterozygosity that ranged from 0.00 to 0.05 over seven SSR loci in the current study (Table 4) was within a range from 0.00 to 0.099 identified in 58 SSR loci among landraces in Italy (Gioia et al. 2019). These differences were probably caused by unequal numbers (7 and 58) of detected SSR loci. The lower heterozygosity values in the current study were probably caused by P. vulgaris as a naturally self-pollinating plant and most loci were probably homozygous (Nkhata et al. 2020). The polymorphic information content (PIC) values show how beneficial specific markers are in diversification research (Nkhata et al. 2020). The PIC that ranged from 0.00 to 0.58 among the P. vulgaris landraces in the current study (Table 4) was within a range from 0.00 to 0.96 recorded among the Brazilian landraces (Burle et al. 2010). This variation in PIC between South Africa and Brazil could have resulted from high mutation rates which lead to variability at SSRs loci (Blair et al. 2006).

The reverse marker PV-ctt001 had the highest genetic diversity (0.65) and PIC (0.58) followed by forward markers PV-ctt001 (GD = 0.65 and PIC = 0.58), PV-ag001 (GD = 0.58 and PIC = 0.53), and PV-at003 (GD = 0.54 and PIC = 0.44). This could probably mean that these SSR markers have high polymorphism among P. vulgaris landraces in South Africa and could be ideal for genetic mapping and characterizing genetic diversity for future seed breeding and conservation (Burle et al. 2010). The existence of variability among 36 P. vulgaris landraces and the two out-groups (P. coccineus and P. lunatus) was revealed by a genetic distance that ranged from 0.00 to 0.79 (Table 5). Landraces with similar seed coat colour but different colour intensity and areas of origin (E-50M50C-K, E-50M50LB-Cl, E-90M10C-Cl, Em-50M50LB-Cl, and KN-50B50M-Cl) were the closest in the genetic distance, therefore had the highest degree of similarity. The high similarity could be due to the similar mature seed coat colours, which are probably controlled by the same gene for seed colour (Bassett 2003).

The highest degree of similarity in landraces E-90LB10M-Cu, E-100Bk-Cl, and Em-100YG-Cl, could have resulted from a similar area of origin, as Eshowe and Empangeni are geographically close to each other and are both located on the north coast of KwaZulu-Natal province. The results were similar to the Turkish genotypes, Mus and Bitlis, that were geographically close to one another, and demonstrated close genetic distance (Bilir et al. 2019). The farthest genetic distance and lowest degree of similarity between landraces D-100C-Cl, E-50LR50C-K, and PS-90DB10LB-Cl from Durban, Eshowe, and Port Shepstone, respectively, was probably due to the low rates of gene flow detected by the SSR markers among these KwaZulu-Natal landraces (Musango et al. 2016). The decrease in similarity could be explained in terms of increasing genetic distances between KN-100 W-Cl and PS-90DB10LB-Cl that could have resulted from major differences in the area of origin, where Port Shepstone is in moist, coastal areas of KwaZulu-Natal province and KwaNdebele is in dry, inland regions of Mpumalanga province.

Genetic relationships among the landraces

The population structure (Fig. 4), principal coordinate analysis (Fig. 5) as well as dendrogram (Fig. 6) grouped some landraces in a different manner because their analysis differs as they are based on allele frequencies at each locus, genotypic distance, as well as unweighted pair group method of arithmetic mean and Nei’s genetic distance, respectively. The population structure for K = 2 (Fig. 4) and the highest delta value that occurred at K = 2 (Fig. 3) indicated that the landraces (P. vulgaris landraces and the two out-groups (P. coccineus and P. lunatus)) could be divided into two subpopulations with admixed landraces between the subpopulations. The results were similar to the population structure of P. vulgaris germplasm in Malawi, where delta K was the highest at K = 2 (Nkhata et al. 2020). At the K = 3 levels, where the population was modelled to evaluate more genetic variations of the subpopulations and the admixtures, 38 landraces were grouped into two subpopulations based on the Bayesian genotype clustering approach. This might have resulted from the domestication of P. vulgaris from two gene pools, namely, Mesoamerican and Andean (Musango et al. 2016). The population structure showed an overlap among landraces, as several landraces from the Mesoamerican gene pool were identified as carrying some seed traits or genes from the Andean gene pool (Fig. 4). This may have occurred as a result of the use of Andean landraces as dominant donor parents in certain breeding programs, resulting in certain genes being shared between the two gene pools (Almeida et al. 2020).

Landraces D-50RB50LB-Cl, N-100DP-K, PS-90LB10M-Cl as well as the admixtures (D-100By-Cl, D-100YG-Cl, E-50YG-Cl, N-100LP-K, and PS-100YG-Cl) in the upper portion of the first quadrant (PCoA) (Fig. 5) had the closest distance and were clustered in the Cluster VII of the dendrogram (Fig. 6), and the admixtures were closely associated in the cluster. This highest degree of similarity was probably due to their similar vegetative and reproductive traits, such as taller plants, thicker stems, numerous leaves as well as their earlier days to flowering, longer and wider pods, and numerous seeds per plant (Tables 2 and 3). The results were comparable to the study conducted in Zimbabwe, where P. vulgaris landraces from different gene pools were clustered together due to similar morphological and agronomic traits (Musango et al. 2016).

The lower portion of the first quadrant (PCoA) and Cluster VII (dendrogram) was composed of E-50LR50C-K, D-90C10LR-Cl, and admixtures Em-100YG-Cl, E-90LB10M-Cu, and E-100YG-Cl. This clustering possibly resulted from high rates of gene flow among the populations, which might have resulted from similar geographical areas as Durban, Eshowe and Empangeni are all coastal areas of the KwaZulu-Natal province. According to the clustering analysis among P. vulgaris from Turkey, the populations that demonstrated high similarity and high gene flow were geographically close (Bilir et al. 2019). These results show large variations in seed coats but had a high degree of similarity that probably emerged from gene introgressions due to random bee pollination in the fields in the same geographical areas (Musango et al. 2016) or through natural cross-pollination (Nkhata et al. 2020).

The furthest clustering of landraces in Clusters I and II particularly from the majority in Cluster VII of the dendrogram (Fig. 6) indicated the highest degree of dissimilarity, which probably resulted from differences in seed coat colour, seed shape (cuboidal to cylindrical), and possibly the different gene pools. These variations can be attributed to large genetic differences between the two groups as a result of parental race differences; Andean origin and Mesoamerican origin based on seed weight (Gioia et al. 2019). This was also true for the scattering of P. vulgaris landraces in the fourth quadrant of PCoA (Fig. 5).

Landraces with different seed coat colour and shape (Br-100LB-Cl, KN-100 W-Cl, and D-90LB10B-Cu) and from different origins (Bushbuckridge, KwaNdebele, and Durban, respectively) were genetically close based on their close distance in PCoA (Fig. 5) and Cluster III in the dendrogram (Fig. 6). This might have resulted from the similar gene pool (Mesoamerican) based on their middle-sized seeds (100-seed mass) and probably influenced by the similar morpho-agronomic traits (Table 3). These results were similar to the study of P. vulgaris landraces from Zimbabwe (Musango et al. 2016). The grouping of Em-100LB-Cl and D-100C-Cl in the PCoA (Fig. 5) was probably due to the similar seed coat colour (but different intensity), seed shape (cylindrical), and also the similar geographical location (coastal areas of KwaZulu-Natal). The close similarity between these landraces was also supported by Cluster II of the dendrogram (Fig. 6). Em-100LB-Cl and D-100C-Cl probably shared similar seed coat colour genes (gene c/c) responsible for the lighter or paler brown colour in the seed coats (McClean et al. 2002).

The close genetic relationship of landraces from different geographical areas (PS-50DB50LB-Cl, PS-90DB10B-Cl, E-50M50C-K, E-90M10C-Cl, E-50M50LB-Cl, D-50P50LB-Cl, Em-50M50LB-Cl, and KN-50B50M-Cl) in the third quadrant of PCoA (Fig. 5) and Cluster VII of the dendrogram (Fig. 6), probably resulted from the exchange or introduction of planting materials (seeds) between farmers in different provinces (Nkhata et al. 2020). The sharing of the ancestry between these landraces was probably due to the intergene crossing in breeding or natural hybridization (Scarano et al. 2014). These results might indicate that landraces, such as E-50M50LB-Cl, Em-50M50LB-Cl, KN-50B50M-Cl, E-5M50C-K, and E-90M10C-Cl, were sown from the same parental seed or parent with similar seed coat colour.

The out-groups P. coccineus and P. lunatus were characterized as the most dissimilar landraces followed by the E-100Bk-Cl and Em-50Bk50C-Cu in the PCoA (Fig. 5). The results were also supported by the phylogenetic diagram as the outgroups formed their clusters, Cluster I for P. coccineus and Cluster II for P. lunatus, while E-100Bk-Cl and Em-50Bk50C-Cu were grouped in Cluster VII (Fig. 6). The out-groups might be dissimilar to the rest of the landraces due to taller climbing plants, thicker stems, numerous leaves and branches as well as longer, thicker, and wider seeds with heavier mass (Tables 2 and 3). Landraces E-100Bk-Cl and Em-50Bk50C-Cu were grouped in the dendrogram but had the farthest distance in the PCoA. The grouping was possibly due to their similar morpho-agronomic traits (Tables 2 and 3) and may also share the same seed coat gene ([Cr] Z J G B V Rk) that expresses the black seed coat (Bassett 2003). The farthest distance which shows the high rate of dissimilarity between the two landraces probably resulted from the variation in gene pools, E-100Bk-Cl might belong to the Mesoamerican based on the middle-sized seeds (100 seed mass) and Em-50Bk50C-Cu due to the large seed belonged to the Andean gene pool (Table 3; Gioia et al. 2019).

Conclusion

In conclusion, the selection of vigorously growing and high yielding landraces for future large-scale farming and breeding is enhanced by grouping landraces in a biplot and dendrogram based on similarities in their seed coat colour, morpho-agronomic attributes as well as area of origin. Landraces B-50B50M-Cl, D-90M10LB-Cl, D-90LB10C-Cl, D-100YG-Cl, N-100DP-K, N-100LP-K, and PS-90DB10LB-Cl are potential for selection because of vigorously growing shoots, leaves, with high chlorophyll, which yielded numerous branches and leaves, as well as longer and wider pods with numerous, longer, thicker and heavier seeds; and they can adapt to the new environment and mature faster than other P. vulgaris in the current study. The genetic variation revealed by the majority of simple sequence repeats markers had lower genetic diversity than those reported in other studies, implying a limited number of rare variants among the P. vulgaris landraces of various origins. They also discovered that the reverse and forward markers PV-ctt001, as well as the forward markers PV-ag001 and PV-at003 in P. vulgaris, had higher genetic diversity, making them excellent for future breeding and conservation. The population structure of the current study showed an overlap among landraces, as several landraces from the Mesoamerican gene pool were identified as carrying some seed traits or genes from the Andean gene pool (many landraces were represented as admixtures). This was also supported by the PCoA and the dendrogram. In the South African landraces, it can be concluded that the morpho-agronomic traits are not showing what is truly represented by the genes. The P. vulgaris landraces could further be tested in various locations to look for morpho-agronomic traits and their adaptation to biotic constraints and also be evaluated in the mitochondrial DNA analysis to screen for ancestry origin.