Background

Common bean (Phaseolus vulgaris L. 2n = 2x = 22) is a herbaceous annual plant grown worldwide for its edible bean, popularly consumed both dry and green bean. Being an important component of sustainable agriculture, it is regarded as the ‘Grain of Hope’ [39, 56]. Genetic analysis of common bean shows that it is originated from Mesoamerica (Mexico, Central America, and Colombia) and presumably Andes (southern Peru, Bolivia, and Argentina) [20]. Six races of common bean have been identified within these gene pools which include three Andean (Chile, Peru and Nueva Granada) races and three Mesoamerican (Durango, Mesoamerican and Jalisco) races, all of these have been distinguished biochemically and morphologically [52, 53]. In addition to these six races, Guatemala race has been identified in Central America which includes climbing beans [3]. All these races have been analysed using various molecular markers including Isozymes, RAPDs and SSRs to obtain insights about their genetic diversity. PCR based simple sequence repeats (SSRs) are short segments of DNA repeats (2–6 nucleotide long) found multiple times in the genome. Among all the markers, SSRs have been used for genetic diversity and population structure studies from time to time in several crop plants and their related wild species e.g., rice [50]; Chinese bread wheat [38]; common bean [11, 33, 34, 58]; Wild Rhubarb [14]; Kewda [40]. Due to the fast rate of mutations and variability [38, 47] in common bean, SSRs are suitable for genetic mapping and evaluating the species diversity [31, 35, 37].

Also, SSR markers are highly polymorphic and multi allelic that can amplify up to 25 alleles per locus and are abundant and widely distributed in both gene coding and non-coding regions of higher eukaryotes. Gaitán-Solís et al. [16] first time developed SSR markers for common beans to evaluate genetic diversity of cultivated and wild species. SSR based genetic maps have been constructed in common beans to evaluate the intraspecific genetic diversity [5, 16, 57]. SSRs have been divided into genic and genomics SSRs. Genic SSRs/microsatellites are those that are found within or closely associated with gene sequences in the genome whereas, genomic SSRs are associated with non-gene containing genomics regions [5, 54]. The genic SSR are well conserved and comprise diverse types of sequences of SSRs in the genome and have been frequently used for diversity analysis [5].

Phaseolin gene based characterisation of common bean germplasm has been done to understand the existence of gene pools in the collected material. The Phaseolin is major protein in common beans coded by 6–10 small family of genes which are tightly linked in the D7 linkage group [10, 21, 44]. Multiple domestication and reduction in the genetic diversity of common beans was found by studying the genetic variation patterns in Phaseolin protein locus by polyacrylamide gel electrophoretic studies [17, 19, 30]. Two types of Phaseolin proteins have been identified that include S and T types [10, 19]. High levels of sequence similarity was found between S and T phaseolin types after subjecting the genomes of common beans to DNA sequence analysis [2, 29]. It has been found that T phaseolin gene comprises of α and β sub-families. The members of α sub-family consists of tandem repeats sequences of 15 bp in 4th exon and 27 bp in 6th exon, whereas, these repeats are not found in β sub-families of T Phaseolin [29]. Cultivated crops are well characterized both morphologically and through molecular analysis whereas as wild crops need to be further investigated by assessing difference in morphological characters, protein (Phaseolin) type and by using molecular markers [5]. In 2014, the genome of common bean became available, and that made it possible to compare and understand the similarities and differences which make these species unique and important for human nutrition around the globe [25]. The in-depth knowledge of diversity of a specific crop is prerequisite for setting up an effective breeding program. This goal can be achieved, by making use of available diverse landraces that are actual repositories of novel genes for evolution of quality traits. The information of association mapping, mining of alleles for novel genes can be possible by evaluating population structure and genetic diversity of a specific crop [54]. In the present study, we employed SSR markers to elucidate the genetic diversity and population structure of Phaseolus vulgaris genotypes collected from foothills of the Himalayan region of Jammu and Kashmir and Ladakh, in an attempt to understand the variability among them. Moreover, Phaseolin gene based diversity was studied to discrimate the collected material based on their respective gene pool.

Materials and methods

Collection of plant material

The genetic material in the present studies included seeds of 102 genotypes collected from North-West Himalyan regions of Jammu and Kashmir and Ladhak, India (Supplementary table 1). The samples were collected from the local farmers and then subjected to field trials in the experimental fields at SKUAST-K, Shalimar campus (34.1485° N, 74.8696° E) to purify the material and were then subjected to analysis.

Genomic DNA isolation, SSR genotyping and scoring of bands

Fresh leaves from the 2 week old seedlings grown in cups were harvested for genomic DNA isolation. The genomic DNA was isolated by following Doyle and Doyle [12] method. The isolated DNA was purified and dissolved in 1X TE Buffer and stored at 4 °C. Quantification of DNA was done by running samples on 0.8% Agarose gel stained with ethidium bromide along with known standards and the results were visualized in Gel Documentations System (Syngene, Genius).

To carry out molecular characterization a set of 11 SSRs (detailed in supplementary Table 2) were used. One SSR of each linakge group was selected. The PCR amplification was carried for 5 μl reaction mixture containing 50 ng DNA template, 2.5 μl of 2X KAPA Taq ReadyMix (Cat. No. KK1024) manufactured by Kapa Biosystems (Sigma-aldrich.com) 0.37 μl of 0.74 pM of each primer (Forward and Reverse) and 1.06 μl of sterile water. The amplification reaction was carried out in a gradient master cycler (Applied Biosystems, Thermo Scientific). An initial denaturation step for 4 min, followed by a loop of 35 cycles each consisting of denaturation (at 94 °C for 30 s), annealing (at 47–55 °C for 30 s) and extension (at 72 °C for 30 s) was programmed. The final extension was performed at 72 °C for 7 min. The amplified products were resolved on 3% agarose gel subjected to 125 V for 1 h the poorly resolved SSRs were further resolved on silver stained PAGE (poly acrylamide gel electrophoresis). The gels were visualized in gel documentation system with inbuilt software ALPHA SA system for scoring of bands. The bands found within the range of expected base pair (bp) were scored in allelic form according to their fragment size (bp) corresponding to the 100 bp molecular weight marker (Invitrogen; Cat. No. 10488-053).

Data analysis based on SSR markers

Eleven SSRs were employed to study various discriminatory parameters among 102 genotypes of common bean by using Power Marker [32] and GenALEx 6.51 [45]. The unweighted neighbor joining tree was constructed by analyzing dissimilarity matrix using shared allele index in DARwin software [46]. Nei’s coefficient with bootstrap protocol of resampling across markers and individuals from allele frequencies was employed to construct the genetic distance between accessions [41]. Principal coordinate analysis (PCoA) was also performed by DARwin software [46]. Population structure of the genotypes under investigation was obtained by STRUCTRE version 2.3.4 software [48] by setting programme at 50,000 of each burn-in and MCMCs with 10 replications of each K (1–10). The results obtained by STRUCTURE software were further analyzed in STRUCTURE HARVESTER for finding out the best K value [13, 15]. In addition, the STRUCTURE software was used to estimate the level of genetic differentiation or wright fixation of F statistics (FST) simultaneously [55]. Further analysis of molecular variance (AMOVA) was performed by GenALEx 6.51 [45].

Phaseolin marker analysis

To classify the common bean genotypes based on their gene pools (Meso American and Andean), phaseolin locus was amplified on 81 genotypes of common bean. The primer for phaseolin gene with sequence for forward primer 5′-AGCATATTCTAGAGGCCTCC-3′, and the primer sequence for the reverse primer 5′-GCTCAGTTCCTCAATCTGTTC-3′ were selected from Kami et al. [29]. The procedure adopted for the amplification of phaseolin locus, was similar as adopted for SSR genotyping, which is detailed above. The scoring of phaseolin marker was done as per Kami et al. [29].

Results

SSRs based genotyping

A total of 102 genotypes were amplified by employing 11 SSRs. An average of 30 alleles per SSRs were amplified among 102 common bean genotypes with maximum 47 alleles in BM184 and minimum 13 alleles in BM98 (Table 1). The major allelic frequency varied from 0.076 (BM-210) to 0.469 (BMC-234) with an average mean of 0.216 (Table 1). The gene diversity (expected heterozygosity) values ranged from 0.751 (BMC-234) to 0.967 (BM-210) with an average of 0.904. The mean Polymorphic Information Content (PIC) value was found to be 0.899 with minimum value of 0.738 (BMC-234) and maximum of 0.952 (BM-210) (Table 1). Further, 11 SSRs revealed more than 50% population with no heterozygosity. Mean value for inbreeding coefficient within each individual was found to be 0.985 raging from 0.900 to 1.00 (Table 1). Moreover, average Fixation index per SSR was 0.094 with minimum value of 0.037 to maximum value of 0.256 (Table 1).

Table 1 Detail of diversity indices obtained after subjecting Phaseolus vulgaris to SSR marker analysis

Genetic diversity and population structure analysis

The dendrogram illustrates the relationship among 102 genotypes of common bean. The genotypes were classified into three major cluster based on the genetic variation (Fig. 1a). Cluster I is sub-divided into two sub-clusters that includes 34 genotypes and cluster II is sub-divided into sub-clusters that includes 25 genotypes whereas, the cluster III sub-divided into other sub-clusters that includes 43 genotypes as detailed in Supplementary table 1. The PCoA also divided the genotypes into three groups (Fig. 1b).

Fig. 1
figure 1

a Dendrogram illustrating genetic relationship among 102 genotypes using 11 SSR primers, b principal coordinate analysis of 102 genotypes constructed by DARwin software

STRUCTURE analysis revealed formation of three populations with slight mixing of genotypes as represented in population structure plot (Fig. 2b). The assumed values of probable sub-populations (K) were ascertained by choosing higher ΔK value, with respect to the number of clusters inferred by STRUCTURE software (Fig. 2a). The individuals were assigned to sub population based on membership probability ≥ 80%. Subpopulation I consisted of 30 (29.4%), subpopulation II consisted of 49 (48%), subpopulation III consisted of 22 (21.5%) genotypes with only 1 genotype as admixture. Genetic differentiation between the three sub-populations ranging from 0.09 to 0.17 indicating that all the three population groups were significantly different from each other (Table 2). Further the expected heterozygosity ranged from 0.84 to 0.88 and FST values from 0.08 to 0.14 for three subpopulations (Table 3).

Fig. 2
figure 2

a Graphical representation of the optimal number of groups in the program STRUCTURE inferred using the criterion of Evano et al. [15]. b STRUCTURE plot of membership coefficients for all the accessions of common bean in the study sample sorted in the same order and classified according to successive selected preset K values ranging from 1 to 10. For K = 3 the groups are identified

Fig. 3
figure 3

Results from phaseolin PCR assay. 8% polyacrylamide gel M depicts ladder in 1st and last well, genotype 1–51 from 2nd to second last well. T and S depicts T and S type of phaseolin; and-depicts no amplification

Table 2 Pair wise population differentiation based on FST values between three common bean sub-populations identified by STRUCTURE software
Table 3 Heterozygosity and FST value calculated for three common bean sub populations by STRUCTURE software

Analysis of molecular variance

Three common bean populations generated from structural analysis were also subjected to analysis of molecular variance (AMOVA) to estimate the percentage of variation among populations, among individuals and within individuals. Out of the total genetic variance among populations, 8% was attributed to the populations based on structure, 90% was attributed among individuals whereas 2% difference was attributed to within the individuals (Table 4).

Table 4 Analysis of molecular variance (AMOVA) by GenALEx software

Phaseolin gene based classification of germplasm

Amplification of phaseolin locus based on presence of either “S” or “T” type phaseolin was observed in 81 common bean genotypes. As such these 81 genotypes were classified into Mesoamerican or Andean origin. 40 genotypes (39.22%) with “S” type band were classified to be having Mesoamerican origin where as the 41 genotypes (40.20%) with “T” type band were classified to be having Andean origin (Fig. 3).

Discussion

Common bean is a widely cultivated crop in the Himalayan regions of Jammu and Kashmir. For the nutritional improvement of crop plants to combat the food and nutritional insecurity, knowledge of genetic diversity is crucial. The accomplishment of well-designed breeding programs needs germplasm with a high level of genetic diversity. So evaluation of genotypes for diversity analysis is important for its utilization in different breeding programs [39]. Majorly two strategies have been employed to assess the genetic makeup of crop plants that is morphological and molecular marker analysis. Morphological markers are easily affected by environmental and other factors, making it a very inaccurate method to characterize crop plants. Molecular markers like SSRs on the other hand, are accurate and reliable tools to assess the genetic variability and have been frequently used in common beans and other legumes [7, 18, 22, 24–27, 49].

In current study, SSR markers were successfully used to assess the genetic diversity among 102 common bean genotypes collected from Himalayan regions of Jammu and Kashmir and Ladakh. These SSRs were chosen from an abundance of markers available in public domain and were selected such that they cover all the linkage groups [5, 42]. In the present study, we observed an average number of 30 alleles per SSR. The results were nearly similar to Blair et al. [6]. The reason for high number of alleles per SSR might be the use of genomic SSR in this study, as genomic SSRs can resolve within gene pool variation [6]. SSR based polymorphism information content (PIC) can be employed for screening appropriate markers to construct the genetic maps, association mapping and carrying phylogenetic analysis [1]. The PIC values reveal the quality of the marker and its capability to detect the genetic variability based on preliminary studies [9]. Biallelic nature of dominant and co-dominant markers like ISSR and SSR respectively results in very low PIC values [48]. In present studies the values of PIC ranged from 0.738 to 0.952, the average value remained to 0.899. The high level of polymorphism is due to huge diversity among genotypes and selection of highly polymorphic markers based on earlier studies. Metais et al. [36] found PIC ranging from 0.05 to 0.83 after subjecting 20 genotypes of common beans to SSR marker analysis. In addition, Gomez et al. [23] assessed 60 genotypes of common beans to SSR markers analysis and found PIC values ranging from 0.03 to 0.70, suggesting that PIC values help to elucidate the complexity of diversity depending on number and genotype diversity. It has been observed that lower PIC values are obtained from closely related genotypes and higher values for genetically distant genotypes. Other frequently used parameter for assessing the genetic variability is gene diversity and heterozygosity. The values of PIC and gene diversity were found to be nearly similar which could be due to the large number of alleles per SSR. Variation in gene diversity and heterozygosity has been observed in earlier common bean diversity studies [23, 36, 42, 43, 51]. Use of germplasm from different geographical locations, availability of different marker systems and scoring pattern might lead to difference in these parameters. The structure of genetic variation within and among the population is greatly influenced by the Wright’s F-statistics (FIS, FST) [28]. FST is directly linked to the variance in allele frequency among populations whereas inversely to the degree of resemblance among individuals within populations. In our study, FST over all loci across common bean germplasm is 0.094, indicating a low degree of genetic differentiation among subpopulations. The estimates of within-subpopulation inbreeding coefficients (FIS) were considerably higher in our study (Table 1). High FIS implies a considerable degree of inbreeding [28].

Dividing the population based on the geographical location is an important parameter for studying evolution of a particular species. The unweighted neighbor joining tree constructed by DARWin software divided the collected germplasm into three clusters which distributed the genotypes based on the place of collection. Further the PCoA also divided the germplasm into three clusters. Moreover, population structure analysis is based on bayesian method and distributes the individual of a population based on ΔK. In our study, population structure analyses divided the genotypes into three populations (K = 3). As such, in the present study the germplasm was classified into three groups, based on cluster analysis, PCoA, structure analysis that has slight differences due to use of different algorithms. Different and large number of molecular markers and germplasm might result in variation in the population structure of that species. Partial reproductive isolation and lower genetic drift might also have attributed to the variation in diversity and population structure analysis. In case of common bean population structure analysis earlier studies have divided the common bean population into 2–6 sub population [8, 32, 42, 43, 51, 58]. Microsatellites have been previously used for the analysis of diversity in common bean breeding lines from Canada [57] in wild accessions and related species [15], in snap beans [36] and in dry bean land-races from Europe and Nicaragua [23]. Further, AMOVA analysis was also found to be in accordance with other results.

Common bean has been divided into two major ecogeographical gene pools i.e. Mesoamerica and the Andes 7000–8000 years ago. However, with the evolution process, common bean from Ecuador and northern Peru formed an intermediate (I) type gene pool between earlier two [29]. This intermediate type of gene pool was also confirmed by the phaseolin marker study in common bean germplasm [29]. These gene pools are characterized by partial reproductive isolation and could be seen in both wild and domesticated common bean genotypes [20, 30]. Origin and domestication of common bean germplasm can be known by Phaseolin marker. This marker helps in classification of common bean germplasm based on its origin [19, 29]. The presence of S type allele in the genotype shows its origin from Andean gene pool where as the presence of T type allele in the genotype shows its origin from Mesoamerican gene pool. In our study, we were able to characterize the common bean germplasm collected from Himalayan region into two gene pools Mesoamerican and Andean. Moreover, we could not find any genotype with I type gene pool, the reason might be relict population of I type as it represents only a small fraction of genetic diversity of the ancestral population [4].

Conclusion

The use of SSR markers for assessing genetic diversity and population structure of common beans from different north western Himalayan regions have shown significant levels of genetic variation that will serve as an important genetic resource. The study further unraveled the gene pool of 81 genotypes collected in the present investigation. The insights provided by this study will serve as a foothold in formulating strategies for conservation of these landraces. The identification of gene pool of each landrace will help the breeders to understand their evolution. This will also help in designing crossing programmes between and among genotypes of same or different gene pools for developing various mapping population for QTL/marker gene identifications which in turn can lead to the development of improved common bean varieties.