Introduction

There are many indigenous food legumes whose potential is underexploited and untapped. Mucuna pruriens, described as a self-pollinated species, is a tropical legume classified within the phaseoloid clade of Leguminosae, also includes soybean, common bean, mung bean and its relatives. It is reported to be native of China and eastern India (Buckles 1995) and has chromosome number of 2n = 2x = 22 (Sastrapradja et al. 1972; Bairiganjan and Patnaik 1989) and an estimated genome size of 1361 Mbp. The plant exhibits climbing habit, trifoliolate leaves, hairy aerial parts and a long inflorescence of white or purple flowers. The pods containing 4–6 seeds mostly green or brown in colour with thickly covered soft or stiff orange bristles, of which the latter causes intense allergic irritation to human skin.

The proximate nutritional composition, total protein content and in vitro protein digestibility of M. pruriens seeds is analogous to other edible legumes (Bressani 2002). It is therefore used as a minor food crop by many Indian and African tribes and its use as livestock feed—a common use of it in the early 1900s in the USA and elsewhere—is prevailing (Jorge et al. 2007). From an agronomic point, it is known to produce seed yield of 2000 kg/ha (Buckles 1995); perform well under dry farming and low soil fertility conditions (Siddhuraju et al. 2000), shows resistance against wide range of diseases (Eilittä et al. 2002); exhibits allelopathic properties (Fujii et al. 1991) and is effective in lowering nematode population (Carsky and Ndikawa 1998). Its positive impacts as green manure cover crop are documented in earlier studies (Tarawali et al. 1999; Jorge et al. 2007).

Seeds of M. pruriens contain high levels (1–6%) of L- Dopa (L-3,4-dihydroxyphenylalanine; Kavitha and Thangamani 2014), a precursor of dopamine used in the treatment of Parkinson’s disease (Haq 1983). However, toxic properties of the drug and resultant side effects such as nausea, anorexia and vomiting in human beings and intestinal ailments in livestock, render the plants, particularly seeds inedible (Szabo and Tebbett 2002). Efforts are therefore underway to breed varieties with safe levels of L-Dopa to make it fit for consumption. On the other hand, even those plants with elevated levels of L-Dopa may be desired as a high value medicinal plant by the pharmaceutical industries.

Genetic linkage maps have emerged as valuable resources to be used as framework for a number of plant breeding applications such as marker-assisted selection (MAS), map-based cloning, physical and comparative mapping (Staub et al. 1996) etc. They are proved to be useful in precise detection of chromosomal locations and to study individual and interactive effects of genes for complex traits in several important legume species such as Lotus japonicus (Hayashi et al. 2001), Medicago truncatula (Thoquet et al. 2002); M. sativa (Julier et al. 2003), Phaseolus vulgaris (Yuste-Lisbona et al. 2012) and Pisum sativum (Sun et al. 2014). Recently, genetic maps have been successfully developed for even lesser known legume species such as azuki bean (Han et al. 2005); bambara groundnut (Ahmad 2012); lima bean (Bonifácio et al. 2012), yardlong bean (Kongjaimun et al. 2012) etc.

However, in M. pruriens except for a few studies involving molecular markers (Capo-Chichi et al. 2001; Padmesh et al. 2006; Sathyanarayana et al. 2010, 2011, 2012; Leelambika et al. 2010) and a lone published linkage map based on amplified fragment length polymorphism (AFLP) markers (Capo-Chichi et al. 2004), little effort has been made towards development and deployment of genomic resources to augment the breeding programmes.

Therefore, in the present study, an F 2 mapping population derived from a cross between two botanical varieties of M. pruriens and a set of AFLP markers has been used to construct first intraspecific genetic linkage map. Also, localization of the genetic loci controlling three qualitative and four quantitative traits, of which six are economically important is attempted.

Materials and methods

Plant material

The parents of the mapping population were selected based on the information in germplasm catalogue developed by the present authors. The female parent belonged to botanical variety utilis(velvet bean), a cultivated variety. Male parent is a wild genotype belonging to variety pruriens(itching bean). The crossing experiments were carried out during May–June 2009 following regular procedures of hand emasculation, dusting of donor pollens and bagging for protected pollination. True F 1 hybrids from the above interbotanical varietal cross were screened based on the dominant flower colour of the male parent. Two hundred and seventy random seeds obtained after selfing single F 1 plant were sown in the field and a F 2 population of 200 surviving individuals was obtained. All F 2 plants were grown in an open field in Mucuna Field Gene Bank (figure 1) established at Sir M. Visvesvaraya Institute of Technology, Bengaluru (12.58 N, 77.38 E; sandy loamy soil; 867 mm rainfall per year; channel irrigation) during January–June 2010.

Figure 1
figure 1

Depiction of (a) phenotypic characters of flower, pod and seeds of F 1 hybrid and F 2 segregating phenotypes, (b) 200 F 2 segregating individuals raised in the field.

DNA isolation and AFLP marker analysis

Genomic DNA was extracted from the young leaf tissue (2–3 week old) of the F 2 individuals using modified CTAB protocol (Doyle and Doyle 1990). AFLP marker analysis was performed following the method described by the authors earlier (Sathyanarayana et al. 2011). The details of the adapters and oligonucleotides used as primers are provided in table 1.

Table 1 Sequence information on the AFLP adapters and primers used.

The AFLP fragments were scored as dominant, i.e. presence versus absence of bands, and therefore for the parents, F 2 generation markers were assigned to either of the parental allele for map construction. Markers originating from each parent were scored according to standard coding system using A, B and H of Mapmaker (Lander et al. 1987) to the individual locus for map construction. The missing data was recorded as ‘–’. The AFLP markers were designated by the primer pair used followed by a number reflecting the fragment position on the gel (e.g. AACCTA4 = EcoRI + AAC/MseI + CTA, band 4). The numbers given for the markers are in ascending order by molecular weight. A total of 805 AFLP markers were generated using 12 primer combinations of which 347 marker were polymorphic. Chi-square tests of goodness-of-fit were performed on the segregation data for all the markers.

Phenotyping of the traits

Fully matured pods were harvested from each F 2 plant for collection of seeds. Phenotyping data on length (mm), width (mm), thickness (mm) of seeds and 100-seed weight (g) were recorded as per the method described earlier. The data was recorded in five replicates for random seeds and then averaged. Qualitative traits including flower colour, fruit colour, presence of trichome on the pod, trichome colour and pod itchiness were recorded visually. Flower colour was scored as presence or absence of purple pigment, pod itchiness was scored as presence of itchiness, and the trichome colour was scored based on ranking of colours (0–3 ranks).

Linkage mapping

Each marker was tested for the segregation ratios. The alterations in the map distances of adjacent markers caused by the removal of markers with a skewed segregation were analysed. Such highly distorted markers were excluded and the linkage map was constructed based on 129 AFLP markers using Antmap ver. 1.2 (Iwata and Ninomiya 2004) software followed by generation of a pictorial depiction of linkage groups using MapChart 2.2 (Voorrips 2002).

Initial grouping of the linkage groups was performed using nearest-neighbouring locus method following the grouping criterion of recombination with a grouping threshold at 0.3. Segregation distortion of markers was determined by chi-square goodness-of-fit analysis, while the locus ordering was done based on LL (log of likelihood) criteria. Marker distances were calculated based on the Kosambi mapping function (Kosambi 1944).The bootstrap test was also performed for the linkage groups with about 1000 iterations and the Antmap colony optimization for better linkage grouping was accomplished with 5000 iterations.

Mapping quantitative trait loci (QTL)

QTL analysis was performed using winQTLcartographer 2.5 (Wang et al. 2012) based on three levels namely, single marker analysis (SMA), simple interval mapping (SIM) and composite interval mapping (CIM). In SMA, the test of association between trait values and the marker loci of the genotypes was carried out at a significance levels of 5, 1, 0.1 and 0.01% indicated by *, **, *** and ****, respectively. In SIM, the association between the trait values and the expected contribution of hypothetical QTL at multiple analysis points between each pair of adjacent marker loci was calculated. A manual threshold value of LR 14 (LOD 3.0) was tested through 1000 permutations scanning of the LOD curves with walk speed of 1 cM at 0.05 significance levels. Like SIM, in CIM, the possibility of a target QTL at multiple analyses points across each marker interval was evaluated but including the effect of one or more background markers at each point. Similar statistical model like SIM was deployed for CIM.

A LOD threshold of 3.0 and above was used to declare putative QTLs since empirically computed LOD thresholds for the traits were slightly lower than 3.0. The phenotypic variance explained by each QTL (R 2) was calculated inclusive of additive and dominance effects of the markers in the latter two methods of mapping.

Results

Mapping population

The summary of phenotyping data of the parents, F 1 hybrid and the F 2 population are given in table 2. Of the 270 selfed seeds produced from an F 1 hybrid, 200 survived after planting in the field and were used as F 2 mapping population for the present work. The F 1 hybrid as well as some of the F 2 segregants fashioned an un predicted behaviour and produced variations which were not present in either of the parent (figure 1). For instance, F 1 hybrid flowered very late, in comparison to parents; likewise the flowering and maturity dates of few of the F 2 individuals were as late as F 1 hybrid and in few cases even later (table 2). Besides, the F 2 plants also produced black-velvety pods, brown to light brown seeds with both mottled as well nonmottled forms, all of which were absent in parents.

Table 2 Comparison of morpho-agronomic characters between the parents, F1 hybrid and F2 population.

Inheritance of traits

The parental varieties differed with regard to seven qualitative traits such as flower colour, fruit colour, pod itchiness, pod trichome colour, seed shape, seed colour as well as pattern. The F 2 plants exhibited monogenic (3:1) segregation for flower colour (χ 2= 0.41 at 0.05 probability levels), while pod itchiness showed polygenic inheritance (χ 2= 5.78 at 0.05 probability levels). Also, the pod trichome colour did not fit Mendelian segregation ratios.

Correlation between the traits

The values for the mean, variance, standard deviation, coefficient of variation and range of quantitative seed traits recorded on the F 2 population are provided in table 3. Pearson correlation analysis revealed significant correlation for all the seed traits (table 4). However, high correlation coefficient value was observed between seed length and seed width (0.620).

Table 3 Statistical measures on four quantitative seed traits in F 2 population.
Table 4 Matrix of Pearson’s correlation coefficient for the four quantitative seed traits.

Linkage mapping

Twelve AFLP primer combinations (table 5) used for genotyping 200 F 2 individuals produced 805 scorable markers of which 347 were polymorphic. The genetic linkage map constructed included 129 AFLP markers after excluding all the distorted ones distributed over 13 linkage groups spanning a total distance of 618.88 cM with an average marker interval of 4.79 cM (table 6; figure 2). Based on the number of markers and the distance coverage, the linkage groups can be classified into three main classes. First one with two linkage groups containing14–30 markers spanning the highest distance range of all, 120.5–189.96 cM; second with 10 linkage groups with 7–11 markers covering a distance of 3.52–56.51cM and the third one with one linkage group with four markers with a distance of 7.01 cM. Segregation distortion was observed for 218 of the 347 markers leading to smaller linkage groups with 1–2 markers and large linkage groups with wide marker intervals at 0.001 probability levels and these were excluded from the analysis.

Figure 2
figure 2

Location of QTLs for three qualitative and three quantitative traits based on combined results of SIM and CIM. The scale on the left side is the genetic distance in centimorgan (cM), marker designations are given on the left side with distance and marker names spanning on both sides of the linkage group. QTLs are shown at the right side in vertical bars with trait names in different colours for different traits. SW, black; ST, dark green; HSW, blue; FC, pink; PI, orange; TC, red. The maps were drawn by the Map Chart 2.2 program. The vertical bar shows the LOD support interval and the line LOD interval of the QTL.

Table 5 Number of polymorphic markers generated between the parents using 12 AFLP primer combinations.
Table 6 Summary of information on genetic linkage map.

QTL mapping

Mapping of both qualitative and quantitative traits were carried out using three different methods which showed increased levels of accuracy and reliability. Firstly, single marker analysis was done for all the seven traits. A statistic measure pr (F) was calculated which basically supports the model of single marker associations. Based on this, the extent of association of marker with the trait was determined (table 7). Secondly, simple interval mapping was done at LOD of 3.0 with a walk speed of 1.0 cM spanning the linkage groups. Unlike SMA, SIM depicted the hypothetical QTLs for the studied traits by testing the association of marker and the trait at multiple points with additive and dominance effects of genes/markers being considered for calculation of percentage of effective phenotypic variance. It is represented by the measure denoted as R 2. The summary of SIM for all the traits across different groups of phenotypic variance is detailed in table 8. One QTL was identified for 100-seed weight depicting a per cent phenotypic variance of 6.82% and five QTLs for qualitative traits with trichome colour showing the highest per cent of phenotypic variance (25.96%) explained by them. Thirdly, composite interval mapping was also done at LOD of 3.0 with a walk speed of 1.0 cM spanning the linkage groups. Like simple interval mapping, CIM depicted the putative QTLs including the effect of one or more background markers at multiple analysis points across each intermarker interval. This helps in one of the two ways, depending on whether the background marker and the target interval are linked. If they are not, inclusion of the background marker makes the analysis more sensitive to the presence of a QTL in the target interval. If they are linked, it may help to separate the target QTL from other linked QTLs on the far side of the background marker. Doing so, this method has resulted in more reliable QTL identification with good illustration of phenotypic variance.

Table 7 Overview of strength of single marker associations with the traits investigated.
Table 8 QTLs illustrating the per cent phenotypic variance of traits at different LOD thresholds based on SIM and CIM.

In this model again, additive and dominance effects of genes/markers are considered for calculation of percentage of effective phenotypic variance, R 2. The summary of CIM for all the traits across different groups which explains the phenotypic variation is detailed in table 8. About two QTLs were identified for the quantitative traits depicting a maximum phenotypic variance of 14.77% for seed thickness and six QTLs for qualitative traits with pod itchiness showing the highest per cent of phenotypic variance (21.12%) explained by them.

Generally, if we closely study both simple interval mapping and composite interval mapping results, we can conclude that CIM has identified QTLs with sharp peaks and some of the QTLs that were identified in interval mapping disappeared in CIM and some new QTLs were identified which were not identified in SIM. In our case, it was observed that CIM identified QTLs with comparatively lesser % R 2 recorded as against SIM; but the resolution of the peak was much sharper than SIM. Interestingly, both SIM and CIM identified three similar QTLs for pod traits with slightly varying %R 2 confirming the localization of QTLs for the same. Highest %R 2 of 25.96 and 25.79 were recorded for the QTLs identified for the traits trichome colour and pod itchiness, respectively.

The map positions for both qualitative and quantitative traits based on combined results of SIM and CIM are shown in figure 2.

Discussion

Little is done to develop genomic resources for underutilized plant species in general and M. pruriens in particular. Alone linkage map using US core collection (Capo-Chichi et al. 2004), published earlier, demonstrated prospects of good genome coverage for linkage studies with AFLP markers, in addition to segregation of pod colour and pod pubescence in F 2 population. Beyond this, there was no effort in the direction of trait-based mapping or QTL studies from this or any other work from any part of the world till date on this crop. Therefore, we present here the first report on the map positions for floral, pod and seed traits in M. pruriens with genome coverage analogous to earlier published map.

The F1 intraspecific hybrid as well F 2 segregating population showed surprising behaviour and thrown up a few variations which was not present in either parent in this study. This is plausible as the two genotypes used for developing mapping population belong to different botanical varieties, and consequently might have behaved analogous to interspecific hybrid owing to large genetic variations between them. Further, this must have resulted in transgressive type of segregation inducing novel phenotypes in F 2 individuals which is common observation in the progenies involving interspecific mating (Lotsy 1916; Brainerd 1924; Darlington and Mather 1949; Vega and Frey 1980). It is necessary, however, to raise different subsets of F 2 individuals from the same F 1 hybrid at multienvironments to conform this.

High percentage of segregation distortion (62.8%) was observed for the polymorphic markers between the parents of F 2 population. The fact that markers with segregation distortion are clustered around particular region in the linkage groups indicates that segregation distortion in the F 2 population is most likely caused by the genetic factors. At the same time even distribution across a few groups are also observed indicating statistical bias (Plomion et al. 1995). Moreover, since statistical bias likely emerging out of genotyping and/or scoring errors are eliminated beforehand in this study by analysing each marker by altering the position and distance between the adjacent markers of different linkage groups, and complete exclusion of such skewed markers from the analysis, these distortions are majorly attributed to genetic factors.

The linkage map realized in this work serves as workable genetic map for M. pruriens. It is necessary that such a map should have mean genetic distance of 20 and 10 cM between adjacent markers for mapping of major genes and QTLs, respectively (Xu and Zhu 1994; Fang et al. 2000). With the maximum marker interval of 8.61 cM and an average marker interval of 4.79 cM, the present map essentially meets these prerequisites. However, much is still awaited in terms of developing saturated map to facilitate detection of closely linked or functional markers for marker-assisted selection and map-based cloning of genes of interest.

In this study, relatively higher centimorgan distances were observed (in LG’s: 1, 5, 10, 12 and 13). Literature suggests that they might have caused due to higher, but <50.0%, total recombination percentages due to higher expected double crosses in addition to the single crossover events (Haldane 1919). Nonetheless, this remains to be verified. Also, the existence of minor linkage groups and several unlinked markers indicates that there are many large gaps with a few markers. This might be due to relatively limiting population size, coupled with low map saturation. However, they may merge into larger groups as and when more markers are assigned to map. Deployment of superior and codominant marker types such as SSRs, SNPs, besides additional AFLP markers generated from primer sets other than EcoRI/MseI will be valuable for this purpose.

All the seed traits and pod trichome showed expected segregation in F 2 population. But, the trichome colour did not segregate in Mendelian ratio. Also intermediates were observed for this trait in both F 1 the F 2 population implying the codominant nature of the genes involved. The range of variability both for trichome colour as well as its effect, itching intensity observed in the population along with the three QTLs identified suggests that the trait is polygenic. The two overlapping QTLs identified for these two traits suggest that they are also highly correlated. The trichomes in M. prurienscontain chemical substances, mucunain and serotonin, which are known to be responsible for itching, blisters and dermatitis caused by its contact (Sastry and Kavathekar 1990). The fact that the intensity of trichome colour is directly proportional to severity of pod itchiness points to some interactions between genes / biochemical pathways responsible for both the traits underlying these QTLs. Further investigations are needed on molecular and biochemical basis of mucunain and serotonin variations along with dense covering of linkage groups with genomewide markers to unveil the precise loci and/or genes linked to these traits.

In case of quantitative traits, highest correlation was observed between the seed traits SL and SW (0.62). Besides, good correlation was also observed between SL and HSW (0.586); SW and HSW (0.607) and ST and HSW (0.582). Correlation percentage of >50% observed between the seed size traits (seed length, width and thickness) with seed mass suggest high genetic relatedness between them implying their usefulness in breeding programmes.

In conclusion, this is the first report on the development of workable linkage map and identification of QTLs for floral, pod and seed characteristics in M. pruriens which form the basis for future genetic studies in this promising species. However, low genome coverage and use of dominant marker (AFLP) are the major limitations. Besides, variation among the F 2 plants which might have included environmental variation is also not accounted. Thus, further experimentations by growing different F 2 subsets at varying environments or developing RIL population besides saturation of linkage map with codominant markers such as SSRs and SNPs will reinforce the findings of the above work and help appropriate it for molecular breeding applications.