Keywords

8.1 Introduction—The Most Important Molecular Marker Types for Plant Science

Molecular markers can be considered to be landmarks in the genome. A molecular marker is either a DNA fragment or a DNA sequence associated with a specific genomic location. In the past, also protein markers have been used. For being useful, a molecular marker needs to show differences among the genotypes under investigation, by either sequence or fragment length. Subsequently, the most important marker types used in plant sciences are briefly described.

Restriction fragment length polymorphism (RFLP) markers were invented by AJ Jeffries in the late 1970s (Zagorski 2006) and were first applied in human genetics. In the late 1980, this technique was introduced to study genome architecture in major crops such as wheat (Sharp et al. 1989), tomato and potato (Bonierbale et al. 1988). RFLP marker detection involves digestion of genomic DNA with restriction enzymes, labeling of specific DNA fragments—usually with the radioactive isotope 32P—and then using the fragments one by one as a probe in Southern blot analyses (Williams 1989). RFLP markers are usually designed to detect both alleles in a heterozygous sample. These markers were a breakthrough for genetic fingerprinting, but the technique is time consuming, needs large amounts of high-quality DNA, and involves handling of a radioactive substance, making the analysis laborious and expensive. Therefore, this technology is not used anymore, but many published RFLP markers referring to important pest and disease resistances are still relevant to research and breeding.

The advent of polymerase chain reaction (PCR) opened the path for a number of different marker types. Random amplified polymorphic DNA (RAPD) markers rapidly have became popular, as they are easy and cheap to use on virtually any organism. RAPD applies combinations of short random primers in PCRs to amplify different genomic regions. The obtained DNA fragments are resolved according to their size by agarose or polyacrylamide electrophoresis. The banding patterns can vary among genotypes, resulting in potentially polymorphic markers. The drawbacks of RAPD markers are that they require relatively high-quality DNA, are dominant, and have low reproducibility. Consequently, they are difficult to be compared among experiments and among laboratories. In addition, the multiple bands produced in PCR by RAPD primer pairs make identification of alleles difficult. Converting RAPD markers to more robust sequence characterized amplified regions (SCAR) markers enhances reproducibility and allele identification (Bhagyawant 2016). For developing SCAR markers, polymorphic RAPD fragments are cloned and sequenced, and primers are designed to specifically amplify the polymorphic RAPD fragments. Generally, the primer pairs are made to amplify a single RAPD band (Paran and Michelmoore 1993). Polymorphism of SCAR markers is either scored as presence or absence of the amplified band, or as length polymorphisms in the case of co-dominant SCAR markers.

Amplified fragment length polymorphism (AFLP) uses genomic DNA digested with restriction enzymes that are ligated to adapters with known sequence (Vos et al. 1995). Primers complementary to the ligated adapters are used to amplify DNA fragments. Complexity reduction is achieved by adding one or a few specific bases at the 3′ end of the primers to amplify only a subset of the restriction fragments. Presence and absence of specific fragments are scored after separation of the fragments according to their size on a gel or on a fragment analyzer. DNA bands are visualized either through autoradiography or fluorescence methods. AFLP can produce a large number of markers and is at least partly amenable to automatization. Polymorphic bands can be converted to SCAR markers (see above). Today, AFLP markers are not broadly used any more.

Microsatellite markers (SSR marker) contain simple sequence repeats (SSRs) of 1–8 base pairs (Hamada and Kakunaga 1982). SSR motifs are hot spots for mutations, where DNA polymerase adds or eliminates one or more repeat units during DNA replication. SSR motifs can be present in coding and non-coding sequences. They can be amplified by PCR using flanking sequence-specific primers. Size polymorphisms of specific SSR fragments among different genotypes are scored after size fractionation by electrophoresis or on fragment analyzers. SSR markers are abundant in the genome and are generally co-dominant; however, the degree of their polymorphism varies among species and populations.

Single-nucleotide polymorphism (SNP) markers consist of single-nucleotide changes observed by comparing the DNA of different genotypes. They are very abundant in the genome, but their generation generally requires sequence information. As improvements in sequencing technology made sequence information readily available at low cost, SNP markers have become the marker species of choice.

Strictly speaking, markers like AFLP or RFLP that do not require sequence information also score (beside indels and structural variations) SNPs, but only if they are present at restriction enzyme cutting sites. Likewise, the Diversity Array Technology (DArT) also scores SNPs (and indels and structural variants) by testing for the presence or amount of a specific DNA restriction fragments in a representation derived from the total genomic DNA of different individuals or populations (Jaccoud et al. 2001). The DArT technology is fast and cost-effective, but produces dominant markers.

There are various methods to obtain SNP markers. Comparing genome or transcriptome sequences among individuals is a relatively simple method to produce SNP information, but cost-effective genotyping of these SNPs in a large population needs specialized technologies based on fluorescence detection in PCR format (Tapp et al. 2000; Semagn et al. 2014), or arrays (Elbasyoni et al. 2018). Smaller SNP sets can be analyzed in large populations by cleaved amplified polymorphic sequence (CAPS) markers (Thiel et al. 2004), high-resolution melting (Liew et al. 2004), mass spectroscopy (Storm and Darnhofer-Patel 2003), or other methods. Dependent of the number of loci to be genotyped, establishing the SNP resource for a population may be a costly investment. Multiplexing SNP assays is an often practiced way to improve cost efficiency (Fan et al. 2003).

SNP genotyping technologies such as genotyping by sequencing (Elshire et al. 2011) and the single-primer enrichment technology (SPET) became very popular. They are based on sequencing a fraction of the genome. Complexity reduction is achieved either by restriction enzyme digestion or by multiplex PCR amplification of target sequences. Both methods can be scaled to obtain a greater or lesser number of SNPs. DNA barcoding allows for pooling of many samples in one sequencing reaction, which leads to dramatic cost reductions (Elshire et al. 2011). GBS and SPET combine SNP discovery and genotyping in one step, but the reproducibility and precision of these methods are inferior to array-based genotyping (Elbasyoni et al. 2018). Both GBS and SPET are patented technologies (patent numbers WO2013009175A1 and US20130231253A1, respectively).

8.2 Molecular Markers in Plant Breeding

Plant breeding consists of crossing the best parents and subsequent identification and recovery of the progeny that outperforms the parents (Moose and Mumm 2008). Genetic gain is defined by (i) the phenotypic variation present in the breeding population, (ii) the probability that a trait phenotype will be transmitted from parents to offspring (heritability), (iii) the proportion of the population that is selected as parents for the next generation (selection intensity), and (iv) by the time necessary to complete a cycle of selection. All four key factors for genetic gain can be positively impacted by using molecular markers.

One factor affecting genetic gain is the available phenotypic variation in the population. Measuring the phenotypic diversity is affected by the environment. Exotic material may not be adapted to the selection environment and thus not show the real potential for breeding. Phenotypic diversity is positively associated with genetic diversity. Molecular markers are cheap and efficient tools to characterize genetic diversity in populations. They contribute to understand population structure and inform about the presence of heterotic groups in a germplasm set or in breeding populations and facilitate the exploitation of heterosis for producing hybrids and improved populations (Van Inghelandt et al. 2010; Barata and Carena 2006).

Heritability depends on the number of genes affecting a trait, the magnitude of their effects, and the type of gene action associated with the phenotype (Moose and Mumm 2008). Molecular marker technologies facilitate the definition of loci associated with a trait of interest. For traits with low heritability such as yield, molecular markers associated with loci influencing the trait often account for a greater proportion of additive genetic effects than the phenotype alone. Knowledge of the genetic architecture underlying the trait can be exploited to add or eliminate specific alleles that contribute to the breeding value. If linkage drag or epistasis among loci with antagonistic effects on a trait limits the genetic gain, information on loci associated with the traits can be used to break these undesirable allelic relationships (Moose and Mumm 2008).

Selection intensity in conventional plant breeding relies on phenotypic selection. Environmental variability, genotype by environment interaction, and evaluation errors add complexity to phenotypic selection. Multi-environment evaluation improves selection accuracy, but is time consuming and expensive. Some traits require destructive sampling or exposure of the population to diseases and pests, which affects the recovery of the desired genotypes. Pest and disease resistance screening in natural environments is particularly challenging, as it depends on the presence and activity of the pathogen (and its vector) or the pest. Molecular markers can make selection more precise and increase the selection intensity.

Some traits, including those associated with yield or stress resistances appear at late developmental stages and only can be measured on mature plants. Therefore, large testing populations have to be cultivated up to maturity for selection. Molecular markers associated with traits of interest can allow selecting for these traits at early stages, reducing the time and costs required for plant cultivation and testing. Markers linked to traits of interest make the selection environment independent, allow for selection in off season nurseries, and permit accommodating multiple selection rounds in a year, therefore shortening the time required for completing a selection cycle. The advantages of molecular marker-based selection have been realized by plant breeders, and the technology is now applied on a broad range of crops, including legumes (Varshney et al. 2018).

8.3 Molecular Markers of Mungbean—A Brief History

Before molecular markers became available, genetic studies relied on studying morphological traits, such as flower color in pea (Mendel et al. 1993). Morphological traits that are controlled by a single gene can be used as genetic markers, but their number is limited, and without progeny tests, it is impossible to distinguish heterozygous from homozygous individuals. With the advent of isoenzyme markers, first genetic maps were constructed (Mahmoud et al. 1984). Protein markers were developed based on differences in mobility of certain proteins among different accessions. Mobility differences of seed proteins were used for cultivar detection (Mohanty et al. 2011). Pattnaik and Kole (2002) found protein markers that were polymorphic between MYMV-resistant and susceptible genotypes. The phylogenetic relationship between Vigna species was also addressed with protein markers (Kole and Panigrahi 2001). The limited number of polymorphic protein markers favored the development of more polymorphic DNA-based markers. About one decade after the first RFLP marker studies on major crop species, the technology was applied also on “orphaned” crops such as mungbean, and in the early 1990s, the first reports using restriction fragment length polymorphic (RFLP) markers for mapping traits in mungbean were published (Fatokun et al. 1992; Young et al. 1992, 1993). A genetic map based on RFLPs comprising 171 loci on 14 linkage groups was produced (Menancio-Hautea et al. 1993). Reports on molecular marker in mungbean became more frequent with the advent of random amplified polymorphic DNA (RAPD) markers. They were used to assess genetic diversity in germplasm (Santalla et al. 1998) and in cultivars (Lakhanpaul et al. 2000) and to map resistance to the most important diseases and pests such as mungbean yellow mosaic disease (Selvi et al. 2006) and bruchid beetles (Chen et al. 2007). The more reproducible AFLP markers allowed refining diversity studies (Singh et al. 2013) and trait mapping (Chaitieng et al. 2002; Srinives et al. 2010), but none of the markers developed with this technology was reported to be used in breeding. Then, large numbers of SSR markers were assembled for mungbean from sequence data of various Vigna species (Somta et al. 2009), or were generated for mungbean genomic sequences (Tangphatsornruang et al. 2009) or transcriptome sequencing data (Gupta et al. 2014; Chen et al. 2015a). A genetic map resolving the 11 linkage groups of mungbean was constructed with 150 SSR markers (Kajonphol et al. 2017), and QTLs for resistances to Cercospora leaf spot (Chankaew et al. 2011), powdery mildew (Kasettranan et al. 2010), nutritional traits such as phytic acid content (Sompong et al. 2012), and domestication-related traits (Isemura et al. 2012) were mapped. SSR markers also were used to assess genetic diversity in a large germplasm collection to establish a mini-core collection (Schafleitner et al. 2015). Large-scale single-nucleotide polymorphic (SNP) marker detection was started in mungbean by transcriptome sequencing (Moe et al. 2011) and by comparing reads obtained by Illumina HiSeq sequencing of the genomes two mungbean cultivars (Van et al. 2013). Soon after, the whole genome sequence of mungbean cultivar VC1973A became available (Kang et al. 2014), paving the path for genotyping by sequencing approaches on this crop (Kang et al. 2014; Schafleitner et al. 2016). Current re-sequencing projects producing huge numbers of markers are likely to provide insight into genome re-arrangements in mungbean.

8.3.1 The First Molecular Markers for Mungbean Breeding: Markers Associated with Bruchid Resistance

The first application of molecular marker in mungbean was a study targeting bruchid-resistant loci. Promising bruchid resistance was discovered in wild mungbean Vigna radiata ssp. sublobata TC1966 (Fujii et al. 1989). At that time, the markers of choice were RFLPs. It was thought that RFPL may be a suitable marker system especially for crops with relatively small genome size such as mungbean to map genes and guide chromosome walking for gene cloning (Steinmetz et al. 1981). Young et al. (1992) analyzed 58 F2 progeny derived from V. radiata ssp. sublobata TC1966 and a susceptible V. radiata line with 153 RFLP markers and succeeded to define a RFLP marker 3.6 cM distant from the bruchid resistance locus. One F2 individual was identified carrying the bruchid resistance gene within a tightly linked double crossover. Such an individual would be highly valuable in developing resistant mungbean lines with reduced linkage drag. Later, Miyagi et al. (2004) succeeded to convert RFLP probes associated with bruchid resistance to PCR-based markers. They screened mungbean BAC libraries from resistant and susceptible lines with RFLP probes associated with bruchid resistance and identified SSR motives and sequence tagged sites (STS) on these BACs. This experiment yielded PCR-based markers STSbr1 and STSbr2 that co-segregated with an RFLP marker associated with bruchid resistance (Miyagi et al. 2004). STSbr1 was validated on Indian genotypes to be associated with bruchid resistance (Sarkar et al. 2011), while STSbr2 was associated with one of two bruchid-resistant loci in V. radiata V2709 (Sun et al. 2008).

In the 1990s, the low-cost and easy-to-use RAPD markers were adopted for mungbean. Conversion of these markers to more robust SCAR markers made this marker type more useful. Chen et al. (2007) identified ten RAPD markers to be associated with bruchid resistance in a bulked segregant analysis in recombinant inbred lines (RILs) derived from a cross between the bruchid-resistant V. radiata ssp. sublobata line TC1966 and the mungbean yellow mosaic disease-resistant V. radiata elite cultivar NM92 bruchid. Three pools were established: One pool consisted of 22 bruchid-resistant F12 RILs (0% infestation), and two pools were made of 20 susceptible RILs with 80–90% damage and 18 susceptible RILs of 90–100% damage. From the ten RAPD markers found to be associated with bruchid resistance in this experiment, the four most closely linked ones were cloned, sequenced, and transformed to SCAR and cleaved amplified polymorphism (CAP) markers. The CAP fragment derived from RAPD marker OPW02a4 was mapped to a location around position 6 mega base on chromosome 5, after the mungbean reference sequence (Kang et al. 2014) became available. QTL analysis using a mix of RAPD, SCAR, CAP, AFLP, and SSR markers (in total 489 markers) in the same population mapped bruchid resistance to linkage groups 7 and 9. Linkage group 9 was tagged with marker DMB158, which later was mapped to chromosome 5 (Schafleitner et al. 2016). QTL mapping using more than 9000 SNPs in population V. radiata ssp. sublobata TC1966 x V. radiata NM94 and more than 6000 SNPs in the cross of the two V. radiata lines V2802 x NM94 corroborated the presence of a bruchid resistance locus on chromosome 5 in both resistant lines, TC1966 and V2802. The markers are currently used in pyramiding bruchid resistance with disease resistances and good agronomic performance in mungbean breeding lines (Ramakrishnan Nair, personal communication).

The example bruchid resistance shows that molecular markers are a suitable tool for mungbean crop improvement. However, specifically for the trait bruchid resistance, evolution of marker technologies and access to large numbers of markers did not significantly improve the localization of the major resistance gene. The first RFLP markers associated with this resistance were mapped to a similar location like SSR and SNP markers in more recent experiments. The SSR marker DMB158 mapped nearest to the bruchid resistance locus (Chotechung et al. 2011; Chen et al. 2013). Fine-mapping the bruchid resistance locus confirmed the localization of this marker on chromosome 5 and resulted in two candidate genes VrPGIP1 and VrPGIP2 conferring the resistance (Chotechung et al. 2016; Kaewwongwal et al. 2017). The SNP markers linked to the resistance gene are not better than the SSR marker. This demonstrates that the trait, the biological material, and the assay conditions are by far more important for mapping traits than the marker system. Major resistance genes can be tagged also with simple methods, as long as the phenotyping data are sound. Including more markers and using more modern marker technologies do not necessarily improve the mapping result.

Bruchid resistance markers obtained by genotyping by sequencing were mapped to a reference sequence. The order of these markers on genetic maps was different to the order suggested by mapping the SNPs to the reference sequence (Schafleitner et al. 2016). This may be due to translocations, which caused differences in marker order in experimental populations compared to the reference sequence. Structural variations are important sources for phenotypic diversity. They are defined as genomic variations that involve segments of DNA larger than 1 kb in length and consist of insertions, deletions, inversions, translocations, and copy number variations (Feuk et al. 2006). Genotyping with markers do not capture all structural variations (Springer et al. 2011). Whole genome re-sequencing is likely to improve the knowledge about structural variations.

8.3.2 Markers for Diversity Analysis in Mungbean

Mungbean is an autogamous (cryptogamous) species. Current cultivars have a narrow genetic base, because only a limited number of genotypes were used for breeding (Kumar et al. 2003). Breeding improved varieties therefore needs access to new genetic diversity. Due to the replacement or disappearance of wild relatives and local cultivars, alleles that could be of high interest for future breeding are continuously lost. Therefore, germplasm collections of mungbean landraces and wild relatives are an important reservoir to source new genetic diversity for breeding. The genetic diversity and population structure of germplasm accessions in gene banks need to be characterized to improve management of the collections by identifying redundant accessions, produce germplasm subsets with certain properties, and identify genotypes of interest for breeding (de Vicente et al. 2006).

Before the advent of molecular markers, diversity analysis depended on geographic information on the site of origin of an accession, pedigree data, morphologic or agronomic traits, or on biochemical data. Geographic origin together with morphological traits, discrete ones like bean color, or continuous ones like plant height or seed size, has been used to classify mungbean germplasm and analyze the diversity of collections (Bisht et al. 1998). The joint analysis of discrete and continuous variables has higher potential than analysis of either discrete or continuous data alone (Gonçalves et al. 2008). However, morphological data are susceptible to environmental variability. Measuring morphological traits in large collections usually is done over several seasons, bearing the risk that environmental variability is causing variation of traits and subsequent errors in diversity analyses. In addition, morphological differences usually are determined by a small number of genes and may not be representative for the genetic diversity of the entire genome (Carroll 2008). In contrast, DNA markers are likely to reveal most accurately the genetic relationship among genotypes (reviewed by Crawford 1990). Marker genotypes are environment independent, and they are stable over different developmental stages of the plants. Small samples of plant tissue are sufficient for genotyping, and it is not necessary to grow plants to maturity, as it is required for morphological characterization, making genotyping a cheap option for diversity characterization. DNA markers likely provide information on homologous loci among genotypes, while morphological characteristics may be under the control of multiple genes, masking allelic relationships. DNA markers also are by far more abundant than morphological markers, increasing the power to discriminate between genotypes. Finally, scoring DNA markers is generally easier that measuring morphological parameters. Modern marker technologies also are amenable to automatization, further facilitating the approach.

Several marker technologies have been used to characterize mungbean germplasm. Santalla et al. (1998) have used RAPD markers to analyze genetic diversity in a small panel consisting of mungbean germplasm and three individuals of other Vigna species. Sixty random decamer primers were tested, and 28 pairs revealed being informative. The resulting phylogenetic tree showed three main clusters, which included V. radiata landraces, Vigna mungo, and Vigna luteola, respectively. Studies on the genetic diversity of Indian mungbean cultivars were also performed with RAPD markers (Lakhanpaul et al. 2000; Datta et al. 2012). These studies had in common that a relatively large number of primer combinations had to be tested and only about the half of the combinations yielded useful RAPD patterns for diversity analysis. RAPD markers can be readily applied on any organism, without previous sequence information. RAPD markers are generally abundant and evenly distributed over the genome. The main weakness of RAPD markers is their low reproducibility (Schierwater and Ender 1993). Hence, these markers are difficult to be used across laboratories and experiments. The scoring of the bands can be complex and is subject to different interpretation when analyzed by different persons. High-quality DNA is critical for these assays, adding costs to the experiment. All together, these properties make RAPD markers a poor tool to analyze genetic diversity in large genebank collections.

Chattopadhyay et al. (2005) applied a combination of RAPD and inter-simple sequence repeats (ISSR) markers to study genetic diversity in selected mungbean genotypes. ISSRs are regions in the genome flanked by microsatellite sequences. These regions are amplified in PCRs using a primer that contains a microsatellite motif at the 3′ end (Gupta et al. 1994). ISSR markers do not need any previous sequence information, are easy to use, and cause low costs. But ISSRs, like RAPD, may be affected by low reproducibility, and the obtained multiple bands may be derived from non-homologous loci and difficult to analyze.

A few reports describe the use of AFLP markers in mungbean diversity analysis (Bhat et al. 2005; Singh et al. 2013). Singh et al. (2013) compared phylograms obtained with ISSR and AFLP markers. They found that AFLP markers were more efficient than the ISSR in assessing genetic diversity, as they yielded more polymorphic markers than ISSR. The comparison of the Jaccard similarity matrices obtained with both marker systems showed only low correlation, and the clustering of genotypes within groups was not similar when AFLP- and ISSR-derived dendrograms were compared. It was hypothesized that the two marker technologies targeted different genomic regions and yielded different numbers of markers, which led to the different phylogenetic clustering of the accessions when the two methods were used. Advice for designing an ISSR experiment and recommendations on using ISSR markers in genetic variation studies has been disclosed in Ng and Tan (2015). But in general, the easy-to-use SSR and SNP markers have widely replaced other marker systems, including ISSR.

Most diversity studies in mungbean have been accomplished with microsatellite (SSR) markers. These co-dominant markers are abundant in the genome, are easy and cheap to use, and are amenable to multiplexing and automatization (Hayden et al. 2008). Originally, SSR markers were developed from repeat-enriched libraries (Edwards et al. 1996), a labor intense technology. But with readily available sequence information from transcriptomes and genomes, microsatellites became much easier to access (Chen et al. 2015a). Specialized software tools to mine DNA sequences for microsatellite motifs and design primers to amplify microsatellite loci are widely available (da Maia et al. 2008; Wang and Wang 2016). Still, microsatellite markers need to be well chosen to obtain allelic bands in genotyping. Backward and forward mutations (homoplasy) may occur at microsatellite loci and cause underestimation of the genetic diversity (Spooner et al. 2005). In mungbean, SSR markers have been developed using the 5′-anchored polymerase chain reaction technique (Kumar et al. 2002), from genome shotgun sequences (Tangphatsornruang et al. 2009), transcriptome sequences of V. radiata (Chen et al. 2015a; Gupta et al. 2014), or have been transferred from other Vigna species (Isemura et al. 2012).

The first comprehensive study on mungbean diversity used a set of 19 SSR markers derived from adzuki bean (Vigna angularis) on 615 cultivated and wild mungbean accessions (Sangiri et al. 2008). The marker set was selected based on the marker location in the adzuki bean genome to contain at least one marker per linkage group. More alleles were detected in wild than in cultivated accessions, illustrating the lower diversity in the cultivated germplasm set. The study revealed that Australia and New Guinea represent a distinct center of diversity for wild mungbean, while cultivated mungbean has greatest diversity in South Asia. Soon after, the diversity and population structure of mungbean were analyzed with 15 different SSR markers in 692 mungbean accessions held by the National Agrobiodiversity Center of the Rural Development Administration, Korea. Mungbean germplasm obtained from 27 countries was grouped into seven phylogenetic clades and into two distinct genetic groups (Gwag et al. 2010). In total, 157 mungbean germplasm accessions were genotyped with EST-SSRs (Chen et al. 2015b).

A combination of morphological data and microsatellite markers was used to define a 300 accession mini-core collection that represents a large proportion of the overall diversity of the whole World Vegetable Center mungbean collection of more than 6700 accessions (Schafleitner et al. 2015). In the first step, geographic stratification was performed, and by cluster analysis of eight phenotypic descriptors, a phylogenetic tree was produced. From this tree, 20% of the accessions were randomly selected from each cluster as a core collection containing about 1400 genotypes. The core collection was subsequently genotyped with 20 microsatellite markers, and a mini-core set was selected to represent all detected 122 alleles (Schafleitner et al. 2015). The collection was small enough to be submitted to multilocation trials in various regions in Asia and Africa to discover new traits for mungbean breeding, and it is expected that it is large enough to map traits in genome-wide association studies.

Other marker types such as single-strand confirmation polymorphism, cleaved amplified polymorphic sequence, and SCAR markers that were used for diversity analysis in other crops (Spooner et al. 2005) were not reported for similar works in mungbean, while single-nucleotide polymorphic markers (SNPs) were applied for analyzing several germplasm collections.

The first SNPs for mungbean were reported for pairs of mungbean lines by Moe et al. (2011) from transcriptome sequences, followed by Van et al. (2013) from shotgun Illumina sequences. Availability of the mungbean whole genome sequence (Kang et al. 2014) strongly improved the access to SNPs for this species. Re-sequencing of selected lines yielded large numbers of SNPs (Liu et al. 2016), and genotyping by sequencing (Elshire et al. 2011) was applied to mungbean populations (Kang et al. 2014; Schafleitner et al. 2016). Genotyping of germplasm accessions of mungbean with SNPs was done on the USDA mungbean collection, the Australian mungbean mungbean diversity panel (Noble et al. 2018), on the World Vegetable Center mini-core (Breria et al. 2019).

The SNP-based diversity analysis of 94 cultivated mungbean genotypes from the USDA collection originating from 27 countries was done using a small set of SNP markers (Islam and Blair 2018). From a total of 42 known SNPs (Van et al. 2013), 18 were successfully converted to polymorphic KASP markers. The population could be divided in two subpopulations and one admixture group.

The Australian diversity panel was submitted to GBS. The germplasm set consisted of 466 cultivated and 16 wild accessions. In total, more than 22,000 polymorphic genome-wide SNPs were identified and used to analyze the genetic diversity, population structure, and linkage disequilibrium (Noble et al. 2018). As expected, polymorphism was lower in the cultivated than in the wild accessions. Linkage disequilibrium decay amounted to about 100 kb in cultivated lines and about 60 kb in wild mungbean. Structure analysis identified four distinct subgroups, which broadly corresponded to geographic origin and seed characteristics (Noble et al. 2018).

Genotyping using GBS of the World Vegetable Center mini-core produced more than 24,000 markers for a germplasm panel consisting of V. radiata and V. mungo and 8000 polymorphic markers for 296 V. radiata accessions. From this set, 5447 polymorphic SNPs were used for germplasm characterization and structure analysis, identifying two major populations, one of them falling into three subpopulations, in the World Vegetable Center germplasm set. The mini-core and the genotyping data are currently used to map a number of morpho-agronomic traits.

8.3.3 Molecular Marker for Cultivar Identification and Hybridity Tests

Molecular fingerprinting of varieties and determining purity of seed is a component of quality seed production. Testing seed purity with molecular markers is common for many crops and is considered to be quicker and more cost-effective than grow-out tests (Yashitola et al. 2002). This, however, may not be true for all cases. For example, much of the mungbean seed production and much of its growing area are located in developing countries where wages are low and where there is little access to infrastructure for low-cost genotyping. Therefore, grow-out tests may be still cheaper than genotyping for seed quality assessment. Ali et al. (2010) reported seed quality assessment of Bangladeshi mungbean varieties based on quantifying other seed than mungbean and inert matter in seed lots, seed moisture content, 1000 seed weight and germination tests. Molecular markers have been applied to produce molecular fingerprints of varieties (Tantasawat et al. 2010; Lestari et al. 2014; Reflinur et al. 2017), but reports on systematic use of markers for seed quality monitoring for mungbean are not available.

Monitoring the success of crosses by hybridity tests with molecular markers is a common practice (Solanki et al. 2010). In mungbean, SSR markers are being used to monitor crosses between mungbean germplasm and breeding lines, as well as in wide crosses between cultivated mungbean and wild relatives (Ramakrishnan Nair, personal communication). One or a few polymorphic SSR markers that are generally easy to define and cheap to apply are sufficient for this task.

8.3.4 Developing Markers Linked to Traits of Interest

Disease-resistant cultivars are the cheapest, simplest, and most environmentally safe way to manage disease. Likewise, improving abiotic stress tolerance of crops can stabilize yields and prevent crop failure. Disease resistance and abiotic stress tolerance are often sourced from landraces and wild relatives. Introgression of biotic and abiotic stress-tolerant traits from unadapted material into elite cultivars is a frequent breeding task. As outlined above, using molecular markers can improve introgression of these traits into elite breeding material.

Developing markers associated with traits of interest include the following steps:

  1. (1)

    Establish the genetic resources for trait mapping, for example, a mapping population or a germplasm panel segregating for the trait of interest

  2. (2)

    Phenotype the population and generate trait value data, e.g., on resistance or susceptibility to a pathogen or pest, or on tolerance to an abiotic stress factor

  3. (3)

    Develop markers and genotype the experimental population

  4. (4)

    Associate phenotypes to specific marker genotypes using appropriate statistical methods

  5. (5)

    Validate the candidate markers in different genetic backgrounds and produce user-friendly markers for marker-assisted selection.

Once the genetic resources are phenotyped, a marker system has to be chosen to genotype the population or germplasm panel. Today, the most popular markers are SNPs. Genotyping by sequencing has been successfully used in mungbean to generate a large number of SNP markers for bi-parental populations and germplasm panels (Schafleitner et al. 2016; Noble et al. 2018). Ongoing whole genome re-sequencing efforts benefit from the available whole genome reference sequence of mungbean (Kang et al. 2014) and are likely to provide large number of SNPs for this species.

Several methods are available to associate phenotypic traits with genotypes. Bulked segregant analysis uses bulked DNA samples generated from individuals of a segregating population from a single cross (Michelmore et al. 1991). Each bulk contains DNA from individuals that are identical for a particular trait such as disease and pest resistance or susceptibility, but are arbitrary at all unlinked regions. The two bulks are therefore genetically dissimilar in the selected region but seemingly heterozygous at all other regions. The bulks are screened for genetic differences using suitable markers to identify loci that have contrasting alleles at homozygote state in the two bulks. Bulked segregant analysis is a rapid and simple method to determine association of markers to single gene or oligogenic traits, but is generally not suitable for multigenic traits. On mungbean, bulked segregant analysis has been used to map bruchid resistance (Cheng et al. 2005, 2007; Sun et al. 2008), Mungbean yellow mosaic disease resistance (Selvi et al. 2006; Dhole and Reddy 2013; Karthikeyan et al. 2012), and iron deficiency tolerance (Toojinda et al. 2001). Markers associated with the respective traits were obtained, but no application of these markers in breeding was reported.

Quantitative traits typically are tagged by QTL analysis (Liu 2017). For this task, first, the molecular markers are mapped, on either genetic or physical maps. Then, associations between the trait(s) of interest and the marker genotypes are tested using statistical methods. A large number of reports describe QTL studies on disease resistances, quality, and domestication traits in mungbean. The first reported QTL study on mungbean found associations between seed weight and RFLP marker genotypes (Fatokun et al. 1992). Humphry et al. (2005) investigated the relationships between hard-seededness and seed weight to support breeding of hard- and large-seeded genotypes. A large number of QTL analyses followed, targeting a wide range of morphological, agronomical, and nutritional traits. Disease and pest resistances were probably the traits that were most frequently targeted by such analyses. Results of many of these studies are summarized in the chapter “Genomic Approaches to Biotic Stresses” by Laosatit et al. in this book.

Genome-wide association studies (GWAS) are a quantitative method to test whether a genomic variant (marker genotype) is associated with a trait of interest using a germplasm panel as experimental population. It assumes that a specific property such as disease resistance, abiotic stress tolerance, or a nutritional trait shared by a subset of the germplasm panel is reflected by a specific marker genotype also shared by these individuals. The markers have to be in linkage disequilibrium with the genes conferring the trait. In comparison to QTL studies on bi-parental populations, GWAS have the advantage to work on germplasm panels and do not need specific mapping populations. Therefore, GWAS can analyze the function of all alleles and haplotypes present in the germplasm set under investigation, while QTL studies on bi-parental populations only take into account the alleles present in the mapping parents. The resolution of GWAS is generally higher than that of QTL analyses in bi-parental populations. Resolution depends on the number of recombination events that separate the investigated genotypes from each other. Bi-parental populations generally have undergone only a low number of recombination events until analysis, while germplasm panels have a long history of evaluation and therefore individuals are usually separated from each other by many recombination events. One of the major drawbacks of GWAS compared to QTL mapping in bi-parental populations is that population structure influences the outcome of the study, but inclusion of population structure into the GWAS model tries to mitigate this effect. Furthermore, GWAS generally requires a larger number of markers than mapping in bi-parental populations. The marker number needed depends on the linkage disequilibrium decay distance in the germplasm panel and is specific for the species and kind of population under investigation. However, modern genotyping technologies provide easy access to large numbers of molecular markers helping to overcome this drawback.

Up to date, only a few genome-wide association studies were undertaken in mungbean. This is probably due to the lack of high-quality phenotyping data for densely genotyped germplasm panels. A pilot GWAS on seed color in the Australian diversity panel revealed five genomic regions associated with this trait (Noble et al. 2018). Ongoing phenotyping efforts for mungbean diversity panels will likely lead to a broader application of GWAS to identify marker trait associations. As the resolution of GWAS often goes down to the gene level (Liu and Yan 2019), these studies not only give markers associated with a trait of interest. They are likely to improve the knowledge on the genetic basis of traits including the causative genes or alleles and their interactions (Hansen 2006).

The bottleneck for successful GWAS are, as mentioned above, high-quality phenotypic data, the complexity of the trait of interest, and the size of the germplasm panel used for determining marker trait associations. Even in associations that are statistically highly significant, false-positive associations may still occur. The large number of statistical inferences, inaccurate genotyping, and too small population sizes make the results prone to errors (Liu and Yan 2019). Like in QTL studies in bi-parental populations, the loci found to be associated in these studies need to be thoroughly validated before drawing conclusions on gene functions or using any of the markers apparently linked to a trait of interest in breeding. Candidate genes and alleles that are found association also in a different population can be assumed to be more likely linked with the trait of interest. In addition, candidate gene knock-out or overexpression studies are suitable methods for validation.

Both QTL analysis and GWAS are appropriate to tag loci conferring a trait with markers, but both approaches are poor tools to analyze complex traits, where a large number of loci contribute to a trait, such as yield and many abiotic stress tolerances (Heffner et al. 2009). Genomic selection became a powerful tool to use molecular markers for selection without associating specific markers to traits. Instead, phenotyping data and high-density genotyping information are used to calculate genomic breeding values for individuals (Heffner et al. 2009). Like that, 1000s of loci along the whole genome are included in the analysis, which reflects the contribution of 1000s of genes to complex traits. The method has been applied in animal breeding with great success (Hayes et al. 2009), is becoming more and more used in plant breeding (Voss-Fels et al. 2018), and has potential also in legume breeding (Mousavi-Derazmahalleh et al. 2019).

8.3.5 Marker-Assisted Selection

In section “Molecular Markers in Plant Breeding,” some advantages were highlighted for selecting based on molecular markers that are tightly linked to traits instead of using trait values directly. The main advantages of marker-assisted selection are:

  1. (1)

    MAS makes selection for traits that are difficult to measure easier

  2. (2)

    It allows for selection of traits that are expressed during late developmental stages already at the seedling stage

  3. (3)

    It eliminates environmental variability from the selection

  4. (4)

    It makes selection of disease-resistant or abiotic stress-tolerant individuals independent of the presence of the biotic or abiotic stimuli (pathogens, pests, vectors, heat, etc.) required for selection

  5. (5)

    It helps maintaining recessive alleles during backcrossing

  6. (6)

    It facilitates pyramiding multiple traits, especially pyramiding multiple loci for the same trait.

Marker-assisted selection is designed to maintain introgressed loci in the population, while marker-assisted backcrossing (MABC) helps introgressing loci generally from unadapted material into an elite background. The introgressed fragment, in addition to the target gene, may contain genes that reduce the agronomic performance of the line. This effect is called linkage drag. Therefore, fragments that are as small as possible and contain as little genetic material from the donor line in addition to the target gene are preferred. Marker-assisted selection with markers can help to reduce the linkage drag and accelerate the reestablishment of the recurrent parent. Overall, the efficiency of MABC depends on the kind of the introgressed gene, the recurrent and the donor parent and the population size (Frisch and Melchinger 2005).

MABC combines foreground selection with markers associated with the trait of interest with background selection with markers that pinpoint offspring with maximal recovery of the recurrent parent genotype. The foreground selection monitors presence of the introgressed fragment in the progeny. Marker-assisted foreground selection with co-dominant markers such as SSRs or SNPs that are tightly associated with the trait of interest is particularly practical for traits that are not expressed at the heterozygote stage or are difficult to score. Introgression means that a double recombination occurring on both sides of a target locus has to occur. This event can best be monitored with a marker pair tightly flanking the target gene, and not with a single linked marker. Literature recommends markers to flank the target gene in a distance of maximally 5 cM, but new marker technologies that allow for much greater marker densities enable the choice of more tightly linked marker, to further reduce linkage drag.

Recombinant selection in foreground selection involves identifying backcross progeny with recombination events as near as possible to the target locus, to reduce the size of the donor chromosome segment containing the target locus and reduce linkage drag. As selection is applied on the target locus, there will be less recombination around the donor fragment than for unlinked regions (Hospital et al. Hospital 2001). As double crossover events occurring on both sides of the introgressed fragment are rare, the donor segment can remain very large, even with many backcross generations. The population size for backcrossing has to be adjusted to the crossover probability. The probability of a double crossover can be calculated from the product of the probabilities of a single recombination on both sides of the introgressed fragment. But for close markers, the probability of double crossovers is much lower than the probabilities for single crossover combinations (Young and Tanksley 1989). Producing a very large number of backcross plants would be necessary to achieve recombination on both sides of the gene in one cross. However, instead of working with a very large population, it is advantageous to select in the first backcross generation a single recombinant on one side and then selecting the recombinant on the other side in a second backcross generation (Young and Tanksley 1989). In summary, the distances between the flanking markers and the target gene, the population size during backcrossing, and the number of backcrosses are critical for reducing linkage drag (Hospital 2001).

Background selection in MABC involves selecting backcross progeny with the greatest proportion of the recovered recurrent parent genome using markers that are unlinked to the target locus and can be used to select against the donor genome. Background selection aims at accelerating the recovery of the recurrent genome. Without markers, the reconstitution of the recurrent phenotype, at least to 97%, can be accomplished within four backcross generations (Frisch et al. 1999), but selection for the introgressed trait affects the recovery and increases the required backcrosses by at least one cycle (Frisch et al. 1999). It was proposed to start with a large backcross population to increase the chance to identify an individual that has recovered the recurrent parent genome to an extent as large as possible, and to reduce the population size for the next generations. Simulation studies estimating the trade-offs between population size, MABC efficiency and costs are available and suggest steps to optimize MABC (Ribaut et al. 2002).

For mungbean, marker-assisted backcrossing efforts were not yet reported, but availability of markers associated with important traits makes it likely that this technology will be used also in mungbean improvement.

8.3.6 Pyramiding Multiple Traits in Breeding Lines

Pyramiding is the process of combining several genes together in a single genotype. Conventional breeding also applies gene pyramiding, but usually it is laborious and time consuming to check the results of this approach by phenotypic tests. For example, to improve agronomic properties, breeders combine multiple disease resistances in elite lines. Checking resistance to multiple diseases is laborious and requires multiple testing environments. The efficiency of this process can be enhanced by marker-assisted selection. To increase the durability of disease resistances, breeders pyramid various resistance genes from different sources (Hanson et al. 2016). Using conventional phenotypic selection, identification of stacked resistance genes is only possible when pathogen races that can detect specific resistance genes are available. In contrast, molecular markers greatly facilitate gene pyramiding, as they can be designed to be specific for each single resistance gene. Early selection by molecular markers also helps to keep the breeding populations small during gene pyramiding. However, in mungbean, no marker-assisted gene pyramiding efforts were reported so far.

8.3.7 Genomic Selection

Access to marker resources open up new methods for selecting favorable genotypes, if sufficient phenotypic data of the organism under investigation are available. As outlined in section “Developing markers linked to traits of interest,” genomic selection is taking up momentum in crop breeding. It is thought that genomic selection is particularly advantageous for selecting favorable genotypes for complex, multigenic traits. However, the technology requires datasets from different environments and over a number of generations. These sets are not yet available for mungbean, so it will still take some time that this technology can be applied on this crop.

8.3.8 Constraints to Successful Marker-Assisted Selection

Great investments in marker-assisted selection, primary in the private sector, have resulted in several improved varieties for a range of crops including cereals, oil seed crops, cotton, legumes, and vegetables. Naturally, for minor crops such as mungbean, there has been much less investment in generating breeding tools, including genomic resources for breeding. But the available whole genome sequence for mungbean, germplasm panels displaying the diversity of the crop and coordinated breeding activities such as the Australian Centre for International Agricultural Research (ACIAR)-funded International Mungbean Breeding Network (https://www.aciar.gov.au/project/CIM/2014/079) make marker-assisted breeding also accessible for mungbean. Especially, disease and pest resistances are likely to be tackled by marker-assisted breeding in the very near future. Marker-assisted breeding for complex traits such as abiotic stress tolerance in mungbean probably will take longer, as it requires putting in place the phenotypic datasets to make use of molecular breeding for complex traits.

Cost savings compared to classical breeding are often mentioned as advantage of marker-assisted selection. Nevertheless, for some breeding programs, the investment required for marker-assisted selection may still be an issue. There are some early studies reporting several cases where marker-assisted selection was less cost-effective than phenotypic selection (Bohn et al. 2001; Dreher et al. 2003). In the meanwhile, the costs for genotyping have dropped, but the investment for molecular breeding may still be relatively high for small programs working on minor crops. Especially in developing countries, breeder may not have easy access to cost-effective genotyping infrastructure, and low labor costs may make field evaluations cheaper than laboratory work that requires relatively expensive consumables. However, the accelerated release of an improved crop variety may translate into more rapid profits. Therefore, if the additional income generated by improved varieties along the mungbean value chain over time is considered, calculations analyzing the costs and benefit of marker-assisted selection in plant breeding will most probably show that this technique is meaningful, also on mungbean.

Lack of access to molecular markers does not limit marker-assisted breeding in mungbean anymore, as technologies to obtain large numbers of markers at low cost are available. High-quality phenotypic data are being produced and expertise to combine phenotypic and genotypic datasets is available in mungbean breeding teams. Therefore, the first improved mungbean varieties produced by marker-assisted selection are in sight.