Keywords

1.1 Introduction

1.1.1 Economic Importance of the Crop

Soybean [Glycine max (L.) Merr] is an important oilseed crop in the world and serves as a major source of protein and oil for both humans and animals. Soybean forms a raw material for several human health and industrial applications. Besides the edible oil l (18–22%), the seed comprising around 38–45% of protein, and ash, carbohydrate minerals along with antioxidants are major component with potential nutraceutical applications for human health. Hence, soybean has been gaining wide attention in various industries such as food, feed wellness and pharmaceuticals which are attributable to its unique components of minerals, isoflavones, tocopherols etc. Ecologically soybean is involved in biological nitrogen fixation hence improves the soil fertility. Considering its diverse uses the crop is aptly named “miracle bean”. Although the crop is cultivated globally, the United States of America, Brazil, Argentina, China and India are major global producers. Also considering the multiple sectors the crop serves as raw materials, sustainable soybean production is imperative for ensuring global food security.

1.1.2 Reduction in Yield and Quality Due to Abiotic Stresses

Multiple abiotic stressors such as drought, elevated temperature, freezing conditions, floods, soil salinity, acidity and the consequent mineral toxicity or nutrient deficiency are some of the challenges soybean production encounters worldwide. Further, the anticipated frequent extreme in weather conditions due to global climate change is another serious concern for sustainable soybean production. It has been observed that millions of acres of soybean crop loss occur every year due to multiple abiotic factors. Crop loss due to various abiotic stresses demands developing strategies to increase soybean yield or maintain yield stability under multitude of abiotic stresses. Therefore, genomic design of soybean for climate resilience and sustainable production with higher yield potential and nutritional value is mandatory.

1.1.3 Growing Importance in the Face of Climate Change and Increasing Population

The multitude of biochemical characteristics and good quality oil makes soybean a desired oil seed crop and rising its demand worldwide. Nevertheless, the requirement of doubled food production by the end of the year 2050 owing to population explosion will severely squeeze the sufficient production of oil seed crops even more so in the context of changing climatic conditions (Deshmukh et al. 2014). Climate change and extreme weather conditions negatively impact the crop yield while temperature, precipitation, and solar radiation are the main drivers of crop growth and development. Therefore, the breeders are entrusted to provide emphasis on the development of not only high yielding and nutritionally superior soybean genotypes but also the genotypes which are expected to tolerate extreme weather conditions.

1.1.4 Limitations of Traditional Breeding and Rational of Genome Designing

The conventional plant breeding strategies such as single pod descent, backcross breeding, pedigree breeding and bulk population breeding have undoubtedly contributed to the improved soybean yield and tolerance to various abiotic stresses. Nevertheless, these strategies are time consuming and warrant screening of huge plant population that consumes land, labour and water resources. Moreover the breeding for complex traits that are governed by multiple genes are severely influenced by the environment. Further, the complexity of multiple abiotic stresses affecting the standing crop due to climate change have instilled a sense of new urgency into accelerating the rates of genetic gain in molecular breeding programs. Hence, regardless of the conventional breeding efforts, it is imperative to integrate the genome designing based breeding approaches to enhance the production potential of the soybean. To facilitate the advances in soybean breeding, it is indispensable to exploit the molecular breeding techniques such as marker-assisted breeding, recombinant DNA technology, genome editing and multiple “omics” to improve the soybean quality and yield. Hence, these limitations of traditional breeding strategies warrant the large-scale application of genomics science in the improvement of soybean for abiotic stresses.

1.2 Abiotic Stresses and Related Traits in Soybean

1.2.1 Root Characters

Considering the water-deficit stress or flooding stress due to climate change, characteristic features of soybean roots are important to tide over the abiotic stresses. The observed root architecture traits of soybean have revealed that narrow root angle to the soil surface is preferred as it enhances development of lateral roots in the upper root regions where penetration of sunlight is ample. Other root traits such as number of forks, crossings are imperative for good soil penetration, coupled with appreciable root length density (RLD) due to enhanced root surface and root volume. Root characteristic features are important for absorption of soil moisture during stress conditions. Nonetheless deeper soybean roots have not yielded desired results when the soil is shallow or no water at depth or during the conditions of mild water stress (Vadez et al. 2015).

1.2.2 Drought Tolerance

An estimate states that around 40% reduction in soybean production worldwide is due to decrease in water supply and it is also anticipated that such losses would further aggravate due to frequent droughts and water shortages under the scenario of future climate change. Enhancing the irrigation potential is not a viable approach considering the poor resource conditions of the many of the developing countries. This scenario warrants the development of drought-tolerant varieties as an important research urgency. Drought in soybean reduces the economic yield levels by 40% (Specht et al. 1999), however, depending upon the intensity of water-deficit stress and the stage of occurrence, yield losses could be as high as 80%. Phenotyping for drought resistance assumed significance in this context, wherein physiological and biochemical aspects of dehydration avoidance and dehydration tolerance are measured. Breeding for drought tolerance depends on phenotyping methods which are reliable, relatively fast and economical. Generally, the measure of dehydration avoidance involves investigating plant water status, in terms of visual symptoms of leaf senescence, relative water content and analyzing other constitutive traits such as root architecture attributes.

1.2.3 Flooding and Submergence Tolerance

Water logging/flooding is a most deleterious stress next only to drought. Flooding affects the plant health and yield of soybeans in 16% of the soybean production area causes severe economic losses. In US alone flooding stress in soybean causes an annual loss of approximately $1.5 billion (Boyer 1982, 1983; Oosterhuis et al. 1990; Rosenzweig et al. 2002; Bailey-Serres et al. 2012; Ahmed et al. 2013). Flooding stress could be due to submergence or water logging though the former in soybean is a rare occurrence (Oosterhui et al. 1990; VanToai et al. 1994; Linkermer et al. 1998). Water logging or flooding results in reduction in root and shoot growth, decline in atmospheric nitrogen fixation, photosynthetic potential, stomatal conductance and nutrient uptake consequently severely affects the yield of soybean and it may cause death of plant in severe conditions (Sullivan et al. 2001; Shannon et al. 2005; VanToai et al. 2012; Rhine et al. 2012; Wu et al. 2017a).

1.2.4 Heat Tolerance

Yield reduction in soybean due to extreme temperature conditions has been estimated to be around 40% (Specht et al. 1999). Heat stress during vegetative stage affects the growth of soybean. Soybean is highly sensitive to elevated temperature conditions (>35 ℃) during reproductive stages as heat stress cause flower and pod abortion during early stages, however the prolonged heat stress during pod filling stages leads to severe reduction in seed size and seed vigour (Boyer 1982; Chebrolu et al. 2016). Therefore, improving heat tolerance of soybean varieties is very crucial to improve the yield levels.

1.2.5 Cold Tolerance

In order to expand the soybean cultivation area from its traditional stronghold it is essential to impart cold tolerance trait so that cultivars could adapt to growing under low temperature conditions. The multiple effects of low temperature on soybean include poor germination, less seedling vigour, flower abortion and poor grain filling at reproductive stages (Yamamoto and Narikawa 1966). Northern hemisphere is characterized with short growing seasons and hence efforts are required to develop soybean varieties having traits such as good emergence and early seedling vigor. Seedling emergence test and early seedling weight are the traits evaluated in soybean germplasm. Genetic dissection of these traits and introgression in cultivated varieties through marker assisted breeding programme is a viable approach to enable the growth of soybean in northern regions.

1.2.6 Salinity Tolerance

Salinity stress severely affects the yields of soybean. High salinity level poses serious damage to the life cycle of soybean whereas low salt levels could cause significant reduction in soybean yield levels (Abel and Mackenzie 1964; Pitman and Läuchli 2002). Various agronomic features of the crop that are affected due to high salinity are significant reduction in plant height, leaf size, biomass, number of pods.plant−1, number of internodes.plant−1, number of branches.plant−1, weight.plant−1 and 100 seed weight (Shao et al. 1986; Shao et al. 1993; Parida and Das 2005; Blanco et al. 2007; Bustingorri and Lavado 2011; El-Sabagh et al. 2015). Salt stress observed during the nodulation stage greatly reduces the efficiency of biological nitrogen fixation as severe reduction in number and biomass of root nodules documented (Singleton and Bohlool 1984; Rabie and Kumazawa 1988; Yang and Blanchar 1993; Delgado et al. 1994; Elsheikh and Wood 1995). Soybean germplasm display a spectrum of salt tolerance capability (Yang et al. 1993; Pitman and Läuchli 2002; Lenis et al. 2011).

1.3 Genetic Resources of Resistance/Tolerance Genes

The diverse morphological, cytological and genetical features of wild species of soybean and also the cultivated soybean display wide array of genetic sources of resistance to multiple biotic and abiotic stresses. Thus, wild species form an important component of gene pool for the exploration of useful genes and alleles conferring abiotic stress tolerance. The annual and perennial soybean species are significantly distantly related. Wild perennial Glycine species offer immense potential for soybean improvement. The genus Glycine has two subgenera, Glycine Willd. (perennial) and Soja (Moench) F.J. Herm (annual). The subgenus Soja includes two species: the cultivated soybean [(G. max (L.) Merr.)] and its wild annual progenitor G. soja Sieb. & Zucc. (Ratnaparkhe et al. 2010). The subgenus Glycine comprises 30 wild perennial species thus, the genetic resources of soybean may be categorized into four plausible gene pools (GP).

1.3.1 Soybean GP-1

Soybean gene pool -1 (GP-1) comprise all of the biological species which could be crossed among them to yield vigorous hybrids characterized with normal meiotic chromosome pairing and possess total seed fertility. All soybean (G. max) germplasm and the wild soybean, G. soja, constitute GP-1 with the qualification that seed sterility can be associated with chromosomal structural changes such as inversions and translocations. Gene segregation is normal and gene exchange is generally easy.

1.3.2 Soybean GP-2

GP-2 include those species which can hybridize with GP-1 with relative ease and the resultant F1 plants exhibit at least some seed fertility (Harlan and de Wet 1971). Glycine max is devoid of GP-2 because no known species exhibit such a relationship with soybean. Though it is plausible for existence of such species in Southeast Asia where the Glycine genus may have originated, extensive germplasm exploration is indispensable to validate this suggestion.

1.3.3 Soybean GP-3

GP-3 is a potential genetic resource of soybean even though the hybrids between GP-1 and GP-3 are lethal. Gene transfers between GP-1 and GP-3 are not possible without resorting to in vitro culture based techniques such as embryo rescue etc. (Harlan and de Wet 1971). GP-3 includes the 26 wild perennial species of the subgenus Glycine which are indigenous to Australia and remain geographically isolated from G. max and G. soja. The three species G. argyrea, G. canescens, and G. tomentella have been successfully hybridized with soybean embryo culture based rescue techniques ensured the survival of F1 hybrids. However, much progress has not been made beyond the amphidiploid stage, with the exception of Singh et al. (1998a, b) suggesting that only these three species belong to GP-3.

1.3.4 Soybean GP-4

GP-4 is considered an extremely outer limit of genetic resources of soybean. Several pre- and post-hybridization barriers arrests the process of embryo development resulting in premature abortion of embryo. Bridge crosses with genus Glycine could circumvent the problems of seedling lethality, seed inviability and inviable F1 plants (Singh et al. 2007). Hence restricting the utility of GP-4 only few wild perennial Glycine species have been hybridized with soybean. Thus, majority of species belong to soybean GP-4 as they have not been hybridized with GP-1 when hybridized did not produce viable F1 plants (Singh et al. 1987). Although the wild perennial species exhibit resistance to several biotic and abiotic stresses, the transfer of useful genes into soybean has not been accomplished. Thus, the breeders/geneticists have access to the primary gene pool for expanding the germplasm base.

1.4 Classical Genetics and Traditional Breeding of Abiotic Stress Tolerance in Soybean

1.4.1 Classical Mapping Efforts

Soybean has been under continuous investigation for its genetic improvement by plant breeders. The crop encounters various biotic and abiotic stresses and hence improving their tolerance to stresses along with seed composition traits is pertinent. Improving agronomic performance of the crop would ensure higher productivity, and improved consumption of soybean and soy products leading to realization of greater economic benefits. Plant breeders have been traditionally using the practices of hybridization and meticulous selection methods to ensure better performing genotypes resulting in development of many soybean varieties. Classical genetics and traditional breeding approaches have been used to develop tolerance for drought, waterlogging stress, salt tolerance and for other abiotic stresses. Table 1.1 lists the soybean lines and resources used for the genetic improvement of abiotic stress tolerance.

Table 1.1 Potential genetic resources for abiotic stress tolerance

1.4.2 Breeding Objectives

Designing highly productivity genotypes under water-limited conditions is an important breeding objective. It warrants introgression of physiological traits that define plant water relations and hydraulic processes, into a common genetic background (Satpute et al. 2021). Water deficit condition is an outcome of complex interplay of several factors including low soil moisture and extreme temperatures and other edaphic factors. Breeding promising soybean genotypes through transfer of gene(s) conferring drought tolerance is an effective approach to alleviating the ill-effects of drought. Under drought stress, soybean plants suffer from oxidative injury, membrane system damage, cellular ion leakage and protein denaturation, declined photosynthetic rates, and CO2 uptake rates consequently causing reduction in biomass accumulation and yield levels. Hence, under drought, among the various physiological processes, photosynthesis is severely down-regulated with wider ramifications for the economic yield levels of the crops.

Breeding approaches to develop drought tolerant soybean involve diverse strategies namely recurrent selection and evaluation of segregating population under imposed drought-stress environment, and investigating the secondary traits for efficient selection, molecular breeding for drought tolerance, genomics-based and transgenic technologies to improve the drought tolerance trait. Advance phenotyping-based breeding approaches are pre-requisite and are being adopted systematically by developing early generation biparental, backcross or multi-parent intercross populations using identified candidate drought tolerant parental lines and wider-adaptable high yielding varieties. The populations are advanced through F2 generation by mass selection where bulks are subjected to chemical desiccation process using potassium iodide 0.2% (Blum et al. 1991; Bhatia et al. 2014; Satpute et al. 2019) followed by selections. Mass selection for the trait large seeds following chemical desiccation has significantly improved the seed weight and grain yield under chemical desiccation stress compared to control set up wherein seed selection was performed without chemical desiccation (Blum 2011). After two cycles of selection, intensive investigation of candidate genotypes for multiple drought tolerance-related traits is practiced in advance generations using three-tier selection scheme followed by multi-traits indexing. Figure 1.1 shows the scheme involved in developing drought tolerant soybean (Satpute et al. 2021). Development of soybean varieties with enhanced tolerance to drought, heat, salinity and cold has become a high research priority in major breeding programs worldwide (Fig. 1.2).

Fig. 1.1
figure 1

Workflow for developing drought tolerant parental lines and varieties specially for target population of environments (TPE) of drought conditions occurring at seed filling growth stage

Fig. 1.2
figure 2

a and b Screening of soybean germplasm under field condition for tolerance to water logging

1.5 Genetic Diversity Analysis

1.5.1 Phenotype and Genotype Based Diversity Analysis

During the past three decades, genetic diversity studies in soybean has been dominated by phenotyping-based diversity analysis, cytogenetics and molecular studies, including variation in isozymes and, seed proteins, use of restriction fragment length polymorphism (RFLP), random amplification of polymorphic DNA(RAPD), amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers. The geographic differentiation in Chinese cultivated soybean and genetic diversity was studied using the coefficient of parentage (Cui et al. 2000a), morphological traits (Dong et al. 2004), SSR markers (Li et al. 2008a; Wang et al. 2008a; Li et al. 2010; Wang et al. 2015) and SNP markers (Kajiya-Kanegae et al. 2021; Saleem et al. 2021). The diversity analysis of Asian soybean landraces and North American cultivars revealed a low level of diversity in the American pools than in the Asian pools, based on phenotypic characterization (Cui et al. 2000a, 2001) or the coefficient of parentage analysis (Cui et al. 2000b). Low diversity was further confirmed in DNA sequence-based analyses showing successive genetic bottlenecks between wild and cultivated soybeans and between Asian landraces and North American cultivars (Hyten et al. 2006). Genetic diversity studies in soybean have been discussed in detail by Carter et al. (2004). Comparison of Chinese and American Soybean Accessions using High-Density SNPs in population structure analysis, and cluster analysis revealed that the genetic basis of Chinese soybeans is entirely distinct from that of the USA (Liu et al. 2017).

1.5.2 Relationship with Other Cultivated Species and Wild Relatives

Comprehensive biosystematics based relationship analysis of all species in the genus Glycine reveal that annual (subgenus Soja) and perennial (subgenus Glycine) soybean species are significantly distantly related (Doyle et al. 2003), having diverged from a common ancestor around 5 MYA (Innes et al. 2008). As stated above in genetic resources section, several attempts to hybridize species between the subgenus Soja and subgenus Glycine were unsuccessful. The pods resulted from interspecific hybridization eventually aborted and got abscised although pod development was found to be initiated (Ladizinsky et al. 1979a; b). Later, the intersubgeneric F1 hybrids of G. max × G. clandestina, G. max × G. tomentella and G. max × G. canescens were successfully obtained either following embryo rescue technique (Newell and Hymowitz 1982; Singh and Hymowitz 1985; Singh et al. 1987) or using transplanted endosperm as a nurse layer (Broué et al. 1982). In general, the cultivated soybean could only hybridize with members of the subgenus Glycine imperfectly. The progeny of such inter-subgeneric hybrids were completely sterile and obtained with a great difficulty. Studies have proven that cultivated soybean does not hybridize with any of the wild relatives in other genera of the tribe (Hymowitz et al. 1995; Hymowitz 2004). The wild soybean (G. soja) has accumulated rich genetic diversity in the process of evolution and adaptation (Kofsky et al. 2018). This adaptive evolutionary process has resulted in wide diversification in the traits of wild soybean. The diversity for multiple morphological features includes flower, pubescence, seed and hilum color, disease and insect resistance traits, physiological and biochemical traits (protein, oil and carbohydrates and their constituents content) (Boerma and Specht 2004).

1.6 Association Mapping Studies

1.6.1 Extent of Linkage Disequilibrium

Linkage disequilibrium (LD) describes changes in the genetic variation within a population over time. Variation in LD either at genome scale or at a particular-genomic region is influenced by various factors such as mutation, domestication, level of inbreeding and selection, confounding effects, population admixture, and population substructure (Rafalski and Morgante 2004). A strong correlation is anticipated between inter-locus distance and LD if the recombination rates do not vary across the genome particularly in a constant population size. Soybean, owing to its ineffective recombination and homozygous genetic background, exhibit less decay of LD (longer region is in LD).

1.6.2 Genome-Wide LD Studies

SNPs are choice markers due to its abundant DNA polymorphism and hence are useful in genetic diversity studies and in determination of genetic relatedness among the individuals. To investigate the genetic frequency of SNP in soybean genome, ~28.7 kbp of coding region, 37.9 kbp of non-coding perigenic region, and 9.7 kbp of random non-coding genomic regions were evaluated in 25 diverse soybean genotypes (Zhu et al. 2003). This study divulged that the nucleotide diversity (θ) observed in coding and in non-coding perigenic DNA was 0.00053 and 0.00111, respectively. The combined nucleotide diversity of whole sequence was 0.00097. Squared allele frequency correlations (r2) among haplotypes at 54 loci with two or more SNPs indicated low genome-wide LD. A haplotype map of soybean (GmHapMap) was constructed using whole-genome sequence data from 1007 Glycine max accessions yielding 14.9 million variants as well as 4.3 M tag SNPs (Torkamanhe et al. 2021). A lower level of genome wide genetic diversity was observed in soybean compared to other major crops. Genome-wide LD investigations in soybean have facilitated identification of molecular markers and key genes associated with various abiotic stresses.

1.6.3 Genome Wide LD Studies for Drought Tolerance

Quantitative trait locus (QTL) mapping using bi-parental populations is limitated by restricted allelic diversity of parental genotypes and mapping resolution. The allelic diversity among mapping populations can be increased to some extent by using multi-parental crosses (Deshmukh et al. 2014). The genome wide association study (GWAS) approach provides opportunities to explore the tremendous allelic diversity present in soybean germplasm. Since millions of crossing events and natural mutations have been fixed in the germplasm during evolution, the mapping resolution of GWAS is comparatively higher. The recent advances in sequencing have played an important role in performing the genome- wide association studies (Abdurakhmonov and Abdukarimov 2008). GWAS is now routinely being used in soybean and other plant species, however fewer studies have been reported with regards to different abiotic stresses. GWAS for quantitative traits like drought tolerance are predicted to be affected by population structure. GWAS models like mixed linear model (MLM) and compressed mixed linear model (CMLM) have been developed which takes into account the population structure, kinship and spurious allelic associations (Deshmukh et al. 2014). Recent development in statistical tools involving larger set of genotypes and high throughput genotyping approaches will definitely improve GWAS power.

Dhanpal et al. (2015) analyzed a population of 373 genotypes in four environments for carbon isotope ratio (δ13C), an important physiological trait linked with water use efficiency (WUE). An association of 39 SNPs, linked to 21 different loci involved in conferring drought tolerance trait has been found. Similarly, Kaler et al. (2017) reported 54 SNPs associated with δ13C & 47 SNPs associated with δ18O. These SNPs were tagged with 46 putative loci and 21 putative genetic loci for δ13C and δ18O, respectively. Several markers and loci have been reported for various drought related traits viz., chlorophyll fluorescence (Hao et al. 2012; Herritt et al. 2018), canopy temperature (Kaler et al. 2017), delayed canopy wilting (Steketee et al. (2020) and drought susceptibility index (Chen et al. 2020) (Table 1.2). GWAS analysis in soybean for drought tolerance was reported using 259 Chinese cultivars for drought related traits. This investigation was based on a total of 4,616 SNPs, and 15 SNP-trait associations were identified by GWAS, among which three SNPs were suggestively associated with two of the drought-tolerance indices (Liu et al. 2020a).

Table 1.2 Details of genome wide association studies (GWAS) carried out for abiotic stress tolerance

1.6.4 Genome-Wide Association Mapping for Flooding Tolerance

Genome-wide association mapping has advantages over bi-parental QTL mapping as the former exploits the historical and evolutionary recombination (Zhu et al. 2008). Yu et al. (2020) conducted GWAS in a panel of 347 soybean genotypes to identify SNPs associated with seed-flooding tolerance related traits, viz., germination rate (GR), normal seedling rate (NSR) and electrical conductivity (EC). Use of 60,109 SNPs identified three major QTNs, viz., QTN13, qNSR-10 and qEC-7–2 linked to the traits. Further, QTN13 was consistently identified in all three traits investigated and in multiple environments. Wu et al. (2020a) applied GWAS in a panel of 384 soybean lines, using 42,291 SNP markers and models viz. regression linear model (GLM), mixed linear model (MLM), compressed mixed linear model (CMLM), and enriched compressed mixed linear model (ECMLM) for dissecting flooding tolerance. It has resulted in identification of 14 SNPs associated with flooding tolerance across all the environments and models (Table 1.2).

1.6.5 Genome-Wide Association Mapping for Salt Tolerance

Seed germination under salt stress was used for an association mapping study by Kan et al. (2015). Under salt stress, three loci significantly associated with three traits namely the ratio of imbibition rate, the ratio of germination index, and the ratio of germination rate, were identified and mapped to chromosomes Gm08, Gm09 and Gm18. Using 283 diverse lines of worldwide soybean accessions, In another GWAS study, Zeng et al. (2017a) identified eight genetic loci (mapped on to chromosomes Gm02, Gm7, Gm08, Gm10, Gm13, Gm14, Gm16, and Gm20) associated with leaf chloride and leaf chlorophyll concentrations by using sing 283 diverse lines of soybean accessions. Huang et al. (2018) used a diverse set of 192 soybean germplasm and identified six genomic regions (Gm02, Gm03, Gm05, Gm06, Gm08 and Gm18) associated with salt tolerance based on visual leaf scorch score. The study by Do et al. (2019), using two GWAS populations for association mapping of salt tolerance, confirmed the major locus on chromosome Gm03 and identified three novel loci on Gm01, Gm08 and Gm18. Several SNPs have been identified to be significantly associated with traits, leaf scorching score (LSS), chlorophyll content ratio (CCR), leaf sodium content (LSC) and leaf chloride content (LCC) (Do et al. 2019). Zhang et al. (2019) identified genomic regions associated with salt tolerance at germination stage and showed 18 significant SNPs were located on chromosome Gm08 and Gm18. Seventeen of the 18 significant SNPs were located in a major QTLqST-8, which was identified by linkage mapping in recombinant inbred lines (RILs) (Zhang et al. 2019). Though GWAS studies for salinity stress are relatively few in soybean, besides confirming major genetic determined by linkage mapping, GWAS has provided information of tolerant loci from new germplasm sources, which are quite useful in QTL pyramiding (Table 1.2).

1.7 Molecular Mapping of Tolerance Genes and QTLs

1.7.1 Brief History of Molecular Mapping in Soybean

The first report of utilization of molecular markers in soybean is use of restriction fragment length polymorphism (RFLP) for the assessment of genetic diversity of the soybean nuclear genome (Apuya et al. 1988). Subsequently, RFLP markers were used extensively for genetic diversity analysis (Kiem et al. 1989; Skorupska et al. 1993; Lorenzen et al. 1995) and linkage mapping (Kiem et al. 1990; Diers et al. 1992; Lark et al. 1993; Akkaya et al. 1995; Shoemaker and Specht 1995; Mansur et al. 1996; Kiem et al. 1997; Cregan et al. 1999; Ferreira et al. 2000; Yamanaka et al. 2001; Lightfoot et al. 2005) until SSR and SNP markers become popular (Hyten et al. 2010a), Lee et al. (2015), Sun et al. (2019a), Ratnaparkhe et al. (2020), Kumawat et al. 2021).

1.7.2 Evolution of Marker Types and Genetic Diversity Studies

Various marker-based technologies such as RFLPs, RAPDs, AFLPs, SSRs and SNPs were used for genetic mapping and diversity studies in soybean. Apuya et al. (1988) analyzed randomly chosen 300 RFLP probes in genomic DNA of the genetically distant cultivars Minosy and Noir 1. RAPDs were also used extensively by soybean geneticists, mainly for germplasm classification (Thompson et al. 1998; Brown–Guedira et al. 2000; Li and Nelson 2002). Construction of soybean linkage maps was done using SSR and AFLP markers (Morgante et al. 1994; Keim et al. 1997; Matthews et al. 2001). Interestingly, the first report of SSR allelic variation and their use as marker system in plant species was in soybean (Akkaya et al. 1992; Morgante and Oliveri 1993). SSR polymorphism showed high level of allelic variation in cultivated and wild soybean genotypes (Morgante et al. 1994; Maughan et al. 1995; Rongwen et al. 1995; Li et al. 2010). Akkaya et al. (1995) developed SSRs in soybean and integrated them into the linkage map. Subsequently, Cregan et al. (1999) mapped 606 SSR loci to develop an integrated soybean linkage map which was subsequently improved by addition of 420 SSRs (Song et al. 2004; Cregan et al. 1999). Hisano et al. (2007) used EST sequences to map a total 668 EST-derived SSR marker loci on soybean linkage map. Further, the availability of BAC-end sequence facilitated development of additional SSRs leading to integration of physical map and genetic map (Shultz et al. 2007; Shoemaker et al. 2006). Utilizing the whole genome sequence of soybean, a SSR database (BARCSOYSSR_1.0) was developed by Song et al. (2010). This genome-wide SSR database provides informative SSRs at any genomic position required for fine mapping as well as for MAS. Choi et al. (2007) identified SNPs via the resequencing of sequence-tagged sites (STSs) developed from EST sequences. In the total 2.44 Mbp of aligned sequence, a total of 5,551 SNPs were discovered, including 4712 single-base changes and 839 InDels resulting in an average nucleotide diversity of θ = 0.000997. Exploiting these SNPs, a total of 1,141 genes were placed on the genetic map by virtue of a SNP segregating among one or more RIL mapping populations, thus constructed a transcript map in soybean. Recent advances in whole genome sequencing and high throughput genotyping helped in the large scale genetic diversity studies of soybean germplasm collections.

1.7.3 Mapping Populations

Various mapping populations in soybean have been developed independently based upon the interests and needs of individual researchers, i.e., the degree of polymorphism required and specific agronomic traits for analysis. F2 populations or recombinant inbred lines (RILs) have been employed for the construction of linkage maps in soybean. While interspecific mapping populations have contributed enormously to the saturation of the soybean linkage map, intraspecific linkage maps have also been developed. Recently, Nested association mapping (NAM) have been used for genetic mapping in soybean (Diers et al. 2018; Beche et al. 2020).

1.7.4 QTL Mapping Studies

Molecular markers especially DNA-based markers have been used extensively to identify the genomic locations of major QTLs governing different traits in soybean. RILs which are developed following several generations of selfing (typically up to F6 or F7) are used in mapping QTLs. RILs are helpful in dissecting the QTLs and the estimate of influence of single or few QTL is possible depending on the population size. More than thousand QTLs governing over 100 agronomically and physiologically important traits have been characterized or mapped in soybean (Grant et al. 2010). Information pertaining to the QTLs mapped in soybean is available on database SoyBase (http://soybase.org). Recently, the advent of SNP-based genetic markers has facilitated the QTL analysis of many agronomic traits of soybean (https://soybase.org, http://soykb.org). The developments in the field of whole genome sequencing and the popularity of high throughput technologies have facilitated the genetic mapping in soybean in a great way yielding millions of SNP markers (Schmutz et al. 2010).

QTL mapping and molecular marker development have advanced in dissecting several agronomic traits and in studying the genetic basis of resistance against drought and water logging along with improved yield. In the pursuit to develop genotyping tools for investigating mapping population, Hyten et al. (2008) has developed a multiplex assay designated as soybean oligo pool all-1 (SoyOPA-1). This custom-made 384-SNP GoldenGate assay was developed utilizing SNPs discovered through resequencing of five diverse soybean accessions. Later, Hyten et al. (2010a) sequenced a total of 3,268 SNP-containing robust STS in six diverse genotypes, resulting in identification of 13,042 SNPs with an average of 3.5 SNP per polymorphic STS. These SNPs along with 5,551 SNPs discovered by Choi et al. (2007) were used to design two Illumina custom 1536 SNP GoldenGate assays designated as SoyOPA-2 and SoyOPA-3. A set of 1,536 SNPs (from the 3456 SNPs in three SoyOPAs) designated as Universal Soy Linkage Panel 1.0 (USLP1.0), ensured sufficient polymorphic markers at genome scale for use in QTL mapping applications. Hyten et al. (2010b) sequenced a reduced representation library of soybean to identify SNPs using high throughput sequencing methods. A total of 1,536 SNPs were selected to create an Illumina GoldenGate assay (SoyOPA-4). The SoyOPA-4 produced 1,254 successful GoldenGate assays suggesting an assay conversion rate of 81.6% for the predicted SNPs. Chaisan et al. (2010) used ESTs derived from 18 genotypes for EST clustering and SNP identification resulting in a total of 3,219 EST contigs and a total of 26,735 SNPs. The confirmation of in silico identified SNPs by Sanger sequencing yielded 15.7% accuracy rate between two cultivars Williams 82 and Harosoy. SNP markers in soybean which could be utilized for mapping of complex traits as well as molecular breeding applications have been developed in recent investigations (Song et al. 2012; Li et al. 2019; Song et al. 2020).

1.7.5 QTL Mapping Software

QTL mapping in soybean has progressed swiftly in last three decades or so nonetheless, a large fraction of QTLs remains unutilized in breeding programs because of issues such as low accuracy and false-positives. However, the QTL accuracy could be improved by adopting various QTL mapping methods and effective statistical models such as single marker analysis (SMA), simple interval mapping (SIM), composite interval mapping (CIM), multiple interval mapping (MIM), and Bayesian interval mapping (BIM). Various QTL mapping softwares and QTL network have been developed to perform the task. “Meta-QTL analysis” compile QTL data from multiple reports onto a same map to ensure precise identification of QTL regions (Deshmukh et al. 2012; Sosnowski et al. 2012). Meta-QTL was effectively utilized by Hwang et al. (2015) to identify QTLs linked to Canopy wilting using l five different populations (RILs). Among the QTLs identified, one QTL on chromosome 8 in the 93,705 KS4895 × Jackson population co-segregated with already known QTL linked with wilting identified in a Kefeng1 × Nannong 1138–2 population. The advances in statistical approaches and software resulted in exponential increase in soybean genetic mapping studies to understand plants response to extreme climatic conditions for abiotic stress such as drought, water logging and high temperature stress.

1.8 Marker-Assisted Breeding for Resistance/Tolerance Traits

Marker-assisted selection (MAS) is an indirect selection method where the linked molecular marker is utilized to transfer important agronomic traits from one genotype to another genotype. Marker-assisted backcrossing (MABC) is an important approach employed in soybean for transferring trait of interest. The high-throughput genotyping technologies have greatly assisted in the process of molecular marker identification and QTL mapping for different traits in soybean. The molecular breeding approaches such as Marker-assisted backcrossing and marker-assisted recurrent selection have aided in the introgression of the trait of economic or agronomic interest in soybean. In the past decades, several studies have focused on the genetic and molecular mechanisms of drought tolerance, flooding tolerance, salt tolerance where several QTLs have been identified to be associated with various abiotic stresses.

1.8.1 QTL Mapping for Drought Tolerance

Drought tolerance is a complex trait influenced by multiple genetic locations or governed by polygenes/QTLs, introgression of minor QTLs from donor to recipient cultivar is not an easy task. QTL mapping identified a total of 10 genomic regions associated with canopy wilting under drought stress (Table 1.3). Majority of these QTLs (9/10) have donor alleles conferring slow wilting traits from PI 416,937, Jackson, or both (Charlson et al. 2009; Abdel-Haleem et al. 2012; Hwang et al. 2015). Molecular markers associated with these QTLs could be explored for use in MAS to introgress the slow canopy wilting phenotypes from the donor into the elite backgrounds. However, transferring these QTLs is challenging task owing to the comple and, quantitative nature of the trait along with its sensitivity to prevailing environmental factors. Most minor QTLs were found to be unstable across the environments and populations. For instance, even major QTLs on chromosome 12 (R2 = 0.27) identified in all five environments from Benning × PI 416,937 (Abdel-Haleem et al. 2012) was not detected in any populations or site-years (Hwang et al. 2015). Accordingly, it is mandatory for QTL confirmation in more advanced generations to validate each individual QTL. It also suggests that molecular stacking of all confirmed QTLs in the genetic background of an elite cultivar is imperative to develop drought tolerance in soybean (Valliyodan et al. 2016). Ren et al. (2020) identified 23 QTL linked to drought tolerance of which seven QTLs were identified on chromosomes 2, 6, 7, 17, and 19 while five QTL were found on chromosomes 2, 6, 13, 17, and 19 respectively.

Table 1.3 Overview of QTLs identified for abiotic stress (Drought, water logging) tolerance in soybean depicting parents, mapping population, associated trait, chromosome and markers and phenotypic variance explained

1.8.2 QTL Mapping for Root System Architecture and Canopy Characteristics

Mapping of genomic regions controlling root system architecture (RSA) and canopy characteristics is critical to develop soybean that is cultivable in water-limited environment (Song et al. 2016a). In an interspecific RIL population derived from cross G. max (V71-370) × G. soja (PI407162), four significant QTLs associated with different root architectural traits were identified on chromosome Gm06 and Gm 07 (Prince et al. 2015a). In another study, Manavalan et al. (2015) identified a major QTL on chromosome Gm08 controlling tap root length, lateral root number and shoot length. Six transcription factors and two key cell wall expansion-related genes were identified as candidate genes in the confidence interval of this QTL. Recently, Dhanpal et al. (2021) conducted first genome-wide association study reporting genetic loci for RSA traits for field-grown soybean and identified key candidate genes.

1.8.3 QTL Mapping for Flooding Tolerance

Several studies have focused on understanding the genetic and molecular mechanisms of flooding tolerance in soybean identifying underlying major QTLs (http://www.soybase.org). VanToai et al. (2001) identified one QTL linked to molecular marker Sat_064 located on chromosome 18 associated with flooding tolerance. However, Reyna et al. (2003) could not find this QTL (Sat_064) associated to water logging tolerance in near-isogenic line (NIL) populations due to different genetic background or location/soil types of studies. Cornelious et al. (2005) reported five QTLs associated with flooding tolerance. The marker Satt485 on chromosome 3, marker Satt599 on chromosome 5, and three markers Satt160, Satt269, and Satt252 on chromosomes 13 were identified to be linked with the QTL. They are associated with flooding tolerance in two RIL populations (Table 1.3). Githiri et al. (2006) identified seven QTLs associated with yield under flooding stress resulting in a proposed QTL near Satt100. Wang et al. (2008b) mapped three QTLs, Satt531-A941V (chromosome 1), Satt648-K418_2V (chromosome 5), and Satt038-Satt275 (chromosome 18) associated with soybean flooding tolerance. Sayama et al. (2009) detected four putative QTLs viz. Sft1, Sft2, Sft3, and Sft4 associated with flooding tolerance and were mapped on to the chromosomes 2, 4, 8, and 12, respectively. Two new QTLs associated with both flooding injury score and flooding yield index were mapped on chromosomes 11 and 13 (Nguyen et al. 2012). However, these QTLs were discovered using bi-parental population characterized with a restricted mapping resolution due to limited recombination events. Later several novel QTLs associated with root system architecture, water-logging tolerance and yield in soybean have been identified (Ye et al. 2018; Wu et al. 2017b; Wu et al. 2020; Sharmin et al. 2020).

1.8.4 QTL Mapping for Salt Tolerance

Dissecting the genetic mechanism of salt tolerance in various stages of crop growth critical for the breeding of salt-tolerant soybeans (Munns and Tester 2008; Deinlein et al. 2014). Genetic architecture of salt tolerance in soybean has been dissected in several studies through bi-parental mapping and genome-wide association studies. An overview of the salt tolerant QTLs identified in soybean through bi-parental mapping is given in Table 1.4. In an F2:5 population derived from a cross of the salt-tolerant cultivar S-100 and salt-sensitive cultivar Tokyo, Lee et al. (2004) mapped a major locus on Gm03, explaining 29% and 35% of phenotypic variation in green house and field conditions, respectively. Chen et al. (2008) identified four QTLs for salt tolerance at the seedling stage on Gm03, Gm07, Gm09, and Gm18. Subsequently, several studies have confirmed the major locus on Gm03, in different genetic backgrounds using bi-parental mapping populations, including interspecific cross mapping population of G. max × G. soja (Hamwieh and Xu 2008; Hamwieh et al. 2011; Ha et al. 2013; Qi et al. 2014; Guan et al. 2014a; Zeng et al. 2017; Do et al. 2018; Shi et al. 2018). Zeng et al. (2017b) identified two new QTLs for leaf chloride content on Gm13 and Gm15, using KCl and NaCl treatments. Do et al. (2018) identified a QTL for salt tolerance on Gm13, linked with leaf sodium content.

Table 1.4 Overview of QTLs identified for salt tolerance in soybean

To identify salt tolerance QTLs at the germination stage, Zhang et al. (2019) used a RIL population and mapped 25 QTLs associated with four different salt tolerance indices during the soybean germination stage. Out of 25 QTLs identified for four salt tolerance indices at seedling stage, nine QTLs were located in an overlapping region on Gm08 (named qST-8, Zhang et al. 2019). A wild soybean (Glycine soja) accession JWS156-1 with high saline and alkaline salt tolerance was identified, and a significant QTL for alkaline salt tolerance was detected on Gm17 (Tuyen et al. 2010). The QTL for alkaline salt tolerance was different from the QTL for saline tolerance found on Gm03, previously in this genotype. This study demonstrated that saline and sodic stress tolerances are controlled by different genes in soybean. DNA markers associated with these QTLs can be used for marker-assisted pyramiding of tolerance genes in soybean for both saline and sodic stresses. Bi-parental linkage mapping has successfully mapped two major locus and several minor loci for salt tolerance, however bi-parental linkage mapping can detect alleles from parents only (Korte and Farlow 2013). Nevertheless, salt-tolerant loci identified by linkage mapping are highly useful for marker-assisted selection and gene cloning.

DNA markers tightly linked with the salt tolerance QTLs and the genes characterized can be used in the selection of salt-tolerant lines. The major QTLs identified on Gm03 and Gm08, are stable QTLs identified in several studies, therefore, highly useful for MAS. Marker-assisted pyramiding of the identified major and minor QTLs may provide higher salt tolerance than single QTL. Marker-assisted development of NILs for major QTL on Gm03, and their evaluation showed higher salt tolerance (Guan et al. 2014b; Do et al. 2016), and higher grain yield in saline field conditions (Do et al. 2016; Liu et al. 2016). The salinity tolerance of tolerant NILs, NIL-T, was associated with the maintenance of seed size under salt stress and could be attributed to the ability to regulate Na+ and Cl in both vegetative and reproductive tissues (Liu et al. 2016). Haplotype-based markers for the identified salt-tolerant QTLs were successfully developed and utilized for new tolerant germplasm identification (Patil et al. 2016; Kumawat et al. 2020).

1.9 Map-Based Cloning of Tolerance Genes

1.9.1 Strategies: Landing and Walking

Availability of genomic clone libraries with large DNA inserts is one of the essential requirements for plant genome analysis, primarily for physical mapping, gene isolation, and gene structure and function analysis. The Bacterial Artificial chromosome (BAC) vectors have been used widely for generating genomic DNA libraries in economically important crop plants including soybean. Development of BAC libraries is considered as critical step towards physical mapping and positional cloning of important genes.

1.9.2 Libraries: BAC/YAC Libraries

Several BAC libraries have been developed from different soybean genotypes and wild species. These soybean BAC libraries have been developed with different objectives including general genomic research as well as specifically for cloning of biotic and abiotic stress tolerance loci. These libraries have provided a good resource for positional cloning of agronomical and biologically important QTL genes that the representative genotype possesses. BAC libraries have also been constructed for several wild species including G. soja, G. syndetika, G. canescens, G. stenophita, G. cyrtoloba, G. tomentella, G. falcata, and the polyploid, G. dolichocarpa. All BAC libraries are publicly available to soybean researchers. The physical map generation of soybean was initiated with the development of early genetic maps characterized by the even distribution on the whole genome of the crop. Yeast artificial chromosomes (YACs) were initially developed with a view to utilize the resource for chromosome walking and in situ hybridization (Zhuet al. 1996). BAC libraries covering the whole soybean genome were generated by early genomic researchers (Marek and Shoemaker 1997; Danesh et al. 1998; Tomkins et al. 1999; Salimath and Bhattacharyya 1999; Meksem et al. 2000). BAC libraries encompassing variety of genotypes have led to the development of early physical contigs (Marek and Shoemaker 1997). Efforts were made to develop physical map of soybean genome using BAC and BIBAC based libraries (Wu et al. 2004). A physical map of soybean cultivar Williams 82 was in place that was generated from 67,968 BAC clones from a BstYI library and 40,320 clones from a HindIII library ([http://soybeanphysicalmap.org/]). Furthermore, SSR markers derived from BAC ends sequence (BES) were mapped and integrated into the physical map to improve its quality (Shoemaker et al. 2008). Six-dimensional BAC clones pools were employed to demonstrate the anchoring of genetic markers to the soybean BAC clones (Wu et al. 2008). On the parallel lines soybean unigene sets from NCBI were computationally anchored to Williams 82 BES resulting in anchoring of additional 305 contigs thereby complementing 1,184 anchored contigs achieved through 6-D pool screening efforts (Wu et al. 2008). Thus, the physical framework was accomplished by associating the contigs to the molecular markers which in turn was achieved by finger printing of the BAC clones through overgo hybridization, RFLP hybridization and SSR amplification (Song et al. 2004). The soybean physical map was updated and available at Soybean Breeders Toolbox (SBT) in soybase website (http://www.soybase.org) for the greater benefit of research community. Later, physical maps of soybean and related wild species were used for comparative and functional genomics studies (Innes et al. 2008; Ha et al. 2012; Ashfield et al. 2012).

1.10 Genomics-Aided Breeding for Tolerance Traits

1.10.1 Details of Genome Sequencing

Soybean genome sequencing project was accomplished by US Department of Energy-Joint Genome Initiative (DOE-JGI)-Community Sequencing Program (CSP) (Schmutz et al. 2010). Peptides from other flowering plants, TIGR legume EST data base were used and aligned with soybean genome data to obtain the gene rich regions. The resultant regions were fed in to the gene prediction algorithms to find putative genic regions. The homologous regions were integrated with EST sequences using PASA program (Haas et al. 2003). The genome sequence data and gene annotation of soybean is housed in Phytozome database (Goodstein et al. 2012) (http://www.phytozome.net/). It provides access to genes and gene families either by keyword-based search or sequence similarity-based programs like BLAST and BLAT (BLAST like Alignment Tool). The sequence analysis via shared functional domain or consensus sequence similarity enables the study on the evolutionary history of each gene family and identification of the closely linked gene families. Gbrowser in the database facilitates EST alignments, utility of VISTA tracks that helps in assessing the extent of nucleotide conservation in related plant genera. The Biomart- open-source data retrieval software allows the research community to download complete data from phytozome.

1.10.2 Application of Structural and Functional Genomics in Genomics-Assisted Breeding

New sequencing technologies have the potential to rapidly change the molecular research landscape in soybean (Lam et al. 2010; Libault et al. 2010; Li et al. 2013; Chung et al. 2014). Several research projects include genome re-sequencing, gene expression, and whole transcript profiling have provided large scale datasets for comparative and functional genomics studies (Valliyodan et al. 2016, 2019; Kim et al. 2019; Kajiya-Kanegae et al. 2021). Structural variations play important roles in driving genome evolution and gene structure variation which in turn contribute to agronomic trait variations. Liu et al. (2020) selected 26 accessions and performed de novo genome assembly for soybean accession. Through a comparative genome analysis, a total of 14,604,953 SNPs and 12,716,823 Indels, 27,531 copy number variations and 723,862 present and absent variations, were identified.

In addition to structural variations, gene expression studies are imperative constituent of any crop improvement program. Expression studies on single and global gene expression pattern analysis is an integral part of any crop improvement program. The gene expression patterns are investigated using the global expression analysis techniques like high-density expression arrays, Serial Analysis of Gene Expression and other functional genomics approaches. Usage of microarray on soybean gene expression studies were conducted for functional studies of key genes (Maguire et al. 2002; Thibaud-Nissen et al. 2003; Vodkin et al. 2004).). Functional genomics studies were also conducted to identify the role of microRNAs. MicroRNAs (miRNAs) are key regulators of gene expression and play important roles in many aspects of plant biology. Turner et al. (2012), identified number of novel miRNAs and previously unknown family members for conserved miRNAs in the recently released soybean genome sequence. They classified all known soybean miRNAs based on their phylogenetic conservation (conserved, legume- and soybean-specific miRNAs) and examined their genome organization, family characteristics and target diversity. Comparative and functional genomics have been applied extensively in soybean for identification of genes associated with key agronomic and physiological traits and for understanding the genome structure (Ma et al. 2010; Livingstone et al. 2010; Kim et al. 2010; Deshmukh et al. 2014; Ratnaparkhe et al. 2013; Valliyodan et al. 2016; Li et al. 2017; Zhou et al. 2019; Kim et al. 2019; Lin et al. 2019; Ferreira-Neto et al. 2019; Schmutz et al. 2019; Chaudhary et al. 2019; Paganon et al. 2020; Liu et al. 2020a; Valliyodan et al. 2021).

1.10.3 Transcriptomic Approaches to Dvelop Drought Tolerance

Characterization of genetic elements defining the root traits and related transcriptional responses to drought tolerance has gained greater interests in soybean (Thao et al. 2013). Initial exploration of genetic tool box for drought tolerance in soybean showed strong upregulation of around 3000 root-derived genes and metabolite coumestrol (Tripathi et al. 2016). In another study, a complex response of root tissues subjected to drought tolerance was identified along with the involvement of multiple biochemical pathways (Stolf-Moreira et al. 2010). In addition, early transcriptional responses of soybean roots to drought stress have been investigated in detail by Neto et al. (2013). Further, molecular basis of canopy wilting tolerance was studies through whole transcriptome sequences of leaf tissues of contrasting soybean genotypes (Prince et al. 2015b). Among the various differentially expressed genes, gene encoding UDP glucuronosyl transferase was specific to the drought tolerant line PI 567690. Comparison of root transcriptome profiles of genotypes DT2008 and William 82 indicated that the drought tolerant ability of DT2008 roots could be ascribed to the expression of high number of genes of root origin during early dehydration than during the prolonged dehydration. Also, differential expression of genes involved in osmo-protectant biosynthesis, transcription factors among others conferred drought tolerance (Ha et al. 2015). Root-specific transcriptome changes were observed in soybean lines subjected to drought stress. It identified several transcription factors that were differentially regulated during drought stress paving way for development of transcription factor-cis element network (Song et al. 2016b).

To gain further molecular insights about the aquaporin family proteins (AQPs), the plant specific AQPs, 23 soybean tonoplast intrinsic proteins (TIPs) genes were analyzed (Song et al. 2016b). Analysis identified 81 SNPs and many InDels in coding regions of TIP genes and their functional validation have provided key information regarding the roles of AQPs in soybean under various abiotic stresses (Song et al. 2016b). Similarly, exploration of AQPs in Glycine soja yielded 62 GsAQP genes. Comparative expression and protein–protein interaction analysis of AQPs in cultivated and wild soybean have helped in identifying GmTIP2;1 as a novel candidate gene, conferring salt and water stress tolerance (Zhang et al. 2017). The comprehensive list of investigations exploring the drought tolerance mechanism in soybean utilizing transcriptomic approaches are presented in Table 1.5.

Table 1.5 Differentially expressed genes related to abiotic stress tolerance in soybean

1.10.4 Applications of Structural and Functional Genomics

Plants have evolved an integrated strategy including signal perception and transduction, regulation of gene expression and biochemical and physiological responses adapting to drought stress. An effective and direct strategy to endure drought stress is to reduce water loss through closing stomata. The stomatal aperture is modulated by multiple factors including environmental signals, biotic/abiotic stress, CO2 concentration, light and plant hormones. Several hormones are involved in stomatal regulation, among which the stress hormone abscisic acid (ABA) plays the main role. During the signal transduction and adaptive response, the expressional changes of a large number of drought responsive genes occur. Chen et al. (2020) identified soybean drought-tolerant genotypes and new candidate genes for breeding. Total 422 SNPs and 302 genes were correlated with drought associated traits through GWAS studies. In addition, thirteen genes were identified which were associated with the node number of main stem trait. By qRT-PCR, the expression level of Glyma.03G018000 and Glyma.03G018900 in drought-tolerant varieties was significantly increased. This study provides important drought-tolerant genotypes, traits, SNPs and potential genes, possibly useful for soybean genetic breeding.

1.10.4.1 Reverse Genetics Approaches

Recent advances in gene isolation, plant transformation, and genetic engineering are being used extensively to alter metabolic pathways in plants by tailor made modifications to single or multiple genes. Many of these modifications are directed toward increasing the nutritional value of plant-derived foods and feeds. These methodologies are based on quickly growing information based on molecular findings, understanding, and predictions of metabolic fluxes and network pathways. The application of recombinant DNA and related techniques to plants opened up the potential to improve agronomic characters, drought tolerance, heat tolerance and salt stress resistance.

RNAi Technology

In functional genomics, RNA interference (RNAi) is a propitious gene regulatory approach that plays a substantial role in crop improvement by permitting down-regulation of gene expression by small molecules of interfering RNA without affecting the expression of other genes. The discovery and study of the RNA interference phenomenon, in which double- stranded RNAs (dsRNA) elicits degradation of a target mRNA containing homologous sequence, led to development of more effective dsRNA-mediated gene silencing methods. RNAi is a less complicated, quick and efficient method of silencing gene expression in a range of organism including prokaryotes and eukaryotes. The silencing of a gene is a result of degradation of RNA into short RNA fragments that binds to specific nuclease which activates ribonucleases to target homologous mRNA. Specific gene silencing has been shown to be related to two ancient processes, co-suppression in plants and quelling in fungi, and has also been associated with regulatory processes such as transposon silencing, antiviral defense mechanisms, gene regulation, and chromosomal modification (Agrawal et al. 2003). The insertion of a functional intron region in the nuclear genome as a spacer fragment additionally increases the efficiency of the gene silencing induction, due to generation of an intron spliced hairpin RNA (ihpRNA) (Wesley et al. 2003). In plants, biotic stress is caused by living organisms, especially, viruses, bacteria, fungi, insects, arachnids, nematodes, and weeds. These organisms account for about a 40% loss in the overall yield of six major food and cash crops. RNAi technology has opened up new prospects for crop protection against biotic stresses.

Plants in their natural field conditions constantly get exposed to various abiotic factors such as high salinity, variation in temperature, flood, drought, and heavy metals, which hinders proper growth and development in plants. These factors are also one of the major causes behind huge crop losses globally. The changing climatic conditions and rapidly expanding population demand creates an urgent need to develop more stress-tolerant cultivars. Hence, RNAi technology can be utilized to develop transgenic cultivars that can cope with different abiotic stresses. Functional genomics studies revealed that novel genetic determinants are involved in stress adaptation in plants, which can be used to attain stress tolerance.

Receptor for activated C-kinase 1 (RACK-1) is a highly conserved scaffold protein that plays a significant role in plant growth and development. Rice plants generated through transgenic method (RNAi technology- a reverse genetic approach) where RNAi mediated downregulation of RACK-1 gene carried out, has shown more tolerance to drought dress as compared to the non-transgenic rice plants (Li et al. 2009). Likewise, disruption of rice farnesyltransferase/squalene synthase (SQS) by maize squalene synthase via RNAi, resulted in enhanced drought tolerance at vegetative and reproductive stages (Manavalan et al. 2012).

Stress tolerance and development in plants are regulated by miRNA and negatively affect the expression of the post-transcriptional gene. Wang et al. (2011a) examined that miRNA are involved in the very early stage during seed germination and identified that miRNA-mediated regulation of gene expression is present in maize imbibed seed. Wang et al. (2011b), reported 32 known members of 10 miRNA families and 8 new miRNAs/new members of known miRNA families that were found to be responsive to drought stress by high-throughput sequencing of small RNAs from Medicago truncatula. These findings suggest the importance of miRNAs in the response of plants to abiotic stress in general and drought stress in particular.

OsTZF1 gene is a member of the CCCH-type zinc finger gene family in rice (Oryza sativa). Conditions like drought, high-salt stress, and hydrogen peroxide can induce the expression of OsTZF1. Expression of OsTZF1 gene was also induced by abscisic acid, salicylic acid, and methyl jasmonate. OsTZF1 gene overexpressed transgenic plants showed enhanced tolerance to high salt and drought stresses; whereas transgenic rice plants in OsTZF1 gene were silenced using RNAi technology has shown less tolerance. This suggests the role played by OsTZF1gene in abiotic stress tolerance (Jan et al. 2013). Dehydrin proteins play a significant role in protecting plants from osmotic damage. Various research results suggest that overexpression of dehydrin gene WZY2 provides more tolerance to plant against osmotic stress. A study conducted by Yu et al. (2019) suggests that RNAi mediated silencing of WZY2 gene in Arabidopsis thaliana makes plant intolerant to osmotic stress.

Several researchers have focused on functional genomics studies of drought responsive genes (Le et al. 2012; Barbosa et al. 2013; Hua et al. 2018; Wang et al. 2018a; Wei et al. 2019). Drought responsive genes consist of regulatory genes encoding plenty of transcription factors (TFs), effector genes encoding chaperones, enzymes and ion/water channels etc. Several groups of TFs, such as ABA-responsive element-binding (AREB), dehydration responsive element binding (DREB), MYB, bZIP, NAC, and WRKY, respond to drought stress and act in an ABA‐dependent or ABA-independent manner. Transcription factors are being used to develop genetically modified plants more tolerant to abiotic stresses. DREB and AREB TFs were introduced in soybean showing improved drought tolerance, under controlled conditions. Soybean, transgenic lines containing AtDREB1A, showed higher survival rate after a severe water deficit and important physiological responses to water deprivation, such as higher stomatal conductance and the maintenance of photosynthesis and photosynthetic efficiency (Polizel et al. 2011; de Pavia Rolla et al. 2014). Higher survival rates of DREB plants are because of lower water use due to lower transpiration rates under well-watered conditions. In addition to physiological studies, molecular analysis revealed that drought-response genes were highly expressed in DREB1A plants subjected to severe water deficit (Polizel et al. 2011). Mizoi et al. (2012) identified GmDREB2A, and showed that its heterologous expression in Arabidopsis induced stress-inducible genes such as RD29A, RD29B, HsfA3, and HSP70 and improved stress tolerance. These findings indicate that plants overexpressing AtDREB2A and DREB2Alike proteins have increased tolerance to abiotic stresses. In soybean, the overexpression of AREB1 gene indicated drought tolerance and exhibiting no leaf damage. It showed better growth and physiological performance under water-deficit as compared to the wild type (Barbosa et al. 2013).

Other transcription factor, WRKY, plays important roles in response to various abiotic stresses (Zhou et al. 2008). Previous studies have proved that soybean GmWRKY54 can improve stress tolerance in transgenic Arabidopsis. Soybean transgenic plants were generated and further investigated for biological mechanisms of GmWRKY54 in response to drought stress (Wei et al. 2019). This study demonstrated that expression of GmWRKY54, driven by either a constitutive promoter (pCm) or a drought‐induced promoter (RD29a), confers drought tolerance. Recently, genes as candidate biomarkers have also been identified to screen for drought-tolerant genotypes (Hua et al. 2018). Using a GeneChip Soybean Genome Array, Hua et al. (2018) identified 697 differentially expressed genes. These genes are mainly involved in the metabolic and hormone signaling pathways. Ten DEGs were validated in a sample of 20 soybean cultivars varying in the level of drought tolerance. This research provided a new set of transcriptomic data and biomarkers for early diagnosis of drought damage and molecular breeding of drought tolerance in soybean.

Major advancement has also been made in the structural and functional genomics studies for salt tolerance (Roy et al. 2014; Wang et al. 2018b; Zhang et al. 2019; Li et al. 2020a, b). Several loci for salt tolerance have been mapped in soybean and among them candidate genes for two major loci have been cloned (Guan et al. 2014b; Qi et al. 2014; Do et al. 2016; Zhang et al. 2019). A major and consistent salt tolerance locus on Gm03, was fine mapped and candidate gene was cloned and characterized as a sodium transporter (Guan et al. 2014b; Qi et al. 2014; Do et al. 2016; Patil et al. 2016). Qi et al. (2014) fine mapped and identified the gene underlying this QTL in a salt tolerant wild soybean accession W05. The candidate gene named GmCHX1, is a counterpart of Glyma03g32900 in Williams 82 and homolog of the Na+/H+ antiporter gene family. Genomic sequence analysis of GmCHX1 for W05 and Williams 82 revealed that Williams 82 had a ~3.8 Kb Ty1/copia retrotransposon inserted into exon 3, but not in its counterpart Glysoja01g005509 in W05 (Qi et al. 2014). In another study, Guan et al. (2014b) resolved this QTL into a salt tolerant variety Tiefeng 8, identifying the same gene Glyma03g32900 (named as GmSALT3) having similar insertion of a 3.78-kb copia retrotransposon in exon 3 of salt sensitive parent. Subsequently, Do et al. (2016) characterized this locus in salt tolerant cultivar FT-Abhayra and identified Glyma03g32900 (named Ncl) as causal gene. Insertion of a ~3.8-kb Ty1/copia type retrotransposon was responsible for the loss of gene function and salt sensitivity. Association of Glyma03g32900 functional alleles and salt tolerance was confirmed in near isogenic lines (Guan et al. 2014b; Do et al. 2016). genetic Overexpression of Glyma03g32900 by genetic transformation in the sensitive genotype Kariyutaka showed improved salt tolerance (Do et al. 2016). Fine mapping of major locus for salt tolerance qST-8 was conducted and a candidate gene Glyma.08g102000 (named GmCDF1), belonging to the cation diffusion facilitator (CDF) family, was identified (Zhang et al. 2019). RNA interference mediated down-regulation of GmCDF1 in soybean hairy roots resulted in tolerance to salt stress (Zhang et al. 2019).

1.11 Recent Concepts and Strategies Developed

Genomic-assisted breeding, genomic selection (GS), genome sequencing, marker-assisted selection (MAS), genetic engineering approaches, and genomics tools have been used to improve soybean yield and quality. Genomic selection is a simple, reliable, and powerful approach that enables the rapid selection of superior genotypes, bringing bigger benefits to the breeders. The marker-assisted selection also has an advantage in screening difficult traits and identification of recessive alleles. Recent advancement in genomic tools and next-generation sequencing techniques makes it easier to develop new varieties with the superior trait. Genomic approaches, along with bioinformatics tools, allow a gigantic leap forward in plant breeding. Genomic designing overcomes the limitations of traditional breeding methods and accelerated the development of climate-smart soybean crops. Developing abiotic stress-tolerant soybean varieties have become convenient with the availability of a complete genomic sequence of soybean. Recently, gene editing tools such as modified meganucleases, hybrid DNA/RNA oligonucleotides, zinc finger nucleases, TAL effector nucleases and modified CRISPR/Cas9 are used for developing abiotic stress tolerance (Bao et al. 2021). Each of these tools has the ability to precisely target one specific DNA sequence within a genome and to create a double-stranded DNA break. DNA repair to such breaks sometimes leads to gene knockouts or gene replacement by homologous recombination. Genome rearrangements are also possible to engineer. Creation and use of such genome rearrangements, gene knockouts and gene replacements by the soybean researchers is gaining significant momentum (Carrijo et al. 2021).

1.11.1 Genome Editing—A Magic Bullet

Genome editing is at the dawn of its golden age. It is described as the ability to modify and manipulate DNA sequences with higher precision in living cells (Segal and Meckler 2003). The ability to remove, insert or even edit DNA sequences easily and accurately has attracted the interest of the scientific community in a wide range of biotechnology areas, such as medicine, environmental studies and even agriculture. Targetable nucleases enable scientists to target and modify theoretically any gene in any organism. In the past few years, rapid development of molecular understanding with the aid of advanced computational technology and instrumentation with multiplexing and higher precision has led to the development of sequence specific DNA nucleases has progressed rapidly and such nucleases like zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems have been used in plant species such as Arabidopsis (Zhang et al. 2010; Li et al. 2013), tobacco (Nekrasov et al. 2013; Zhang et al. 2013), rice (Li et al. 2012; Shan et al. 2013, 2014), barley (Wendt et al. 2013), soybean (Sun et al. 2015; Curtin et al. 2011), Brachypodium (Shan et al. 2013) and maize (Shukla et al. 2009).

All these nucleases involved in the genome editing technology are consist of DNA binding domains together with non-specific nuclease domains that generate double- strand breaks (DSBs). The DSBs are mainly repaired by non-homologous end-joining (NHEJ) or homologous recombination (HR) pathway (Chen and Gao 2013). NHEJ simply re-joins the broken DNA ends in an error-prone fashion and often results in small deletions or insertions. In the HR pathway, DSBs are correctly repaired using a homologous donor DNA as template. So far most genome editing has utilised the NHEJ pathway to knockout genes and only a few illustrations of gene insertion by HDR have been reported (Hyun 2020). The reasons may be that the mass of tissues to which DNA is delivered are often composed of determinate cells in which HDR is not the preferred repair mechanism.

1.11.1.1 ZFNs (Zinc Finger Nucleases)

Zinc finger proteins were considered as the very first of the “genome editing” nucleases to hit the scene in the end of the twentieth century. The Zinc finges, class of protein which is found the most commonly as a DNA binding protein domain in eukaryotes. Zinc finger nuclease (ZFN) is made up of two domains: DNA binding domain with repeated zinc fingers and FokI restriction enzyme-derived nuclease domain which is considered one of the most abundant DNA binding motifs in eukaryotic genome having the ability to recognize any sequence (Bitinaite et al. 1998). It is generally comprised of ~30 amino acid modules that interact with nucleotide triplets ie codons. ZFNs have been designed in such way that that it can recognize all of the 64 possible trinucleotide combinations, and by stringing different zinc finger moieties, one can create ZFNs that specifically recognize any specific sequence of DNA triplets (Segal et al. 2003). Each ZFN typically recognizes 3–6 nucleotide triplets, binds to the nuclease functions only as dimer, are required to target any specific locus. The first half part that recognizes the sequence upstream and the later one recognize the sequence downstream of the site to be modified (Szczepek et al. 2007).

1.11.1.2 TALENs (Transcription Activator-Like Effector Nucleases)

Transcription activator-like effector nucleases (TALENs) have made a huge impact on the genomic engineering (Bedell et al. 2012). TALENs, like ZFNs contain the FokI nuclease fused to the DNA binding protein domain which can be exploited for targeted cleavage. This DNA binding domain known as Transcription activator-like effectors (TALE) derived from plant pathogenic Xanthomonas bacterium contains 33–35 amino acid repeat domains that recognizes a single base pair of the DNA (Joung and Sander 2013). Two hyper variable amino acids which are known as the repeat-variable di-residues (RVD) determine the TALE specificity found at positions 12 and 13. The TALE repeats use four RVD domains NN, NI, HD and NG which recognize guanine, adenine, cytosine and thymidine, respectively (Deng et al. 2012).

Although TALENs are effective tools for genome editing, there are some limitations regarding the potential target sites, such as the need for T at position 1 (Doyle et al. 2012) and the fact that some TALENs fail to cause mutations at the desired location despite of engineering nuclease and DNA binding domain. The latestly developed genome editing technology- CRISPR/Cas system seems to provide a complementary approach to ZFNs and TALENs, as it only requires the PAM (NGG) motif preceding the recognition sequence.

1.11.1.3 CRISPR/Cas System

The research into the defence mechanisms of bacteria brought CRISPR to the scientific community. First discovered in 1987, the CRISPR-Cas system is an adaptive immunity prokaryotic defence system. As a result, it has been the focus of aggressive research that provided compelling insights into its function, as well as the promise of new molecular techniques (Ishino et al. 1987). CRISPR immunity has been categorized into three stages: adaptation, expression and interference. During the adaptation stage new spacer sequences are incorporated into the CRISPR locus. During the expression stage the CRISPR locus is transcribed to generate, or mature, the CRISPR RNA (crRNA). Finally, in the interference stage the invading nucleic acid is destroyed using the processed crRNA in some form of effector complex containing Cas proteins.

The most commonly used RGN in genome editing is the Cas9 nuclease from the type II CRISPR/Cas9 system of Streptococcus pyogenes (Jinek et al. 2012). With this system, there are two components that enable targeted DNA cleavage: a Cas9 protein and an RNA complex consisting of a CRISPR RNA (crRNA; contains 20 nucleotides of RNA that are homologous to the target site) and a transactivating CRISRP RNA (tracrRNA). For genome engineering purposes, the system can be reduced in complexity by fusing the crRNA and tracrRNA to generate a single-guide RNA (sgRNA) (Jinek et al. 2012). Also protospacer adjacent motif (PAM) sequences (5’- NGG-3’), an essential targeting component is situated upstream of the crRNA which is recognized by the cas9. The CRISPR/Cas systems can therefore cleave 23 bps target DNA sequence.

In contrast to ZFNs and TALENs, which require recoding of proteins using large DNA segments (500–1500 bp) for each new target site, CRISPR-Cas9 can be easily altered to target any genomic sequence by changing the 20-bp protospacer of the guide RNA, which can be accomplished by subcloning this nucleotide sequence into the guide RNA plasmid backbone. The Cas9 protein component remains unchanged. This ease of use for CRISPR-Cas9 is a significant advantage over ZFNs and TALENs, especially in generating a large set of vectors to target numerous sites (Mali et al. 2013). Another potential advantage of CRISPR-Cas9 is the ability to multiplex, i.e., to use multiple guide RNAs in parallel to target multiple sites simultaneously in the same cell (Cong et al. 2013 and Mali et al. 2013). With respect to site selection, CRISPR-Cas9 compares favourably with ZFNs and TALENs. With the most flexible version of the S. pyogenes CRISPR-Cas system, site selection is limited to 23-bp sequences on either strand that end in an NGG motif (the PAM for S. pyogenes Cas9), which occurs on average once every 8 bp (Cong et al. 2013).

The targeted plant genome editing using sequence specific nucleases has a great potential for crop improvement to meet the increasing global food demands and to provide sustainable productive agriculture system. Immediately after its early use to edit the genomes of bacteria and animals (Hwang et al. 2013; Mali et al. 2013), its efficacy was validated in the model plant systems of Arabidopsis, rice and tobacco (Feng et al. 2013; Nekrasov et al. 2013; Xie and Yang, 2013).

1.12 Genetic Engineering for Tolerance Traits

Genetic modification of soybean utilizing various genes has resulted in the improved salt and drought tolerance traits (Table 1.6). Confirmation of drought tolerance in soybean was performed by ectopic expression of AtABF3 Gene (Kim et al. 2018). Several genes and TFs have been ectopically expressed in other model plants to study their functional significance. For example, over-expression of soybean-derived calmodulin gene (GmCaM4) in Arabidopsis enhanced tolerance to salinity owing to upregulation of AtMYB2-regulated genes, namely P5CS1 (Δ1- pyrroline-5-carboxylate synthetase-1) (Yoo et al. 2005). Similarly, soybean-derived S-phase kinase-associated protein 1 (SKP1) gene GmSK1 was over expressed in Nicotiana tobacum cv. Samsun showing improved tolerance to salinity and drought stress (Chen et al. 2018). Pitman and Läuchli (2002) suggested that genetic modification for enhanced salt tolerance is an important approach. In dry regions, irrigation of moderately salt tolerant crops with brackish water is feasible and will be helpful for increasing the crop production. Identification of orthologs and their functional analysis will provide opportunity to improve salt tolerance in soybean through genetic engineering. Based on the knowledge of monovalent cation/proton antiporter (CPA) family in Arabidopsis, several genes have been identified and functionally characterized for their involvement in salt tolerance in soybean. Jia et al. (2017) demonstrated that GsCHX19.3, a member of cation/H+ exchanger super family from wild soybean provide tolerance to high salinity and carbonate alkaline stress. GsCHX19.3 mediates K+ uptake and Na+ excretion under carbonate alkaline stress when over-expressed in Arabidopsis. Sun et al. (2019a) found that a Na+/H+ exchanger, GmNHX1, was upregulated under salt stress in soybean genotype Jidou 7. Overexpression of GmNHX1 in Arabidopsis, enhances salt tolerance by maintaining K+/Na+ ratio in root (Sun et al. 2019b). Similarly, overexpression of transcription factor GmNAC15, a member of the NAC transcription factor family in soybean, enhances salt tolerance in soybean hairy roots (Ming et al. 2018).

Table 1.6 Genetic engineering studies related to abiotic stress tolerance

Jia et al. (2020) characterized GmCHX20a, a paralog of salt tolerant gene GmCHX1, and found that the ectopic expression of GmCHX20a in soybean hairy roots and Arabidopsis led to an increase in salt sensitivity and osmotic tolerance. It was suggested that GmCHX20a and GmCHX1 together addresses both osmotic stress and ionic stress at different times of salinity stress exposure (Jia et al. 2020). Higher expression of GmCHX20a led to an increase in salt sensitivity and osmotic tolerance in early stage of salinity stress, whereas higher expression of GmCHX1 protected plants via Na+ exclusion under salt stress in later stage. Jin et al. (2019) characterized GsPRX9, a class III peroxidase which upregulated significantly under salt stress. Overexpression of the GsPRX9 in soybean hairy roots resulted in higher root fresh weight, primary root length, activities of peroxidase and superoxide dismutase, and glutathione level, but had shown lower H2O2 content than those in control roots under salt stress. This suggests that the overexpression of the GsPRX9 gene results in enhanced salt tolerance and activation of antioxidant response in soybean. These examples provide insight into the mechanism of salt tolerance in soybean and various genes playing important role in maintaining ion ratio and antioxidant properties in plant, which can be utilized for genetic engineering of salt tolerance in soybean. To improve salt tolerance through genetic engineering, the negative regulators of salt tolerance could be down-regulated by gene editing and positive regulators could be overexpressed through genetic transformation.

The availability of large number of salinity tolerant genotypes makes it possible to develop salt tolerant soybean cultivars. Further, genetic characterization for trait inheritance and QTL identification made it feasible to introgress single or multiple salinity stress tolerant QTLs in desirable genetic background through DNA marker-assisted backcrossing and marker assisted recurrent selection (Lee et al. 2009). Identification of progeny lines which have shown higher tolerance than tolerant parental genotypes in some of the studies indicated that when positive alleles from tolerant and susceptible parents come together, higher tolerance is achievable (Hamwieh et al. 2011; Do et al. 2018). Therefore, identification of positive alleles from both types of parents is desirable for QTL pyramiding for higher salt tolerance. It is also possible to identify different positive loci from two different tolerant genotypes to increase the threshold of stress tolerance, and in such cases QTL mapping may be performed in populations derived from tolerant × tolerant parents. Functional characterization of positive regulators of salinity stress tolerance like GmCHX1, GmCHX19.3, GmNAC15 and GmNHX1, made it feasible to genetically engineer target soybean cultivars in a short period of time. However, identification of negative regulators of salinity tolerance indicates that target genetic background should be carefully characterized to overcome the negative interaction of these negative loci, when introgression or modification of positive genes and alleles is planned.

1.13 Prospectus and Limitations of Genomic Designing for Soybean

Genomic designing approaches have enabled the improvement of soybean at a faster pace than traditional approaches. Introgression of genes and QTLs become much easier with the genomics advances. Marker-based QTL mapping is a powerful method to recognize regions of the genome that co-segregate with a given trait and mapping of QTL for abiotic stress tolerance can be utilized for the elevation of tolerance against drought (Carpentieri-Pipolo et al. 2012; Zhang et al. 2012), salt (Hamwieh et al. 2011; Ha et al. 2013; Tuyenet al. 2013), flood (Guzman et al.2007; Li et al. 2008b), and heavy metal stress (Sharma et al. 2011) in soybean. QTL mapping is more efficient compared to traditional mapping approaches since it does not require large numbers of progenies and generations of segregation populations. Genome-wide association study is an excellent approach to explore the allelic diversity present in the natural accessions of soybean. Furthermore, GWAS mapping resolution is higher than QTL mapping resolution due to millions of crossing events accumulated in the germplasm in the course of evolution (Deshmukh et al. 2014). Genome-wide association study has a great advantage in the dissection of the complex genetic architecture (Korte and Farlow 2013). Genome-assisted breeding in soybean helps in selecting superior genotypes which in turn improve the quality and yield of soybean crops on a large scale.

Although genome designing approaches have many benefits and are less time-consuming, more reliable, and easier methods, it has some limitations also. For instance, the resolution of QTL mapping is not very high due to biased mapping of QTL. Also, this method is limited to map allelic diversity that tends to segregate in a biparental population (Borevitz and Nordborg 2003). From a single QTL mapping experiment, it is hard to isolate perfect candidate genes. Moreover, genes that are identified by QTL mapping experiments are limited to those that segregate in the considered cross (Brachi et al. 2010). Genome-wide association study can overcome these limitations of QTL mapping, although it has its limitations such as the risk of many false positives as a result of population structure, unpredictable power to detect QTL, and the background LD can confound the results. The main drawback of MAS is linkage drag which can be minimized by marker-assisted backcrossing (MAB) and GS limitation is high cost and low accuracy (Staub et al. 1996; Deshmukh et al. 2014). Genome editing and other genomic methods undoubtedly set a milestone that solves all new challenges in the stream of science, however, it has some major ethical issues and negative side effects. In the future, advancement in genomic designing tools and methodologies may overcome the above-mentioned limitations (Bao et al. 2021; Carrijo et al. 2021).

1.14 Bioinformatic Resources for Soybean Improvement

Bioinformatics plays an inevitable role in the modern genomics era. It is a science of collecting, storing, and developing algorithmic tools to analyze and understand complex biological data. There are several databases and bioinformatics tools available for various purposes.

1.14.1 Gene and Genome Databases

Arabidopsis was the first plant species and the third multicellular organism to be completely sequenced and published (Kaul et al. 2000). Later, with the advancement of next-generation sequencing, several plant genomes were sequenced, and most of them are available in public databases. Biological databases are stores of biological information, and are mainly of two types, primary and secondary database. In the primary database, the sequence information is stored, and the secondary database utilizes this information. The secondary database uses the genome sequence information and performs the downstream analysis like functional annotations. The most important databases where genome and gene sequences can be submitted and retrieved are NCBI, Phytozome, and Ensemble. SoyKB and SoyBase are secondary databases that are specific to soybean. Most of these databases were generated for easy retrieval of specific genomic sequences, annotated genes, and putative functions of the genes possess marker information, QTL, transcriptomic data and can perform other downstream analysis. These databases play an important role in the identification of homologous genes using the information of functionally characterized genes.

1.14.2 Comparative Genome Databases

Genome sequencing of a large number of plant species and whole-genome re-sequencing of different cultivars of a crop generates new scopes of comparative genomics. Several studies have been published for comprehensive gene family analysis and duplication among the plant species. These types of studies are very important for the evolutionary fingerprinting of plant species. On the other hand, whole genome resequencing helps to explore genomic variants within a species. The comparative genomic variants would help in the dissection of biochemical pathways. The variant information of around 20,000 soybean accessions is available at SoyBase (Grant et al. 2010; Brown et al. 2020; https://www.soybase.org/) and SoyKB (Joshi et al. 2017; http://soykb.org/) database generated by SoySNP50K chip (Song et al. 2013). These single nucleotide variant data can be downloaded from SoyBase and SoyKB databases using Plant Introduction (PI) ID, genomic coordinate, and SNP ID. Further, variant information can be utilized for various studies like genome-wide association study, genomic selection, and superior haplotype identification. The comparative genomic analysis also provides evolutionary information, polyploidization, copy number variation, and presence-absent variations (PAV). Ha et al. (2019) developed a database Soybean-VCF2Genomes to identify the closest accession in soybean germplasm collection.

1.14.3 Gene Expression Databases

The transcriptional data provides the information about different gene interaction in diverse biological conditions, their role in biochemical pathways, and their function. The microarray and expressed sequence tags (EST) data information was dominant over a decade. Later, the advancement of NGS techniques replaced these conventional techniques. Next-generation sequencing is based on whole tissue mRNA sequencing and generate large amounts of sequencing data related to gene expression in various environmental conditions that can play important role in predicting gene function. There are several methods for gene expression analysis such as microarrays, Gene Chips, EST, serial analysis of gene expression (SAGE), massive parallel signature sequencing (MPSS), and RNAseq (Chaudhary et al. 2015). RNA-seq data related to various environment and stress conditions are available at different public sites like NCBI (https://www.ncbi.nlm.nih.gov/sra/), EMBL-ENA (https://www.ebi.ac.uk/ena/browser/home) and DDBJ (https://www.ddbj.nig.ac.jp/index-e.html). These databases provide the RNA-seq data of sequence read archive (SRA) raw files which can be analyzed using various publicly available RNA-seq pipelines. However, some databases like BAR (http://bar.utoronto.ca/), SoyKB, and SoyBase provides the publicly available analyzed data in the form of gene expression profile in different tissues and conditions. Several studies have been performed using publicly available RNAseq data and identified various key genes related to specific conditions (Machado et al. 2019). The biotic and abiotic stress related RNAseq data is also available in future, meta-transcriptomics analysis would result in the understanding of precise gene function, gene-environment interaction, and complex biological pathways.

1.14.4 Protein or Metabolome Databases

Proteins are the most important biomolecules as they directly control biological pathways and act as a functional unit. There are several hundred different proteins present in soybean seed but the major is glycinin (11S legumin type) and conglycinin (7S vicilin type), both comprise 65–80% of total protein content and 25–35% of seed content (Hammond et al. 2003). Soybean also has anti-nutrient content like kunitz trypsin inhibitors, lectin, P34 allergen, urease, and some other transporter protein, oil storage protein oleosins, sucrose binding and many others. Many studies have been conducted in soybean and different crops for the identification of protein expression in different tissue at various time intervals under stress conditions. The different techniques like 2D gel electrophoresis, HPLC, UPLC, LCMS, and GCMS have been used for the identification of proteins/metabolome in different environmental conditions. Several metabolites are available in Kyoto Encyclopedia of Genes and Genomes (KEGG: https://www.genome.jp/kegg/), Arabidopsis acyl-lipid metabolism (http://aralip.plantbiology.msu.edu/pathways/pathways), BRENDA (https://www.brenda-enzymes.org/index.php), MassBank (http://www.massbank.jp/). Medicine plant (http://medicinalplantgenomics.msu.edu/), MetabolomeXchange (http://www.metabolomexchange.org/site/), Plant Metabolic Network (PMN: https://plantcyc.org/), Plant/Eukaryotic and Microbial Systems Resource (PMR: http://metnetweb.gdcb.iastate.edu/PMR/), PRIMe (http://prime.psc.riken.jp/?action=metabolites_index), MetaboLights (https://www.ebi.ac.uk/metabolights/index). SoyMetDB is a metabolomic database for soybean and provide a one-stop web resource for integrating, mining and visualizing soybean metabolomic data, including identification and expression of various metabolites across different experiments and time courses (Joshi et al. 2017). These databases give the idea about metabolite biochemical and physiological properties.

1.14.5 Integration of Data from Multiple Sources

The advancement of different modern techniques in genomics, proteomics, ionomics, metabolomics, and phenomics develops a large amount of data that can be integrated to find precise identification of the target. There are several studies that successfully identified target by integrating two or more techniques. The genome-wide association studies (GWAS) along with transcriptomics data have been successfully explored for the identification of candidate genes governing particular traits. A computation approach, “camoco” has been developed which is the integration of GWAS and gene co-expression network (Schaefer et al. 2018). The integrated use of GWAS and RNAseq data identified 7 promising candidate genes for drought tolerance in maize, from the 62 loci identified in GWAS (Guo et al. 2020). Similar studies are also available in Brassica for yield (Lu et al. 2017) and in linseed for seed fatty acid metabolism (Xie et al. 2019). In recent study, integration of GWAS, digital phenotyping and transcriptomics was done for the identification of drought resistance genes in cotton (Li et al. 2020). Further, the integration of WGRS, transcriptome, and metabolite at different seed development stages have been utilised for the dissection of seed component related traits (Chaudhary et al. 2015). SoyBase provides the data of genetics, genomics, and USDA germplasm information. The loci information of nearly 100 traits for QTLs mapping and GWAS studies are available on SoyBase (Grant et al. 2010). The SoyKB is a web-based database that provides data of genomics, transcriptomics, metabolomics, and molecular breeding (Joshi et al. 2017). A recently developed SoyTD integrated database (http://artemis.cyverse.org/soykb_dev/SoyTD/) of WGRS and transcriptomics gives the information of natural variations and expression of soybean transporter genes (Deshmukh et al. 2020). Lai et al. (2020) developed a comprehensive framework consisting of of bioinformatics big data mining, meta-analysis, and a gene prioritization algorithm. A total of 36,705 test genes set collected from multidimensional data platforms were analysed and candidate genes for flooding tolerance were identified. In the future, integration of more databases would help to accurately understand the complex biochemical pathways and identification of candidate genes for a specifictrait.

1.15 Future Perspectives

In just the past few years we have witnessed tremendous progress in soybean comparative and functional genomics and an explosive expansion of new resources. We have seen large scale whole genome sequencing, development of high-density genetic maps using high through put approaches, construction of physical and transcript maps, development of high-density cDNA and oligo arrays, and advancement in functional genomics studies. These resources and the research outcome have shed much light on the structure, organization and evolution of the soybean genome and key genes associated with biotic, abiotic stresses and other traits. With the availability of the whole-genome sequence of the soybean genome, emerging functional genomic data and large-scale re-sequencing data, genome-wide comparisons are being achieved. These approaches will allow researchers to decipher the evolutionary history and genomic complexity of soybean. We will be able to further explore genomic approaches to the elucidation of key genes or functional components that control complex agronomical and physiological traits.