Keywords

1.1 Introduction

Soybean is a self-pollinated plant that belongs to the family Fabaceae and Glycine genus. The Glycine genus is subsequently divided into subgenera, Glycine and Soja. The subgenus Soja has two highly recognized species including the cultivated soybean Glycine max and the wild soybean Glycine soja. Soybean is an economically important legume crop that is rich in seed protein (40%) and oil (20%), which provides sources of starch, dietary fiber, protein, lipids, and essential minerals for human as well as for livestock (Chaudhary et al. 2015). It is widely grown as a grain legume and oilseed crop in the world including the countries United States, Brazil, Argentina, China, and India. The US is the leading producer with 35% (119.5 Million Metric Tons) of the global production (340.9 Million Metric Tons) of soybean (SoyStats 2018 www.soystats.com).

Soybean is in high demand not only for food and feed consumption, but also it can potentially serve as a future fuel feedstock and biodegradable plastics (Candeia et al. 2009; Song et al. 2011). Furthermore, soybean is also used in industrial and pharmaceutical applications as well as in the production of biodiesel (Goldberg 2008). Due to diverse uses, soybean has become a highly desirable crop and its demand is rapidly increasing. However, the increasing global population will need doubled food production by the year 2050 and we can only achieve ∼55% of the required production at the current rate of yield improvement. It is expected to be more difficult to produce sufficient yield with the changing climate (Ray et al. 2013; Deshmukh et al. 2014). Climate change and extreme weather conditions have a negative impact on crop yield, because solar radiation, temperature, and precipitation are the main drivers of crop growth. Additionally, climate change influences the plant diseases and pest infestations, as well as the supply of and demand for irrigation water (Rosenzweig et al. 2001). Therefore, the emphasis must be given toward the production of high-yielding soybeans with high nutritional value, which are environmentally stable and resistant to extreme weather conditions.

Plant breeding has undoubtedly improved soybean yield and resistance to biotic and abiotic stresses to achieve the current level of demand, but the main challenge is to continue to increase the production under the current scenario of climate change. In general, breeding for a complex trait is challenging due to their control by multiple genes and they are also greatly influenced by the environment. The conventional breeding procedures such as backcrossing, single pod descent, pedigree breeding, and bulk population breeding are used in order to develop improved varieties of soybean (Poehlman et al. 1995). To facilitate breeding advances, it is necessary to employ modern breeding techniques such as marker-assisted breeding, recombinant DNA technology, genome editing and “omics” (genomics, transcriptomics, proteomics, metabolomics, ionomics) to improve the soybean quality and yield. In addition, the concerns about environmental stress due to climate change and demand of ample supply have instilled a new urgency into accelerating the rates of genetic gain in breeding programs. Therefore, regardless of the conventional breeding efforts, it is essential to integrate the next-generation molecular and omics approaches for the production of high-yielding soybeans with enhanced adaptation to various environmental stresses.

1.2 Prioritizing Climate-Smart Traits

1.2.1 Flowering Time and Maturity

Plants can perceive various environmental signals, such as photoperiod, temperature, and stresses, to flower, and thus control seed production. In soybeans, flowering time and maturity are important agronomic traits, which are useful for developing soybean cultivars with a wider geographical adaptation. Soybean is a short-day plant. Short days induce flowering while long-day conditions delay flowering. Photoperiod and in-season temperature are the primary factors that dictate the region where a soybean variety is adapted. Soybean can grow in a wide range of latitudes, from 50°N to 35°S (Watanabe et al. 2012). The adaptability of soybean in a wide latitude is caused by natural variations of many genes controlling flowering time and maturity. The study on the flowering and maturity controlling mechanism of soybean can provide a theoretical basis and genetic materials for soybean breeding, especially under the climate change scenario. Understanding the regulatory mechanisms of flowering time and maturity allows us to modify the growth cycles of soybean to overcome or avoid different stresses by manipulation of the two traits.

1.2.1.1 Overview of Flowering and Maturity Regulating Genes in Soybean

Flowering time (days to R1) and maturity (days to R8) in soybean have been reported to be highly correlated traits (Mansur et al. 1996). Photoperiod insensitivity, flowering time, and maturity were found to be controlled by the same genes or by tightly clustered genes in the same chromosomal region (Tasma et al. 2001).

In 1927, a major gene locus was detected to control maturity (Owen 1927). Subsequent research work found that the E1 locus is largely responsible for the variation in flowering time among cultivars (Bernard 1971; Abe et al. 2003). To date, ten genes related to flowering and maturity have been reported including nine E genes (E1E9) and one J gene (Bernard 1971; Buzzell and Voldeng 1980; Bonato and Vello 1999; Cober et al. 2010; Kong et al. 2014). Six E genes, E1, E3, E4, E7, E8, and E9, can specifically participate in photoperiod response (Cober et al. 1996; Cober and Voldeng 2001; Cober et al. 2010), with E1, E3, E4, E7, and E8 as recessive loci (Watanabe et al. 2012; Kong et al. 2014). Introgression of these early flowering alleles results in earlier flowering under long day and improved adaptation to short summers at high latitudes. The J locus was identified in the progeny of crosses between standard and late flowering cultivars with a long-juvenile habit, whose recessive allele causes late flowering under short days (Ray et al. 1995). In general, the trait of “delayed juvenile” is useful for adaptation to low latitudes and spring sowings at the lower latitudes (Tomkins and Shipe 1997).

1.2.1.2 Cloning Genes Underlying the Flowering and Maturity Traits

Efforts were made to clone the underlying genes of the loci to understand the mechanisms of flowering and maturity in soybean. The E1 gene was map-based cloned to encode a B3-like protein, which is belonging to a family of plant-specific transcription factors. E1 from soybean shows high similarity to other legumes, such as Medicago truncatula and Lotus corniculatus. However, the E1 gene does not exist in the model plants Arabidopsis and rice. The E2 gene was identified as the homolog of GIGANTEA (GI), the unique plant-specific nuclear clock-associated protein, which contributes to the maintenance of circadian period length and amplitude, and regulates flowering time and hypocotyl growth in response to day length (Watanabe et al. 2011). E2 can enhance the photoperiod response of soybean, and it is closely related to the early flowering phenotype of soybean and the light adaptability. E3 and E4 encode the phytochrome (phy) family of photoreceptors PHYA3 and PHYA2, respectively (Liu et al. 2008; Tsubokura et al. 2013). Soybean contains four PHYA genes that consist of two pairs of homologs. E3 and E4 represent in different homolog pairs. The homolog of E4, PHYA1, is apparently functional, whereas the homolog of E3 carries a deletion and is probably a pseudogene (Watanabe et al. 2009). E9 was identified as FT2a, an ortholog of Arabidopsis FLOWERING LOCUS T, through fine-mapping, sequencing, and expression analysis. Recessive allele of E9 delays flowering because of lower transcript abundance that is caused by allele-specific transcriptional repression.

1.2.1.3 Application of Classification of Maturity Group (MG) in Soybean

Understanding the mechanism of soybean flowering time and maturity diversity and adaptation is very important for breeding for high productivity in diverse latitudes. Many soybean cultivars were bred with different maturity to adapt various ecological environments. For the convenience of breeding layout, 13 MGs from 000, 00, 0, I, II, to X were classified based on photoperiod and yield trial in North America (Zhang et al. 2007). Maturity group zones represent defined areas, where a cultivar is best adapted. But the classification of maturity group is still not internationally unified. Based on the knowledge mentioned above, flowering and maturity were highly controlled by major genes in soybean. Therefore, flowering time and maturity can be adjusted by soybean genetic change through breeding efforts and genetic engineering. Although photoperiod remains constant, climatic conditions, management practices, and soybean genetics have changed during the past decades. Maturity group adaptation zones need to be understood, applied and adjusted for the breeding benefit (Mourtzinis and Conley 2017).

1.2.2 Seed Composition

Soybean is a major crop for oil and protein resources, accounts for 56% of total oilseed production in the world (Wilson 2008). The seed quality is determined by the seeds’ composition, including protein, oil, sugars, and minerals. Soybean seeds contain 40% protein, 20% oil, 15% soluble carbohydrate, and 15% fiber on a dry weight basis. Protein and oil are the most abundant and valuable compositions in soybean.

1.2.2.1 Oil

Soybean seed contains about up to 230 g kg−1 of oil on a dry weight basis and the oil contents are constituted by 16% saturated, 23% monounsaturated, and 58% polyunsaturated fatty acids (Bellaloui et al. 2015). The major unsaturated fatty acids in soybean are the polyunsaturated alpha-linolenic acid (7–10%), linoleic acid (51%), and the monounsaturated oleic acid (23%) (Poth 2000; Ivanov et al. 2010). This makes soybean oil valuable in terms of human healthy diets. However, soybean oil has approximately 24% monounsaturated fatty acids (C18:1), which are significantly less than competing oils such as canola (61%) and olive (40%) (Terés et al. 2008). Oleic Acid (C18:1), a monounsaturated omega-9 fatty acid typically makes up 55–83% of total oil content in olive. Monounsaturated fats are resistant to high heat, making extra virgin olive oil a healthy choice for cooking.

1.2.2.1.1 Genetic Regulation of Seed Oil Production

The oil concentration in soybean seeds is a quantitative trait governed by a number of genes mostly with small effects and under influence of the environment. A negative relationship between seed oil and protein was well documented, which makes it difficult for breeders to develop high-oil soybean genotypes while retaining a high level of protein (Wilcox and Shibles 2001; Hyten et al. 2004b). There are >130 quantitative trait loci (QTLs) reported to be associated with oil content in soybean (Qi et al. 2011), since the first documented report to detect oil QTL (Diers et al. 1992). Among these oil QTLs, only a few have been detected in multiple genetic backgrounds or environments, and none have been widely used in marker-assisted selection (MAS) for high oil in soybean breeding programs. This could be due to several factors affecting the usefulness of QTL, including large confidence intervals, QTL × environment, and QTL × genetic background interactions, which all impede the use of QTL in breeding programs (Qi et al. 2011).

Except QTLs, some transcription factors have been reported to modify the seed oil content in soybean, such as transcription factors, LEC1LEC2ABI3, and FUS3, which are master regulators of seed development, and thus regulate oil content (Mendoza et al. 2005). Besides, overexpression of GmDOF4 and GmDOF11 increased lipid content in seeds of transgenic Arabidopsis plants via direct activation of lipid biosynthesis genes and the repression of storage protein genes (Wang et al. 2007). Transcription factor GmbZIP123, also elevated lipid contents in seeds of transgenic Arabidopsis plants by activating Suc-transporter genes and cell-wall-invertase genes for sugar translocation and sugar breakdown, respectively (Song et al. 2013).

1.2.2.1.2 Metabolic Engineering of Fatty Acid Composition

Most domesticated oilseed crops have been successfully modified through either breeding or genetic engineering approaches to optimize the ratio of endogenous fatty acids in the storage oil for specific end uses (Drexler et al. 2003). For example, suppression of the oleate D12-desaturase gene in soybean, sunflower, cotton, and canola has resulted in the production of oils with a high C18:1 fatty acids, which have a greater oxidative stability and improved performance in high-temperature cooking applications. Oils with a high C18:1 ratio are also desired by the chemical industry, as C18:1 can be used in a variety of applications including detergents, soaps, lubricants, cosmetics, and emulsifying agents, and as a source of C9 monomers for plastics (Metzger and Bornscheuer 2006). Buhr et al. (2002) described the development of transgenic soybean events in which the expression of FAD2-1 and FatB was simultaneously downregulated in a seed-specific fashion, thereby generating soybean oil with a reduced content of C16:0 (<5%) and significantly increased C18:1 content (>85%) (Buhr et al. 2002). Recently, naturally occurred mutant alleles of FAD2-1A and FAD2-1B in soybean plant introduction (PI) collections were identified (Pham et al. 2011). The traditionally bred soybean lines carrying both homozygous mutant FAD2-1A and FAD2-1B alleles were developed by marker-assisted backcrossing and the C18:1 contents of these lines have increased from 20% to an average of 82–86%. On the other hand, upregulating the endoplasmic reticulum oleoyl and linoleoyl desaturases in oilseeds could lead to a substantial increase in polyunsaturated fatty acids (PUFAs) in the oil. For example, the overexpression of a fungal bifunctional Δ12 and Δ15 desaturase in soybean resulted in over 70% of C18:3, compared to 19% in the wild type, in somatic embryo oils (Damude et al. 2006).

1.2.2.2 Protein

The total content of protein in soybean seed is very important, approximately 60% of the value comes from soybean meals (Pettersson and Pontoppidan 2013). Poultry and livestock need a minimum of 47.5% protein content in soybean meal for their proper growth and development. Soybean seed composition, especially seed storage protein, is also a complex trait controlled by a network of genes, and interaction with the environment. Besides, increasing seed storage protein is difficult due to its strong negative correlation with oil content and seed yield (Bandillo et al. 2015; Chaudhary et al. 2015).

1.2.2.2.1 Seed Protein Composition

Soybean, like many other seeds, has two major storage proteins, glycinin (11S legumin type) and conglycinin (7S vicilin type), which dominate the proteome (Herman and Larkins 1999). The soybean seed proteome also includes many moderately abundant proteins that are bioactive and allergenic, such as the Kunitz and Bowman–Birk trypsin inhibitors, lectin, P34 allergen, sucrose-binding protein, urease, and oleosins, together with several thousand low abundance proteins (Herman and Burks 2011). The specific mix of proteins and each protein’s abundance within the proteome determine this protein amino acid composition trait (Herman 2014). The development of soybean cultivars with enhanced protein and amino acid content would further increase the economic value of the crop and will help to enrich the entire value chain from farmers to processors to end users.

1.2.2.2.2 Genetic Regulation of Protein Content

Over the past decades, considerable resources, including genomic, transcript, single nucleotide polymorphism (SNP), simple sequence repeat (SSR) maps, and proteomics, have been helped to elucidate the genetic regulation of soybean protein content. With the advancement of genetic map construction, the availability of a well-annotated reference genome (Schmutz et al. 2010), resources for association mapping (Song et al. 2013), and whole-genome resequencing (WGR) data (Zhou et al. 2015; Valliyodan et al. 2016) a large number of QTLs for seed protein content have been identified.

Currently, >160 QTLs have been identified to be associated with seed protein content in soybean. Among these, a major QTL for seed protein and oil content has been consistently mapped on Chr. 20 and remarkable attention has been given to this QTL due to its high additive effect and stability (Diers et al. 1992; Hwang et al. 2014). Due to the large environmental effects, only two QTLs, one on Chr. 15 (cqPro-15) and the other one on Chr. 20 (cqPro-20), are designated as officially confirmed QTL based on error rate (lower than 0.01) (http://soybase.org/). However, the presence of QTL for higher protein on Chr. 20 was negatively correlated with seed yield (Nichols et al. 2006), which suggests that it is a tradeoff when we try to increase protein content. Introgression of this QTL into elite backgrounds would increase the value of soybean to compensate the yield drag.

1.2.3 Abiotic Stress Tolerance

Plants face a constant threat from various abiotic stresses, including drought, waterlogging, heat, cold, nutrient deficiency, and so on. Climate changes increase the occurrence of extreme weather patterns including irregular precipitation and extreme temperatures in the global agricultural areas, which cause a significant reduction in crop production and threaten food security (Lesk et al. 2016). Yield losses of major crops under irregular weather patterns keep increasing, despite the progressive increase in yield through breeding and management practices since the 1960s (Boyer et al. 2013; Lobell and Tebaldi 2014). To achieve sustainability in agriculture, it is crucial to develop crops with tolerance to abiotic stresses. During the evolution, plants have been developing tolerance traits to overcome these stresses. Incorporation of these tolerance traits into current elite germplasm is a key to maintain sustainable crop production.

1.2.3.1 Drought Tolerance

Drought is the major abiotic stress that threatens crop production. Climate changes are anticipated to intensify the occurrence of irregular precipitation patterns worldwide, which will further negatively affect crop production and food security. Soybean as one of the most important crops with multiple consumable purposes, also suffers from drought stress. In crops, drought resistance is translated to traits enhancing yield stability rather than that increasing survivability under drought (Blum 2009; Passioura 2010; Sinclair 2011; Passioura 2012; Valliyodan et al. 2016). These translated traits are correlated with yield under drought and have no yield penalty under nonstress conditions. The success of soybean improvement under drought and heat stress depends on the discovery and utilization of genetic variations present in the germplasm. Identification of genetic diversities for traits related to drought and heat tolerance has helped identify genetic resources in soybean. In this section, advance in drought tolerance in soybean is summarized by highlighting the traits contributing to drought tolerance, including root system architecture (RSA), water use efficiency, canopy wilting, and sustained N-fixation under drought.

1.2.3.1.1 Root System Architectures and Anatomy

Kramer (1969) stated an essential characteristic of drought tolerance: “deep, wide-spreading, much-branched root system”. Root systems are usually involved in both drought avoidance and tolerance during water deficits due to the constitutive and plastic characteristics of roots. RSA refers to the shape of the roots and the physical space; and the deeper and wilder root system can avoid tissue dehydration by their ability to acquire more water resource. RSA is also highly plastic to respond rapidly to environmental changes such as water deficit. When plants perceive water deficit stress, roots tend to keep growing and penetrate into deeper soil layers (Hoogenboom et al. 1987; Creelman et al. 1990; Wu et al. 1994). The ability of plants to develop deeper rooting systems under drought stress depends on the tolerance levels of the roots to water deficit stress. Some lines were observed to be able to significantly elongate their rooting depth than some other lines under drought stress in legumes, including soybean (Garay and Wilhelm 1983; Sponchiado et al. 1989). Genetic diversity of RSA has evolved through geographic adaptation of plants. Deep rooting, which is a complex trait affected by growth angle and root length (Araki et al. 2002), plays a crucial role in water uptake from deeper soils to avoid drought under water deficit conditions. Root angle determines the direction of horizontal and vertical distribution of roots in the soil. It is recognized as an adaptive trait for drought avoidance in crops (Mace et al. 2012; Christopher et al. 2013; Uga et al. 2013). In addition to deep rooting, drought stress also induces the plasticity responses of root systems by increasing the number of fibrous roots, decreasing lateral root diameter, and fluctuations in root biomass (Nielsen et al. 1997; Osmont et al. 2007; Meister et al. 2014; Salazar-Henao et al. 2016). Alterations in root anatomy, such as aerenchyma formation in maize (Lynch 2011; Burton et al. 2013), save the energy inputs to allow improved soil penetration and exploration to compensate water deficit (Addington et al. 2006; Maseda and Fernández 2006).

In soybean, the improved RSA was shown to alleviate drought stress by increasing exploration for water and nutrients (Hoogenboom et al. 1987). Natural variation in RSA was reported in soybean, which was suggested to be used for the improvement of drought tolerance (Carter 1989). In the field evaluation, the drought-tolerant Japanese landrace PI 416937 displayed the ability to utilize upper soil horizon with a great network of fibrous roots and was found to have greater lateral root system spread than that of Forrest (Hudak and Patterson 1996). Recently, upon screening of a core set (400 lines) of the USDA Germplasm Collection, several soybean accessions have been identified to have promising RSA for extensive fibrous rooting, root length or large root angle. Genetic diversity was also observed in root anatomy of soybean and change in root anatomy also affects the water movement through root systems (Rincon et al. 2003). The examination of root anatomy of 41 soybean lines led to the detection of variations in the number of metaxylems in roots. The number of metaxylems was found to be correlated with drought tolerance in soybean, as soybean plants were observed to develop a greater number of metaxylem under drought conditions and the drought-tolerant lines develop obviously more numbers than the drought-sensitive lines (Prince et al. 2017). QTL mapping has been conducted in soybean and many QTLs associated with RSA and drought tolerance have been mapped (Abdel-Haleem et al. 2011; Manavalan et al. 2015; Prince et al. 2015). This information is being used in molecular-assisted breeding to incorporate these RSA traits into elite varieties for drought tolerance improvement.

1.2.3.1.2 Water Use Efficiency and Canopy Wilting

Improving water use efficiency is another promising strategy to overcome drought stress. The essential factors to improve water use efficiency are to conserve water in plants and reduce the unnecessary transpiration losses (Turner et al. 2001; Turner 2003). In soybean, researchers made efforts to look for traits associated with water use efficiency, which can allow screening for drought tolerance on a large scale. In 1990, a phenotype of slow canopy wilting under drought and heat stresses was observed in PI 416937, a Japanese drought-tolerant landrace in the maturity group (MG) VI (Sloane et al. 1990). Field evaluation found a contradictory phenomenon that PI 416937 actually used a significantly less amount of water than the drought-sensitive checks (Hudak and Patterson 1996). Further research on water conservation aspects of PI 416937 revealed that this line can limit its transpiration rate under vapor pressure deficit (VPD) above 2.0-KPa compared with other drought-sensitive genotypes (Fletcher et al. 2007; Tanaka et al. 2010; Ries et al. 2012). PI 416937 offered breeding resources for improving drought tolerance in late maturity groups of soybean. Recently, two additional slow canopy wilting landraces (PI 567690 and PI 567731) at MG III were identified after the evaluation of a core set of 250 soybean germplasm lines for canopy wilting and drought tolerance (Pathan et al. 2014). These two lines shared the similar physiological mechanisms of limiting transpiration rate under high VPD as PI 416937 and offered breeding materials for early maturity groups in soybean (Pathan et al. 2014). Moreover, QTL mapping studies performed using drought-tolerant soybean genotypes have identified genomic loci governing physiological traits like slow wilting. For instance, 10 QTLs associated with slow canopy wilting traits have been mapped and subsequently, DNA markers were developed for MAS in breeding (Abdel-Haleem et al. 2012; Hwang et al. 2015; Hwang et al. 2016). The complexity of the canopy wilting trait also indicates that stacking all confirmed QTLs by MAS or genomic selection is necessary to recover the drought tolerance performance shown in the original drought-tolerant exotic PIs.

1.2.3.1.3 Sustained N-Fixation Under Drought

N-fixation is highly sensitive to drought stress (Djekoun and Planchon 1991; Adams et al. 2016). Soybean plants transport ureides from nodules to shoots, which make them more sensitive to drought stress than the other legumes transporting amides (Sinclain and Serraj 1995). In soybean, ureides accumulate in leaves and nodules during drought stress and impose a negative feedback to inhibit nitrogenase activity for N-fixation (Serraj et al. 1999; Vadez and Sinclair 2000; Ladrera et al. 2007). Shoot ureide concentration was found to be associated with drought tolerance and used as another indicator of drought tolerance (Desilva et al. 1996; Serraj and Sinclair 1996a, b; Purcell et al. 1998). In the 1990s, Sinclair et al. (2000) screened 3081 soybean PIs for sustained nitrogen fixation, in which eight PIs were identified to have the best performance in sustained N-fixation under drought and have promising drought tolerance related to yield stability. All these soybean genotypes have been used in breeding programs to develop drought-tolerant elite lines (Devi et al. 2014). Recently, a large number of QTLs associated with shoot ureide and nitrogen concentration were mapped in both biparental populations and genome-wide association studies (GWAS) in diverse lines (Hwang et al. 2013; Ray et al. 2013; Dhanapal et al. 2015), which indicated the complexity of N-fixation under drought and suggested that genomic selection should be better suited to improve such complex traits.

1.2.3.2 Waterlogging Tolerance

Recent climate change data with a predicted 30% increase in heavy precipitations by 2030 show that flooding stress will be more severe in the future. Soybean is sensitive to waterlogging, resulting in significant yield reduction, ranging from 46 to 56% (Scott et al. 1989; Oosterhuis et al. 1990; Linkemer et al. 1998). Waterlogging triggers root damage which affects water and nutrient uptake and subsequently causes a reduction in nodulation, impaired photosynthesis, and plant death due to diseases which ultimately results in yield loss (Oosterhuis et al. 1990; Vantoai et al. 1994).

Lack of cellular oxygen in roots was believed as a major component associated with waterlogging stress in soybean. In soybean, aerenchyma formation was also thought to be important for internal aeration of roots to mitigate the root damage during cell anoxia. RSA and plasticity could be another tolerance strategy to compensate root damage during waterlogging and could accelerate root recovery after waterlogging. Previously, a positive correlation was found between total root length and waterlogging tolerance in soybean germplasm lines and one waterlogging tolerant soybean line tends to generate more adventitious/aerial roots than a sensitive line (Kim et al. 2015). Favorable RSA and plasticity in soybean can lead to less waterlogging damage and faster plant recovery after waterlogging stress as revealed through the study of a major waterlogging tolerance QTL in soybean (Ye et al. 2018).

Genetic diversity of flooding tolerance has evolved through geographic adaptation of soybean plants. Shannon et al. (2005) screened a core set of 350 soybean germplasm lines for flooding tolerance at the early reproductive stage. The flooding susceptible lines lost approximately two times more yield compared to the flooding tolerant lines while the exotic PIs have much better tolerance than cultivars. Several cultivated germplasm lines (Glycine max) were identified as potential donor sources for the breeding for flooding tolerance, including Archer, Misuzudaiz, PI 408105A, PI 561271, PI 567651, PI 567343, VND2, Nam-Vang, and ATF15-1. In the past few years, lots of QTL mapping has been performed, and several QTLs associated with waterlogging tolerance have been identified (Scott et al. 1989; VanToai et al. 2001; Cornelious et al. 2005; Shannon et al. 2005; Rhine et al. 2010; Vantoai 2010; Nguyen et al. 2012; Ye et al. 2018). Among these QTLs, a major QTL, mapped on chromosome 3, was confirmed at the near-isogenic background and this QTL was reported to increase yield by up to 40% in the field (Ye et al. 2018).

1.2.3.3 Salt Tolerance

Salt stress is one of the major abiotic factors affecting crop growth and production. In general, soybean is sensitive to salt stress (Munns and Tester 2008). Soybean yield could be reduced by 50% when the electrical conductivity of the saturation extract of soil was 9 millimhos/cm (Abel and Mackenzie 1964; Papiernik et al. 2005). Na+ and Cl ions absorption and accumulation in high concentrations causes toxicity in soybean plants, and results in plant death with increasing salt concentration in the soil (Pathan et al. 2007; Phang et al. 2008). Improvement of salt tolerance in soybean is necessary to ensure food security for the world. The success of such improvement depends largely on the discovery and utilization of genetic variation present in the germplasm and characterization of salt tolerance genes and mechanisms.

A large amount of work has been done to investigate salt tolerance in soybean. Based on the responses of soybean plants to salt stress, several phenotyping indices were developed to evaluate salt tolerance levels of soybean plants and the most commonly used indices are based on visual rating, including leaf scorch score, salt tolerance rating, and survival rate. With these phenotyping indices, rich genetic variations of salt tolerance were observed in soybean and salt-tolerant soybean lines were identified over the years (Parker et al. 1983; Yang and Blanchar 1993; Luo et al. 2005; Hamwieh and Xu 2008; Lee et al. 2009b; Chen et al. 2013; Ha et al. 2013; Qi et al. 2014; Do et al. 2016; Xu et al. 2016). The existing genetic diversity in salt tolerance in soybean offers genetic resources to breed salt-tolerant varieties.

QTL mapping for salt tolerance has been focused mainly on soybean seedling stage. The first QTL mapping was performed in an F2:5 population derived from the cross of S-100 (salt tolerant) and Tokyo (salt sensitive) and two QTLs were mapped on linkage groups L and N. The QTL on linkage group N (Chr.3) showed the major effect in salt tolerance with a phenotypic contribution of 60% (Lee et al. 2004). Later on, this major QTL was confirmed by multiple studies and a few minor QTLs were also mapped in different populations (Lee et al. 2004; Chen et al. 2008a; Hamwieh and Xu 2008; Hamwieh et al. 2011; Ha et al. 2013). The major salt tolerance QTL located on Chr.3 (linkage group N), was identified by several research groups using different soybean mapping populations. The underlying gene was cloned to encode an ion transporter and identified to be involved in Na+ and Cl exclusion and homeostasis regulation (Guan et al. 2014; Qi et al. 2014; Do et al. 2016; Liu et al. 2016). This gene is heavily used to improve salt tolerance in current soybean elite germplasm and other new major tolerance resources are needed to enhance the genetic diversity of salt tolerance in soybean.

1.2.3.4 Heat Tolerance

Extreme temperatures cause about 40% reduction in soybean yield (Specht et al. 1999). Heat stress during the vegetative stage affects the growth of soybean. Under heat stress, soybeans in reproductive stages were shown to have increased flower and pod abortion and in later periods of pod-filling stages, prolonged stresses resulted in fewer and smaller seeds with reduced seed vigor (Boyer 1983; Chebrolu et al. 2016). Reproduction of soybean is sensitive to high temperatures (>35 °C), therefore, improving heat tolerance of soybean varieties is crucial to improve the yield (Salem et al. 2007).

Heat stress during reproductive stages such as flowering and seed development significantly decreases soybean yield (Kebede et al. 2012; Redden et al. 2014; Siebers et al. 2015). Increased flower and pod abortion and reduced seed germinability were observed in soybean plants subjected to extreme heat (Boyer 1983; Salem et al. 2007; Chebrolu et al. 2016). Fattened and collapsed pollens were observed in soybean under heat stress, which resulted in lower pollen viability and fertilization rates (Salem et al. 2007). Soybean plants exposed to high temperature (38/28 °C) showed 22.7% reduction in pollen germination and consequently had about 35% reduction in pot setting and the anatomical changes in pollens under heat stress were observed (Djanaguiraman et al. 2013). Evaluation of 44 soybean genotypes from MGs III-IV for heat tolerance helped to categorize into heat-tolerant, -intermediate, and heat-sensitive groups based on pollen viability. Among the 44 genotypes, 13 were identified as most heat tolerant and can be used in breeding programs for heat tolerance at the reproductive stages. Seed development is much more vulnerable than vegetative tissues to heat stress. Heat stress at the pod filling stages of soybean results in seed with less vigor, poor germination and increased incidence of pathogen infection (Hatfield et al. 2011). Genetic diversity and significant differences in germinability under heat stress between the heat tolerant and sensitive lines were reported (Chebrolu et al. 2016). Germination of seeds from the heat-sensitive genotype reduced by 50% under 36/24 °C and completely inhibited under 42/26 °C compared to normal conditions (28/22 °C). In contrast, seed germinability from the heat-tolerant genotypes was unaffected under 36/24 °C, and was reduced by 75% under 42/26 °C treatment compared with normal conditions (Chebrolu et al. 2016). These identified heat-tolerant lines are good targets for gene discovery for heat tolerance and soybean breeding programs.

1.2.3.5 Cold Tolerance

Cold tolerance is also an important trait to develop climate-smart soybean. The decreased seed yields caused by low temperatures have been attributed to two stages: poor germination and seedling vigor during the early growth stage, abortion of flowers and inadequate grain filling at reproductive stages (Yamamoto and Narikawa 1966). To expand soybean production area, cold tolerance is a key trait, as it is essential for soybean cultivars to adopt low temperature in spring and sudden cold shock at the reproductive stages during summer in the northern parts of the planet, such as Canada and northern Europe. To increase yield in these northern areas with short growing seasons, efforts need to be made to develop varieties showing good emergence and early seedling vigor. Emergence test and early seedling weight have been used to evaluate the soybean germplasm that revealed genetic variation in these two traits among the germplasm lines (Littlejohns and Tanner 1976). Further efforts are needed to characterize the genetic elements controlling these traits and utilize them in the soybean breeding programs, especially for the northern areas.

Low temperatures at the reproductive stages in soybean result in a reduced pod and seed formation (Saito et al. 1970; Lawn and Hume 1985; Gass et al. 1996). The cold tolerance at the reproductive stages is usually evaluated as quantification of pods or seeds (Saito et al. 1970; Hume and Jackson 1981; Lawn and Hume 1985; Gass et al. 1996; Kurosaki and Yumoto 2003) or direct measurement of seed yield (Funatsuki et al. 2003). Genetic loci, cAPX1, T, Ln, P1, and Dt1, were characterized to control cold tolerance at the reproductive stages in soybean. The soybean maturity loci were also thought to be involved in cold tolerance regulation (Funatsuki and Ohnishi 2009; Toda et al. 2011). These genetic resources provide the potential to improve cold tolerance of soybeans and knowledge on the loci involved and their allelic status in breeding lines would facilitate the use of molecular markers to assist in the development of cold-tolerant varieties at the maturity.

1.2.4 Biotic Stress Tolerance

1.2.4.1 Insect Resistance

Increased temperatures resulted from climate change, could influence soybean insect–pest populations in several complicated and dynamic ways. Insects are cold-blooded organisms and the temperature of their bodies is approximately the same as that of the environment. For this reason, changes in warmth can affect insect physiology and development directly or indirectly through the physiology or existence of hosts, and impact insect behavior, distribution, development, survival, and reproduction. The precise impacts of increased temperatures on insects are somewhat uncertain, because these changes may favor or inhibit some insect populations. The decrease in pest insect populations would more likely occur when insects are closely associated with a specific set of host crops. Soybean aphid (Aphis glycines Matsumura) feeds on soybeans but requires the presence of its overwintering host buckthorn (Rhamnus cathartica). Most researchers seem to agree that warmer temperatures will favor insects with a shorter period of reproduction due to their faster ability of adaptation. Entomologists predict additional generations of important insect–pests as a result of increased temperatures. With a 2 °C temperature increase, insects might experience one–five additional life cycles per season, and therefore damage more crops (Yamamura and Kiritani 1998). With these changes of climate, insect–pests may have the ability to spread to new geographical regions, result in the development of insect diversity and an increase in their populations, and an increase in the number of outbreaks. Higher average temperature might result soybean being able to be grown in regions further north and it is likely that some of the insects will follow the expanded crop areas. Based on evidence developed by studying the fossil record, the diversity of insect species and the intensity of their feeding habit have increased historically with increasing temperature (Bale et al. 2002). Insects that spend a long part of their lives in the soil, maybe more gradually affected by temperature changes than those that are above ground because soil provides an insulation that buffers temperature changes more than the air (Bale et al. 2002). These soil-born pests include multiple nematode species (Heterodera glycines, Meloidogyne incognita, and Rotylenchulus reniformis) and insects with larvae form living in the soil like bean leaf beetle (Cerotoma trifurcata Forster), multiple wireworms (Melanotus spp., Agriotes mancus Say, and Limonius dubitans LeConte), and white grubs (Phyllophaga spp., Cyclocephala spp., and Popillia japonica Newman).

More frequent and intensive precipitation events forecasted with climate change may negatively affect many insect populations. The same environmental factors that affect insect pests can affect their insect predators as well as the disease organisms that infect the pests, resulting in an increased attack on insect populations. Fungal pathogens of insects are favored by high humidity and their incidence would be increased by climate changes. Moreover, higher humidity and CO2 effects on insects can be potentially important considerations in a global climate change setting (Coviella and Trumble 1999; Hunter 2001; Hamilton et al. 2005).

With changes in climate, soybean growers need to meet many challenges related to insect management strategies. Insects will broaden their occurrence in the world, emerge new types, and increase reproduction rates and overwintering survival. Decreased winter mortality of insects due to warmer winters seems to have a positive effect on increasing insect populations. Warmer temperatures could result in extensive insecticide implementations to keep insect populations below economic damage thresholds. Extensive applications of insecticides can result in pest outbreaks to further impose a negative environmental and economic impact. Additionally, some classes of pesticides, like pyrethroids and spinosad, have been shown to be less effective in controlling insects at higher temperatures (Musser and Shelton 2005). Furthermore, the probability of insects to develop insecticide-resistance will be increased with higher demand in required spraying applications. It also seems that agricultural practices will be also affected by climate change. For example, crop rotation as an insect management strategy could be less effective with earlier insect arrival or increased overwintering of insects. The most optimal and successful action plan for soybean growers is to use integrated pest management practices to track insect population development such as field monitoring, pest forecasting, record keeping, and choosing economically and environmentally sound control measures. Recording insect and crop management over time can evaluate the economic and environmental impact of pest control.

1.2.4.2 Disease Resistance

Many historical and contemporary diseases are emerging as threats to modern agriculture and food security along with climate change. Expression of disease symptoms depends on the interaction between three key components: a susceptible host plant, a widespread and virulent pathogen, and environment that support infection or alter host susceptibility (Scholthof 2007). Alternation of any of these components can dramatically change the consequence and expansion of disease in a given pathosystem. Changes in climate are known to modify disease symptoms in soybean and are involved in new disease emergence (Morgan et al. 2003; Eastburn et al. 2010; Matthiesen et al. 2016; Willbur et al. 2018). Many new soybean pathogens have recently emerged or spread as a direct or indirect consequence of environmental changes around the world (Chang et al. 2015; Murithi et al. 2015; Chang-Sidorchuk et al. 2016; Barbieri et al. 2017; Plasencia-Márquez et al. 2017). In 2004, Phakopsora pachyrhizi causing soybean rust was confirmed in Louisiana, making it the first report in the continental United States, and over a decade it spread through most US soybean-growing states (Schneider et al. 2005). The emergence of new diseases such as charcoal root rot, caused by Macrophomina phaseolina, is expected to spread to new geographical regions under current climate change scenarios (Sarr et al. 2014).

Plant defense is constantly evolving to respond to disease-causing components: evolving pathogen populations and changes in environmental conditions (Whitham et al. 2016). Climate change alters the susceptibility of the host by inducing signals that modulate gene transcription, cell biology, and physiology. Host specialization is able to limit distribution and saturation of pathogens through genetic resistance. Many resistance genes (R-genes) were mapped in soybean genome like Rpg genes for bacterial blight (Ashfield et al. 1998), Rpp genes for soybean rust (Kelly et al. 2015), and Rps genes for Phytophthora root and stem rot (Han et al. 2008). Unfortunately, due to pathogen nature and short life cycle, an adaptation of the pathogen genome to new environment progresses faster than in the plant genome. Some alarming reports emerged recently describing more new virulent pathogen isolates that defeat R-genes in soybean (Khatabi et al. 2012). In this case, partial host resistance and R-genes should be combined together into high-yielding cultivars to sustain disease resistance. Effects of climate change on soybean resistance against pathogens have received little attention to date. The major climate change factors affecting soybean disease severity and spread, include warmer temperatures and higher humidity, increase in atmospheric carbon dioxide (CO2), heavy and unseasonal rains, and drought. More frequent and extreme precipitation events could result in extended periods with conditions favorable for pathogen propagation. Recently, sudden death syndrome (SDS) caused by Fusarium virguliforme was reported to be impaired by prolonged flooding and anaerobic conditions (Abdelsamad et al. 2017). Some soybean diseases are favored by cool temperatures and wet soil conditions like Pythium spp. and Fusarium spp. infection, whereas other pathogens cause more severe symptoms in hot and dry conditions like Macrophomina phaseolina, or hot and wet conditions like Phytophora sojae. Aggressiveness of many isolates of Pythium spp. and Phytopythium spp., that cause seed decay, damping-off, and root rot in soybean, were increased as temperature increased from 15 to 25 °C (Radmer et al. 2017). Another research reports that depending on Pythium spp., the isolates can be more virulent on soybean at lower or higher temperatures (Matthiesen et al. 2016). The response of soybean to elevated CO2 and ozone has been studied extensively (Ainsworth et al. 2002; Morgan et al. 2003; Eastburn et al. 2010). Under ambient atmospheric conditions, soybean pathogens can cause annual losses of 424 million metric tons worldwide (Wrather and Koenning 2006; Allen et al. 2017). The effects of elevated CO2 and ozone were evaluated on three economically important soybean diseases where these atmospheric treatments significantly reduced disease severity of downy mildew caused by Peronospora manshurica, mildly increased brown spot severity caused by Septoria glycines, and no effect on the incidence of SDS. In addition, higher precipitation and higher daily temperatures in the late spring were associated with increased severity for downy mildew and brown spot (Eastburn et al. 2010). Systemic infection of soybean plants by Soybean mosaic virus (SMV) was reduced when plants were exposed to elevated levels of O3 (Bilgin et al. 2008). Therefore, the specific impacts of climate change on soybean diseases are difficult to predict. It is likely that the increased temperatures may result in a northward expansion of the range of some diseases and cause higher survival of pathogen populations. The significance of soybean disease and climate change cannot be left uncontrolled and unconsidered. More research is underway to protect soybean crop in the future with help of agencies within the United States Department of Agriculture, industry, soybean check-off boards, and universities.

1.2.5 Nutrient Use Efficiency

Climate change is mostly associated with temperature and rainfall regimes. Effects of the extreme temperature and irregular rainfall on nutrient uptake by plant have not been studied as expected. In a study, Schlesinger and Lichter (2001) have shown that demand for nitrogen (N) increases with the increased CO2 in the atmosphere. The increased N demand cannot be fulfilled by natural soil processes and that created a deficient condition known as progressive nitrogen limitation (PNL). The PNL occurred with increased CO2 can be elevated with the supply of N fertilizer (Schlesinger and Lichter 2001). Under the condition of elevated CO2 and sufficient N supply found to be helpful in increasing productivity of crop plants. In soybean, elevated CO2 found to be correlated with an increase in symbiotic N but the N uptake from soil or fertilizer was unaffected (Li et al. 2017). The increased symbiotic nitrogen seems to be increased N-fixation efficiency rather than increased nodule formation or nodule biomass. Besides higher N uptake efficiency, mobilization of N from root to shoot and to reproductive tissue is important for the productivity. With increased N supply, yield increases mostly due to the increased number of seed in soybean (Kinugasa et al. 2012). No change in N feeding in soybean has been reported with N supply. Nitrogen fertilizer use under elevated CO2 not only reduce the natural N-fixation by symbiosis but also raises the concern of effective utilization of the limited fertilizer resources. Manufacturing of N fertilizer is largely depending on natural gas and the resource is estimated to be lasting for another 50 years (Fixen 2009). Although alternative sources have been evolved, those will be costlier. A continuous supply of N will also add-up to the greenhouse effect by increasing release of N2O and CO2 from the fertilized soil.

Similarly, Phosphorus (P) deficiency with the changing environmental conditions causes plant stress which leads to changes in strategies to adopt P stress. In soybean, such changes include root morphology and architecture modifications and enhancement of root symbiosis and root exudates induction. There are a number of studies performed on molecular regulation of P stress in soybean and more than 200 genes have been identified in the roots and shoots of soybean seedlings (Hwang et al. 2009; Li et al. 2011; Sha et al. 2012). For example, gene GmEXPB2 (Glycine max β-expansins) is known to enhance P responsiveness and utilization efficiency by the modification in root system architecture (Valdes‐Lopez et al. 2008; Wang et al. 2010; Sha et al. 2016). Additionally, the soybean cultivar BX10 is considered as a P-efficient genotype and a number of early or late P-starvation responsive genes and miRNAs were identified from BX10 based on the transcriptional expression profiles and deep sequencing (Dong et al. 2004). Recently, Sha et al. (2016) identified 37 and 33 unique proteins from soybean root and shoot under high or low P condition, respectively. However, only four of the identified proteins were common in root and shoot which indicates that molecular regulation and response to P stress in different tissue are not similar in soybean.

Despite the availability of genetic and molecular information about P and N uptake, there is still an urge for a comprehensive understanding of molecular mechanisms in response to various nutrient stress in soybean because of the influence of changing climatic conditions.

1.3 Genetic Resources of Climate-Smart Genes

The cultivated soybean (G. max) was domesticated 6000–9000 years ago from its wild relative G. soja, in East Asia (Carter et al. 2004). Based on morphological, cytogenetic and chloroplast sequence identity several centers of domestication including areas of Japan and China have also been proposed. However, recent whole-genome resequencing and molecular studies indicated Yellow River of China and southern China, as the domestication centers (Guo et al. 2010; Lam et al. 2010; Chung et al. 2014; Zhou et al. 2015). In soybean, the genetic diversity centers are recognized as a primary source of genetic variability. Based on genetic diversity in the accessions China is considered as the primary diversity center, while Korea, Japan followed by countries of South Asia (India, Indonesia, and Vietnam) and Russia is considered as secondary diversity centers. Recently the soybean was introduced to non-Asian countries from primary and secondary diversity centers.

The soybean was first introduced in North America during 1765 (Hymowitz and Harlan 1983). The extensive breeding of soybean was undertaken in China and the USA utilizing the genetic stock from the Chinese origin (Cui et al. 2001). In the USA around 400 soybean cultivars were released for cultivation which is derived from ~80 ancestral lines (Gizlice et al. 1994), most of them introduced from China (Li and Nelson 2001). Even in China, the soybean breeding programs used elite cultivars from North America to broaden the genetic base of modern soybean cultivars (Gai et al. 2015). The Chinese Gene Bank maintains around 28,580 soybean accessions. Soybean accession collection in the USA started in the 1920s but with the initiation of soybean collection in United States Department of Agriculture (USDA) during 1949 led to systematic preservation of soybean (Carter et al. 2004). USDA collected 14,330 germplasm accessions that of which 5216 (36.40%) are from China (Gai et al. 2015).

The primary gene pool (GP1) of soybean consists of G. max cultivars, landraces, and its related species G. soja genotypes. GP1 is a biological species that can be crossed within the gene pool and produce F1 progenies having a normal meiotic pairing, normal gene segregation, and complete seed fertility. However, the seed sterility can be attributed to the chromosomal inversions and translocations. In soybean besides having relatively lower diversity, considerable genetic resources have been maintained and characterized. The species from GP2 can produce the F1 hybrids having some fertility when crossed with the GP1 (Harlan and de Wet 1971). In Glycine, no species is assigned to the secondary gene pool (GP2). However, the efforts are underway to explore the GP2 species in the regions of soybean’s origin.

The crossing of GP1 with the GP3 results in the sterile or lethal hybrids and gene transfer requires rescue techniques. The tertiary gene pool of Glycine consists of 26 wild perennial species indigenous to Australia geographically distant from G. soja and G. max. Among the GP3 three species, G. argyrea, G. canescens, and G. tomentella were successfully hybridized with G. max and F1 hybrids were rescued and found to be sterile. However, a cross between the G. max and G. tomentella followed by embryo rescue and further backcrossing efforts led to the development of BC2 lines, which showed phenomenal yield increases of nearly 500 kg/ha more than G. max parent (Akpertey et al. 2018).

1.4 Brief on Diversity Analysis

1.4.1 Phenotype-Based Diversity Analysis in Soybean Varieties

The diversity present in crop plants is of great importance to develop elite lines through plant breeding. The genetic diversity can be estimated through phenotypic information, pedigree data, and molecular genotyping using DNA or protein markers. The genetic diversity of soybean analyzed based on pedigree information revealed that the genetic base of North American cultivars is narrower compared to Asian cultivars. In North America, 90% of genes in 258 cultivars were contributed by 26 ancestors (Gizlice et al. 1994), while 90% of genes in 651 Chinese cultivars were contributed by 339 ancestors (Cui et al. 2000). The Chinese national soybean collection of 20,000 accessions was evaluated for 15 phenotypic traits representing a highly valuable resource for breeding (Dong et al. 2004). Evaluation of phenotype of 25 seed, leaf and stem traits of North American and Chinese soybean cultivars have also exhibited narrow genetic base in North American cultivars (Cui et al. 2001).

1.4.2 Genotype-Based Diversity Analysis: Molecular Markers

The assessment of molecular genetic diversity of the soybean cultivars began in the 1980s with the application of restriction fragment length polymorphism (RFLP) technology (Roth and Lark 1984; Apuya et al. 1988). Subsequently, other molecular markers such as random amplified polymorphic DNA (RAPD), microsatellite or SSRs, amplified fragment length polymorphism (AFLP), and SNPs are employed to assess the genetic diversity cultivated and wild soybeans. Li and Nelson (2001) found high genetic diversity in Chinese soybean accessions compared to the Japanese and Korean soybean accessions using RAPD markers. Di- and tri-nucleotide SSR-based genotyping led to the detection of six–eight alleles among a group of 38 G. max and five G. soja accessions showing highly polymorphic nature of SSRs in soybean (Akkaya et al. 1992). Maughan et al. (1995) used SSRs to genotype 62 G. max lines and 32 wild soybeans from Asian origins. The study identified 5–21 alleles at five SSR loci. These studies along with the subsequent experiments suggested SSRs as a reliable marker for analyzing limited diversity found in the soybean accessions (Rongwen et al. 1995; Song et al. 1998). A study based on AFLP markers showed Japanese cultivars are more distinct from the North American soybeans compared to Chinese accessions, suggesting the use of the Japanese cultivars to broaden the genetic base of North American soybean (Ude et al. 2003). Song et al. (2015) studied a huge number of samples approximately 19,700 soybean accessions from USDA Soybean Germplasm Collection. The collection for this study included more than 1100 wild soybeans from China, Korea, Japan and Russia, and more than 18,000 cultivated soybeans from China, Korea, Japan, and 84 other countries and reported a number of loci in soybean. Additionally, the study identified major candidate regions associated with seed weight on seven chromosomes by utilizing the genotyping data. Furthermore, 106 soybean accessions representing 7 wild, 43 landraces, and 56 elite lines from different countries of origin and domestication were studied using whole-genome resequencing. The study supported the common hypothesis that soybean was domesticated in the China subcontinent and then introduced to the US and other parts of the world on the basis of population structural analysis and phylogenetic analysis (Valliyodan et al. 2016). Recently, Liu et al. (2017) analyzed the diversity among the 277 Chinese accessions and 300 US accessions of soybean using 5361 SNP markers. Population structure and cluster analysis showed that the Chinese soybeans are more diverse than American soybean accessions.

1.4.3 Relationship with Other Cultivated Species and Wild Relatives

The soybean [Glycine max (L.) Merr.; 2n = 40] belongs to family Fabaceae, the tribe Phaseoleae, subtribe Glycininae, and the genus Glycine Willd. The subtribe Glycininae consists of about 16 genera, among them. The genus Glycine bears distinct morphological and cytogenetic characters and from other genera in the subtribe (Lackey 1977). The genus Glycine consists of two subgenera, namely, subgenus Glycine Wild and subgenus Soja. The subgenus Soja consists of three annual species including the cultivated soybean G. max, its immediate wild ancestral species, G. soja and a weedy form of the soybean G. gracilis, indigenous to eastern Asia (Lackey 1981). The subgenus Glycine comprises of 15–16 perennial species, found in Australia and Japan. The subgenus Glycine is considered as a secondary gene pool for the cultivated soybean and it contains useful agronomic traits.

The subgenus Soja (annual type) and subgenus Glycine (perennial type) are significantly distinct from each other (Doyle et al. 2003). The attempts to hybridize between the subgenus were unsuccessful (Ahmad et al. 1977; Hood and Allen 1980). Recent molecular phylogenetic relationships confirmed a significant diversification between species from this subgenus (Karasawa 1953; Ahmad et al. 1977; Cao et al. 1996; Doyle et al. 2003; Ratnaparkhe et al. 2011). Within the subgenus Soja, the cultivated G. max hybridized with G. soja and G. gracilis and fertile seeds were produced (Karasawa 1953; Hadley and Hymowitz 1973; Ahmad et al. 1977). The cytological (Wang 1986; Xu 1990) and molecular genetic studies showed close evolutionary relationship between these species (Hui et al. 1996; Powell et al. 1996; Wu et al. 2001).

In a recent study, a total of 106 soybean genomes representing landraces, elite and wild soybean genotypes were resequenced. This approach led to the identification of 10 million high-quality SNPs. In Addition, 159 putative domestication sweeps were found, which comprised of 54.34 Mbp (4.9%) and 4414 genes; 146 regions were involved in artificial selection during domestication. This study provides valuable genomic information for understanding soybean genome structure and its relationship with the wild-type soybean (Valliyodan et al. 2016).

1.4.4 Relationship with Geographical Distribution

Most of the species from subgenus Glycine are found in Australia and the South Pacific Islands (Hymowitz and Singh 1987; Shimamoto 2000). However, two species, G. tabacina and G. tomentella, which are also found in the parts of Philippines, Japan and China, including Fujian and Taiwan, apart from Australia and the associated areas (Hymowitz and Singh 1987; Zhuang 1999). The genus G. soja is found in China and in its adjacent areas such as Russia, Korea and Japan (Hymowitz et al. 1998; Zhuang 1999; Shimamoto 2000). G. gracilis exhibits several morphological characteristics intermediate to those of G. max and G. soja. It is recognized as a hybrid between G. max and G. soja (Hymowitz 1970) and hence it is found in areas where the cultivated G. max and its wild ancestor G. soja have a sympatric distribution which includes northeast part of China (Hymowitz 1970).

1.5 Population Structures of Soybean in Nature

Linkage disequilibrium (LD) describes the inheritance of an allele of one SNP with an allele of another SNP within a population. The term LD was used to describe changes in the genetic variation within a population over time. The LD concept is the same as chromosome linkage where the markers tend to physically unite on a chromosome throughout all the generations. Linkage disequilibrium, the nonrandom occurrence of alleles at different genomic loci, is affected by several factors and has been of great interest to geneticists. Variation in LD throughout the genome or at a particular-genomic region is affected due to the processes of domestication, mutation, level of inbreeding and selection, confounding effects, population admixture, and population substructure (Rafalski and Morgante 2004). The extent of LD is also reliant on the recombination rate. However, LD in a population decreases due to recombination and can restore equilibrium between the loci in a due course. A strong correlation is anticipated between inter-locus distance and LD if the recombination rates do not vary across the genome particularly in a constant population size.

The LD decay is influenced by the recombination frequency between the two loci and the number of generations of recombination. In some situations, LD between the SNP alleles on different haplotypes (linked in repulsion phase) is not easily detectable and is tough to define. In such situations, LD descends to a very low level due to independent segregation of haplotypes. Self-fertilizing crop plants like soybean usually show less decay of LD (longer region is in LD) because the recombination is ineffective to cause LD decay in a homozygous genetic background. Whereas, high LD decay (shorter region in LD) is common in an outcrossing crop species like maize. However, vegetatively propagated species, such as potato and sugarcane, show relatively slow LD decay in spite of outcrossing nature of these crops (Raboin et al. 2008).

Cultivated soybean has been reported to have outcrossing rates of <1%, while G. soja has an outcrossing rate as high as 13% (Fujita et al. 1997). G. soja has high LD decay compared to G. max due to the increased recombination rate (Flint-Garcia et al. 2003). The current soybean germplasm is the outcome of several cycles of selection and effective recombination leading to increased LD throughout the entire genome. The landraces resulted from domestication might have increased LD level. Loci governing traits like domestication has extended LD levels mostly because of the selection during the domestication. The fact that the LD decay is found to be associated with domestication-related genes has great importance to understand the domestication process and also the population genetics of the traits. Similar effects may happen on natural selection against several devastating stresses (Vuong et al. 2015). Recent studies performed with next-generation sequencing technology have provided a better understanding of LD decay in soybean. Sonah et al. (2015) have characterized LD decay for the entire set of soybean chromosome using 47,000 SNP data obtained with genotyping-by-sequencing method. The results suggested variation for LD decay within and across the chromosomes. Within the chromosome, the study has found very less LD decay (longer LD) at the centromere and pericentromeric region compared to the gene-rich region (Sonah et al. 2015). Subsequently, a similar finding has been reported in several other reports (Bastien et al. 2014; Iquira et al. 2015).

1.6 Association Mapping Studies

Traditionally, QTLs were mapped in plants using biparental crosses. The population derived from biparental crosses lacks allelic diversity as it deals with genetic variation within the parental lines. This critical limitation of the QTL mapping approach can be overcome by the use of association mapping of unrelated genotypes that have accumulated a large number of crossing-over events since their last common progenitor. Association mapping helps to identify loci controlling phenotypic variations and assists identification of genes underlying observed variation. With the availability of the whole genome of crop plants, genome-wide association studies (GWAS) gained more importance. Such GWAS are useful to identify candidate loci associated with many traits in animals and plants (Appels et al. 2013; Korte and Farlow 2013). The GWAS analysis followed by candidate gene identification proved to be successful in plant species such as Arabidopsis (Verslues et al. 2014), maize (Li et al. 2013a), and rice (Zhao et al. 2011).

In soybean, high-throughput genotyping techniques provided opportunity to obtain the required number of markers on several hundreds of lines, either through SNP genotyping (Song et al. 2013) or by genotyping-by-sequencing (GBS) approach (Sonah et al. 2013). One of the important efforts on soybean improvement is the development of Illumina Infinium BeadChip (SoySNP50 K iSelect BeadChip), which contained over 50,000 SNPs. This study validated the SoySNP50 K chip with 96 landrace genotypes, 96 elite cultivars, and 96 wild soybean accessions and reported 47,337 polymorphic SNPs and 40,841 of the 47,337 SNPs (86%) had minor allele frequencies ≥10% among the landraces, elite cultivars and the wild soybean accessions (Song et al. 2013). The SoySNP50 K iSelect SNP BeadChip is being used in several studies to characterize soybean genetic diversity and linkage disequilibrium, and high-resolution linkage maps construction for the soybean improvement (Hwang et al. 2014; Wen et al. 2014; Zhang et al. 2015). A GBS approach was used to identify >47,000 SNPs on 304 short-season soybean lines. A subset of 139 lines, representing the diversity was phenotypically characterized for eight traits under six environments. Marker coverage was found sufficient to find a significant association between the genes known to control flower, hilum and pubescence color, maturity, plant height, seed weight, seed oil and protein (Sonah et al. 2015). A soybean germplasm containing 189 accessions from 10 countries were used to study association mapping for Psojae resistance. These accessions were evaluated for disease resistance by inoculating with Psojae races 1, 3, 7, 17, and 25. Five accessions were resistant to all the races. The genome-wide analysis identified 32 significantly associated SNPs, which were clustered around genomic loci associated with resistance. Among these SNPs, one SNP was found near the gene Glyma.14g087500, a subtilisin protease (Qin et al. 2017). Kaler et al. (2017) evaluated 373 maturity group IV soybean genotypes grown in four different environments for canopy wilting. Over 31,260 SNPs were used for association mapping, among them, significant environment-specific 61 SNP-canopy wilting associations were identified, and 21 canopy wilting SNPs were from more than one environment. Based on significant SNPs, the slowest and fastest wilting genotypes were identified. Several of these SNPs were located within or very close to a candidate gene that had been reported to be involved in transpiration or water transport. Mao et al. (2017) performed association mapping on 91 soybean cultivars from different maturity groups using 172 SSRs and 5107 SNPs. Large-effect loci were found on Gm 11, Gm 16, and Gm 20 as reported in previous studies. Most of the flowering time associated loci were sensitive to photo-thermal conditions. Further within the associated loci, three candidate loci were identified; among them, Gm04_4497001 was found to be a key locus interacting with other loci for regulating flowering time in soybean. A set of 185 soybean accessions was evaluated to identify the QTLs associated with seed protein and oil contents. Using specific length amplified fragment sequencing (SLAF-seq) technology, a total of 12,072 SNPs were detected. Among them, 31 SNPs located on 12 chromosomes were correlated with protein and oil content in seeds. The two SNP markers were related to seed oil content and three SNP were correlated with seed protein content during 2015 and 2016 (Li et al. 2018a).

1.7 Brief Account of Molecular Mapping of CS Genes and QTLs

Due to increasing urgency to develop climate-smart soybean with enhanced yield, breeding strategies have progressed at a massive rate in the past decade. With the advent of molecular genetic techniques, a lot of breeding programs have significantly implemented molecular markers for soybean improvement with regard to seed oil and protein enhancement, drought, flooding, and disease resistance.

Since molecular markers identified genetic variants for different traits quickly and accurately, therefore, markers are important in developing genetic linkage maps, germplasm evaluation, phylogenetic and evolutionary analysis, selection of desired alleles and mapping of genes/QTLs. The first linkage map in crops was constructed in tomato using RFLPs in 1986 by only 57 loci (Bernatzky and Tanksley 1986). RFLP was followed by the development of simpler and inexpensive DNA markers RAPDs (Williams et al. 1990), AFLPs (Vos et al. 1995), and SSRs (Akkaya et al. 1992) which resulted in the selection of desirable lines based on genotype instead of the phenotype. Since SSR markers are less abundant in the genome, SNP markers became popular and enabled the development of highly dense linkage maps and facilitated QTL analysis for nearly every agronomic trait in soybean (https://soybase.org, http://soykb.org). Furthermore, marker-assisted breeding became more applicable to soybean with the availability of sequencing data (Kim et al. 2010; Lam et al. 2010; Schmutz et al. 2010). This revolutionary change in sequencing further facilitated the development of thousands of SSRs and millions of SNP markers. Moreover, a high-density consensus soybean map was developed with 5500 markers including 3792 SNPs (Hyten et al. 2010b). Later on, Song et al. (2013) developed and used Illumina Infinium BeadChip containing 52,041 SNPs to evaluate the entire USDA soybean germplasm collection (Song et al. 2013).

QTL analysis plays a significant role to identify genetic regions which are responsible for phenotypic variation and it requires a large segregating population (biparental mapping population) such as an F2 population or recombinant inbred lines (RILs). In general, QTL mapping uses a large number of RILs, which are established for at least several generations of inbreeding (typically up to F6 or F7) (Takuno et al. 2012). However, RILs are helpful for the detection, but it estimates the effect of single QTL depending on population size. Moreover, the results are highly population specific for multigenic traits. On the other hand, plants that are homozygous for the unfavorable allele are eliminated in an F2 population, and plants heterozygouss and homozygous for the favorable allele are advanced for inbred development. This way, frequencies of favorable alleles increase during inbred development the probability of fixation of all or the majority of favorable alleles increase (Bernardo 2010). Furthermore, due to the popularity of QTL mapping, over 2000 QTLs have been mapped in soybean (Table 1.1).

Table 1.1 Details of significant QTL mapping studies performed to identify genomic loci for various traits in soybean

Yield improvement with improved qualities and increased resistance to biotic and abiotic stresses is the major objective of soybean breeding. The maturity group based on latitude (MG 000 to MGX), growth habit (determinate or indeterminate), and seed size (large or small) are most important factors to be considered in soybean breeding program (Pathan and Sleper 2008). A number of studies have been performed to improve yield potential of soybean such as insects and diseases which causes major yield loss. Soybean Cyst Nematode (SCN) is one of the most destructive pests in the USA, many QTLs associated with different races of SCN have been reported and QTLs namely rhg1 (located on LG G) and Rhg4 (located on LG A2) are confirmed across the different populations, time and locations and commonly utilized in MAS for SCN screening (Concibido et al. 2004; Vuong et al. 2010). Recently, several efficient and high-throughput SNP markers for SCN resistance have been developed for rhg1 and Rhg4 ((Vuong et al. 2010; Kadam et al. 2016). A number of QTLs also have been identified for other pests such as Sclerotinia stem rot (Arahana et al. 2001), sudden death syndrome (SDS) (Iqbal et al. 2001), brown stem rot (Bachman et al. 2001; Patzoldt et al. 2005), and root-knot nematode (Li et al. 2001; Ha et al. 2007). QTL mapping and marker development have progressed not only for insects and pests resistance but also for the resistance against several climatic stress (drought, flooding, and salinity) as well as high-quality seeds (protein and oil content) with improved yield. There are a number of important QTL studies for soybean seed protein and oil content reported QTLs across the different environment and genetic backgrounds, for example, seed oil, protein, and seed size QTL (Hyten et al. 2004a), fine-mapping of soybean protein QTL on chromosome (Chr.) 20 (Nichols et al. 2006). Eskandari et al. (2013) identified QTL for oil content on Chr. 9, which also had a significant positive effect on seed protein composition (Eskandari et al. 2013). For the improvement of soybean meal, Pathan et al. (2013) detected QTL using both SSR and SNP markers for seed protein, oil, and seed weight across genetic backgrounds and environments on Chrs. 5 and 6 (Pathan et al. 2013). In a recent study, QTL analysis was performed for seed protein, oil, and sucrose using 3 K-SNPs. A total of five, nine, and four QTLs were identified for protein, oil, and sucrose content, respectively. The major QTL for protein and oil were mapped on Chr. 20 while novel and major QTL for sucrose content were mapped on Chr. 8 (qSuc_08) (Patil et al. 2018). Additionally, a notable success has been made to map QTLs/genes for abiotic stress such as drought (Mian et al. 1998; Bhatnagar et al. 2005; Molnar et al. 2012), salinity (Lee et al. 2004; Hamwieh and Xu 2008; Do et al. 2018), and flooding tolerance (VanToai et al. 2001; Reyna et al. 2003; Githiri et al. 2006; Nguyen et al. 2012).

Although QTL mapping has advanced quickly in the past few years, a large number of mapped QTLs cannot be utilized in the breeding because of false-positive QTLs and low accuracy. However, the accuracy can be improved by adapting QTL mapping methods and effective statistical analysis such as single marker analysis (SMA), simple interval mapping (SIM), composite interval mapping (CIM), multiple interval mapping (MIM), and Bayesian interval mapping (BIM). Also, a number of QTL mapping software have been developed such as Mapmaker/QTL, QTL Cartographer, PLABQTL, PGRI, MapQTL, QGene, Map Manager, QTLMAPPER, QTLSTA, IciMapping, and QTL network.

The developments in sequencing technologies, statistical approaches, and software resulted in exponential growth in soybean studies to understand plant’s response to extreme climatic conditions such as drought, flood, pests as well as disease stress. Consequently, with the advancements of molecular techniques, statistical models, and software development led to the QTL analysis, then to candidate genes identification for biotic, abiotic stress resistance and yield-related traits. However, breeding for stress tolerance traits such as drought is one of the most challenging goals in soybean because of negative correlation between mean performance (stress and nonstress yield average) and stress index (stress and nonstress yield difference), therefore breeding for stress tolerance can lead to loss in yield (Miladinović et al. 2015). Therefore, it is necessary for a soybean breeder to utilize interdisciplinary approaches and tools to develop climate-smart soybean.

1.8 Map-Based Cloning of CS Genes

Map-based cloning approach is the strategy to identify or isolate genes underlying a trait based on their map positions on chromosomes. A robust phenotyping is a prerequisite for successful map-based cloning of genes. A general strategy for map-based cloning involves (a) Development of segregating population for the trait of interest; (b) Phenotypic and genotypic analysis of the segregants; (c) High-resolution mapping; (d) Physical mapping of the loci containing the gene of interest; (e) Identification and isolation of the gene. The F2 segregating population is used for map-based cloning; however, the use of RILs, and NILs as mapping population is more powerful than the F2 population.

The availability of physical and genetic maps of soybean, and genome sequence for American cultivar “Williams 82” (Schmutz et al. 2010) greatly accelerated the identification of QTLs and genes controlling agronomically important traits. In soybean, several genes controlling climate-smart traits were identified through map-based cloning approach. Soybean cyst nematode is a major constraint to soybean production. Many reports are available on the identification and mapping of QTL in soybeans showing resistance to SCN from a different germplasm source. QTL on chromosomes 18 (rhg1) and 8 (Rhg4) are the two major QTL that have been consistently mapped and reported from different soybean germplasm. Map-based cloning approach revealed that the major QTL locus, Rhg4 (for resistance to Heterodera glycines 4) provides resistance to this pathogen (Liu et al. 2012). Further gene silencing, mutation studies, and complementation tests confirm that the gene confers resistance. The gene was found to encode a serine hydroxymethyltransferase enzyme which is structurally conserved and ubiquitous across kingdoms. Both QTL and major (Rps) genes showing resistance to P. sojae were identified in soybean (Polzin et al. 1994; Bhattacharyya et al. 2005; Sandhu et al. 2005; Gordon et al. 2006). Five Rps genes, including the important Rps1-k, which confers resistance to most races of Phytophthora sojae (Kasuga et al. 1997; Song et al. 2004; Gao et al. 2005; Gao and Bhattacharyya 2008) are mapped to the Rps1 locus. Rps1 k encoding an intracellular coiled coil class of NBS-LRR resistance proteins was cloned by map-based cloning (Gao et al. 2005). Several maturity loci, designated as E loci (E1 to E8), controlling flowering time, duration of the reproductive phase (DRP), yield, branching (Kumudini et al. 2007; Sayama et al. 2010; Yamada et al. 2012), chilling resistance (Funatsuki et al. 2005; Takahashi et al. 2005) have been characterized by classical genetics approach. The E4 gene encoding phytochrome A2, protein, was identified through candidate gene approach based on the QTL position on the map (Liu et al. 2008). The E3 gene, encoding a copy of the phytochrome, GmPhyA3 was cloned by positional cloning using residual heterozygous line (RHL) (Watanabe et al. 2009). Further, a similar strategy was used to clone soybean maturity locus E2, an ortholog of GIGmGIa using progeny of an RHL population (Watanabe et al. 2011).

1.9 Marker-Assisted Breeding for CS Traits

Molecular breeding and genetic engineering approaches were successfully employed to develop climate smart soybeans are (Fig 1.1). Marker-assisted selection (MAS) is the indirect selection method where the linked marker is used to transfer important agronomical traits from one genotype to another. Marker-assisted backcrossing is an important strategy in soybean for transferring trait of interest (Concibido et al. 2003; Orf et al. 2004; Lee et al. 2006). The high-throughput genotyping technologies enhanced the process of marker identification and QTL mapping for different traits in soybean. The molecular breeding approaches such as marker-assisted backcrossing (MABC) and marker-assisted recurrent selection (MARS) aided in the introgression of the trait of interest in soybean. The Soybean cyst nematode-resistant line, LDX01-1-65(PI636464) was developed using MABC. The G. soja accession, PI468916 with poor agronomic traits contained two SCN-resistant QTLs. MABC was used to introgress these QTLs into a recurrent parent, A81-356022. Closely linked markers were employed to select the QTL during each backcrossing. Four rounds of backcrossing were carried out and a BC4F1 was selected which was heterozygous for both the QTLs. A population of BC4F3 lines was derived and the genotypic combination for each QTL was determined using the markers (Diers et al. 2005). Similarly, the SBR resistant soybean lines were developed using MABC by introgressing key Rpp genes, Rpp1, Rpp2, Rpp3, and Rpp4 (King et al. 2016). NILs were developed for individual Rpp genes by making backcrosses to soybean cultivar G00-3213. A marker linked to each Rpp gene was used for screening progenies during backcrossing process.

The glyphosate-tolerant soybean cultivar Benning (released as ‘H7242 RR’) developed in less than five years though background selection to recover the recurrent parent genome in a backcross program (Orf et al. 2004). The tolerance to glyphosate by a single transgene facilitated phenotypic selection of plants, while the SSR markers aided in the identification of the tolerant plants with the high proportion of Benning genome during subsequent backcross generations. In another study, the markers linked to three different QTLs from PI 229358 were employed to develop insect-resistant NILs of Benning. These lines provided an opportunity to characterize the individual and combined effects of insect resistance QTLs (Zhu et al. 2008).

Molecular markers linked to a gene controlling particular trait can be used to transfer that trait from one genetic background to another, or pyramid genes (Walker et al. 2002; Walker et al. 2010). Gene pyramiding involves combining favorable alleles controlling the same trait from more than two parental lines. (Melchinger 1990; Huang et al. 1997). Marker-assisted gene pyramiding was successfully carried out to develop durable resistance to several pathogens causing diseases in soybean (Walker et al. 2010). Resistance controlled by a single R-gene is likely to be broken by novel biotypes of a pathogen if it is transferred as sole resistance gene into a cultivar. The range of resistance can be increased by gene pyramiding (Nelson 1978; Melchinger 1990; Saghai Maroof et al. 2008). Pyramiding major R-gene with additional resistance alleles using phenotypic assays is difficult while molecular markers linked to the individual genes aid in the selection of plants with multiple resistance genes and to combine them in a single genetic background. SNP associated with southern root-knot nematode resistance allow easy selection of the resistance alleles at major and minor QTLs, facilitating pyramiding multiple genes for an increased level of resistance (Ha et al. 2007). The Rsv1, Rsv3, and Rsv4 genes provide resistance to all strains of soybean mosaic virus (SMV), and pyramiding all these genes provide comprehensive SMV resistance (Saghai Maroof et al. 2008; Shi et al. 2009). In another study, Wang et al. (2017) pyramided three SMV resistance genes, RSC4, RSC8, and RSC14Q, from different cultivars. Ten SSRs linked to the resistance genes were used for pyramided breeding. Five F7 homozygous pyramided plants showed resistance to 21 SMV strains along with desirable agronomic traits. Similarly, markers were used to introgress insect resistance allele from the Japanese soybean accession PI 229358 with a Bt protein, cry1Ac toxic to lepidopteran pests of soybean. Yamanaka et al. (2015) developed seven pyramided lines of soybean carrying multiple resistance genes (Rpp) to provide a broad-spectrum and higher level of resistance to Asian soybean rust. Higher resistance was found in the pyramided lines, Oy49-4 (Rpp2 + Rpp3 + Rpp4) No6-12-B (Rpp4 + Rpp5), and No6-12-1 (Rpp2 + Rpp4 + Rpp5) compared to the genotypes from which the resistance genes were derived. Brzostowski and Diers (2017) stacked resistance alleles from different soybean accessions, PI88788, PI468916, and PI567516C to develop resistance to the virulent soybean cyst nematode isolates.

1.10 Genomic Resources

Recently, high-resolution genome information has been started to be adopted for germplasm characterization, genetic dissection of agronomic traits, and prediction for breeding value. Along with the development of next-generation sequencing techniques, sequencing and resequencing of plant genomes have expanded dramatically. A complete soybean reference genome was published in 2010 (Schmutz et al. 2010). Then several soybean resequencing projects were initiated to develop soybean genomic resources for more than 1000 soybean accessions (Lam et al. 2010; Zhou et al. 2015; Valliyodan et al. 2016, 2017); unpublished data at the University of Missouri). Genomic resources generated from the recent resequencing of diverse germplasm sets provided powerful tools for characterizing soybean genetic diversity and building a strong foundation for trait/gene discovery to accelerate future breeding for elite cultivars. Comparative genome analysis is also greatly benefited from advances in genome information. Following the release of soybean reference genome “Williams 82” in 2010, high-quality reference genomes for other legume crops, including pigeon pea (Varshney et al. 2012), chickpea (Jain et al. 2013; Varshney et al. 2013; Parween et al. 2015), common bean (Schmutz et al. 2014) and peanut (Bertioli et al. 2015) have been released. The changes in genome structures and genome synteny among the legume species can be acquired through comparative genome analysis. The genes or genomic regions for abiotic stress tolerance cloned in one legume species can be extended into other legume crops by comparing genome structures and synteny.

1.11 Genomics-Assisted Breeding for CS Traits

In recent years, rapid progress in genetics, genomics, and soybean genome sequence information have resulted in the identification of SNPs, copy number variation, and structural variation in soybean germplasm (Kim et al. 2010; Lam et al. 2010; Schmutz et al. 2010) (Table 1.2). The growth of next-generation sequencing and low sequencing cost has revolutionized soybean research and next-generation sequencing approaches (NGS) has been widely utilized in various de novo sequencing, whole-genome resequencing (WGR), genotyping-by-sequencing (GBS), and transcriptomic analysis. These developments have made a significant impact in molecular breeding strategies through marker development such as SSRs (Hwang et al. 2009), SNPs (Kim et al. 2010; Lam et al. 2010; Chung et al. 2014; Zhou et al. 2015; Valliyodan et al. 2016), insertion/deletion (INDEL) markers (Song et al. 2015), specific-locus amplified fragment (SLAF) markers (Zhang et al. 2016b). Furthermore, the technical advances and availability of millions of SNPs have facilitated the development of high-density array-based genotyping chips such as Illumina Infinium array (SoySNP50 K iSelect BeadChip) for ∼50,000 SNPs (Song et al. 2013), SoySNP6 K Infinium BeadChip (Akond et al. 2013), and the Axiom SoyaSNP array for approximately 180,000 SNPs (Lee et al. 2015), which are being used for the genotyping of soybean lines (Table 1.3).

Table 1.2 Details of whole-genome sequencing efforts in soybean
Table 1.3 List of significant studies to identify SNP markers using various genotyping platforms in soybean

Furthermore, GBS is one of the popular sequencing-based genotyping approaches which has significantly reduced labor and time and improved precision in the identification of key genes as compared to the conventional PCR-based genotyping methods and being utilized in several crop species and soybean (Poland and Rife 2012; Sonah et al. 2013). Additionally, GBS also allows the detection of new variants in the population of interest, which can be utilized in future breeding programs. In soybean, a number of studies have explored sequencing-based QTL analyses (Xu et al. 2013; Bastien et al. 2014; Li et al. 2014). Sonah et al. (2015) identified 47,702 SNPs including 2744 InDels using GBS approach in a diverse set of 304 short-season soybean lines. The study further characterized a subset of 139 lines for eight agronomic traits (flower, hilum, and pubescence color, maturity, plant height, seed weight, seed oil, and protein) and identified associated loci (Sonah et al. 2015). In another study, Qi et al. (2014) utilized sequencing-based QTL mapping to map salt tolerance locus in wild soybean accession W05 and the locus was mapped to a 978-kb region on chromosome 3 using 2757 bin markers (Qi et al. 2014). Furthermore, Patil et al. (2016) developed Kompetitive allele-specific polymerase chain reaction (KASP) assays for detecting salt tolerance and seed composition traits based on the subsequent whole-genome sequencing analysis of 106 soybean accessions (Patil et al. 2016; Patil et al. 2017). Moreover, another cost-effective sequencing-based approach, SLAF-sequencing was utilized to study low-phosphate stress QTL in soybean. In this study, the genetic map was generated using 6159 SLAF markers and 85 low-phosphate stress-related QTLs were identified (Zhang et al. 2016b).

Even with the advances in genomics-based technologies, the utilization of these approaches is still restricted because it requires the computational expertise and significant time for data analysis. However, the increasing number of software packages and computational pipeline development will aid the genomics-assisted breeding to develop climate resilient crops.

1.12 Brief on Genetic Engineering for CS Traits

1.12.1 Achievements of Transgenic Approaches

The genetically modified crops significantly increased the yield and production by protecting crops from diseases, pests and abiotic stress factors. The main advantages of climate-smart transgenic plants include reduced crop loss and reduced use of chemicals products in agriculture (Job 2002; James 2011). Since 1996 with the commercialization of biotech crops traits such as herbicide tolerance and insect resistance have gained more interest due to their economic impact. The development of genetically modified soybean with superior yield, tolerance to disease and pest, tolerance to abiotic stress, improved nutritional quality, biofuel production is underway (Lu et al. 2007).

1.12.1.1 Insect Resistance

Insecticidal crystal proteins (δ-endotoxins) produced by entomopathogenic bacterium Bacillus thuringiensis (Bt) acts as a biopesticide to control lepidopteran, dipteran, and coleopteran larvae (Tabashnik 1994; Peferoen 1997; Hongyu et al. 2000). To date, several plant species including soybean have been transformed with Bt gene to impart insect resistance trait. Transgenic soybean expressing Bt cry gene showed resistance to many insect pests in laboratory bioassays and field conditions (Parrott et al. 1994; Dufourmantel et al. 2005; Macrae et al. 2005; McPherson and MacRae 2009). Miklos et al. (2007) expressed a synthetic cry1A in soybean which showed resistance to lepidopteran pests, Helicoverpa zea and Anticarsia gemmatalis. A transgenic soybean with high levels of a synthetic cry1Ac protein accumulation caused complete A. gemmatalis larval mortality and significantly reduced other pests also (Stewart et al. 1996; Walker et al. 2000). Similarly, expression of the synthetic cry1Ac gene showed a high level of toxicity to A. gemmatalis without affecting the crop yield (Homrich et al. 2008).

The strategy to pyramid the Cry1Ac with the native genes was adopted to increase plant resistance to insects. Several QTLs from soybean lines showing antixenosis and antibiosis resistance (Cregan et al. 1999; Rector et al. 2000). were used to develop transgenic soybean lines by combining QTLs with synthetic Cry1Ac (Walker et al. 2004). The pyramiding Bt lines were found significantly more resistant to lepidopteran pests. The success of Bt crops leads to the employment of different insect-resistant protein-encoding genes such as lectins, plant defense proteins, insect chitinases, α-amylase inhibitors, and defensins (Hudson et al. 2013). Insect-resistant transgenic plants are climate friendly and have the potential to drastically reduce the use of chemical pesticides.

1.12.1.2 Disease Resistance

Viruses and fungi are the most common pathogens affecting soybean and hence they are targeted for the development of disease-resistant soybeans. Resistance to viruses in a different plant species have been achieved using pathogen derived viral coat proteins. When coat proteins are used in planta, they interfere with viral assembly controlling their spread. The same approach was used in soybean to develop virus-resistant plants. Soybean resistant to bean pod mottle virus (BPMV) (Di et al. 1996) was developed by introducing a BPMV coat protein. Another study developed soybean resistant to BPMV by transforming a BPMV capsid polyprotein. The transgenic lines showed complete resistance to virus infection with no visible symptoms (Reddy et al. 2001). Similarly, efforts were made to develop soybean mosaic virus (SMV) a devastating virus which causes yield loss up to 90%. Soybean lines conferring pathogen-derived resistance, transgenic plants were produced containing a SMV derived coat protein gene and the 3’UTR (Wang et al. 2001). The coat protein expression was detected in transgenic lines and few lines showed high resistance to infections with the SMV virus. Similarly, soybean dwarf virus (SbDV) derived sense coat protein gene was used to develop SbDV-resistant soybean plants (Tougou et al. 2006). The resistance was achieved by overexpression of SbDV-CP mRNA, soybean transgenic lines remained symptomless after infection with SbDV.

Sclerotinia stem rot (SSR) caused by the fungus Sclerotinia sclerotiorum is one of the important diseases affecting soybeans. This fungus was found to be associated with oxalic acid (OA). Treatment of plants with OA increased symptoms, while OA metabolism resulted in fungal tolerance. Transgenic soybean overexpressing oxalate decarboxylase (OXDC) (Cunha et al. 2010), showed reduced disease progression correlating with the transgene expression levels. The single-chain variable fragment (scFv) antibodies are used as an alternative technology to control fungal infection. The plant can express and assemble antibody fragments. A similar antibody approach was used in soybean to control Fusarium virguliforme causing sudden death syndrome (SDS) (Brar and Bhattacharyya 2012). Antibody gene encoding scFv anti-FvTox1was used to create transgenic lines targeting pathogenic toxin Tox1, which reduced disease development.

1.12.1.3 Abiotic Stress Resistance

Drought is a most important abiotic stress factor that affects crop productivity and yield. To understand genetic basis of drought tolerance, research was focused on the study of physiological responses such as water use efficiency, nitrogen fixation, leaf wilting, and root growth. The overexpression of single target genes showed potential for enhancing drought tolerance in Arabidopsis and tobacco model systems, however, the knowledge has not been translated to many crop plants. In soybean overexpression of molecular chaperone binding protein (BiP) showed decreased leaf water potential, leaf wilting, and stomatal closure under drought (Valente et al. 2008). Further, the transgenic plants exhibited decreased rates transpiration and photosynthesis, delayed leaf senescence. The soybean overexpressing the Δ1-pyrroline-5-carboxylate synthase (P5CR) gene from Arabidopsis showed high free proline accumulation resulting in increased tolerance to drought and heat stresses (De Ronde et al. 2004a, b; Kocsy et al. 2005). DREBs belonging to the ethylene-responsive factors (ERF) family of transcription factors play an important role in providing tolerance to abiotic stresses. (Polizel et al. 2011) transformed a drought-sensitive soybean cultivar, BR16 with AtDREB1A gene under drought-inducible promoter (rd29A) from Arabidopsis. The transformed plants showed increased chlorophyll, higher stomatal conductance, and enhanced transpiration and photosynthetic rates. The overexpression of GmFDL19, a bZIP transcription factor in soybean caused early flowering and, enhanced tolerance drought and salt stress in transgenic soybean plants. The expression of GmFDL19 was found to be induced by abscisic acid (ABA), polyethylene glycol (PEG 6000) and high salt stresses (Li et al. 2017). Researchers at the University of Litoral developed genetically modified soybeans by transformation of a gene isolated from sunflower (HAHB-4), The transgenic lines were found to tolerate water-stress (drought) and saline soils. Argentinean government approved HB4 technology for soybean. The HB4 technology improved the yields 13% during severe drought in Argentina and the improvement reached up to 30% in field trials (Mira and Nación 2015; Patiño 2018) (Fig. 1.1).

Fig. 1.1
figure 1

Schematic representation of breeding approach for the development of climate-smart soybeans

Aquaporins are the class of transporters involved in the transport of water and other small solutes like ammonia, urea, glycerol, boric acid, silicic acid, H2O2, and CO2 (Tyerman et al. 2002; Maurel et al. 2008; Bienert and Chaumont 2014). These aquaporins belong to major intrinsic protein (MIP) superfamily. Their structure resembles hourglass (Törnroth-Horsefield et al. 2006) with six transmembranes (TM) α helices (helix 1 to helix 6), and five loops that penetrate into the lipid bilayer to make a passage for water movement (Fig. 1.2). Considering their importance, aquaporin-encoding genes are identified in different crop plants including soybean (Deshmukh et al. 2013; Zhang et al. 2013; Deokar and Tar’an 2016; Deshmukh et al. 2016; Song et al. 2016; Shivaraj et al. 2017a, b; Sonah et al. 2017). Several studies have demonstrated the role of aquaporins to improve climate-smart traits in different crop plants (Table 1.4). Recently functionally important aquaporin encoding genes were identified and characterized from soybean genome (Table 1.5). The expression analysis of GmTIP2;3 showed higher levels in the root, stem, and pod. Its accumulation significantly increased in response to osmotic stresses, including polyethylene glycol and abscisic acid treatments. In addition, yeast heterologous expression showed that GmTIP2;3 could increase tolerance to osmotic stress in yeast cells (Zhang et al. 2016a). Overexpression of GmPIP1;6 in soybean resulted in enhanced leaf gas exchange in normal conditions compared to wild-type plant. Under salt stress, the transgenic plants showed increased growth and yield relative to wild type in field conditions (Zhou et al. 2014). GmPIP2;9 overexpression in soybean showed increased tolerance to drought stress in both solution and soil cultures. GmPIP2;9 overexpression lines under drought stress showed increased net CO2 assimilation of photosynthesis, transpiration rate, and stomatal conductance compared to wild-type plants. Additionally, field grown overexpression plants exhibited significantly more pod numbers and increased seed size than wild-type plants (Lu et al. 2018).

Fig. 1.2
figure 2

Schematic diagram of the 2D structure and 3D structure of aquaporin GmNIP2-1 identified in Soybean showing six transmembrane alpha-helices and the five inter-helical loops. Modifications in NPA-spacing or Ar/R selectivity filter positions in aquaporins have been shown to change the transport activity and the solute specificity

Table 1.4 Studies demonstrating the role of aquaporins in abiotic stress tolerance in the plants through heterologous expression assays
Table 1.5 Functional characterization of aquaporin family members in soybean

1.12.1.4 Herbicide Resistance

Weeds are unwanted plants known to reduce yield by competing with crops for water, sunlight, and nutrients, which can be considered as another form of biotic stress. Farmers use different strategies like manual removal, plowing, and application of herbicides to control weeds. Herbicides are chemicals which kill weeds or hinder their growth. Broad-spectrum herbicides kill many types of plants that they come in contact with, while narrow-spectrum herbicides are toxic to a specific group of plant species (Zimdahl 2018). It is important to develop the crop varieties, which can withstand the toxic effect of herbicides and maximize the benefit of herbicide application. Herbicide-resistant plants provide both economical and ecological benefits. Since the development of herbicide-tolerant soybeans to glyphosate soil tillage has diminished by 23% (Givens et al. 2009). Reduced tilling preserves soil nutrients and organic matter thereby reducing soil erosion. Reduced tillage practices decrease fuel consumption and reduced carbon dioxide emission. The largest reduction in carbon dioxide emissions has come from the adoption of genetically modified herbicide-tolerant soybean (Brookes and Barfoot 2015).

Glyphosate, a broadly used herbicide and is known to inhibit 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS), production of several essential aromatic amino acids. Hence research was undertaken to develop glyphosate-resistant soybean called Roundup Ready® by Monsanto. These transgenic soybeans expressed glyphosate-tolerant EPSPS from Agrobacterium spp. strain CP4 providing tolerance to the glyphosate (Padgette et al. 1995). Bayer Crop Sci released herbicide-tolerant soybean called Liberty Link® soybean. Liberty Link® soybean was developed to express a gene phosphinothricin-N-acetyltransferase (PAT) from bacteria Strptomyces viridochromogenes. PAT encodes a glutamine synthetase inhibitor which binds to glutamate, imparting plants resistance to the contact herbicide glufosinate ammonium. Similarly, Pioneer has developed a genetically modified soybean product tolerant to two classes of herbicides, glyphosate and acetolactate synthase (ALS)-inhibiting herbicides. These plants express the glyphosate acetyltransferase and modified soybean acetolactate synthase (GMHRA) protein (Mathesius et al. 2009). The glyphosate acetyltransferase confers tolerance to herbicides containing glyphosate by acetylating glyphosate making it non-phytotoxic, While the GMHRA protein imparts tolerance to the ALS-inhibiting herbicides.

Monsanto has also developed transgenic soybeans that are resistant to herbicide dicamba which controls broadleaf weeds (Behrens et al. 2007). Soybeans transformed with a bacterial dicamba monooxygenase (DMO) gene found to inactivate dicamba, making them tolerant to this herbicide. Syngenta and Bayer Crop Sci are developing an HPPD-inhibitor tolerant soybeans plants. The event consists of a stacking of a gene conferring tolerance to 4-hydroxyphenylpyruvate dioxygenase (HPPD)-inhibiting herbicides and a gene for glufosinate tolerance. Inhibition of HPPD arrests the degradation of tyrosine to plastoquinones, which is important for carotenoid biosynthesis, photosynthesis, and tocopherol production (van Almsick 2009). The stacked herbicide-tolerant lines will enable the use of multiple herbicides and will act as an important tool to fight the increasing pressure from resistant weeds.

1.12.1.5 Increased Oil Content

Over the past decade, there has been a growing demand for soybean oil, for edible consumption soy-based biodiesel production. The interest in soybean oil has led to novel metabolic engineering strategies to increase the oil content of soybean seeds. Increasing oil content results in decreased protein content, and vice versa. The attempts made to increase oil content targeted enzymes and substrate pools in the Kennedy pathway involved in the production of triacylglycerols (TAGs). The overexpression of diacylglycerol acyltransferase (DGAT2) encoding gene from fungi in soybean seeds (Lardizabal et al. 2008) converted diacylglycerols (DAGs) to TAGs. Transgenic soybeans grown at different locations over showed a 1.5% increase in total seed oil without affecting seed protein content. In another study (Rao and Hildebrand 2009), overexpression of the yeast sphingolipid compensation (SLC1) protein in soybean led the conversion of lysophosphatidic acid to phosphatidic acid, which is the precursor of DAG in the Kennedy pathway. Stable overexpression transgenic lines showed 1.5% increased oil content in seeds.

1.13 Recent Concepts and Strategies Development

1.13.1 Gene Editing

Inducing genetic variation in plant genome is the source for increasing genetic diversity and crop improvement. Natural and induced mutations were the only source of introducing new alleles that plant breeders exploited for crop improvement. However, these mutations are distributed randomly in the genome and not always useful. Recent advances in genomics, molecular biology, and genetic engineering have improved our ability to induce precise changes in the plant genome. Gene editing (aka genome editing or genome engineering) describes a suite of techniques that enable precise and targeted modifications (deletions, insertion, gene/base replacement) of host plant genome (Butler et al. 2018). The reagents (endonuclease) required for gene editing includes transcription activator-like effector nuclease (TALENs), zinc-finger nucleases (ZFNs), and clustered regularly interspaced short palindromic repeats (CRISPR-Cas9).

Particularly, gene editing using CRISPR has emerged as a simple yet most powerful technology due to its high efficacy, simple to design target, cost-effective, and amenable to multiplexing (Jacobs et al. 2015). In CRISPR system, two core components are required to make site-specific change; first, the Cas9 (CRISPR-associated 9 protein) which is a large protein and has endonuclease activity and the second component is guide-RNA (gRNA) which is approx. 100 nucleotide RNA molecule. Both Cas9 and gRNA interact to make Cas9 complex and identify DNA sequence complementary to the gRNA in the genome. When these components introduced to plant cell via Agrobacterium-mediated transformation, the Cas9 complex recognizes the target site and it makes double-stranded break (DSB). When DSB created in the eukaryotic cells, the DNA repair mechanism gets activated which facilitate non-homologous end joining (NHEJ) and results in the deletion or insertion at the repair site. Insertion and deletions at the target gene sites cause frame-shift mutation (gene knockout) (Jacobs et al. 2015).

Cermak et al. (2017) developed a comprehensive toolkit that enables targeted, specific modification of monocot and dicot genomes using a variety of genome engineering approaches. In addition to creating target specific mutations, CRISPR-Cas9 offers a unique advantage to repair a stretch of DNA sequence using homology-directed repair (HDR) and/or base editing. In this process, the CRISPR-Cas9-induced double-strand break can be used to create a knock-in, rather than a target gene knockout when a donor template provided (Gaj et al. 2016; Curtin et al. 2018). The precise insertion of a donor template after double-strand break can be altered to fix a mutation. However, one significant remaining challenge in plant genome engineering is achieving high-frequency gene editing by HDR. Recently, several groups have engineered Cas9 (dCas9, nCas9) for programmable editing of DNA base (Reviewed by (Eid et al. 2018). This advanced technology has provided an opportunity to edit single-base and accelerating functional characterization of novel genes and trait discovery to cope with abiotic stresses.

In soybean, these CRISPR and TALEN technologies have been successfully used to knockout genes involved in important agronomic and seed composition traits (Haun et al. 2014; Du et al. 2016; Curtin et al. 2018). Haun et al. (2014) used TALEN to improve soybean oil quality by targeted mutagenesis of the fatty acid desaturase 2 gene family. Recently, Curtin et al. (2018) used both CRISPR/Cas9 and TALEN reagents to generate heritable mutations in small RNA processing involved in drought tolerance in soybean. They created a bi-allelic double mutant for soybean paralogs GmDrb2a and GmDrb2b, and Dicer-like2 genes. Notably, the study showed that Gmdrb2ab mutant plants were significantly more sensitive to drought stress than wild-type soybean plants suggesting a functional role of these genes in water stress. Moreover, utilization of a hairy root transformation system (Agrobacterium rhizogenes) is now possible to evaluate the efficacy and efficiency of multiple targets in a high-throughput setting (Cermak et al. 2017).

1.13.1.1 Concerns and Compliances About Gene Editing and Genetically Modified Crops

Modern biotechnology has provided a wide range of options to improve nutrition, climate resilience, and productivity. However, the technology and popularity of genetically modified (GM) crops have created social and ethical contradictions between consumers, farmers, researchers and policymakers. GM crops are wildly accepted for cultivation in the US and other parts of the world for either food crops (e.g., soybean, corn, canola) or nonfood crops (e.g., cotton). On the other hand, Europe has concerns about GM crop cultivation (Maghari and Ardekani 2011). However, it is needed to understand that there is a big distinction between gene editing in crops and classically defined GM crops. GM refers to insertion of a gene (foreign gene) from an external source such as viruses, bacteria, animals, or plants (usually unrelated species). While, in gene editing technology, a new gene is not transferred into a target crop plant, rather, genome editing technology tool is used to alter the function of a preexisting gene inside the plant genome. More importantly, gene editing technology is similar to the widely accepted ‘mutation breeding’ and moreover it is much faster and precise. Despite this fact, many other agencies such as Environmental Protection Agency (EPA) and Food and Drug Administration (FDA) in the US also plays a major role in the regulation and these agencies are considering the issues. It is increasingly clear that the USDA will not regulate the genome edited plants for cultivation (Waltz 2018). But gene-edited plants now are subjected to tough GM regulation in the European Union (https://www.nature.com/articles/d41586-018-05814-6).

1.13.2 Nanotechnology

Recent advancements in nanotechnology have opened up a novel application in agriculture and the scientific data indicates its potential to positively impact on the development of climate-resilient crops (Srilatha 2011; Fraceto et al. 2016). Nanoparticles can be synthesized from metal or metal oxide through physical or chemical processes and these nanoparticles are being studied to assess their potential in plant growth and development, protection from biotic and abiotic stresses. Nanotechnology has been successfully implemented in fertilizer applications, wastewater treatment, nanosensor, etc. (Srilatha 2011). Similarly, this technology opens large scope for diverse applications in fields of crop biotechnology and the potential benefits could be exploited mitigating abiotic stress and boosting agriculture productivity (Saxena et al. 2016).

Researcher has used silicon nanoparticles (SiNPs) to enhance abiotic stress tolerance via increased nutrient uptake, enhancement of antioxidant enzyme activity and by the formation of a thin layer in the apoplast, which helps the plant to resist various stresses (Liang et al. 2007; Saxena et al. 2016). A study showed than SiNPs (Na2Sio2) absorbs fasters without toxic effect in maize plant and exhibited a promoting effect on plant growth. In another study, Sedghi et al. (2013), demonstrated that nano zinc oxide have the potential to increase seed germination rate in soybean under water stress. Further, they concluded that application of these nanoparticles under drought condition decrease seed residual fresh and dry weight, suggesting its potential effect as seed reservoirs to seedling growth and enhance drought tolerance (Sedghi et al. 2013).

1.14 Brief Account on the Role of Bioinformatics as a Tool

Bioinformatics plays important role in curation deposition and organization of data to understand biological phenomena. The importance of bioinformatics tools rose sharply with the expansion of high-throughput molecular biology and genomic techniques. Currently, several repositories specific to Soybean genome and expression data are available (Livingstone et al. 2016.

1.14.1 Gene and Genome Databases

1.14.1.1 Phytozome (http://www.phytozome.net/soybean)

The JGI (Joint Genome Institute) released the soybean genome sequence. This genome was sequenced by whole-genome shotgun sequencing, and subsequently, the genome sequence was assembled (Jaffe et al. 2003). The Phytozome houses the whole-genome sequence of soybean and it also provides tools explore the genome through browsing interface. The data can be downloaded using the BioMart tool. Users can perform BLAST analysis against many plant genomes including soybean. Each gene has been annotated with, KOG, PFAM, PANTHER, KEGG RefSeq, UniProt, TAIR, and JGI assignments.

1.14.1.2 SoyBase (http://soybase.org/index.php)

The USDA-ARS initiated a central repository, SoyBase for genetics data and related resources. It is a map-based database containing different classes of data such as markers, maps, QTLs, locus, etc. SoyBase and the Soybean Breeder’s Toolbox database include tools to browse the genetic map, physical map, and sequence map of the soybean. There are also many search pages and BLAST analysis to collect and analyze the data.

1.14.1.3 SoyGD (http://soybeangenome.siu.edu/)

The Soybean Genome Database (SoyGD) provides genomic information in with respect to the genetic map representing linkage groups based on loci and markers (Shultz et al. 2006). This browser allows users to visualize the physical and genetic maps of soybean. The search interface also enables to locate a specific region of interest based on region or landmark.

1.14.2 Soybean Omics Databases

1.14.2.1 Gene Networks in Seed Development (http://estdb.biology.ucla.edu/seed/)

This database is a collaborative effort between the Goldberg laboratory at UCLA and the Harada laboratory at UCD stores information about all the genes involved in soybean seed development. The soybean and Arabidopsis Affymetrix GeneChips, Laser Capture Microdissection (LCM), and next-generation high-throughput sequencing technologies were employed to profile the mRNA sets from different seed regions from distinct developmental stages. Tools are available to browse and analyze gene expression data based on different seed developmental stages.

1.14.2.2 The Soybean Genomics and Microarray Database (SGMD) (http://psi081.ba.ars.usda.gov/SGMD/Default.htm)

The SGMD contains EST and microarray data to analyze the interaction of soybean with the major pest, soybean cyst nematode. The database contains more than 50 million rows of DNA microarray data and around 20,000 ESTs (Alkharouf and Matthews 2004). The analytical tools are in place to show the result with statistical measurement.

1.14.2.3 SoyXpress (http://soyxpress.agrenv.mcgill.ca/)

SoyXpress provides a link between the gene expression data from Affymetrix chips with related information like transcriptome data, Gene Ontology terms, and KEGG (Kyoto Encyclopedia of Genes and Genomes) metabolic pathways. It also contains many search interfaces, which enables users to browse by GenBank accession number, EST ID, Affymetrix probe ID, SwissProt protein ID, GO term or EC enzyme number (Cheng and Strömvik 2008).

1.14.2.4 The Soybean Proteome Database (SPD, http://proteome.dc.affrc.go.jp/soybean/)

Contains proteome data and their annotations from several organelles under flooding, salt, and drought stress. It comprised of the data generated by analyzed by two-dimensional polyacrylamide gels and gel-free proteomics technique.

1.14.2.5 Soybean Knowledge Base (SoyKB) (http://soykb.org/)

It is a comprehensive soybean genomics resource. SoyKB contains integrated data of genomics, transcriptomics, proteomics, and metabolomics along with gene function and biological pathway annotations. It contains information on SNPs, genes, microRNAs, and metabolites.