Genome-wide association analysis for yield-related traits at the R6 stage in a Chinese soybean mini core collection

Li, Xiangnan; Zhou, Yang; Bu, Yuanpeng; Wang, Xinfang; Zhang, Yumei; Guo, Na; Zhao, Jinming; Xing, Han

doi:10.1007/s13258-021-01109-9

Genome-wide association analysis for yield-related traits at the R6 stage in a Chinese soybean mini core collection

Research Article
Published: 06 May 2021

Volume 43, pages 897–912, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Genes & Genomics Aims and scope Submit manuscript

Genome-wide association analysis for yield-related traits at the R6 stage in a Chinese soybean mini core collection

Download PDF

Xiangnan Li¹,
Yang Zhou¹,
Yuanpeng Bu¹,
Xinfang Wang¹,
Yumei Zhang¹,
Na Guo¹,
Jinming Zhao¹ &
…
Han Xing¹

693 Accesses
10 Citations
Explore all metrics

Abstract

Background

Soybean (Glycine max (L.) Merr.) is an economically important crop for vegetable oil and protein production, and yield is a critical trait for grain/vegetable uses of soybean. However, our knowledge of the genes controlling the vegetable soybean yield remains limited.

Objective

To better understand the genetic basis of the vegetable soybean yield.

Methods

The 100-pod fresh weight (PFW), 100-seed fresh weight (SFW), kernel percent (KP) and moisture content of fresh seeds (MCFS) at the R6 stage are four yield-related traits for vegetable soybean. We investigated a soybean mini core collection composed of 224 germplasm accessions for four yield-related traits in two consecutive years. Based on 1514 single nucleotide polymorphisms (SNPs), genome-wide association studies (GWAS) were conducted using a mixed linear model (MLM).

Results

Extensive phenotypic variation existed in the soybean mini core collection and significant positive correlations were shown among most of traits. A total of 16 SNP markers for PFW, SFW, KP and MCFS were detected in all environments via GWAS. Nine SNP markers were repeatedly identified in two environments. Among these markers, eight were located in or near regions where yield-related QTLs have been reported in previous studies, and one was a novel genetic locus identified in this study. In addition, we conducted candidate gene analysis to the large-effect SNP markers, a total of twelve genes were proposed as potential candidate genes of soybean yield at the R6 stage.

Conclusion

These results will be beneficial for understanding the genetic basis of soybean yield at the R6 stage and facilitating the pyramiding of favourable alleles for future high-yield breeding by marker-assisted selection in vegetable soybean.

Genome-wide association study of four yield-related traits at the R6 stage in soybean

Article Open access 29 March 2019

Genetic dissection of yield-related traits via genome-wide association analysis across multiple environments in wild soybean (Glycine soja Sieb. and Zucc.)

Article 06 January 2020

Identification of major genomic regions for soybean seed weight by genome-wide association study

Article 28 June 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Soybean [Glycine max (L.) Merr.] is an important crop that is grown worldwide as a source of edible oil and high-quality protein (Graham and Vance 2003). Soybean provides approximately 68% of protein meal and 57% of vegetable oil production worldwide (USDA-FAS, http://www.fas.usda.gov/). In accordance with the use of soybean cultivars, it can be divided into grain soybean and vegetable soybean. Grain soybean is harvested in the late maturation stage, when seeds are dehydrated, and is mainly employed for the production of oil, soybean milk, tofu and other bean products. Vegetable soybean is harvested at the R6 stage, when the pods are green and fully filled with the seeds, and is mainly used as a bean vegetable (Fehr and Caviness 1977; Young et al. 2000). With a long soybean breeding history, yield has been considered as one of the preferable traits. It is a very complex trait and depends upon a combination of different plant traits. Seed weight, the seed number per pod and the pod number per plant are three important yield component traits that directly determine yield (Zhou et al. 2009; Hao et al. 2012). Seed shape, plant architecture, the growing period and biomass yield are important yield-related traits that have been proven to influence yield indirectly in several crops (Li et al. 2019a, b; Zhang et al. 2018; Salas Fernandez et al. 2009; Zhang et al. 2015; Raboanatahiry et al. 2018). In addition, vegetable soybean has a high moisture content of approximately 57.0–79.0% in fresh seeds, and a significant positive correlation between moisture content and yield has been observed in soybean landrace populations (Li et al. 2019a, b). Thus, moisture content is also an important yield-related trait in vegetable soybean. In maize, high heritability (h²) of moisture content has been shown, and several quantitative trait loci (QTLs) that are simultaneously related to yield and moisture content have been detected (Austin et al. 2000). Many yield-related QTLs have been shown to be clustered with yield QTLs in several crops (Chen et al. 2007; Shi et al. 2009; Cai et al. 2014; Zhang et al. 2017). Therefore, understanding the genetic basis of yield-related traits will help us to improve yield in vegetable soybean.

Yield-related traits are complex traits that are controlled by multiple genes and influenced by environmental factors. In soybean, QTL analyses have been conducted to elucidate the genetic bases of yield-related traits in segregating populations derived from biparental crosses. More than 300 QTLs have been detected across the whole genome of soybean, most of which have been repeatedly identified in different genetic backgrounds (Kim et al. 2010; Liu et al. 2011; Han et al. 2012; Sun et al. 2012). These studies demonstrated that linkage mapping is an efficient approach for revealing the genetic architecture underlying complex quantitative traits. Linkage mapping has been conducted in cross-populations such as recombinant inbred lines (RILs). In these mapping populations, homozygous alternative alleles are segregated, and each allele is present in 50% of the recombinant lines; thus, high power to detect QTLs can be observed (Huang and Han 2014). However, the establishment of the RIL population is time consuming and labour intensive.

Genome-wide association studies (GWAS), also referred to as linkage disequilibrium (LD) mapping, are an effective method for studying the genetic architecture of complex traits. In contrast to linkage mapping, GWAS is a time- and labour-saving approach with a high resolution. It can be performed in natural populations with complex genetic backgrounds that have accumulated a large number of historic recombination events and can couple events with linked DNA polymorphisms to reveal associations with phenotypic diversity within a relatively small region of the genome (Rafalski 2010; Zhao et al. 2015). In addition, GWAS has the ability to reveal elite alleles at genetic loci in germplasm. GWAS has been successfully used in soybean. For example, 118 SNPs and 11 SSRs were identified for soybean flowering time under various photothermal conditions in 91 soybean cultivars from different maturity groups, four of which were large-effect markers located on Gm 11, Gm 16 and Gm 20 and were detected in eight environments (Mao et al. 2017). Based on SNP markers and SNP-based haplotype analysis, a total of 11, 17 and 59 SNP-based haplotypes have been shown to be associated with seed yield, 100-seed weight and plant height, respectively (Contreras et al. 2017). For seed size and shape, a total of 21 significant loci were identified in a soybean landrace population, which explained 8.12–14.25% of the observed phenotypic variance. The new locus SW9-1, on chromosome 9, is a reliable locus related to seed size traits that exerts a relatively high phenotypic effect (13.67) on 100-seed weight (Li et al. 2019a, b). For the vegetable soybean yield, Li et al. (2019a, b) detected 35 significant SNPs and proposed five potential candidate genes for soybean yield at the R6 stage (Li et al. 2019a, b). However, the genetic basis of yield and yield-related traits in soybean remains largely ambiguous.

In this study, we genotyped a soybean mini core collection composed of 224 germplasm accessions using 1514 SNP markers and surveyed four yield-related traits in two consecutive years. The objectives of this study were to (1) reveal the genetic basis of yield-related traits in soybean at the R6 stage and (2) evaluate marker-based allelic effects for application in vegetable soybean breeding programmes.

Materials and methods

Plant materials

Two hundred and twenty-four accessions obtained from the Chinese soybean mini core collection were selected from 23,587 germplasm accessions conserved in the Chinese National Soybean GenBank (CNSGB). The 224 accessions consisted of 196 landraces and 28 cultivars which had been reported by Huang (Table S1) (Huang et al. 2016). The population preserved approximately 63.5% of the soybean genetic diversity present in China and is an excellent population for identifying QTLs and revealing the genetic bases of target traits by association mapping (Huang et al. 2016; Qiu et al. 2009).

Field trials and trait measurement

The 224 accessions were planted in double-row plots in a randomized complete block design with three replications on the Experimental Farms of Nanjing Agricultural University located in two environments: Jiangpu (32.04° N 118.63° E, 2015 and 2016) in Nanjing, China. The rows were 4 m in length and 0.5 m in width, and there were 40 plants in a row. At the R6 stage, fifty fresh pods from normally developed plants were harvested for trait investigation in the autumn of 2015 and 2016.

Four yield-related traits, 100-pod fresh weight (PFW), 100-seed fresh weight (SFW), the kernel percent (KP) and the moisture content of fresh seeds (MCFS), were investigated. First, PFW (g) could be calculated by weighing the fresh pods with an electronic balance. Second, SFW (g) was measured in fresh seeds whose pod husks were stripped. Third, KP (%) was the ratio of the fresh seed weight to the fresh pod weight. Finally, the fresh seeds were killed at 110 °C for 30 min and dried at 65 °C in a dryer until a constant weight was reached. Then, the MCFS (%) could be calculated.

Statistical analysis

Analysis of variance (ANOVA) of the phenotype data across multiple environments was performed via the PROC GLM process in SAS version 9.4 (SAS Institute, Inc., Cary, NC, USA). The model for the phenotypic traits was as follows: \({y}_{ijk}=\mu +{\alpha }_{i}+{\beta }_{j}+{\gamma }_{kj}+{(\alpha \beta )}_{ij}+{\varepsilon }_{ijk}\), where µ is the overall mean; α_i is the genetic effect of the ith genotype; β_j is the effect of the jth environment; γ_kj is the block effect within the jth environment; (αβ)_ij is the interaction effect between the ith genotype and the jth environment; and ε_ijk is the random error following N(0, α²_ε). Broad-sense heritability was calculated as follows:\({h}^{2}={\alpha }_{g}^{2}/({\alpha }_{g}^{2}+{\alpha }_{ge}^{2}/n+{\alpha }_{\varepsilon }^{2}/rn)\), where σ²_g is the genotypic variance; σ²_ge is the genotype by environment interaction variance; α²_ε is the error variance; n is the number of environments; and r is the number of replications. All of the above variance values can be calculated by using the REML method via the SAS VARCOMP procedure.

The best linear unbiased prediction (BLUP) is the best statistical method for predicting breeding values of target traits under multi-environments. The BLUPs for PFW, SFW, KP and MCFS in each accession were calculated using PROC MIXED in SAS. Then the breeding values from the mixed BLUP model were used as phenotypic data for association mapping.

Genotyping and quality control

1645 SNPs were used to genotype the population and had been described by Huang et al. (2016). After excluding SNP markers with > 15% missing data, a minor allele frequency (MAF) < 5% and heterozygosity > 15% (Liu and Muse 2005), a total of 1514 high-quality SNPs were selected for association mapping (Table S2).

Population structure and linkage disequilibrium

Population structure analysis was performed using the Bayesian model-based clustering method in the program STRUCTURE v2.2 (Pritchard et al. 2000).Pairwise kinship (K) was calculated in TASSEL 5. Based on the Q and K, the Chinese soybean mini core collection was divided into two subgroups (Huang et al. 2016). The pairwise LD between SNPs was calculated as the squared correlation coefficient (r²) of alleles using TASSEL 5 software (Bradbury et al. 2007). The LD estimation had been reported by Huang and the average LD of the population was 544.01 kb (Huang et al. 2016).

Association mapping

To minimize the effects of environmental variation, the best linear unbiased predictions (BLUPs) of individual lines were calculated for each trait using the R package lme4 (Bates et al. 2012). A total of 1514 SNPs (MAF > 0.05) and four yield-related traits (PFW, SFW, KP and MCFS) for all genotypes from the association panel across two years were used to perform marker-trait association analysis. GWAS was performed with a mixed linear model (MLM) using TASSEL 5, which takes both familial relatedness and population structure into account. The significance threshold of marker-trait association was set to a P value < 10⁻³ (− Log₁₀ P > 3.00).

Prediction of candidate genes

To identify candidate genes underlying the association signals, we selected significant SNP markers which identified in multiple environments and associated with large-effect QTLs to search candidate genes in their candidate regions (LD ± 544. 01 kb). The functional annotations and tissue expression of genes located in the candidate regions were obtained from Soybase (https://www.soybase.org/) and Phytozome (http://phytozome.net). The soybean reference genome was Wm82.a1.v1.1. Based on the soybean genomic annotations and expression data, potential candidate genes were predicted. The heatmap profile of the candidate genes were drawn with R software.

Results

Phenotypic analysis of four yield-related traits

A total of 224 soybean accessions were planted in two consecutive years from 2015 to 2016, and four yield-related traits were investigated. Abundant phenotypic variation was observed for all of these traits in the 224 soybean accessions (Table 1). The 100-pod fresh weight (PFW) showed 9.26 or 7.81 fold variation, ranging from 35.85 to 332.08 g and 41.71 to 326.04 g during 2015 and 2016, respectively. The 100-seed fresh weight (SFW) showed an 8.35 or 7.17 fold variation, ranging from 8.66 to 72.35 g and 8.24 to 59.13 g in 2015 and 2016, respectively. The remaining two traits of the kernel percent (KP) and the moisture content of fresh seeds (MCFS) displayed 1.46 to 1.64 or 1.26 to 1.39 fold variation, respectively. The analysis of the coefficient of variation (CV) for these traits showed similar results, suggesting that this population presented a high latent capacity for increases in yield. The skewness ranged from − 0.18 to 1.56, and kurtosis ranged from − 0.50 to 7.29 for the four traits in 2015 and 2016, respectively. Following normalization, the phenotypic data of KP (2015 and 2016) and MCFS (2016) were normally distributed, and the other two traits did not fit a normal distribution (Fig. 1; Table 1).

Table 1 Descriptive statistics of soybean yield-related traits over 2 years

Full size table

Variance analysis of the four yield-related traits showed that the effects of the genotype (G), the environment (E) and the interaction between the genotype and environment (G × E) were all at very significant level (P < 0.0001) (Table 2). The broad-sense heritability (h²) of the four traits was calculated (Table 1). PFW, SFW and MCFS showed a high heritability (≥ 80%), indicating that genetic effects play a predominant role in the phenotype variation of these traits. KP showed a low heritability (54%) and had a great phenotype variation between years. In addition, there was a significant positive correlation between PFW, SFW and MCFS (r_p 0.198–0.946, P < 0.01), suggesting that MCFS is also an important factor influencing soybean yield development at the R6 stage (Table 3).

Table 2 ANOVA results for the effect of year, genotype (G) and their interaction for 100-pod fresh weight (PFW), 100-seed fresh weight (SFW), kernel percent (KP) and moisture content of fresh seeds (MCFS)

Full size table

Table 3 Correlation coefficients among four yield-related traits

Full size table

Association mapping and candidate genes for soybean yield-related traits

Through GWAS, a total of 45 associations for the four yield-related traits (− Log₁₀(P) > 3) were identified under the MLM conditions, which accounted for both population structure and familial relatedness. The 45 associations included 16 SNPs located on 10 of 20 soybean chromosomes (Chr) (Figs. 2, 3, 4, 5). Six, five and seven SNPs associated with PFW were identified in 2015, 2016 and the BLUP across 2 years, respectively, representing 8 genomic regions covering six soybean chromosomes. The contribution of individual SNPs to the observed phenotypic variation was 5.13–11.42%. Four SNP markers, Map-0676 on Chr 04, Q-07-0082152 on Chr 07, and Map-2049 and Map-2055 on Chr 11, were repeatedly detected in all environments (Table 4). For SFW, four, six and six significant SNPs were detected in 2015, 2016 and the BLUP across 2 years, respectively, which represented 8 genomic regions covering seven chromosomes. Each SNP could explain 5.25–9.78% of the observed phenotypic variation. Two SNP markers, Map-2049 and Map-2055, were repeatedly detected in all environments (Table 5). Four, one and two SNP markers were detected for KP using the BLUP across two years and within the individual years, respectively. These SNPs were located in 5 genomic regions across the four chromosomes and explained 3.05–3.68% of the phenotypic variation for KP. No SNPs were repeatedly identified in all environments. Only the SNPs Map-2976 and Q-05-0211601 were repeatedly detected in individual year and the BLUP (Table 6). For MCFS, two, one and one SNP markers were detected in 2015, 2016 and the BLUP across 2 years, respectively, which were located in the same genomic region on chromosome 5. The phenotypic variation interpretation rate of a single SNP was 5.79–8.69%. There was only one SNP marker (Q-05-0194474 on Chr 5) that was repeatedly detected in all environments (Table 7).

Table 4 SNPs associated with 100-pod fresh weight (PFW)

Full size table

Table 5 SNPs associated with 100-seed fresh weight (SFW)

Full size table

Table 6 SNPs associated with kernel percent (KP)

Full size table

Table 7 SNPs associated with moisture content of fresh seed (MCFS)

Full size table

To identify the beneficial allele of each SNP associated with the target trait, the average phenotypic difference was calculated from the soybean accessions that possessed different SNP alleles. On average, the cultivars carrying the favorable allele (allele 1) exhibited higher phenotypic values than those with the unfavorable allele (allele 2) and the entire association panel. For PFW, the phenotypic difference between the cultivars with allele 1 and the cultivars with allele 2 reached 7.10–25.84 g. The phenotypic difference between the cultivars with allele 1 and the entire association panel was 0.71–5.83 g (Table 4). For SFW, the phenotypic values of the cultivars with allele 1 were 1.33–4.05 g higher than those of the cultivars with allele 2 and were 0.27–1.53 g higher than those of the entire association panel (Table 5). The difference in KP between the cultivars carrying allele 1 and allele 2 was 0.45–1.38%. The mean KP of the cultivars with allele 1 was 0.13–1.22% higher than that of the entire association panel (Table 6). With respect to MCFS, the cultivars with allele 1 presented approximately 2.13–2.87% and 0.58–1.34% higher phenotypic values than those with allele 2 and the entire association panel (Table 7).

In the current study, we identified a series of pleiotropic SNPs associated with multiple traits. Six common SNP markers (Map-0676, Q-07-0082152, Map-2049, Map-2055, Map-2565 and Q-18-0006557) were associated with both PFW and SFW, and one SNP marker, Q-05-0194474, was associated with both SFW and MCFS (Table 8). This was consistent with the significant positive correlations among PFW, SFW and MCFS. This result suggested that the positive correlations between PFW and SFW, SFW and MCFS mainly occurred at the molecular level via coincident quantitative trait nucleotides (QTNs). Thus, these coincident QTNs are critical for breeding vegetable soybean cultivars with a high yield.

Table 8 SNPs significantly associated with the four yield-related traits and previously reported QTLs at similar genome regions

Full size table

A total of nine significant SNP markers were repeatedly detected in at least two environments (2015, 2016, and the BLUP across the two years) (Table 8). Among these markers, eight SNP markers were located in or near regions where QTLs have been previously mapped by linkage analysis. These reported QTLs were mainly related to seed yield, seed weight, seed shape, pod wall width and pod maturity. In this study, only one SNP marker (Map-2565) was identified as a new locus associated with soybean yield at the R6 stage.

To shed light on the potential candidate genes involved in the development of soybean yield at the R6 stage, two SNP markers, Map-0676 and Q-05-0194474, which were repeatedly detected in multiple environments and associated with large-effect QTLs were selected to conduct the candidate gene analysis. The PFW and SFW marker Map-0676 (R² 7.64–11.42%) was located on chromosome 4, the PFW and SFW of the materials carrying the favorable allele (AA) at Map-0676 were 6.77 g and 1.63 g higher than the materials carrying the unfavorable allele (GG), respectively (Tables 4, 5). The MCFS marker Q-05-0194474 (R² 5.79–8.69%) located on chromosome 5, the MCFS of the materials carrying the favorable allele (AA) at Q-05-0194474 was 2.13% higher than the materials carrying the unfavorable allele (GG) (Table 7). Based on gene annotation and tissue expression, we selected 12 potential candidate genes for soybean yield at the R6 stage in the candidate regions of Map-0676 (42369904–43457924) and Q-05-0194474 (37953252–39041272) (Table 9). The candidate genes encode a plant intracellular ras-group-related leucine-rich repeat protein (PIRL9), a RAB GTPase homolog E1B (RABE1b), a FUS3-complementing gene protein (FC2), a seed imbibition protein (SIP2), a protein-l-isoaspartate methyltransferase (PIMT1), a BTB and TAZ domain protein (BT2), a pectin lyase-like superfamily protein (PPME1), a retinoblastoma-related protein (RBR1), a HCP-like superfamily protein, a phosphatidylethanolamine-binding protein (MFT) and two relative of early flowering proteins (REF6), respectively. The 12 genes showed significant difference in expression during the development of soybean pod (Fig. 6). The combined results suggested that these genes would be as the potential candidate genes, which might participate in the development of soybean yield at the R6 stage.

Table 9 Candidate genes for the four yield-related traits

Full size table

Discussion

The large phenotype variations observed within the yield-related traits allowed us to identify the best genes with the largest effects (Table 1). The KP (2015 and 2016) and MCFS (2016) traits were normally distributed, but the other two traits were not normally distributed, the different results might be caused by the small sample size used in this study. Heritabilities ranged from 80 to 96% for PFW, SFW and MCFS. Heritability for KP was only 54%. The low heritability of KP might due to the significant genotype-by-environment interaction justified by ANOVA (Table 2). In previous study, a low heritability (51%) for KP and a low correlation coefficient (0.057–0.13) between different environments were observed in maize (Ma et al. 2021), which was similar with the present study.

Soybean yield-related traits constitute complex quantitative traits that are governed by multiple genetic loci, each of which mostly displays minor effects (Zhang et al. 2016). Because of the environmental influence, loci with minor effects are often poor repeatable. Up to date, numerous soybean yield-related trait QTLs have been reported in different independent experiments, but specific and in-depth analyses of these loci are still lacking. Using a mini core collection with abundant genetic variations, a total of 45 significant yield-related associations were identified, which involved 16 SNP markers. Nine SNP markers were repeatedly detected in different environments, indicating that these SNPs were essential for yield-related traits. One of these SNPs was newly associated with soybean yield, and eight were located in or near known QTL regions (Table 8). Five stable SNP markers (Map-0676 on Chr 04, Q-07-0082152 on Chr 07, Map-2049 and Map-2055 on Chr 11, and Q-18-0006557 on Chr 18) that were simultaneously associated with PFW and SFW, overlapped with ten previously reported QTLs, including five seed yield-related QTLs (seed yield 1-2, seed weight 47-3, seed weight 49-15, seed weight 7-2, seed weight 10-8) (Mansur et al. 1993; Li et al. 2010; Teng et al. 2009; Orf et al. 1999; Specht et al. 2001), four seed and pod shape-related QTLs (seed length 1-15, seed length 5-3, seed thickness 1-5, pod wall width 1-7) (Salas et al. 2006; Hu et al. 2013; Tae-Hwan et al. 2014; Guo et al. 2011), and one pod maturity-related QTL (pod maturity 33-1) (Sun et al. 2013). Two stable SNP markers (Q-05-0211601 on Chr 05 and Map-2976 on Chr 16) that associated with KP were located in or near the reported QTL regions containing three yield-related QTLs (seed weight 7-3, seed weight 2-6, seed yield 29-2), one seed shape-related QTL (seed height 1-4) and one pod maturity-related QTL (pod maturity 9-1) (Orf et al. 1999; Salas et al. 2006; Mian et al. 1996; Eskandari et al. 2013). The last stable SNP marker, Q-05-0194474 on Chr 05, was concurrently related to MCFS and SFW; this marker was situated in a yield-related QTL region (seed weight 10-1) located 60 kb away from a seed shape QTL (seed thickness 1-3) (Specht et al. 2001; Jun et al. 2014).

In this study, a total of 12 potential candidate genes associated with soybean yield at the R6 stage were predicted within the candidate regions of two large-effect SNP markers (Table 9 and Fig. 6). Among these 12 genes, 10 genes are proposed as the candidate genes for PFW and SFW. Glyma04g36190 encodes a plant intracellular Ras-group-related leucine-rich repeat protein (PIRL9). In Arabidopsis thaliana, PIRL1 and PIRL9 are essential for differentiation of microspores into pollen, the T-DNA knockout double-mutants stopped growing as microspores, failed to initiate vacuolar fission, aborted, and underwent cytoplasmic degeneration (Forsthoefel et al. 2010). Glyma04g36430 encodes protein-l-isoaspartate methyltransferase 1(PIMT1). In Oryza sativa, PIMT1 is involved in the repair of damaged proteins containing abnormal isoaspartyl (isoAsp) which accumulated as seeds age, the overexpression of OsPIMT1 reduced the accumulation of isoAsp-containing protein in embryos and increased embryo viability (Wei et al. 2015). Glyma04g36500 encodes a BTB and TAZ domain protein (BT2). In Arabidopsis thaliana, BT proteins perform crucial roles in gametophyte development and BT2 seems to be the predominant gene in this process (Robert et al. 2009). The expression levels of Glyma04g36190, Glyma04g36430 and Glyma04g36500 showed significant increase in the late stage of seed development (seed 21–42 days), it suggested that these genes might participate in the soybean seed development, and thereby influence the soybean yield. Glyma04g36250 is a homologous to AT4G20360 which encodes a RAB GTPase homolog E1B (RABE1b). In Arabidopsis thaliana, RABE1b plays an important role in cell division during the embryonic development (Li et al. 2011). Glyma04g36360 is a homologous to AT4G24740, which encodes a FUS3-complementing protein. FUS3 is an important component of the regulatory pathway controlling late embryogenesis and seed maturation (Reidt et al. 2001). Glyma04g36410 encodes a seed imbibition protein (SIP2). SIP2 is an alkaline-galactosidase with a distinct substrate specificity for raffinose (Peters et al. 2010). In plants, the raffinose family of oligosaccharides have been proposed to play roles in the acquisition of desiccation tolerance in seeds (Ooms et al. 1993; Brenac et al. 1997). Glyma04g36520 encodes a pectin lyase-like superfamily protein. In Japanese apricot, the pectin lyase-like superfamily protein (PPME1) have relationships with the pistil abortion which would affect flowers development and seriously decrease the yield in production (Ting et al. 2012). In soybean development, the genes Glyma04g36250, Glyma04g36360, Glyma04g36410 and Glyma04g36520 display a similar expression pattern: they are highly expressed in the early pod development stage. It indicates that they may affect the early pod development by influencing cell division, embryonic development, seed imbibition and pistil fertility. Glyma04g36620 and Glyma04g36630 are homologous to AT3G48430, which encodes a relative of early flowering protein (REF6). In Arabidopsis thaliana, REF6 may participate in the regulation of flowering time and the formation of cotyledon boundary (Noh et al. 2004; Cui et al. 2016). Tissue expression analysis showed the expression of Glyma04g36620 and Glyma04g36630 were highly expressed in early pod and seed development, but were lowly expressed in late pod and seed development. It suggested that the dynamic changes of REF6 genes expression might involved in the regulation of soybean pod and seed development. Glyma04g36700 encodes a retinoblastoma-related protein (RBR1). In Oryza sativa, the RBR1 co-suppression mutants showed low seed setting rate or complete sterility (Zhang 2012). There was a high-level expression of Glyma04g36700 in root, flower, young leaf and pod. It hinted that Glyma04g36700 might play a positive role in soybean yield formation. For MCFS, there were only two genes, Glyma05g33770 and Glyma05g34030, predicted as the potential candidate genes. Glyma05g33770 encodes a HCP-like superfamily protein which might involved in the regulation of cytosolic calcium, and it was upregulated and then downregulated under flooding and drought stress (Xin and Setsuko 2017). The maturity of soybean seed need to go through the seed expanding and dhydration stage. Glyma05g33770 was upregulated in seed expanding stage (seed 10–25 days), and downregulated in seed dhydration stage (seed 28–42 days). It suggested that it might be related to the change of moisture content during seed development. Glyma05g34030 encodes a phosphatidylethanolamine-binding protein which shares 83.2% amino acid sequence identity with MOTHER OF FT AND TFL1 (MFT) in Arabidopsis thaliana, it might regulate seed germination via the ABA and GA signaling pathways (Xi and Yu 2010). In previous study, Glyma05g34030 was predicted as the candidate gene of MCFS and displayed a significantly different expression between varieties with different moisture content (Li et al. 2019a, b). Glyma05g34030 was highly expressed during the seed development, it suggested that Glyma05g34030 might related to the MCFS of soybean seed development. In this study, we postulate that the twelve genes are potential candidate genes for PFW, SFW and MCFS. However, further evidence is needed to functionally validate this hypothesis.

Conclusion

In this study, we identified 45 significant associations for PFW, SFW, KP and MCFS in all environments via GWAS, which included 16 SNP markers. Nine SNP markers were repeatedly detected in different environments, among which eight were located in or near yield-related QTLs that were previously reported, and one was a new locus associated with soybean yield. We were particularly interested in the large-effect markers Map-0676 and Q-05-0194474, which explained 7.64%-11.42% and 5.79%-8.69% of the observed phenotypic variance, respectively. According to genetic annotation and tissue expression analysis, a total of twelve genes, including Glyma04g36190, Glyma04g36250, Glyma04g36360, Glyma04g36410, Glyma04g36430, Glyma04g36500, Glyma04g36520, Glyma04g36620, Glyma04g36630, Glyma04g36700, Glyma05g33770 and Glyma05g34030, were proposed as potential candidate genes of soybean yield at the R6 stage, which requires further investigation for verification. These results provide useful insights for the yield improvement of vegetable soybean.

Availability of data and material

The datasets supporting the results of this article are included within the article and its additional files.

References

Austin D, Lee M, Veldboom L, Hallauer A (2000) Genetic mapping in maize with hybrid progeny across testers and generations: grain yield and grain moisture. Crop Sci 40(1):30–39
Article Google Scholar
Bates D, Maechler M, Bolker B (2012) lme4: linear mixed-effects models using S4 classes. R package version 0.999999-2.
Bradbury P, Zhang Z, Kroon D, Casstevens T, Ramdoss Y, Buckler E (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23(19):2633–2635
Article CAS PubMed Google Scholar
Brenac P, Horbowicz M, Downer S, Dickerman AM et al (1997) Raffinose accumulation related to desic-cation tolerance during maize seed development and maturation. J Plant Physiol 150:481–488
Article CAS Google Scholar
Cai D, Xiao Y, Yang W, Ye W, Wang B, Muhammad Y et al (2014) Association mapping of six yield-related traits in rapeseed (Brassica napus L.). Theor Appl Genet 127:85–96
Article CAS PubMed Google Scholar
Chen W, Zhang Y, Liu X, Chen B, Tu J, Fu T (2007) Detection of QTL for six yield-related traits in oil seed rape (Brassica napus) using DH and immortalized F2 populations. Theor Appl Genet 115:849–858
Article CAS PubMed Google Scholar
Contreras R, Mora F, Mar O, Higashi W, Schuster I (2017) A genome-wide association study for agronomic traits in soybean using SNP markers and SNP-based haplotype analysis. PLoS ONE 12(2):e0171105
Article CAS Google Scholar
Cui X, Lu F, Qiu Q, Zhou B et al (2016) REF6 recognizes a specific DNA sequence to demethylate H3K27me3 and regulate organ boundary formation in Arabidopsis. Nat Genet 48:694–699
Article CAS PubMed Google Scholar
Eskandari M, Cober E, Rajcan I (2013) Genetic control of soybean seed oil: II. QTL and genes that increase oil concentration without decreasing protein or with increased seed yield. Theor Appl Genet 126(6):1677–1687
Article CAS PubMed Google Scholar
Fehr W, Caviness C (1977) Stages of soybean development. Spec Rep 87
Forsthoefel N, Dao T, Vernon D (2010) PIRL1 and PIRL9 encoding members of a novel plant-specific family of leucine-rich repeat proteins, are essential for differentiation of microspores into pollen. Planta 232(5):1101–1114
Article CAS PubMed Google Scholar
Graham P, Vance C (2003) Legumes: importance and constraints to greater use. Plant Physiol 131(3):872–877
Article CAS PubMed PubMed Central Google Scholar
Guo G, Sun R, Hou M, Guo Y et al (2011) Quantitative trait locus (QTL) analysis of pod related traits in different environments in soybean. Afr J Biotechnol 10(56):11848–11854
Google Scholar
Han Y, Li D, Zhu D, Li H et al (2012) QTL analysis of soybean seed weight across multi-genetic backgrounds and environments. Theor Appl Genet 125(4):671–683
Article CAS PubMed Google Scholar
Hao D, Chen H, Yin Z, Cui S et al (2012) Identification of single nucleotide polymorphisms and haplotypes associated with yield and yield components in soybean (Glycine max) landraces across multiple environments. Theor Appl Genet 124(3):447–458
Article CAS PubMed Google Scholar
Hu Z, Zhang H, Kan G, Ma D et al (2013) Determination of the genetic architecture of seed size and shape via linkage and association analysis in soybean (Glycine max L. Merr.). Genetica 141(4–6):247–254
Article CAS PubMed Google Scholar
Huang X, Han B (2014) Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol 65:531–551
Article CAS PubMed Google Scholar
Huang J, Guo N, Li Y, Sun J et al (2016) Phenotypic evaluation and genetic disection of resistance to Phytophthora sojae in the Chinese soybean mini core collection. BMC Genet 17(1):85
Article PubMed PubMed Central CAS Google Scholar
Jun T, Freewalt K, Michel A, Mian R (2014) Identification of novel QTL for leaf traits in soybean. Plant Breed 133(1):61–66
Article CAS Google Scholar
Kim H, Kim Y, Kim S, Son B et al (2010) Analysis of quantitative trait loci (QTLs) for seed size and fatty acid composition using recombinant inbred lines in soybean. J Life Sci 20(8):1186–1192
Article CAS Google Scholar
Li D, Sun M, Han Y, Teng W, Li W (2010) Identification of QTL underlying soluble pigment content in soybean stems related to resistance to soybean white mold (Sclerotinia sclerotiorum). Euphytica 172(1):49–57
Article Google Scholar
Li Y, Lu X, Shen S, Zhang C (2011) Study on involvement of the GTPase RABE1b in embryonic development of Arabidopsis thaliana. Agric Sci Technol 12(12):1795–1798
CAS Google Scholar
Li J, Zhao J, Li Y, Gao Y et al (2019a) Identification of a novel seed size associated locus SW9-1 in soybean. Crop J 7(04):548–559
Article Google Scholar
Li X, Zhang X, Zhu L, Bu Y et al (2019b) Genome-wide association study of four yield-related traits at the R6 stage in soybean. BMC Genet 20(1):39
Article PubMed PubMed Central Google Scholar
Liu K, Muse S (2005) PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21(9):2128–2129
Article CAS PubMed Google Scholar
Liu W, Kim M, Van K, Lee Y et al (2011) QTL identification of yield-related traits and their association with flowering and maturity in soybean. J Crop Sci Biotechnol 14(1):65–70
Article Google Scholar
Ma J, Wang L, Cao Y, Li H (2021) Genome-wide association studies for kernel ratio in maize. J Plant Genet Resour 22(02):448–454 (in Chinese)
Mansur L, Lark K, Kross H, Oliveira A (1993) Interval mapping of quantitative trait loci for reproductive, morphological, and seed traits of soybean (Glycine max L.). Theor Appl Genet 86(8):907–913
Article CAS PubMed Google Scholar
Mao T, Li J, Wen Z, Wu T et al (2017) Association mapping of loci controlling genetic and environmental interaction of soybean flowering time under various photo-thermal conditions. BMC Genom 18(1):415
Article CAS Google Scholar
Mian M, Bailey M, Tamulonis J, Shipe E et al (1996) Molecular markers associated with seed weight in two soybean populations. Theor Appl Genet 93(7):1011–1016
Article CAS PubMed Google Scholar
Noh B, Lee S, Kim H et al (2004) Divergent roles of a pair of homologous jumonji/zinc-finger–class transcription factor proteins in the regulation of Arabidopsis flowering time. Plant Cell 16(10):2601–2613
Article CAS PubMed PubMed Central Google Scholar
Ooms J, Leon K, Koornneef M, Karssen C (1993) Acquisition of desiccation tolerance and longevity in seeds of Arabidopsis thaliana. Plant Physiol 102:1185–1191
Article CAS PubMed PubMed Central Google Scholar
Orf J, Chase K, Jarvik T, Mansur L et al (1999) Genetics of soybean agronomic traits: I. Comparison of three related recombinant inbred populations. Crop Sci 39(6):1642–1651
Article Google Scholar
Peters S, Egert A, Stieger B et al (2010) Functional identification of Arabidopsis ATSIP2 (At3g57520) as an alkaline α-galactosidase with a substrate specificity for raffinose and an apparent sink-specific expression pattern. IEEE Mtt-s International Microwave Symposium Digest
Pritchard J, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959
Article CAS PubMed PubMed Central Google Scholar
Qiu L, Li Y, Guan R, Liu Z, Wang L, Chang R (2009) Establishment, representative testing and research progress of soybean core collection and mini core collection. Crop J 35(4):571–579
Google Scholar
Raboanatahiry N, Chao H, Dalin H, Pu S et al (2018) QTL alignment for seed yield and yield related traits in brassica napus. Front Plant Sci 9:1127
Article PubMed PubMed Central Google Scholar
Rafalski J (2010) Association genetics in crop improvement. Curr Opin Plant Biol 13(2):174–180
Article CAS PubMed Google Scholar
Reidt W, Ellerstrm M, Klle K et al (2001) FUS3-dependent gene regulation during late embryogenesis. J Plant Physiol 158(4):411–418
Article CAS Google Scholar
Robert H, Quint A, Brand D, Vivian-Smith A, Offringa R (2009) BTB and TAZ domain scaffold proteins perform a crucial function in Arabidopsis development. Plant J 58(1):109–121
Article CAS PubMed Google Scholar
Salas P, Oyarzo-Llaipen J, Wang D, Chase K, Mansur L (2006) Genetic mapping of seed shape in three populations of recombinant inbred lines of soybean (Glycine max L. Merr.). Theor Appl Genet 113(8):1459–1466
Article CAS PubMed Google Scholar
Salas Fernandez M, Becraft P, Yin Y, Lübberstedt T (2009) From dwarves to giants? Plant height manipulation for biomass yield. Trends Plant Sci 14:454–461
Article CAS PubMed Google Scholar
Shi J, Li R, Qiu D, Jiang C et al (2009) Unraveling the complex trait of crop yield with quantitative trait loci mapping in Brassica napus. Genetics 182:851–861
Article CAS PubMed PubMed Central Google Scholar
Specht J, Chase K, Macrander M, Graef G et al (2001) Soybean response to water: a QTL analysis of drought tolerance. Crop Sci 41(2):493–509
Article CAS Google Scholar
Sun Y, Pan J, Shi X, Du X, Wu Q, Qi Z et al (2012) Multi-environment mapping and meta-analysis of 100-seed weight in soybean. Mol Biol Rep 39(10):9435–9443
Article CAS PubMed Google Scholar
Sun S, Kim M, Lee Y, Li B, Lee S (2013) QTLs for resistance to phomopsis seed decay are associated with days to maturity in soybean (Glycine max). Theor Appl Genet 126(8):2029–2038
Article CAS PubMed Google Scholar
Tae-Hwan J, Freewalt K, Michal A, Mian R (2014) Identification of novel QTL for leaf traits in soybean. Plant Breed 133(1):61–66
Article CAS Google Scholar
Teng W, Han Y, Du Y, Sun D et al (2009) QTL analyses of seed weight during the development of soybean (Glycine max L. Merr.). Heredity 102(4):372
Article CAS PubMed Google Scholar
Ting S, Zhihong G, Liangju W et al (2012) Identification of differentially-expressed genes associated with pistil abortion in Japanese apricot by genome-wide transcriptional analysis. PLoS ONE 7(10):e47810
Article CAS Google Scholar
Wei Y, Xu H, Diao L et al (2015) Protein repair l-isoaspartyl methyltransferase 1 (PIMT1) in rice improves seed longevity by preserving embryo vigor and viability. Plant Mol Biol 89(4–5):475–492
Article CAS PubMed Google Scholar
Xi W, Yu H (2010) MOTHER OF FT AND TFL1 regulates seed germination through a negative feedback loop modulating ABA signaling in Arabidopsis. Plant Signal Behav 22(6):1733–1748
CAS Google Scholar
Xin W, Setsuko K (2017) Proteomic analysis of calcium effects on soybean root tip under flooding and drought stresses. Plant Cell Physiol 8:8
Google Scholar
Young G, Mebrahtu T, Johnson J (2000) Acceptability of green soybeans as a vegetable entity. Plant Foods Hum Nutr 55(4):323–333
Article CAS PubMed Google Scholar
Zhang Q (2012) Functional study of the OsRBR1 in rice. Huanan Agricultural University (in Chinese)
Zhang J, Song Q, Cregan P, Nelson R et al (2015) Genome-wide association study for flowering time, maturity dates and plant height in early maturing soybean (Glycine max) germplasm. BMC Genomics 16(1):217
Article CAS PubMed PubMed Central Google Scholar
Zhang J, Song Q, Cregan P, Jiang G (2016) Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean (Glycine max). Theor Appl Genet 129(1):117–130
Article CAS PubMed Google Scholar
Zhang W, Hu D, Raman R, Guo S et al (2017) Investigation of the genetic diversity and quantitative trait loci accounting for important agronomic and seed quality traits in Brassica carinata. Front Plant Sci 8:615
Article PubMed PubMed Central Google Scholar
Zhang X, Wang W, Guo N, Zhang Y, Bu Y, Zhao J (2018) Combining QTL-seq and linkage mapping to fine map a wild soybean allele characteristic of greater plant height. BMC Genom 19(1):226
Article CAS Google Scholar
Zhao X, Han Y, Li Y, Liu D, Sun M, Zhao Y (2015) Loci and candidate gene identification for resistance to Sclerotinia sclerotiorum in soybean (Glycine max L. Merr.) via association and linkage maps. Plant J Cell Mol Biol 82(2):245–255
Article CAS Google Scholar
Zhou R, Chen H, Wang X, Zhang X et al (2009) QTL analysis of yield, yield components, and lodging in soybean. Acta Agron Sin 35(5):821–830 (in Chinese)
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by the Key R & D project of Jiangsu Province (BE2019376), National Key R&D Program of China (2017YFD0101500), Genetically Modified Organisms Breeding Major Projects (2016ZX08004002), Modern Agro-industry Technology Research System of China (CARS-04-PS10), the Fundamental Research Funds for the Central Universities (KYT201801), Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT_17R55) and Jiangsu Collaborative Innovation Center for Modern Crop Production, Science and Technology Innovative Team in Fujian Academy of Agricultural Sciences (STIT2017-2-6), Cyrus Tang Innovation Center for Seed Industry.

Author information

Authors and Affiliations

National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory for Crop Genetics and Germplasm Enhancement, College of Agriculture, Nanjing Agricultural University, Weigang No. 1, Nanjing, 210095, Jiangsu, China
Xiangnan Li, Yang Zhou, Yuanpeng Bu, Xinfang Wang, Yumei Zhang, Na Guo, Jinming Zhao & Han Xing

Authors

Xiangnan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yuanpeng Bu
View author publications
You can also search for this author in PubMed Google Scholar
Xinfang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yumei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Na Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jinming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Han Xing
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have read and approved the manuscript. HX and JMZ conceived and coordinated the work. NG and JMZ conducted planting soybean materials. XNL, YZ, YPB and XFW conducted phenotyping experiments. XNL conducted association mapping experiments and wrote the manuscript.

Corresponding authors

Correspondence to Jinming Zhao or Han Xing.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 35 KB)

Supplementary file2 (DOCX 233 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Zhou, Y., Bu, Y. et al. Genome-wide association analysis for yield-related traits at the R6 stage in a Chinese soybean mini core collection. Genes Genom 43, 897–912 (2021). https://doi.org/10.1007/s13258-021-01109-9

Download citation

Received: 05 February 2021
Accepted: 26 April 2021
Published: 06 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s13258-021-01109-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Genome-wide association analysis for yield-related traits at the R6 stage in a Chinese soybean mini core collection

Abstract

Background

Objective

Methods

Results

Conclusion

Similar content being viewed by others

Genome-wide association study of four yield-related traits at the R6 stage in soybean

Genetic dissection of yield-related traits via genome-wide association analysis across multiple environments in wild soybean (Glycine soja Sieb. and Zucc.)

Identification of major genomic regions for soybean seed weight by genome-wide association study

Introduction

Materials and methods

Plant materials

Field trials and trait measurement

Statistical analysis

Genotyping and quality control

Population structure and linkage disequilibrium

Association mapping

Prediction of candidate genes

Results

Phenotypic analysis of four yield-related traits

Association mapping and candidate genes for soybean yield-related traits

Discussion

Conclusion

Availability of data and material

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Consent for publication

Ethics approval and consent to participate

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 35 KB)

Supplementary file2 (DOCX 233 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation