Introduction

Soybean [Glycine max (L.) Merr.] is an important crop that is grown worldwide as a source of edible oil and high-quality protein (Graham and Vance 2003). Soybean provides approximately 68% of protein meal and 57% of vegetable oil production worldwide (USDA-FAS, http://www.fas.usda.gov/). In accordance with the use of soybean cultivars, it can be divided into grain soybean and vegetable soybean. Grain soybean is harvested in the late maturation stage, when seeds are dehydrated, and is mainly employed for the production of oil, soybean milk, tofu and other bean products. Vegetable soybean is harvested at the R6 stage, when the pods are green and fully filled with the seeds, and is mainly used as a bean vegetable (Fehr and Caviness 1977; Young et al. 2000). With a long soybean breeding history, yield has been considered as one of the preferable traits. It is a very complex trait and depends upon a combination of different plant traits. Seed weight, the seed number per pod and the pod number per plant are three important yield component traits that directly determine yield (Zhou et al. 2009; Hao et al. 2012). Seed shape, plant architecture, the growing period and biomass yield are important yield-related traits that have been proven to influence yield indirectly in several crops (Li et al. 2019a, b; Zhang et al. 2018; Salas Fernandez et al. 2009; Zhang et al. 2015; Raboanatahiry et al. 2018). In addition, vegetable soybean has a high moisture content of approximately 57.0–79.0% in fresh seeds, and a significant positive correlation between moisture content and yield has been observed in soybean landrace populations (Li et al. 2019a, b). Thus, moisture content is also an important yield-related trait in vegetable soybean. In maize, high heritability (h2) of moisture content has been shown, and several quantitative trait loci (QTLs) that are simultaneously related to yield and moisture content have been detected (Austin et al. 2000). Many yield-related QTLs have been shown to be clustered with yield QTLs in several crops (Chen et al. 2007; Shi et al. 2009; Cai et al. 2014; Zhang et al. 2017). Therefore, understanding the genetic basis of yield-related traits will help us to improve yield in vegetable soybean.

Yield-related traits are complex traits that are controlled by multiple genes and influenced by environmental factors. In soybean, QTL analyses have been conducted to elucidate the genetic bases of yield-related traits in segregating populations derived from biparental crosses. More than 300 QTLs have been detected across the whole genome of soybean, most of which have been repeatedly identified in different genetic backgrounds (Kim et al. 2010; Liu et al. 2011; Han et al. 2012; Sun et al. 2012). These studies demonstrated that linkage mapping is an efficient approach for revealing the genetic architecture underlying complex quantitative traits. Linkage mapping has been conducted in cross-populations such as recombinant inbred lines (RILs). In these mapping populations, homozygous alternative alleles are segregated, and each allele is present in 50% of the recombinant lines; thus, high power to detect QTLs can be observed (Huang and Han 2014). However, the establishment of the RIL population is time consuming and labour intensive.

Genome-wide association studies (GWAS), also referred to as linkage disequilibrium (LD) mapping, are an effective method for studying the genetic architecture of complex traits. In contrast to linkage mapping, GWAS is a time- and labour-saving approach with a high resolution. It can be performed in natural populations with complex genetic backgrounds that have accumulated a large number of historic recombination events and can couple events with linked DNA polymorphisms to reveal associations with phenotypic diversity within a relatively small region of the genome (Rafalski 2010; Zhao et al. 2015). In addition, GWAS has the ability to reveal elite alleles at genetic loci in germplasm. GWAS has been successfully used in soybean. For example, 118 SNPs and 11 SSRs were identified for soybean flowering time under various photothermal conditions in 91 soybean cultivars from different maturity groups, four of which were large-effect markers located on Gm 11, Gm 16 and Gm 20 and were detected in eight environments (Mao et al. 2017). Based on SNP markers and SNP-based haplotype analysis, a total of 11, 17 and 59 SNP-based haplotypes have been shown to be associated with seed yield, 100-seed weight and plant height, respectively (Contreras et al. 2017). For seed size and shape, a total of 21 significant loci were identified in a soybean landrace population, which explained 8.12–14.25% of the observed phenotypic variance. The new locus SW9-1, on chromosome 9, is a reliable locus related to seed size traits that exerts a relatively high phenotypic effect (13.67) on 100-seed weight (Li et al. 2019a, b). For the vegetable soybean yield, Li et al. (2019a, b) detected 35 significant SNPs and proposed five potential candidate genes for soybean yield at the R6 stage (Li et al. 2019a, b). However, the genetic basis of yield and yield-related traits in soybean remains largely ambiguous.

In this study, we genotyped a soybean mini core collection composed of 224 germplasm accessions using 1514 SNP markers and surveyed four yield-related traits in two consecutive years. The objectives of this study were to (1) reveal the genetic basis of yield-related traits in soybean at the R6 stage and (2) evaluate marker-based allelic effects for application in vegetable soybean breeding programmes.

Materials and methods

Plant materials

Two hundred and twenty-four accessions obtained from the Chinese soybean mini core collection were selected from 23,587 germplasm accessions conserved in the Chinese National Soybean GenBank (CNSGB). The 224 accessions consisted of 196 landraces and 28 cultivars which had been reported by Huang (Table S1) (Huang et al. 2016). The population preserved approximately 63.5% of the soybean genetic diversity present in China and is an excellent population for identifying QTLs and revealing the genetic bases of target traits by association mapping (Huang et al. 2016; Qiu et al. 2009).

Field trials and trait measurement

The 224 accessions were planted in double-row plots in a randomized complete block design with three replications on the Experimental Farms of Nanjing Agricultural University located in two environments: Jiangpu (32.04° N 118.63° E, 2015 and 2016) in Nanjing, China. The rows were 4 m in length and 0.5 m in width, and there were 40 plants in a row. At the R6 stage, fifty fresh pods from normally developed plants were harvested for trait investigation in the autumn of 2015 and 2016.

Four yield-related traits, 100-pod fresh weight (PFW), 100-seed fresh weight (SFW), the kernel percent (KP) and the moisture content of fresh seeds (MCFS), were investigated. First, PFW (g) could be calculated by weighing the fresh pods with an electronic balance. Second, SFW (g) was measured in fresh seeds whose pod husks were stripped. Third, KP (%) was the ratio of the fresh seed weight to the fresh pod weight. Finally, the fresh seeds were killed at 110 °C for 30 min and dried at 65 °C in a dryer until a constant weight was reached. Then, the MCFS (%) could be calculated.

Statistical analysis

Analysis of variance (ANOVA) of the phenotype data across multiple environments was performed via the PROC GLM process in SAS version 9.4 (SAS Institute, Inc., Cary, NC, USA). The model for the phenotypic traits was as follows: \({y}_{ijk}=\mu +{\alpha }_{i}+{\beta }_{j}+{\gamma }_{kj}+{(\alpha \beta )}_{ij}+{\varepsilon }_{ijk}\), where µ is the overall mean; αi is the genetic effect of the ith genotype; βj is the effect of the jth environment; γkj is the block effect within the jth environment; (αβ)ij is the interaction effect between the ith genotype and the jth environment; and εijk is the random error following N(0, α2ε). Broad-sense heritability was calculated as follows:\({h}^{2}={\alpha }_{g}^{2}/({\alpha }_{g}^{2}+{\alpha }_{ge}^{2}/n+{\alpha }_{\varepsilon }^{2}/rn)\), where σ2g is the genotypic variance; σ2ge is the genotype by environment interaction variance; α2ε is the error variance; n is the number of environments; and r is the number of replications. All of the above variance values can be calculated by using the REML method via the SAS VARCOMP procedure.

The best linear unbiased prediction (BLUP) is the best statistical method for predicting breeding values of target traits under multi-environments. The BLUPs for PFW, SFW, KP and MCFS in each accession were calculated using PROC MIXED in SAS. Then the breeding values from the mixed BLUP model were used as phenotypic data for association mapping.

Genotyping and quality control

1645 SNPs were used to genotype the population and had been described by Huang et al. (2016). After excluding SNP markers with > 15% missing data, a minor allele frequency (MAF) < 5% and heterozygosity > 15% (Liu and Muse 2005), a total of 1514 high-quality SNPs were selected for association mapping (Table S2).

Population structure and linkage disequilibrium

Population structure analysis was performed using the Bayesian model-based clustering method in the program STRUCTURE v2.2 (Pritchard et al. 2000).Pairwise kinship (K) was calculated in TASSEL 5. Based on the Q and K, the Chinese soybean mini core collection was divided into two subgroups (Huang et al. 2016). The pairwise LD between SNPs was calculated as the squared correlation coefficient (r2) of alleles using TASSEL 5 software (Bradbury et al. 2007). The LD estimation had been reported by Huang and the average LD of the population was 544.01 kb (Huang et al. 2016).

Association mapping

To minimize the effects of environmental variation, the best linear unbiased predictions (BLUPs) of individual lines were calculated for each trait using the R package lme4 (Bates et al. 2012). A total of 1514 SNPs (MAF > 0.05) and four yield-related traits (PFW, SFW, KP and MCFS) for all genotypes from the association panel across two years were used to perform marker-trait association analysis. GWAS was performed with a mixed linear model (MLM) using TASSEL 5, which takes both familial relatedness and population structure into account. The significance threshold of marker-trait association was set to a P value < 10−3 (− Log10 P > 3.00).

Prediction of candidate genes

To identify candidate genes underlying the association signals, we selected significant SNP markers which identified in multiple environments and associated with large-effect QTLs to search candidate genes in their candidate regions (LD ± 544. 01 kb). The functional annotations and tissue expression of genes located in the candidate regions were obtained from Soybase (https://www.soybase.org/) and Phytozome (http://phytozome.net). The soybean reference genome was Wm82.a1.v1.1. Based on the soybean genomic annotations and expression data, potential candidate genes were predicted. The heatmap profile of the candidate genes were drawn with R software.

Results

Phenotypic analysis of four yield-related traits

A total of 224 soybean accessions were planted in two consecutive years from 2015 to 2016, and four yield-related traits were investigated. Abundant phenotypic variation was observed for all of these traits in the 224 soybean accessions (Table 1). The 100-pod fresh weight (PFW) showed 9.26 or 7.81 fold variation, ranging from 35.85 to 332.08 g and 41.71 to 326.04 g during 2015 and 2016, respectively. The 100-seed fresh weight (SFW) showed an 8.35 or 7.17 fold variation, ranging from 8.66 to 72.35 g and 8.24 to 59.13 g in 2015 and 2016, respectively. The remaining two traits of the kernel percent (KP) and the moisture content of fresh seeds (MCFS) displayed 1.46 to 1.64 or 1.26 to 1.39 fold variation, respectively. The analysis of the coefficient of variation (CV) for these traits showed similar results, suggesting that this population presented a high latent capacity for increases in yield. The skewness ranged from − 0.18 to 1.56, and kurtosis ranged from − 0.50 to 7.29 for the four traits in 2015 and 2016, respectively. Following normalization, the phenotypic data of KP (2015 and 2016) and MCFS (2016) were normally distributed, and the other two traits did not fit a normal distribution (Fig. 1; Table 1).

Table 1 Descriptive statistics of soybean yield-related traits over 2 years
Fig. 1
figure 1

Frequency distributions of 100-pod fresh weight (PFW), 100-seed fresh weight (SFW), kernel percent (KP) and moisture content of fresh seed (MCFS) in 2015 and 2016

Variance analysis of the four yield-related traits showed that the effects of the genotype (G), the environment (E) and the interaction between the genotype and environment (G × E) were all at very significant level (P < 0.0001) (Table 2). The broad-sense heritability (h2) of the four traits was calculated (Table 1). PFW, SFW and MCFS showed a high heritability (≥ 80%), indicating that genetic effects play a predominant role in the phenotype variation of these traits. KP showed a low heritability (54%) and had a great phenotype variation between years. In addition, there was a significant positive correlation between PFW, SFW and MCFS (rp 0.198–0.946, P < 0.01), suggesting that MCFS is also an important factor influencing soybean yield development at the R6 stage (Table 3).

Table 2 ANOVA results for the effect of year, genotype (G) and their interaction for 100-pod fresh weight (PFW), 100-seed fresh weight (SFW), kernel percent (KP) and moisture content of fresh seeds (MCFS)
Table 3 Correlation coefficients among four yield-related traits

Association mapping and candidate genes for soybean yield-related traits

Through GWAS, a total of 45 associations for the four yield-related traits (− Log10(P) > 3) were identified under the MLM conditions, which accounted for both population structure and familial relatedness. The 45 associations included 16 SNPs located on 10 of 20 soybean chromosomes (Chr) (Figs. 2, 3, 4, 5). Six, five and seven SNPs associated with PFW were identified in 2015, 2016 and the BLUP across 2 years, respectively, representing 8 genomic regions covering six soybean chromosomes. The contribution of individual SNPs to the observed phenotypic variation was 5.13–11.42%. Four SNP markers, Map-0676 on Chr 04, Q-07-0082152 on Chr 07, and Map-2049 and Map-2055 on Chr 11, were repeatedly detected in all environments (Table 4). For SFW, four, six and six significant SNPs were detected in 2015, 2016 and the BLUP across 2 years, respectively, which represented 8 genomic regions covering seven chromosomes. Each SNP could explain 5.25–9.78% of the observed phenotypic variation. Two SNP markers, Map-2049 and Map-2055, were repeatedly detected in all environments (Table 5). Four, one and two SNP markers were detected for KP using the BLUP across two years and within the individual years, respectively. These SNPs were located in 5 genomic regions across the four chromosomes and explained 3.05–3.68% of the phenotypic variation for KP. No SNPs were repeatedly identified in all environments. Only the SNPs Map-2976 and Q-05-0211601 were repeatedly detected in individual year and the BLUP (Table 6). For MCFS, two, one and one SNP markers were detected in 2015, 2016 and the BLUP across 2 years, respectively, which were located in the same genomic region on chromosome 5. The phenotypic variation interpretation rate of a single SNP was 5.79–8.69%. There was only one SNP marker (Q-05-0194474 on Chr 5) that was repeatedly detected in all environments (Table 7).

Fig. 2
figure 2

Manhattan plots and QQ plots of estimated − log10(P) from association analysis of PFW. a 2015 PFW: 100-pod fresh weight in 2015, b 2016 PFW:100-pod fresh weight in 2016; c BLUP PFW: 100-pod fresh weight in the BLUP across 2 years

Fig. 3
figure 3

Manhattan plots and QQ plots of estimated − log10(P) from association analysis of SFW. a 2015 SFW:100-seed fresh weight in 2015; b 2016 SFW:100-seed fresh weight in 2016; c BLUP SFW: 100-seed fresh weight in the BLUP across 2 years

Fig. 4
figure 4

Manhattan plots and QQ plots of estimated − log10(P) from association analysis of KP. a 2015 KP: kernel percent in 2015; b 2016 KP: kernel percent in 2016; c BLUP KP: kernel percent in the BLUP across 2 years

Fig. 5
figure 5

Manhattan plots and QQ plots of estimated − log10(P) from association analysis of MCFS. a 2015 MCFS: moisture content of fresh seed in 2015; b 2016 MCFS: moisture content of fresh seed in 2016; c BLUP MCFS: moisture content of fresh seed in the BLUP across 2 years

Table 4 SNPs associated with 100-pod fresh weight (PFW)
Table 5 SNPs associated with 100-seed fresh weight (SFW)
Table 6 SNPs associated with kernel percent (KP)
Table 7 SNPs associated with moisture content of fresh seed (MCFS)

To identify the beneficial allele of each SNP associated with the target trait, the average phenotypic difference was calculated from the soybean accessions that possessed different SNP alleles. On average, the cultivars carrying the favorable allele (allele 1) exhibited higher phenotypic values than those with the unfavorable allele (allele 2) and the entire association panel. For PFW, the phenotypic difference between the cultivars with allele 1 and the cultivars with allele 2 reached 7.10–25.84 g. The phenotypic difference between the cultivars with allele 1 and the entire association panel was 0.71–5.83 g (Table 4). For SFW, the phenotypic values of the cultivars with allele 1 were 1.33–4.05 g higher than those of the cultivars with allele 2 and were 0.27–1.53 g higher than those of the entire association panel (Table 5). The difference in KP between the cultivars carrying allele 1 and allele 2 was 0.45–1.38%. The mean KP of the cultivars with allele 1 was 0.13–1.22% higher than that of the entire association panel (Table 6). With respect to MCFS, the cultivars with allele 1 presented approximately 2.13–2.87% and 0.58–1.34% higher phenotypic values than those with allele 2 and the entire association panel (Table 7).

In the current study, we identified a series of pleiotropic SNPs associated with multiple traits. Six common SNP markers (Map-0676, Q-07-0082152, Map-2049, Map-2055, Map-2565 and Q-18-0006557) were associated with both PFW and SFW, and one SNP marker, Q-05-0194474, was associated with both SFW and MCFS (Table 8). This was consistent with the significant positive correlations among PFW, SFW and MCFS. This result suggested that the positive correlations between PFW and SFW, SFW and MCFS mainly occurred at the molecular level via coincident quantitative trait nucleotides (QTNs). Thus, these coincident QTNs are critical for breeding vegetable soybean cultivars with a high yield.

Table 8 SNPs significantly associated with the four yield-related traits and previously reported QTLs at similar genome regions

A total of nine significant SNP markers were repeatedly detected in at least two environments (2015, 2016, and the BLUP across the two years) (Table 8). Among these markers, eight SNP markers were located in or near regions where QTLs have been previously mapped by linkage analysis. These reported QTLs were mainly related to seed yield, seed weight, seed shape, pod wall width and pod maturity. In this study, only one SNP marker (Map-2565) was identified as a new locus associated with soybean yield at the R6 stage.

To shed light on the potential candidate genes involved in the development of soybean yield at the R6 stage, two SNP markers, Map-0676 and Q-05-0194474, which were repeatedly detected in multiple environments and associated with large-effect QTLs were selected to conduct the candidate gene analysis. The PFW and SFW marker Map-0676 (R2 7.64–11.42%) was located on chromosome 4, the PFW and SFW of the materials carrying the favorable allele (AA) at Map-0676 were 6.77 g and 1.63 g higher than the materials carrying the unfavorable allele (GG), respectively (Tables 4, 5). The MCFS marker Q-05-0194474 (R2 5.79–8.69%) located on chromosome 5, the MCFS of the materials carrying the favorable allele (AA) at Q-05-0194474 was 2.13% higher than the materials carrying the unfavorable allele (GG) (Table 7). Based on gene annotation and tissue expression, we selected 12 potential candidate genes for soybean yield at the R6 stage in the candidate regions of Map-0676 (42369904–43457924) and Q-05-0194474 (37953252–39041272) (Table 9). The candidate genes encode a plant intracellular ras-group-related leucine-rich repeat protein (PIRL9), a RAB GTPase homolog E1B (RABE1b), a FUS3-complementing gene protein (FC2), a seed imbibition protein (SIP2), a protein-l-isoaspartate methyltransferase (PIMT1), a BTB and TAZ domain protein (BT2), a pectin lyase-like superfamily protein (PPME1), a retinoblastoma-related protein (RBR1), a HCP-like superfamily protein, a phosphatidylethanolamine-binding protein (MFT) and two relative of early flowering proteins (REF6), respectively. The 12 genes showed significant difference in expression during the development of soybean pod (Fig. 6). The combined results suggested that these genes would be as the potential candidate genes, which might participate in the development of soybean yield at the R6 stage.

Table 9 Candidate genes for the four yield-related traits
Fig. 6
figure 6

Heatmap profiles of the candidate genes in tissues

Discussion

The large phenotype variations observed within the yield-related traits allowed us to identify the best genes with the largest effects (Table 1). The KP (2015 and 2016) and MCFS (2016) traits were normally distributed, but the other two traits were not normally distributed, the different results might be caused by the small sample size used in this study. Heritabilities ranged from 80 to 96% for PFW, SFW and MCFS. Heritability for KP was only 54%. The low heritability of KP might due to the significant genotype-by-environment interaction justified by ANOVA (Table 2). In previous study, a low heritability (51%) for KP and a low correlation coefficient (0.057–0.13) between different environments were observed in maize (Ma et al. 2021), which was similar with the present study.

Soybean yield-related traits constitute complex quantitative traits that are governed by multiple genetic loci, each of which mostly displays minor effects (Zhang et al. 2016). Because of the environmental influence, loci with minor effects are often poor repeatable. Up to date, numerous soybean yield-related trait QTLs have been reported in different independent experiments, but specific and in-depth analyses of these loci are still lacking. Using a mini core collection with abundant genetic variations, a total of 45 significant yield-related associations were identified, which involved 16 SNP markers. Nine SNP markers were repeatedly detected in different environments, indicating that these SNPs were essential for yield-related traits. One of these SNPs was newly associated with soybean yield, and eight were located in or near known QTL regions (Table 8). Five stable SNP markers (Map-0676 on Chr 04, Q-07-0082152 on Chr 07, Map-2049 and Map-2055 on Chr 11, and Q-18-0006557 on Chr 18) that were simultaneously associated with PFW and SFW, overlapped with ten previously reported QTLs, including five seed yield-related QTLs (seed yield 1-2, seed weight 47-3, seed weight 49-15, seed weight 7-2, seed weight 10-8) (Mansur et al. 1993; Li et al. 2010; Teng et al. 2009; Orf et al. 1999; Specht et al. 2001), four seed and pod shape-related QTLs (seed length 1-15, seed length 5-3, seed thickness 1-5, pod wall width 1-7) (Salas et al. 2006; Hu et al. 2013; Tae-Hwan et al. 2014; Guo et al. 2011), and one pod maturity-related QTL (pod maturity 33-1) (Sun et al. 2013). Two stable SNP markers (Q-05-0211601 on Chr 05 and Map-2976 on Chr 16) that associated with KP were located in or near the reported QTL regions containing three yield-related QTLs (seed weight 7-3, seed weight 2-6, seed yield 29-2), one seed shape-related QTL (seed height 1-4) and one pod maturity-related QTL (pod maturity 9-1) (Orf et al. 1999; Salas et al. 2006; Mian et al. 1996; Eskandari et al. 2013). The last stable SNP marker, Q-05-0194474 on Chr 05, was concurrently related to MCFS and SFW; this marker was situated in a yield-related QTL region (seed weight 10-1) located 60 kb away from a seed shape QTL (seed thickness 1-3) (Specht et al. 2001; Jun et al. 2014).

In this study, a total of 12 potential candidate genes associated with soybean yield at the R6 stage were predicted within the candidate regions of two large-effect SNP markers (Table 9 and Fig. 6). Among these 12 genes, 10 genes are proposed as the candidate genes for PFW and SFW. Glyma04g36190 encodes a plant intracellular Ras-group-related leucine-rich repeat protein (PIRL9). In Arabidopsis thaliana, PIRL1 and PIRL9 are essential for differentiation of microspores into pollen, the T-DNA knockout double-mutants stopped growing as microspores, failed to initiate vacuolar fission, aborted, and underwent cytoplasmic degeneration (Forsthoefel et al. 2010). Glyma04g36430 encodes protein-l-isoaspartate methyltransferase 1(PIMT1). In Oryza sativa, PIMT1 is involved in the repair of damaged proteins containing abnormal isoaspartyl (isoAsp) which accumulated as seeds age, the overexpression of OsPIMT1 reduced the accumulation of isoAsp-containing protein in embryos and increased embryo viability (Wei et al. 2015). Glyma04g36500 encodes a BTB and TAZ domain protein (BT2). In Arabidopsis thaliana, BT proteins perform crucial roles in gametophyte development and BT2 seems to be the predominant gene in this process (Robert et al. 2009). The expression levels of Glyma04g36190, Glyma04g36430 and Glyma04g36500 showed significant increase in the late stage of seed development (seed 21–42 days), it suggested that these genes might participate in the soybean seed development, and thereby influence the soybean yield. Glyma04g36250 is a homologous to AT4G20360 which encodes a RAB GTPase homolog E1B (RABE1b). In Arabidopsis thaliana, RABE1b plays an important role in cell division during the embryonic development (Li et al. 2011). Glyma04g36360 is a homologous to AT4G24740, which encodes a FUS3-complementing protein. FUS3 is an important component of the regulatory pathway controlling late embryogenesis and seed maturation (Reidt et al. 2001). Glyma04g36410 encodes a seed imbibition protein (SIP2). SIP2 is an alkaline-galactosidase with a distinct substrate specificity for raffinose (Peters et al. 2010). In plants, the raffinose family of oligosaccharides have been proposed to play roles in the acquisition of desiccation tolerance in seeds (Ooms et al. 1993; Brenac et al. 1997). Glyma04g36520 encodes a pectin lyase-like superfamily protein. In Japanese apricot, the pectin lyase-like superfamily protein (PPME1) have relationships with the pistil abortion which would affect flowers development and seriously decrease the yield in production (Ting et al. 2012). In soybean development, the genes Glyma04g36250, Glyma04g36360, Glyma04g36410 and Glyma04g36520 display a similar expression pattern: they are highly expressed in the early pod development stage. It indicates that they may affect the early pod development by influencing cell division, embryonic development, seed imbibition and pistil fertility. Glyma04g36620 and Glyma04g36630 are homologous to AT3G48430, which encodes a relative of early flowering protein (REF6). In Arabidopsis thaliana, REF6 may participate in the regulation of flowering time and the formation of cotyledon boundary (Noh et al. 2004; Cui et al. 2016). Tissue expression analysis showed the expression of Glyma04g36620 and Glyma04g36630 were highly expressed in early pod and seed development, but were lowly expressed in late pod and seed development. It suggested that the dynamic changes of REF6 genes expression might involved in the regulation of soybean pod and seed development. Glyma04g36700 encodes a retinoblastoma-related protein (RBR1). In Oryza sativa, the RBR1 co-suppression mutants showed low seed setting rate or complete sterility (Zhang 2012). There was a high-level expression of Glyma04g36700 in root, flower, young leaf and pod. It hinted that Glyma04g36700 might play a positive role in soybean yield formation. For MCFS, there were only two genes, Glyma05g33770 and Glyma05g34030, predicted as the potential candidate genes. Glyma05g33770 encodes a HCP-like superfamily protein which might involved in the regulation of cytosolic calcium, and it was upregulated and then downregulated under flooding and drought stress (Xin and Setsuko 2017). The maturity of soybean seed need to go through the seed expanding and dhydration stage. Glyma05g33770 was upregulated in seed expanding stage (seed 10–25 days), and downregulated in seed dhydration stage (seed 28–42 days). It suggested that it might be related to the change of moisture content during seed development. Glyma05g34030 encodes a phosphatidylethanolamine-binding protein which shares 83.2% amino acid sequence identity with MOTHER OF FT AND TFL1 (MFT) in Arabidopsis thaliana, it might regulate seed germination via the ABA and GA signaling pathways (Xi and Yu 2010). In previous study, Glyma05g34030 was predicted as the candidate gene of MCFS and displayed a significantly different expression between varieties with different moisture content (Li et al. 2019a, b). Glyma05g34030 was highly expressed during the seed development, it suggested that Glyma05g34030 might related to the MCFS of soybean seed development. In this study, we postulate that the twelve genes are potential candidate genes for PFW, SFW and MCFS. However, further evidence is needed to functionally validate this hypothesis.

Conclusion

In this study, we identified 45 significant associations for PFW, SFW, KP and MCFS in all environments via GWAS, which included 16 SNP markers. Nine SNP markers were repeatedly detected in different environments, among which eight were located in or near yield-related QTLs that were previously reported, and one was a new locus associated with soybean yield. We were particularly interested in the large-effect markers Map-0676 and Q-05-0194474, which explained 7.64%-11.42% and 5.79%-8.69% of the observed phenotypic variance, respectively. According to genetic annotation and tissue expression analysis, a total of twelve genes, including Glyma04g36190, Glyma04g36250, Glyma04g36360, Glyma04g36410, Glyma04g36430, Glyma04g36500, Glyma04g36520, Glyma04g36620, Glyma04g36630, Glyma04g36700, Glyma05g33770 and Glyma05g34030, were proposed as potential candidate genes of soybean yield at the R6 stage, which requires further investigation for verification. These results provide useful insights for the yield improvement of vegetable soybean.