Introduction

The aims of soybean breeding are to increase seed yield and to improve seed quality. Seed yield per unit area is the product of number of plants per unit area, number of seeds per plant and 100-seed weight. The 100-seed weight is affected by seed size, measured by length, width and thickness. Seed size is a major target of breeding, not only as a component of seed yield (Liang et al. 2005) but also as a morphological quality trait (Wilson 1995). In addition, commercial value in international trade partly depends on seed appearance (Cui and Xuan 2007). Therefore, the great economic importance associated with seed size necessitates in-depth study of its genetic basis and developmental mechanism to better understand biological development processes and to facilitate breeding in soybean.

During the past decade, many attempts have been made to dissect the genetic mechanism of seed size in rice, peanut, maize and barley (Zheng et al. 1985; Ayoub et al. 2002; Salas et al. 2006; Rabiei et al. 2004), and the most progress has been made in rice. Grain length in rice is a quantitative trait controlled by multiple major genes (Kazuyoshi and Ayumi 1980) or by polygenes with an additive effect (Shi and Shen 1994). Recently, a molecular quantitative genetics approach has been used to detect quantitative trait loci (QTL) for grain size traits (Lin et al. 1995; Rabiei et al. 2004; Wan et al. 2008). Many QTL for grain weight and size in rice have been detected (Sakamoto and Matsuoka 2008), including the following three that were recently identified: GS3 (Fan et al. 2006), GW2 (Song et al. 2007) and qSW5 (Shomura et al. 2008; Wan et al. 2008). Although the functions of the genes in these three QTL have not yet been clarified, cloning of such genes will provide the opportunity to characterize the regulatory mechanisms of grain development and create a potential tool for improving grain size.

Although there has been substantial research on the classic inheritance of seed size and seed weight in soybean (Kim et al. 2000; Johnson et al. 2001; Orf et al. 1999; Mansur et al. 1993, 1996; Leroy et al. 1991), few studies have been done on the molecular inheritance of seed size and shape traits. Nelson and Wang (1989) showed that seed size and seed shape can be inherited stably; Cober et al. (1997) indicated that seed shape is independent of seed size, and a similar result has been recently shown in wheat by Gegas et al. (2010). The estimate of heritability is 59–79% for seed shape and 19–56% for seed size (Cober et al. 1997). Liang et al. (2005) carried out an incomplete diallele cross of eight varieties with their F1 and F2 populations and showed that the inheritance of seed weight and seed length (SL) is mainly controlled by cytoplasmic effects, whereas seed width (SW) and thickness (ST) are mainly controlled by maternal effects. Within the above studies, the collective properties of genes were studied, and the position and the effects of single QTL or gene are not clear. Therefore, a method of locating QTL should be carried out (Mansur et al. 1993). Recently, Salas et al. (2006) mapped the QTL responsible for seed shape traits with three densely mapped recombinant inbred populations. A total of 19 significant QTL in 10 linkage groups was detected for all seed shape traits: only one QTL was stable across populations and environments, and six were stable in at least two populations in both environments. Three QTL for SL were detected in the linkage groups F, J and M by Li et al. (2008). In addition, seven QTL for SL, three for SW and three for ST were mapped by Liang et al. (2008). However, environment-by-QTL and cytoplasm-by- QTL interactions have not been reported.

The objectives of this study were to obtain information on the QTL of SL, SW and ST and their corresponding ratios—length-to-width (SLW), length-to-thickness (SLT) and width-to-thickness (SWT)—in soybean, including main-effect QTL (M-QTL), environment-by-QTL interaction, cytoplasm-by-QTL interaction and the gene action of identified QTL.

Materials and methods

Mapping population

The direct and reciprocal crosses were made between Lishuizhongzihuang (LSZZH) and Nannong 493-1 (N493) in 2005 because of highly significant difference between the two parents for 100-seed weight (13 g for LSZZH and 20 g for N493, P < 0.01). Two hundred and forty-four F2 plants of the direct cross and 260 F2 plants of the reciprocal cross were produced in 2006. All the F2 plants of the above two crosses were selfed to develop F2:3, F2:4 and F2:5 families, which were used in this study.

Trait evaluation

All the F2:3, F2:4 and F2:5 families of the above two crosses were planted in three-row plots in a completely randomized design and evaluated at the Jiangpu experimental station at Nanjing Agricultural University in 2007, 2008 and 2009, respectively. Plots were 1.5 m wide and 2 m long. Five individuals and 20 seeds in the middle row of each plot were randomly picked out to measure seed size traits—SL, SW and ST—as well as seed shape traits—SLW, SLT and SWT ratios—by digital vernier caliper. The measurement of each seed trait was averaged over 20 seeds.

Simple and partial correlation analysis

Simple and partial correlation analysis among the above traits in soybean was applied. All statistical tests were two sided. P < 0.05 was considered statistically significant. The SAS 9.20 statistical package was used.

DNA extraction, PCR reaction and polymorphism detection

Approximately 0.3 g fresh leaves obtained from each F2 plant of the above two crosses in 2006 was used to extract genomic DNA using the cetyl-trimethyl- ammonium bromide method as described by Lipp et al. (1999). To screen for polymorphisms between soybean cultivars LSZZH and N493, 972 simple sequence repeat (SSR) primer pairs were examined. Primer sequences were obtained from the soybean database Soybase (http://www.ncbi.nlm.nih.gov). PCR was performed in a volume of 15 μl containing 5 μl template DNA (20 ng/μl), 3 μl primers, 1.5 μl 10 × PCR buffer (with 15 mM/l Mg2+), 0.2 μl dNTP (10 mM/l), 0.15 μl Taq enzyme (5 U/μl) and 5.15 μl ddH2O. The PCR profile included one cycle of 95°C for 2 min; followed by 35 cycles of 94°C for 30 s, 47–55°C (depending on the specific primers) for 45 s and 72°C for 1 min; and a final extension at 72°C for 10 min. PCR products were separated on 8% non-denaturing polyacrylamide gels with a 29:1 ratio of acrylamide: bisacrylamide and then silver-stained as described by Santos et al. (1993).

Genetic linkage map construction and QTL mapping

JoinMap 4.0 (Van Ooijen 2006) was employed to construct a linkage map with a maximum recombinant fraction of 0.40 and a minimum LOD score of 3.0. The recombination frequency was converted to genetic map distance (centi-Morgan, cM) using the Kosambi mapping function (Kosambi 1944). Linkage groups were assigned to chromosomes based on the publicly available linkage map (Song et al. 2004) and soybean genome sequence (Schmutz et al. 2010), with those unassigned denoted as XA, where A represents serial numbers.

QTL analysis was performed using multi-QTL joint analysis under the framework of penalized maximum likelihood (Zhang and Xu 2005; Dou et al. 2010), along with composite interval mapping (CIM) by Cartographer v2.5 (Wang et al. 2007). In the joint analysis, the average phenotypic value of quantitative trait for the ith F2:r family of the jth family population of the kth cross (\( i = 1, \ldots ,n_{k} \); \( j = 1,2,3 \); \( k = 1,2 \)), \( y_{ijk} \), may be described by the following model:

$$ y_{ijk} = \mu + x_{c} c + E + G + GE + GC + \varepsilon_{i} $$
(1)

where μ is the population mean; c is cytoplasmic effect, and the dummy variable \( x_{c} \) is defined as \( x_{c} = 1 \) for a direct cross and \( x_{c} = - 1 \) for a reciprocal cross; E is environmental effect; G is genetic effect; GE is environment-by-QTL interaction effect; GC is cytoplasm-by-QTL interaction effect; ɛ i is a residual error with an assumed \( N(0,\sigma^{2} ) \) distribution; and

$$ \begin{gathered} E = \sum\limits_{j = 1}^{R - 1} {x_{ij} } r_{j} \hfill \\ G = \sum\limits_{l = 1}^{m} {(x_{i(R - 1 + 2l - 1)} } a_{l} + x_{i(R - 1 + 2l)} d_{l} ) \hfill \\ GE = \sum\limits_{j = 1}^{R - 1} {\sum\limits_{l = 1}^{m} {\left[ {x_{ij} x_{i(R - 1 + 2l - 1)} (ae)_{jl} + x_{ij} x_{i(R - 1 + 2l)} (de)_{jl} } \right]} } \hfill \\ GC = \sum\limits_{l = 1}^{m} {\left[ {x_{c} x_{i(R - 1 + 2l - 1)} (ac)_{l} + x_{c} x_{i(R - 1 + 2l)} (dc)_{l} } \right]} \hfill \\ \end{gathered} $$

where R and m are the numbers of environments and QTL, respectively; a is additive effect; d is dominant effect; ae is additive-by-environment interaction effect; de is dominant-by-environment interaction effect; ac is additive-by-cytoplasm interaction effect; dc is dominant-by-cytoplasm interaction effect; and x is a dummy variable for various effects. The pseudomarker approach uses the multi-marker analysis with a slight modification by inserting virtual markers into all marker intervals >5 cM. Because of incomplete genotypic information in the real data analysis, multiple permutations for incomplete marker genotypes (Sen and Churchill 2001) were adopted to simulate the incomplete genotypes. This required multiple analyses of the data, one for each imputed dataset. Although 10–20 imputed datasets may suffice (Sen and Churchill 2001; Xu and Jia 2007), we imputed 50 samples in this study. For each sample, the complete genotypes sampled were used to construct the design matrix for QTL effects in model (1). All the effects in model (1) were simultaneously estimated by the penalized maximum likelihood method of Zhang and Xu (2005). The samples in which the LOD statistic was greater than 2.5 were counted. Furthermore, LOD score values between 2.0 and 2.5 were used to detect suggestive QTL, as suggested by Lander and Kruglyak (1995). A QTL detected in which the ratio of the number of such samples to the total number of imputed samples (50) exceeded 10% was considered a true QTL. The QTL position is an average weighted by the total genetic variance of QTL detected.

In the analysis with CIM for the jth family population of the kth cross, the standard model (Model 6), which takes forward stepwise regression with backward elimination, was adopted with a walk speed of 1 cM to search for QTL and identify cofactors. The window size was set at 10 cM, and the five background markers with the highest P value were used as cofactors to control the genetic background for each trait. A LOD score threshold of 2.5 was used to declare the presence of a putative QTL in a given genomic region. QTL confidence intervals (90–95%) were set as map intervals corresponding to one LOD decline on either side of the peak. The phenotypic variation explained (PVE) by QTL (it is similar to heritability of one QTL), and the additive and dominant effects of each QTL for all traits were calculated as well.

Results

Construction of genetic linkage map in soybean

A total of 972 SSR primer pairs covering the whole genome were used to screen for polymorphisms between the soybean cultivars LSZZH and N493. Of these, 150 primer pairs showed polymorphisms among the two parents and all the F2 plants. Thus, 15.4% of the SSR primers revealed polymorphic bands among the parents and the F2 plants. By using JoinMap 4.0, 113 SSR markers were mapped into 34 linkage groups, which were assigned to 19 chromosomes except for four genetic linkage maps. The total length of the linkage maps was 1,557.85 cM, with an average marker spacing of 13.79 cM.

Phenotypic evaluations

Mean value, standard deviation, range, skewness and kurtosis for each trait measured in the parent, F2:3, F2:4 and F2:5 populations for 3 years of growing seasons were calculated (Table 1). There was evidence to indicate that the two average values of the parents for each trait were significantly different (P < 0.05). All traits from six datasets exhibited continuous distribution in the above populations and almost showed a normal distribution with skewness and kurtosis statistics, typical of quantitative traits.

Table 1 Phenotypic variation of seed traits in 504 F2:3, F2:4 and F2:5 families from the soybean cross of Lishuizhongzihuang (P1) by Nannong 493-1 (P2)

Simple and partial correlations among seed size traits based upon F2:3, F2:4 and F2:5 family means from 2007 to 2009 are given in the top right and bottom left corners in Table 2, respectively. In a simple correlation analysis, there was a significant positive correlation between any two traits among seed size traits (SL, SW and ST) but a significant negative correlation between thickness and each seed shape trait (SLW, SLT and SWT). In the partial correlation analysis, significant positive correlations for most trait pairs, except for SL and ST (or SWT), SW and SLW (or SLT) and SWT and SL (or ST, or SLW), were observed.

Table 2 Simple and partial correlation coefficients for seed traits in soybean

Mapping M-QTL

A total of 121 M-QTL: 21 for SL, 21 for SW, 22 for ST, 24 for SLW, 18 for SLT and 15 for SWT were detected by the multi-QTL joint analysis of the six datasets (Supplement Table A1), whereas a total of 67 M-QTL: 10 for SL, 12 for SW, 12 for ST, 19 for SLW, 8 for SLT and 7 for SWT were identified by CIM in the direct and reciprocal cross populations of the F2:3, F2:4 and F2:5 families. A summary of all 52 common M-QTL detected, including marker interval, position and LOD score, additive and dominant effects and percentage of the PVE by the QTL is shown in Table 3 and Fig. 1.

Table 3 Common main-effect QTL identified by multi-QTL joint analysis and composite interval mapping
Fig. 1
figure 1

Chromosome locations of QTL associated with seed length (SL), width (SW), thickness (ST) and their corresponding ratios (length-to-width, SLW; length-to-thickness, SLT; and width-to- thickness, SWT) in the F2:3, F2:4 and F2:5 populations from the direct and reciprocal crosses of Lishuizhongzihuang × Nannong 493-1 soybeans (Glycine max L. Merr.), with the software of Mapchart 2.2 (Voorrips 2002). Positions of loci are given in centi-Morgan. Fifty-two main-effect QTL, 4 environmental interaction QTL and 28 nucleo-cytoplasmic interaction QTL are shown for SL, SW, ST, SLW, SLT and SWT

Eight common SL M-QTL, with heritabilities of 4.51–14.94%, were simultaneously identified by both the joint analysis and CIM and mapped to chromosomes 6, 10, 16, 17, 18 and 19. Of these QTL, seven M-QTL had negative additive effects, −0.081 to −0.037, which indicates that N493 contributed one allele leading to an increase in SL. Two M-QTL, qSL-10-2 and qSL-18, were identified in more than one environment by CIM, and the heritabilities of qSL-10-2 and qSL-10-3 were greater than 10%. In addition, the heritabilities of qSL-2-2 and qSL-14-1, which were detected by CIM but not by the joint analysis, were 16.10 and 24.87%, respectively.

Nine common SW M-QTL, located on chromosomes 2, 3, 5, 6, 7, 10, 14, 18 and 20, were identified simultaneously by both the joint analysis and CIM, with heritabilities ranging from 5.61 to 28.21%. Of these QTL, seven M-QTL had negative additive effects, −0.054 to −0.011, which indicates that N493 contributed one allele leading to an increase in SW. Two QTL, qSW-6-1 and qSW-20, were detected in more than one environment by CIM, and qSW-6-1 was mapped simultaneously in the F2:3, F2:4 and F2:5 family populations. The heritabilities of qSW-2-1, qSW-6-1, qSW-14-2 and qSW-20 were greater than 10%.

Eight common ST M-QTL, located on chromosomes 3, 5, 6, 7, 18 and 20, were found simultaneously by the joint analysis and CIM, with heritabilities ranging from 5.43 to 23.69%. Of these QTL, six had negative additive effects, −0.047 to −0.002, which means that N493 contributed one allele leading to an increase in ST. One QTL, qST-5-2, was confirmed in more than one environment by CIM. The heritabilities for qST-5-2 and qST-7-2 were greater than 10%.

Sixteen common SLW M-QTL, located on chromosomes 1, 2, 5, 6, 10, 13, 16, 17, 18, 19 and X3, were identified simultaneously by the joint analysis and CIM, with heritabilities ranging from 2.21 to 30.30%. Of these QTL, 11 had negative additive effects, −0.054 to −0.032, which means that N493 contributed one allele leading to an increase in seed thickness. Ten QTL, qSLW-1-1, qSLW-2, qSLW-5-1, qSLW-5-2, qSLW-6, qSLW-10-2, qSLW-13-1, qSLW-17-4, qSLW-17-5 and qSLW-18, were found in more than one environment by CIM, and three of them, qSLW-10-2, qSLW-13-1 and qSLW-17-5, were mapped simultaneously in the F2:3, F2:4 and F2:5 family populations. The heritabilities for the three QTL, qSLW-5-1, qSLW-10-2 and qSLW-16-1, were greater than 10%.

Five common SLT M-QTL, located on chromosomes 10, 13, 20 and X3, with heritabilities ranging from 4.25 to 12.78%, were detected simultaneously by the joint analysis and CIM. Of these QTL, three had a negative additive effect, −0.088 to −0.054, which means that N493 contributed one allele leading to an increase in seed thickness. Five QTL, qSLT-10-1, qSLT-10-2, qSLT-13, qSLT-20 and qSLT-X3, were mapped in more than one environment by CIM. One QTL, qSLT-20, had a heritability greater than 10%.

Six common SWT M-QTL, located on chromosomes 3, 6, 13, 15 and 20, with heritabilities ranging from 4.85 to 48.92%, were identified simultaneously by the joint analysis and CIM. Of these QTL, three had negative additive effects, −0.065 to −0.061, which means that N493 contributed one allele leading to an increase in seed thickness. One QTL, qSWT-20-1, was mapped simultaneously in all populations, and qSWT-3-2 and qSWT-20-1 had a high heritability, greater than 10%, particularly qSWT-3-2 (48.92%).

In conclusion, the following are five common M-QTL in all three populations: qSW-6-1, qSLW-10-2, qSLW-13-1, qSLW-17-5 and qSWT-20-1. The following five M-QTL had a heritability of greater than 20%: qSL-14-1, qSW-14-2, qST-5-2, qSLW-10-2 and qSWT-3-2.

Cytoplasmic effect and cytoplasm-by-QTL interaction

Significant cytoplasmic effects, estimated to be between −0.151 and −0.063, were evident for all traits except for SLW. A total of 92 cytoplasm-by-QTL interactions: 10, 11, 11, 23, 18 and 19 interactions for SL, SW, ST, SLW, SLT and SWT, respectively, were detected (Supplement Table A2). Among these interactions, 28 were consistent with M-QTL detected by CIM (Table 4). Among the 28 interactions, three QTL, qSL-10-3, qSLW-10-2 and qSLW-X3, were detected in the direct and reciprocal crosses but had different effects and heritabilities across the direct and reciprocal crosses, and the others were detected in only the direct or the reciprocal cross.

Table 4 Cytoplasm-by-QTL interactions that are consistent with main-effect QTL detected by composite interval mapping

Environmental effect and environment-by-QTL interaction

Significant environmental effects were observed for all the traits (Supplement Table A4). A total of 8 environment-by-QTL interactions: 2, 2, 2 and 2 interactions for the SL, SW, ST and SWT, respectively, were identified (Supplement Table A3). Among these interactions, four were consistent with M-QTL identified by CIM (Table 5) and mapped only in one environment.

Table 5 Environment-by-QTL interactions that are consistent with main-effect QTL detected by composite interval mapping

Discussion

In mapping QTL for quantitative traits, CIM is a widely used approach because it solves the problem of multiple QTL residing in the same linkage group. However, it can only detect QTL in a single population. To compensate for this shortcoming, MJA is often used. There are some advantages in using MJA on the F2:3, F2:4 and F2:5 populations over CIM in one population. First, environment-related effects can be identified, e.g., the six environmental effects and eight environment-by-QTL interactions that were identified in this study. Second, cytoplasm-related effects can be found, e.g., the five cytoplasmic effects and 92 cytoplasm-by-QTL interactions that were detected in this study. Third, it can validate results derived from CIM, e.g., the 52 M-QTL detected in the F2:3, F2:4 and F2:5 that were further confirmed by MJA. Finally, it can detect new QTL not identified by CIM, e.g., the 69 new QTL that were found in MJA. Therefore, a multi-environment, multi-marker, multi-cross (direct and reciprocal crosses) and multi-population joint analysis was adopted to dissect the genetic architecture for seed size and shape traits in this study.

We tried to compare the QTL in the present study with those in previous studies conducted by other research groups. Although it seemed difficult because a few common markers existed between their data and ours, it was rewarding to find that some loci that we detected were also detected by other researchers, for example, four: qSL-19, qSW-19-2, qST-19 and qST-7-1, one: qSL-7-2 and four QTL: qST-10-1, qSLW-10-1, qSLT-10-1 and qSLW-18 in this study were also detected by Salas et al. (2006), Li et al. (2008) and Liang et al. (2008), respectively. One nucleo-cytoplasmic interaction, qST-10c-2, associated with marker satt331, had the same position as the M-QTL qST_O_1a detected by Liang et al. (2008). In addition, some QTL detected in the present study are potentially similar to those in previous studies because they are located on the same chromosome; for example, some QTL identified on chromosomes 2, 3, 4, 5, 6, 7, 13, 15 and 19 by Salas et al. (2008), 7, 13 and 16 by Li et al. (2008) and 1, 5, 10, 13, 14, 16, 17 and 20 by Liang et al. (2008).

The phenomenon of QTL clusters has previously been reported in rice (Cai and Morishima 2002), wheat (Quarrie et al. 2006), cotton (Shappley et al. 1998; Qin et al. 2008), rat (Stoll et al. 2000) and sorghum (Lin et al. 1995). Previous work has indicated that numerous disease resistance loci are clustered in various regions of the soybean genome; for example, chromosomes D1b and F (Rector et al. 1999; Hayes et al. 2000). This phenomenon was also evident in our results. Although the common QTL for seed size and shape traits were distributed on 16 chromosomes, the majority of the QTL were clustered in eight chromosomal intervals (Fig. 1). The eight intervals, located on chromosomes 3, 5, 6, 10, 13, 17, 18 and 20, were found to be involved in the control of two or more of the above traits. The QTL for SL, SW, ST and SWT were clustered in the satt640-satt422 interval on chromosome 6, shared the same direction of both additive and dominant effects and involved novel alleles from the same parent, N493. The QTL for SW and ST were located in the satt449-sat_356 interval on chromosome 5, shared the same direction of both additive and dominant effects and involved novel alleles from the same parent, N493, as well. The QTL for ST and SLW were located in the satt449-sat_356 interval on chromosome 5 and exhibited different directions of both additive and dominant effects, which may have caused the significant negative correlation between ST and SLW. The QTL for SL, SW, SLW and SLT were clustered between satt331 and satt592 and between satt592 and sat_274 on chromosome 10, and they exhibited the same phenomenon found by Liang et al. (2008) that some QTL were clustered between satt331 and sat_038. The QTL for SLW, SLT and SWT were clustered in the satt649-satt269 interval on chromosome 13, and the QTL for SL, SW, ST and SLW were clustered in the satt309-satt688 interval on chromosome 18. The cause of QTL clustering has been discussed by Qin et al. (2008). Considering the high correlation between some traits, coupling tight linkage and pleiotropy could better explain the data (Qin et al. 2008). In addition, Xiong et al. (1999) and Bres-Patr et al. (2001) performed a QTL analysis of domestication-related traits using crosses between O. sativa and O. rufipogon and between O. sativa and weedy-type rice. Both studies demonstrated clusters of QTL related to domestication. In the present study, the evidence of QTL in soybean related to domestication is not shown but needs to be further studied in the future.

To date, there have been some reports about the mechanism of seed development, but most of them concentrate on the model plants Arabidopsis and rice (Sundaresan 2005; Song et al. 2007; Shomura et al. 2008). In rice, the loss of GW2 function increases cell number, resulting in a larger (wider) spikelet hull, and it accelerates the grain milk filling rate, resulting in enhanced grain width, weight and yield (Song et al. 2007). The deletion in qSW5 in rice affects seed width and results in a significant increase in sink size owing to an increase in cell number in the outer glume of the rice flower (Shomura et al. 2008). In soybean, there are no reports on this topic. This also needs to be studied in the future.