Introduction

Bread wheat (Triticum aestivum L.) is among the most important food crops worldwide. It was estimated that a genetic gain of 50% in yield, or an annual gain of ~ 2%, is essential to meet predicted global requirements over the next 20 years (Lopes et al. 2012). However, annual gains in yield were only 0.6–0.7% in past decades (Sharma et al. 2012; Gao et al. 2017), mainly achieved by conventional breeding. Therefore, it is urgent to improve grain yield (GY) potential with better dissecting the genetic basis of yield and related traits in wheat.

Grain filling in wheat relies on two major carbon sources, namely direct photosynthetic assimilation from green leaves and reserved carbohydrates in stems and leaf sheaths (Ehdaie et al. 2008). When the photosynthetic source is depressed either by leaf senescence or by drought/heat stresses, grain filling becomes more dependent on mobilized resources (Bidinger et al. 1977; Kobata et al. 1992; Blum et al. 1994). Water-soluble carbohydrates (WSC) are stored in stems and leaf sheaths during vegetative and early reproductive stages, and they are remobilized and transported to grains at the later grain filling stages (Pheloung and Siddique 1991; Wardlaw and Willenbrink 2000). Stem-reserved WSC could account for 10–20% and 30–50% of the wheat GY under well-watered and terminal drought conditions, respectively (Aggarwal and Sinha 1984; Wardlaw and Willenbrink 2000; Foulkes et al. 2010; Ovenden et al. 2017). Therefore, improvement in SWSCC can be a valuable approach to improve GY (Shearman et al. 2005; Ruuska et al. 2008; Sadras and Lawson 2011; Xiao et al. 2012; Gao et al. 2017).

In addition to improving GY, WSC also play important roles in coping with abiotic stresses caused by water deficiency and/or high temperature (Livingston et al. 2009). SWSCC are higher in drought-tolerant cultivars than in sensitive ones (Foulkes et al. 2002; Goggin and Setter 2004), and increasing the genetic capacity for WSC accumulation was considered an approach to improve drought tolerance in wheat (Ovenden et al. 2017). In addition, the components of WSC, e.g., fructose, glucose and sucrose, are involved in plant immunity as signaling molecules for regulation of defense genes on biotic stress (Bolouri-Moghaddam and van den Ende 2013; Trouvelot et al. 2014). It is obvious that WSC are involved in a complex system of plant growth, development and diverse biotic and abiotic stress responses (Rolland et al. 2006; Trouvelot et al. 2014). And researchers indicated selection for higher SWSCC has potential in breeding for improved adaptation across a range of environmental stresses (Rebetzke et al. 2008).

Many studies have shown that genotypic differences in SWSCC are repeatable across diverse environments with high broad-sense heritability (H2) of 0.7–0.9 (Zhang et al. 2014; Dong et al. 2016a, b). This indicates that variation in SWSCC is largely genetically determined; however, SWSCC is also significantly affected by environmental factors like drought and heat stresses (Ovenden et al. 2017). Quantitative trait loci (QTL) mapping researches of SWSCC have been reported in barley (Teulat et al. 2001), rice (Nagata et al. 2002; Wang et al. 2017; Phung et al. 2019), maize (Thévenot et al. 2005; Bian et al. 2015), perennial ryegrass (Turner et al. 2006), and sorghum (Brenton et al. 2016). In wheat, diverse bi-parental populations were used in identification of QTL or genomic regions associated with SWSCC (Snape et al. 2007; Yang et al. 2007; Rebetzke et al. 2008; Dong et al. 2016b). Nevertheless, these studies didn’t provide a genome-wide landscape of the complex genetic architecture, as family-based genetic populations have limited diversity. In addition, the amount of recombination places a limit on mapping resolution in family-based QTL mapping (Korte and Farlow 2013). It is also possible that QTL with moderate or small effects may be missed in QTL mapping, and the Beavis effect can cause a biased estimation of QTL effects especially when the population size is small (Xu 2003). Moreover, the relatively long genetic distances between linked markers and causal genes limit their use in marker-assisted selection (MAS, Platten et al. 2019).

Genome-wide association studies (GWAS) on SWSCC were performed at flowering, mid-grain filling and maturity stages (Zhang et al. 2014; Li et al. 2015), but the detailed genetic architecture was not revealed due to use of only 209 SSR markers. Dong et al. (2016a) conducted GWAS on SWSCC at 14 days post-anthesis (DPA) using 18,207 markers from the wheat 90 K SNP array (Wang et al. 2014), but there were still large gaps in the genetic map, and marker coverage for the D genome was particularly sparse (Liu et al. 2017). Therefore, it is necessary for a more precise dissection of the genetic architecture underlying the complexity of stem carbohydrate metabolism. In addition, genes controlling SWSCC express dynamically at different growth stages (Veenstra et al. 2017; Yáñez et al. 2017; Hou et al. 2018), and knowledge of basis of SWSCC over time would provide more valuable information for breeding. In maize, QTL analyses of stalk sugar contents at different growth stages were conducted by Bian et al. (2015). Although QTL mapping works on wheat SWSCC have been performed at different stages (Snape et al. 2007; Yang et al. 2007; Rebetzke et al. 2008; Zhang et al. 2014; Li et al. 2015; Dong et al. 2016a, b) knowledge of the dynamic expression patterns of genes associated with WSC remains limited. It is therefore important to gain deep insights into the genetic expression patterns related to wheat SWSCC at different stages.

In the present study, a diversity panel of 166 winter wheat accessions was planted in four well-watered environments, and SWSCC at three time-points at the grain filling stage and thousand-kernel weight (TKW) were investigated. GWAS was performed on SWSCC and TKW using a high-density physical map constructed with the markers from wheat 660 K and 90 K SNP arrays. The objectives were to (1) evaluate the relationship between SWSCC and TKW, (2) identify loci and candidate genes associated with SWSCC and TKW, providing insights into the genetic basis of SWSCC in wheat and (3) develop high-throughput kompetitive allele-specific PCR (KASP) markers for MAS targeting WSC and TKW improvement.

Materials and methods

Plant materials and field trials

A diversity panel of 166 representative wheat accessions chosen from more than 400 cultivars was used for GWAS on SWSCC and TKW, including 144 accessions collected from the Yellow and Huai Valleys Winter Wheat Zone (YHVWWZ) of China, and 22 from five other countries (Liu et al. 2017; Zhai et al. 2018; Li et al. 2019; Table S1). Among them, 130 Chinese wheat cultivars released from 1947 to 2016 (Table S1) were used to investigate the genetic progress in improvement in SWSCC and TKW. These cultivars were divided into five groups, i.e., 9 cultivars released during 1947–1979, 13 in the 1980s, 36 in the 1990s, 59 in the 2000s and 13 released in the 2010s. Furthermore, cultivars released after 1990 from five main wheat producing provinces in China, i.e., Anhui (9 cultivars), Hebei (13), Henan (44), Shandong (21) and Shaanxi (17), were used to investigate inter-province differences of SWSCC and TKW (Table S1).

All accessions were grown in four environments including Dezhou (37° 27′ N, 116° 18′ E; Shandong Province) and Gaoyi (37° 37′ N, 114° 34′ E; Hebei Province) during the 2016–2017 cropping season, and Luohe (33° 36′ N, 113° 58′ E; Henan Province) and Xinxiang (35° 18′ N, 113° 51′ E; Henan Province) during 2017–2018. These environments were designated as 17DZ, 17GY, 18LH and 18XX, respectively. The field trials at each location were carried out under well-watered conditions with two flood irrigations during jointing and flowering stages. All locations experienced warm temperatures during the later grain filling stage (Fig. S1). The cultivars were planted in randomized complete blocks with three replications. Each plot contained two 2-m rows spaced 20 cm apart, with about 50 seeds sown per row. Field managements followed local practices and fungicide applications were made to control diseases (powdery mildew, stripe rust and leaf rust). All accessions are available from the National Gene Bank of China, Chinese Academy of Agricultural Sciences.

Phenotypic evaluation

SWSCC was assayed at 10DPA, 20DPA and 30DPA using a near-infrared spectroscopy (NIRS) method following Wang et al. (2011), and the corresponding phenotypic data are referred to as WSC10, WSC20 and WSC30, respectively. About ten main culms were randomly taken from each plot, the leaf blades were removed, and the spikes were cut off at the spike collars. Fresh samples from each plot were put into a labeled paper bag and exposed to 105 °C for 30 min and then oven-dried at 80 °C for 24 h. The dried samples were cut into 3–5 mm lengths. Before NIRS assays, the cut samples were re-dried at 80 °C until a constant weight, and brought to room temperature in vacuum bags. The detailed procedure for NIRS assay was reported in Dong et al. (2016b), and SWSCC was reported as a percentage on a dry weight basis. Three technically independent assays were performed for each sample at 10DPA and 20DPA, and five assays were made for samples at 30DPA. The mean values for each sample with extreme outliers discarded were used in subsequent statistical analyses. TKW of wheat accessions were obtained using an automatic seed character analyzer with the SC-G V2.1.2.3 software (Wanshen Detection Technology Co., Ltd., Hangzhou, China, http://www.wseen.com/).

Statistical analysis

Analysis of variance (ANOVA) was performed using SAS 9.2 software (SAS Institute Inc., Cary, NC, USA). Mean squares of each source of variation were used to estimate the variance components for genotypes (\( \sigma_{\text{G}}^{2} \)), genotype × environment interaction (\( \sigma_{\text{GE}}^{2} \)) and residual error (\( \sigma_{\varepsilon }^{2} \)), respectively, and heritabilities were estimated using the formula \( H^{2} \) = \( \sigma_{\text{G}}^{2} \)/(\( \sigma_{\text{G}}^{2} \) + \( \sigma_{\text{GE}}^{2} \)/e + \( \sigma_{\varepsilon }^{2} \) /(re)), in which e and r were the numbers of environments and replicates per environment, respectively (Holland et al. 2003; Yin et al. 2015). Best linear unbiased estimations (BLUE) for phenotypic data across environments were extracted using the linear model described in Yin et al. (2015) which had been implemented in the ANOVA function in QTL IciMapping v4.1 software (Li et al. 2007). Correlation analyses and t-tests were performed using SAS 9.2.

Genotyping and physical map construction

Genomic DNA was extracted from young leaves using a modified CTAB method (Murray and Thompson 1980). All accessions were genotyped using the Affymetrix 660 K wheat SNP array (containing 630,517 SNPs, Cui et al. 2017) and the Illumina 90 K wheat SNP array (containing 81,587 SNPs, Wang et al. 2014) by CapitalBio Technology Co., Ltd. (http://www.capitalbiotech.com/). Minor allele frequency (MAF), polymorphism information content (PIC) and genetic diversity were computed by PowerMarker v3.25 (Liu and Muse 2005, http://statgen.ncsu.edu/powermarker/). The heterozygous genotypes were considered as missing data; markers with MAF < 5% and missing data > 20% were excluded to avoid spurious marker-trait associations (MTAs) in subsequent association mapping. Flanking sequences of SNP markers were used to blast against the Chinese Spring (CS) reference genome in IWGSC (RefSeq v1.0, http://www.wheatgenome.org/; IWGSC 2018), and corresponding physical positions were determined according to the best blast hit results. The positions of 1212 markers with multiple hit positions on different chromosomes were assigned according to the consensus 660 K-SNP (Cui et al. 2017) and 90 K-SNP (Wang et al. 2014) genetic linkage maps. High-quality markers from the two SNP arrays were integrated into a common physical map for association study.

Linkage disequilibrium and population structure

Linkage disequilibrium (LD) and population structure of the 166 accessions were analyzed in a previous study using the same population (Liu et al. 2017). Briefly, 12,324 SNPs evenly distributing on 21 wheat chromosomes were used to calculate LD using a full matrix and sliding window method implemented in Tassel v5.0 (Bradbury et al. 2007). It showed that the average LD decay distance for the whole genome was about 8 Mb; and LD decays were 6, 4 and 11 Mb for the A, B and D genomes, respectively (Fig. S2; Liu et al. 2017). A total of 2000 evenly distributed polymorphic SNPs were chosen to analyze the population structure and estimate the Q matrix by the software Structure v2.3.4 (Pritchard et al. 2000). A neighbor-joining tree was constructed and principal components analysis was performed by Tassel v5.0 to verify the population stratification. Obvious population stratification was observed and the entire panel comprised three subgroups (Fig. S3, Table S1; Liu et al. 2017).

Genome-wide association study

The mean values of three replicates in each environment and the BLUE values across environments for each trait were used for GWAS. To control background variation and eliminate spurious MTAs, associations between markers and traits were estimated using a Q + K mixed linear model (MLM, Yu et al. 2006; Zhang et al. 2010) which was implemented in software TASSEL v5.0 (Bradbury et al. 2007). The Q matrix estimated by Structure v2.3.4 (Pritchard et al. 2000), defining the population structure was considered a fixed-effect factor. The kinship matrix (K matrix) computed by TASSEL v5.0, reflecting relationships among individuals, was incorporated as the variance–covariance structure of the random effect for individuals (Zhang et al. 2010). As the Q + K MLM may also compromise true positives in some cases when it controls false positives, the fixed and random model circulating probability unification (FarmCPU) method, which was demonstrated to have a more improved statistical power than MLM (Liu et al. 2016), was also used to perform GWAS on SWSCC by R software (R Version 3.5.1, https://www.r-project.org/; FarmCPU package at http://zzlab.net/FarmCPU/FarmCPU_functions.txt).

Different methods for multiple testing corrections were tried, including the Bonferroni-Holm correction method (Holm 1979) and the Benjamini and Hochberg’s false discovery rate (FDR) procedure (Benjamini and Hochberg 1995), but few SNPs could be declared significant in some of the trait-by-environment conditions in the present study which may due to the higher extent of LD in wheat (Chao et al. 2010; Hao et al. 2011; Chen et al. 2012; Liu et al. 2017) and/or the complex underlying genetic architecture for WSC. Finally, a threshold of P = 1.0 × 10−3 (– log10(P) = 3.0) was adopted for calling significant MTAs. This threshold was also used in some other association studies on complex traits in hexaploid wheat (Liu et al. 2017; Muqaddasi et al. 2019; Rahimi et al. 2019).

The adjacent associated markers were grouped together as one locus if the inter-marker distance is smaller than the average LD decay for specific chromosome, which was reported in Liu et al. (2017). The most significant marker across environments for each locus was considered the representative, and the corresponding effect and R2 (phenotypic variance explained) were estimated and outputted by TASSEL v5.0 (Bradbury et al. 2007). To further control the FDR of WSC-associated loci, those detected in at least two environments by either MLM or FarmCPU were considered to be stable. The genome-wide MTAs were visualized by Manhattan plots with –log10(P) for each SNP displayed on the Y-axis and the corresponding genomic coordinates displayed along the X-axis. The quantile–quantile (QQ) plots (observed versus expected –log10(P) values) were used to assess the association mapping models. Manhattan plots and QQ plots of the GWAS results were drawn using the CMplot code (https://github.com/YinLiLin/R-CMplot) in R software (Version 3.5.1).

Analyses of allele frequencies and effects of identified QTL in subgroups

Allele frequencies and effects of the identified QTL were analyzed in three subgroups (Table S1) of the population based on representative markers of each locus, and the averaged BLUE values for two genotypes at each locus were used to compare the effects among subgroups by t-test.

Comparison of two models for GWAS

To compare the GWAS performance of MLM and FarmCPU, the linear model (LM) fitting analysis was performed using representative markers for the identified QTL, where markers were fitted as independent variables and observed phenotypes as dependent variables. The coefficients of determination from the LM were then calculated using the LM function in R 3.5.1 (https://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html). Dot plots with the observed phenotypes on the X-axis and predicted values on the Y-axis were drawn to show the LM fitting results.

Comparison of identified WSC loci with reported QTL or genes

The WSC loci detected in this study were compared with WSC- and GY-related QTL or genes that were searched from the literature based on the physical positions (CS RefSeq v1.0; IWGSC 2018) of their flanking or associated markers. If the physical distances between two QTL were smaller than the average LD decay for a specific chromosome (Liu et al. 2017), they were considered to be at the same locus.

Putative candidate gene analyses

The genes located in the physical intervals of WSC-associated loci were screened based on the annotations in the wheat reference genome (CS RefSeq v1.0; IWGSC 2018), and those related to sugar metabolism or transportation were regarded as candidate genes. In addition, the sequences and corresponding physical positions of some known genes involved in WSC synthesis, degradation, and remobilization were obtained from the NCBI (https://www.ncbi.nlm.nih.gov/) and IWGSC (http://www.wheatgenome.org/). The positions of the known genes and their homoeologs were compared with the WSC loci, and some candidates were identified.

KASP marker development

High-throughput KASP markers for potentially important loci were developed based on corresponding representative SNPs. Flanking sequences of SNPs were used as queries to blast against the wheat reference genome in IWGSC (CS RefSeq v1.0; IWGSC 2018), and chromosome-specific KASP primers were developed based on alignment of homologous sequences. Allele-specific primers carrying FAM (5′ GAAGGTGACCAAGTTCATGCT 3′) and HEX (5′ GAAGGTCGGAGTCAACGGATT 3′) tails were designed with the targeted SNP at the 3′ end, and common reverse primer was designed for a chromosome-specific amplification with less than 200 bp of amplified sequence. The KASP assay mixture was prepared with 40 µL of common primer (100 µM), 16 µL of each tailed primer (100 µM), and 60 µL of ddH2O. Each reaction mixture comprised 2.5 µL of 2 × KASP master mixture (LGC Genomics, https://www.biosearchtech.com/), 0.056 µL of the assay mixture, and 2.5 µL of DNA (30–50 ng/µL). PCR were performed in a 384-well plate as follows: denaturation at 95 °C for 15 min, followed by 9 touchdown cycles (95 °C for 20 s; touchdown at 65 °C initially then decreasing by 1 °C per cycle for 1 min), and 32 additional cycles of denaturing, annealing and extension (95 °C for 10 s; 57 °C for 1 min). Consistency between KASP genotyping results and the original chip-based genotypes was investigated, and t-tests were conducted to confirm the effectiveness of the KASP markers.

Results

Phenotypic variation

Continuous variations among the 166 wheat accessions were observed for WSC10, WSC20, WSC30 and TKW (Table S1, Fig. S4). The resulting BLUE values for WSC10, WSC20, WSC30 and TKW across the four environments were 9.41–18.81% (mean, 14.71%), 8.49–17.87% (mean, 13.62%), 1.04–9.71% (mean, 4.14%) and 26.9–56.6 g (mean, 43.1 g), respectively (Tables S1, S2). Correlations among environments for WSC10, WSC20, WSC30 and TKW showed ranges of 0.56–0.75, 0.51–0.71, 0.50–0.60 and 0.66–0.87 with P < 0.0001, respectively (Table S3). ANOVA revealed that genotypes, environments and genotype × environment interactions had significant effects on SWSCC at all three time-points (Table 1). The H2 of WSC10, WSC20, WSC30 and TKW across four environments were 0.90, 0.87, 0.85 and 0.93, respectively (Table 1), suggesting that most of the phenotypic variation was determined by genetic factors.

Table 1 Analysis of variance for stem WSC contents and thousand-kernel weight in the 166 wheat accessions

Correlation analyses showed that WSC10 was significantly correlated with WSC20 with correlation coefficients (r) ranging from 0.27 (P < 0.001) to 0.61 (P < 0.0001) in different environments and in BLUE; WSC20 was significantly correlated with WSC30, with r = 0.43–0.61 (P < 0.0001), whereas the correlations between WSC10 and WSC30 were much lower, ranging from 0.04 (not significant) to 0.31 (P < 0.001) (Table 2). Correlations between WSC10 and TKW (r = 0.33–0.63, P < 0.0001) were similar to those between WSC20 and TKW (r = 0.32–0.59, P < 0.0001), whereas r between WSC30 and TKW ranged from 0.15 (not significant) to 0.42 (P < 0.0001) (Table 2).

Table 2 Correlation analyses for stem WSC contents and thousand-kernel weight of the 166 wheat accessions in different environments

Marker coverage and genetic diversity

A total of 373,106 high-quality SNPs with 359,760 (96.42%) and 13,346 (3.58%) from the 660 K and 90 K SNP arrays, respectively, were used in GWAS (Table S4). The A, B and D genomes were represented by 39.8, 49.3 and 10.8% of the markers, respectively. Chromosome 3B possessed the most markers (46,708), whereas chromosome 4D had the least (2375). The markers covered the whole genome (14,061.15 Mb) with an average density of 0.038 Mb per marker, and for respective chromosomes, the marker densities ranged from 0.018 (3B) to 0.214 (4D). The B genome showed the highest marker density (0.028 Mb per marker), genetic diversity (0.367) and PIC (0.293) compared with the A (0.033, 0.357 and 0.286) and D (0.098, 0.334 and 0.270) genomes. The detailed information of marker number, marker density, genetic diversity and PIC is provided in Table S4, in addition to a density map showing marker distribution along chromosomes (Fig. S5).

Marker-trait associations and stable WSC-associated loci

Significant MTAs for WSC10, WSC20 and WSC30 analyzed using the MLM and FarmCPU methods are listed in Tables S5–S10. In total, 1095, 652 and 597 significant markers corresponding to 168, 93 and 14 loci were detected for WSC10, WSC20 and WSC30, respectively, using MLM in TASSEL; and 813, 1344 and 1415 significant markers corresponding to 134, 112 and 23 loci were identified for the three traits using FarmCPU. Manhattan plots for WSC contents analyzed by MLM and FarmCPU using BLUE values are shown in Fig. 1, and Manhattan and QQ plots for each trait in each environment analyzed by both methods are shown in Figs. S6 and S7.

Fig. 1
figure 1

Manhattan plots for stem WSC contents analyzed by Q + K mixed linear model (MLM) and FarmCPU. The threshold of P = 1.0 × 10−3 (– log10(P) = 3.0) was used for calling significant marker-trait associations (MTAs). WSC10, WSC20 and WSC30 indicate stem WSC contents at 10 days post-anthesis (10DPA), 20DPA and 30DPA, respectively. BLUE indicates the best linear unbiased estimations for each trait across four environments in this study

The WSC loci detected in at least three out of the four environments by either MLM or FarmCPU are summarized in Table 3. Furthermore, all stable WSC loci that detected in at least two environments by either MLM or FarmCPU are reported in Table S11. The numbers of stable loci for WSC10, WSC20 and WSC30 were 36, 24 and 19, respectively (Table S11). Overall, 62 stable loci for SWSCC across the three developmental time-points were detected on all 21 chromosomes except 5D, and 16 of these loci were associated with SWSCC at two or more time-points (Table S11, Fig. 2). In terms of the stable loci, 6 (17%), 7 (29%) and 8 (42%) for WSC10, WSC20 and WSC30, respectively, were detected by both MLM and FarmCPU.

Table 3 WSC-associated loci detected in at least three environments by either MLM or FarmCPU
Fig. 2
figure 2

The Venn diagram of stable loci associated with stem WSC contents. Loci detected in at least two environments by either mixed linear model (MLM) or FarmCPU were considered stable (Table S11). WSC10, WSC20 and WSC30 indicate the stem WSC contents at 10 days post-anthesis (10DPA), 20DPA and 30DPA, respectively

Allele frequencies and QTL effects in subgroups

For each WSC QTL, the allele frequencies and effects in three subgroups of the population were investigated (Table S12). For most of the loci, similar allele frequencies and effects were observed among subgroups. But some large inter-subgroup variations of frequencies and effects were detected in a few loci. For instance, the allele frequency of a WSC10 locus on chromosome 5B (692.45 Mb, AX-111100349) were 0.37 for the entire panel, while the frequencies were 0.11, 0.57 and 0.49 in subgroups 1, 2 and 3, respectively (Table S12). The effect of a WSC10 locus on 1B (681.04 Mb, AX-110000934) was 0.57% for the entire population, whereas the effects were 2.48, 0.28 and 0.05% in subgroups 1, 2 and 3, respectively (Table S12). These indicated that the allele frequencies and effects of a few loci might be significantly affected by the population stratification, and we need to further confirm their effects in future studies.

Pleiotropic loci for SWSCC and TKW

By comparing the physical positions of significant markers, 13 stable loci associated with TKW were co-localized with the WSC loci on chromosomes 1A, 1B (2), 1D, 2A, 2B, 3A (2), 4A, 4B, 5B and 6B (2), explaining 7.69–16.38% and 8.17–15.53% of the phenotypic variations for TKW and SWSCC, respectively (Tables 4, S11). Because high TKW is a main objective in wheat breeding, the alleles associated with increased TKW and SWSCC were considered favorable. The frequencies of favorable alleles at five loci on 1A, 1B, 2B and 3A (2) were 0.86–0.93 (Table 4), indicating that the favorable alleles at these loci had already been widely selected in traditional breeding programs, whereas the other eight loci exhibited favorable allele frequencies (FAF) of 0.45–0.70 (Table 4), indicating more potential possibilities for selection in future breeding.

Table 4 Thirteen TKW-associated loci simultaneously associated with stem WSC contents

Cumulative effect of increasing-effect alleles on SWSCC and TKW

To further investigate the effects of combined alleles on SWSCC, the number of increasing-effect alleles in each accession was investigated. The number of WSC-increasing alleles possessed by each accession had ranges of 6–32, 5–22 and 2–16 for WSC10, WSC20 and WSC30, respectively (Table S1, Fig. S8). Significant correlations were observed between SWSCC and number of WSC-increasing alleles with r = 0.84, 0.68 and 0.63 (P < 0.001) for WSC10, WSC20 and WSC30, respectively. Linear regressions using the BLUE values were determined to further investigate the relationships between SWSCC and the number of WSC-increasing alleles, and it showed significant linear associations between SWSCC and number of alleles with regression slopes of 0.31 (coefficient of determination (r2) = 0.95), 0.44 (r2 = 0.95) and 0.40 (r2 = 0.95) for WSC10, WSC20 and WSC30, respectively (Fig. 3).

Fig. 3
figure 3

Cumulative effects of increasing-effect alleles on stem WSC contents. WSC10, WSC20 and WSC30 indicate stem WSC contents at 10, 20 and 30 days post-anthesis, respectively. “n”, number of increasing-effect alleles. Dots and bars represent the mean values and standard deviations, respectively. Linear regressions were performed to investigate the relationships between stem WSC contents and number of increasing-effect alleles in the 166 wheat accessions. Calculations were based on representative markers of the 36, 24 and 19 stable loci associated with WSC10, WSC20 and WSC30, respectively (Table S11). Best linear unbiased estimations (BLUE) for each trait across four environments were used

The accessions contained 0–13 increasing-effect alleles at the 13 pleiotropic loci for both SWSCC and TKW (Table S1, Fig. S8). There were clear cumulative effects on WSC10, WSC20, WSC30 and TKW, with cumulative number of increasing-effect pleiotropic alleles (Fig. 4). Linear regression slopes for WSC10, WSC20, WSC30 and TKW versus corresponding numbers of increasing-effect alleles were 0.51 (r2 = 0.88), 0.49 (r2 = 0.91), 0.30 (r2 = 0.82) and 1.56 (r2 = 0.96), respectively.

Fig. 4
figure 4

Cumulative effects of increasing-effect alleles on stem WSC contents and thousand-kernel weight based on the 13 pleiotropic loci. Calculations were based on representative markers for the 13 pleiotropic loci simultaneously affecting stem WSC contents and TKW (Table 4). Best linear unbiased estimations (BLUE) for phenotypic data across four environments were used. WSC10, WSC20 and WSC30 indicate stem WSC contents at 10, 20 and 30 days post-anthesis, respectively. TKW, thousand-kernel weight

Pyramiding WSC-increasing alleles improved SWSCC and TKW over past decades

To explore the roles of WSC-associated loci in improving GY, the genetic progresses of SWSCC and TKW have been investigated. The results showed that WSC10, WSC20 and WSC30 had increased on average from 13.02%, 12.55% and 3.45% before 1980 (9 cultivars) to 16.27%, 15.41% and 4.63% after 2010 (13 cultivars), respectively (Fig. 5a–c); accompanied by increased TKW from 41.1 to 46.4 g (Fig. 5d) and increased numbers of increasing-effect alleles from 15, 14, 7 and 7 to 26, 18, 10 and 11 for WSC10, WSC20, WSC30 and TKW, respectively (Fig. 5e–h).

Fig. 5
figure 5

Genetic progresses of stem WSC contents and thousand-kernel weight over past decades. Genetic progress was based on five groups from a panel of 130 Chinese wheat cultivars released from 1947 to 2016 (Table S1). WSC10, WSC20 and WSC30 indicate the stem WSC contents at 10 days post-anthesis (10DPA), 20DPA and 30DPA, respectively. TKW, thousand-kernel weight. P1, P2, P3, P4 and P5 indicate 1947–1979, the 1980s, the 1990s, the 2000s and 2010–2016, respectively. Violin plots a to d, phenotypic changes in WSC10, WSC20, WSC30 and TKW, respectively; e to h, changes in numbers of increasing-effect alleles for WSC10, WSC20, WSC30 and TKW, respectively. Each violin plot shows the median (indicated by the small, white dot), first through third interquartile range (black, thick, solid vertical band), and estimator of the density (thin vertical curves) of the corresponding observations. NFA indicates the number of increasing-effect alleles contained in a cultivar

In addition, we investigated the frequency-changes in favorable alleles for the 13 pleiotropic loci (Table 4). It showed that FAFs for all these loci increased in recent decades except the one on 4AL with a representative marker AX-109832317 (Fig. 6). Furthermore, differences of SWSCC and TKW among five main wheat producing provinces in China have been investigated. The results showed that cultivars in Henan, which is the top wheat producing province in China, had the most WSC10 and TKW, as well as the most numbers of increasing-effect WSC alleles (Fig. S9).

Fig. 6
figure 6

Frequency changes in increasing-effect alleles in past decades based on representative markers of the 13 pleiotropic loci. The 13 pleiotropic loci were associated with both WSC and thousand-kernel weight (Table 4). The analysis was based on 130 wheat cultivars in five groups released from 1947 to 2016 (Table S1). P1, P2, P3, P4 and P5 indicate 1947–1979, the 1980s, the 1990s, the 2000s and 2010–2016, respectively. The name of each representative marker is indicated in the corresponding chart

KASP marker development

Among the 13 pleiotropic loci showing effects on both WSC and TKW, the alleles at loci on 1AL, 1BL, 2BS and 3AL (2) were almost fixed in this panel of wheat accessions with FAF of 0.86–0.93, whereas the FAF of 0.45–0.70 were found in the other eight loci leaving more values for MAS, and these have been focused to develop KASP markers (Table 4). Finally, five KASP markers were successfully developed for representative SNP markers of pleiotropic loci on chromosomes 2A (physical interval 27.51–34.67 Mb), 4A (728.51–739.58 Mb), 4B (298.15–299.53 Mb), 5B (689.91–696.19 Mb) and 6B (674.84–677.48 Mb) (Table 4; Fig. 7). The primers and information for the corresponding loci are listed in Table S13. Comparison between the KASP genotyping results and the chip-based genotypes showed consistencies of 0.96–0.98. T-tests in different environments confirmed the effectiveness of the KASP markers (Fig. S10).

Fig. 7
figure 7

Genotype calling screenshots of five KASP markers. The KASP markers were developed based on representative SNP markers for five pleiotropic loci on both stem WSC contents (SWSCC) and thousand-kernel weight (TKW) (Tables 4, S11). Corresponding names for the KASP markers are presented on screenshots. Two homozygous genotypes are indicated beside the corresponding dots, and favorable genotypes showed increasing-effects on SWSCC and TKW (Fig. S10) are shown in red font (color figure online)

Discussion

Comparison of GWAS results by MLM and FarmCPU

In the present study, 19 and 20 stable WSC loci identified by MLM and FarmCPU, respectively, were located at similar positions with reported WSC-related QTL or genes (Table S11); 6, 7 and 8 loci for WSC10, WSC20 and WSC30, respectively, were detected by both MLM and FarmCPU (Table S11). All these indicated the reliability of GWAS results from both models. Some different results were also observed in two models. The WSC20 loci on chromosomes 1D (414.59–420.74 Mb), 3A (737.44–744.29 Mb) and 6B (674.84–677.48 Mb) were detected by FarmCPU in three or more environments but not stably identified by MLM, whereas the WSC10 loci on 1B (236.96–237.38 Mb) and 4A (735.30–737.58 Mb) were specifically detected by MLM (Table S11).

Liu et al. (2016) indicated that the Q + K MLM may lead to false negatives in some cases, although it generally performs well in controlling false positives, while FarmCPU could control false positives and simultaneously avoid model over-fitting by using fixed and random effect models iteratively. In the present study, the QQ plots showed that the population structure and kinship were over-corrected by MLM in some environments, especially for WSC20 and WSC30 (Figs. S6, S7), indicating that there might be some false negatives for MLM. In addition, when we focused on the WSC loci detected in at least three environments, more loci were observed by FarmCPU (Table 3).

To further compare the results from MLM and FarmCPU, LM fitting analysis was conducted based on representative markers identified by each model. For WSC10, the GWAS performance of MLM and FarmCPU was similar with coefficients of determination (r2) of 0.76 and 0.72, respectively (Fig. S11). While for WSC20 and WSC30, FarmCPU performs better with r2 = 0.56 and 0.58, respectively, than MLM with r2 values of 0.43 and 0.34, respectively (Fig. S11). In addition, LM fitting analyses using all the identified representative markers from MLM and FarmCPU (Table S11) gave the best results with r2 = 0.86, 0.62 and 0.58 or WSC10, WSC20 and WSC30, respectively (Fig. S11). All these indicated that the detection powers of MLM and FarmCPU may vary according to different traits, and the two models are complementary in detecting QTL.

Additionally, although many loci were detected by both FarmCPU and MLM (Table S11), it seems that the Manhattan plots corresponding to the two models are quite different (Figs. S6, S7). Some single significant SNPs hang on the plots for FarmCPU results, while strings of SNPs clustered as peaks in the plots for MLM. This may be attributed to different methodologies for the two methods (Liu et al. 2016).

Comparison of identified WSC loci with reported QTL or genes

Based on the physical positions (CS RefSeq v1.0; IWGSC 2018) of markers or genes, the WSC loci identified in this study were compared with reported QTL or genes. Of the 62 WSC loci, 26 were located at similar positions to previously reported WSC-related QTL or genes (Table S11), including 15 (41.7%), 9 (37.5%) and 8 (42.1%) stable loci for WSC10, WSC20 and WSC30, respectively. This indicated the importance of these loci, reflecting the reliability of our findings. The remaining 36 new WSC loci, comprising 21, 15 and 11 loci for WSC10, WSC20 and WSC30, respectively (Table S11), provided us a basis to more comprehensively understand the complex genetic architecture underlying SWSCC.

It should be mentioned that some loci might be related to WSC content in different parts of stem, and some were associated with SWSCC under diverse conditions. For instance, a locus on 2DS (Xcfd53, 23.02 Mb) was associated with WSC in the uppermost internode at 14DPA under drought stress conditions (Zhang et al. 2014), whereas in the present study, a similar locus (29.17 Mb) was associated with WSC content of the whole stem (Table S11). A locus on 3AL (Xbarc314, 712.49 Mb) was associated with WSC in the lower internodes at the grain filling stage under simulated terminal drought stress conditions (Zhang et al. 2014), in agreement with the WSC10- and WSC20-associated locus (711.30–726.13 Mb) identified in the present study (Table S11).

Although diverse bi-parental and natural populations were used across the world, many WSC-associated loci identified in this study were located at similar positions to the previously reported ones based on the physical positions (CS RefSeq v1.0; IWGSC 2018) of QTL-flanking or associated markers (Table S11). This indicated that (a) linkage mapping and GWAS are complementary in identifying genes, and (b) many major genes controlling carbohydrate metabolism might be common in different wheat production regions. Considering the high marker density in the present study and consistency of many loci between the present and past researches, the loci and associated SNP markers detected here appear to be reliable and hence valuable for further genetic research on WSC metabolism and for MAS in breeding.

Putative candidate genes related to stem WSC content

Fructans are major components of WSC (Ruuska et al. 2006), and genes involved in fructan metabolism play important roles in controlling SWSCC. In regions adjacent to the WSC loci on chromosomes 2A (763.02–769.51 Mb), 4A (735.30–737.58 Mb) and 7D (2.23–17.34 Mb), some known genes involved in fructan synthesis and hydrolysis were identified, including 1-SST, 6-SFT, 1-FFT and 6-FEH (Table S11; McIntyre et al. 2011). SWSCC was significantly associated with 1-FFT-A1 (Yue et al. 2017) and 1-SST-D1 (Dong et al. 2016b), and the corresponding gene-specific markers had been reported. In addition to the genes directly involved in carbohydrate metabolism, transcription factors also play important roles in modulating metabolic pathways. TaMYB13 was identified as a transcriptional activator of fructosyltransferase genes, and expression levels of TaMYB13 were positively correlated with the mRNA levels of 1-SST and 6-SFT in wheat stems (Xue et al. 2011; Kooiker et al. 2013). Based on the genomic DNA sequences, we found that TaMYB13-1 (ID: TraesCS3A02G535100, 746.63 Mb) was located near the WSC20 locus on 3A (737.44–744.29 Mb).

Stem reserved WSC can be remobilized and transported to sink organs such as seeds and roots, and sugar transporters are critical in these processes (Halford et al. 2011). Sucrose transporters (SUT) and hexose and sucrose transporters are major sugar transporter families (Eom et al. 2015). TaSUT1 genes on homoeologous group 4 chromosomes were the predominant sucrose transporter group in wheat stems and leaf sheaths, and significantly positive correlations were observed between TKW and expression level of TaSUT1 genes (Ahmed et al. 2018). A TaSUT gene (TraesCS2A02G505000, 733.56 Mb) was identified in the vicinity of the WSC30 locus on chromosome 2A (734.16–734.26 Mb) (Table S11). In addition, two potentially new sugar transporter genes (TraesCS6B01G421500, 692.23 Mb; TraesCS7D01G521400, 618.68 Mb) belonging to the hexose and sucrose transporters family were identified in the intervals of WSC loci on 6B (692.03–702.39 Mb) and 7D (616.95–621.14 Mb) (Table S11), indicating that these genes might have important roles in WSC remobilization.

Sucrose synthase (SUS) controls carbon flow in starch biosynthesis, and dry matter accumulation of plants is positively correlated with SUS activity (Kato 1995), and TaSus2-2B was significantly associated with TKW (Jiang et al. 2011). Based on gene sequences, TaSus2 (171.03 Mb) was located in a WSC30 locus on 2B (168.78–184.77 Mb) (Table S11). A diverse family of glycosyltransferases participates in glycan and glycoside biosynthesis during plant development and growth (Lao et al. 2014), and glycoside hydrolases are involved in hydrolysis of complex sugars (Bourne and Henrissat 2001). In the present study, six potentially new glycosyltransferase candidate genes were located in the intervals of WSC loci on chromosomes 4B, 5B, 6D, 7A, 7B and 7D, and two glycoside hydrolase genes were in the intervals of WSC loci on 6B and 7A (Table S11). These candidate genes provide a basis for further exploration of the genetic mechanism of WSC metabolism, and might be useful in breeding programs targeting increased TKW.

Relationship between WSC and GY

In the present study, TKW was significantly and positively correlated with SWSCC, especially with WSC10 and WSC20 (Table 2). This was consistent with previous findings about relationship of WSC and TKW (Zhang et al. 2015; del Pozo et al. 2016). In addition, there were evidences that cultivars carrying favorable WSC alleles had higher TKW than those without favorable alleles in both well-watered and drought and heat stress conditions (Zhang et al. 2014). Similar findings were also observed in the present study that 13 pleiotropic loci simultaneously affecting SWSCC and TKW (Table 4) showed clear cumulative effects on both SWSCC and TKW (Fig. 4). Besides the 13 pleiotropic loci, other 20 WSC loci also located at similar positions to reported GY-related QTL, including 12, 5, 2 and 1 related to TKW, spike number, GY and grain filling rate, respectively (Table S11). All these indicated the positive contributions of WSC to TKW.

In many studies, significant and positive correlations (r = 0.47–0.80) were observed between SWSCC and GY under diverse conditions (Foulkes et al. 2007; Snape et al. 2007; Xue et al. 2008; Gao et al. 2017), whereas in some other researches, low or non-significant correlations between SWSCC and GY were observed (Ruuska et al. 2006; Zhang et al. 2015; del Pozo et al. 2016). Using 384 wheat cultivars, del Pozo et al. (2016) found that WSC content at anthesis was negatively correlated with spikes per square meter, but positively correlated with kernels per spike and TKW under water stress and full irrigation conditions; consequently, the correlation between WSC with GY were low or not significant.

By comparing the WSC QTL and reported GY-related QTL, we found some clues that might genetically explain the complicated relationship between WSC and GY. A pleiotropic locus simultaneously affecting WSC20, WSC30 and TKW was identified on chromosome 2AS (27.33–34.67 Mb) in this study (Table S11). At a similar position, a QTL for TKW and grain filling rate (Xgwm359, 28.20 Mb; Wang et al. 2009) and a QTL for GY (30.3–31.9 Mb; Li et al. 2019) were reported. These indicate that the underlying gene (or genes) might control grain filling rate by affecting WSC content in stems, and finally affect the TKW and GY.

Plant density and flowering date could also affect SWSCC (Rebetzke et al. 2008; del Pozo et al. 2016). In the present study, five WSC loci were located at similar positions with reported plant density-related QTL (Table S11), indicating that SWSCC might be affected by plant density, and this also explains the complex relationship between SWSCC and GY. At the WSC10-associated locus on 5A (697.77–699.48 Mb), a vernalization gene Vrn2-5A (698.2 Mb) had been reported (Yan et al. 2004). To further investigate the potential impact of vernalization on stem WSC, we conducted GWAS by controlling the Vrn2-5A (using its genotypic data) as a fixed-effect in the Q + K MLM. The results showed that 66.7% (24 out of 36) of the loci could still be detected (Table S14), including the WSC locus that collocated with Vrn2-5A. It seems that the effect of this WSC locus is independent from Vrn2-5A.

Pyramiding WSC-increasing alleles by MAS could improve GY

Significant and positive correlations between SWSCC and TKW (Table 2) were consistent with previous reports (Ehdaie et al. 2008; Li et al. 2015; Gao et al. 2017). Increased SWSCC has contributed to genetic gains in GY in the UK, Australia and China (Shearman et al. 2005; Sadras and Lawson 2011; Xiao et al. 2012; Gao et al. 2017). Similar results were evident in the present study that improvements in TKW from pre-1980 to post-2010 were accompanied by increased WSC10, WSC20 and WSC30 (Fig. 5a–d), demonstrated by increased numbers of favorable alleles (Fig. 5e–h). Linear regressions also showed that pyramiding favorable alleles was effective for improving both SWSCC and TKW (Fig. 4). Furthermore, cultivars from Henan and Shandong, two high-yield wheat producing provinces in China, also had higher SWSCC and more favorable WSC alleles (Fig. S9). All these indicated that improvement in stem WSC content is a promising way to improve GY.

In addition, representative markers for pleiotropic loci on chromosomes 1BS (BS00023084_51), 2AS (AX-109293110), 4BS (AX-110974144), 5BL (AX-111044647), 6BS (AX-94910312) and 6BL (AX-111494281) exhibited FAF of 0.67–0.92 for cultivars in the 2010s (Fig. 6). These indicate that there are still opportunities for further improvement of SWSCC and TKW by pyramiding more increasing-effect alleles. High-throughput KASP markers for five pleiotropic loci were developed and validated in different environments (Fig. S10). These markers should be useful for MAS targeting improved SWSCC and TKW. Additionally, considering the important role of WSC in coping with diverse stresses (Rebetzke et al. 2008; Livingston et al. 2009; Trouvelot et al. 2014; Ovenden et al. 2017), these markers may also have potential values in developing stress-resistant cultivars by MAS.

The present study was conducted in well-watered field conditions with the same field management as used in YHVWWZ of China, and the findings would be useful for wheat breeding in this area and similar environments. Cultivars Luyuan 502, Huaimai 20, Jinmai 61, Luomai 21, Yumai 13, Zhongmai 875, Zhongmai 895, Zhou 8425B, Zhoumai 22, Zhoumai 30, Zhoumai 31, Zhoumai 32 and Zimai 12 possess high SWSCC (WSC10 and WSC20) and TKW with high numbers of increasing-effect alleles (Tables 5, S11). These cultivars could be used as elite germplasms for improving WSC and TKW.

Table 5 Elite germplasms for improving stem WSC contents and grain yield

Advantages and disadvantages of this study

In the present study, GWAS was conducted to identify QTL associated with SWSCC based on a high density of SNP markers. The high marker density has significant impact on the number of haplotypes within trait-associated loci by change in LD pattern and would result in discovering marker-trait associations in low-recombination regions on chromosomes (Kim and Yoo 2016; Andrade et al. 2019). Previously, Dong et al. (2016a) performed GWAS on SWSCC with 18,207 markers from the 90 K SNP array using the same panel of wheat accessions as that used in the present study, but only 11 significant MTAs were identified at two or more environments due to the presence of large gaps, particularly in the D genome (Liu et al. 2017). To resolve this problem, the present GWAS for WSC content was performed using 373,106 SNPs from the 660 K and 90 K SNP arrays. The integrated physical map showed a significantly improved marker density of 0.038 Mb per marker for the whole genome compared with 0.772 Mb per marker (14,061 Mb/18,207 markers) of the 90 K genetic map in Dong et al. (2016a). Consequently, 36, 24 and 19 stable loci were detected for WSC10, WSC20 and WSC30, respectively (Table S11). This indicated that the high-density physical map constructed here gave a significant advantage for GWAS on complex traits like SWSCC.

A relatively small population in this study might be the reason why a stringent threshold like Bonferroni correction (Holm 1979) could not be used for GWAS. Nevertheless, the 166 wheat accessions employed in this study were chosen from more than 400 cultivars, and they are good representatives of wheat germplasms from YHVWWZ of China. This population was previously used for GWAS on yield and quality-related traits, and many important loci were identified and validated (Liu et al. 2017; Zhai et al. 2018; Li et al. 2019). In the present study, 26 out of 62 WSC loci were identified at similar positions to previously reported QTL or genes (Table S11), also indicating the reliability of the results. Furthermore, based on the identified WSC loci, we observed a clear pyramiding effect of favorable WSC alleles in wheat accessions released in past decades, which was consistent with the improvement in SWSCC and TKW (Fig. 5). Therefore, the population is suitable for GWAS of complex traits. In addition, the present study was conducted under well-watered conditions following the local practices in YHVWWZ, so the findings would be useful for breeding in this zone and similar environments.

Conclusion

TKW was significantly correlated with WSC10 and WSC20, indicating the important contribution of WSC to GY. We identified 62 stable loci for WSC at three grain filling time-points, with 36, 24 and 19 loci for WSC10, WSC20 and WSC30, respectively. Of these loci, 36 are potentially new, 16 affected SWSCC at two or more time-points, and 13 showed pleiotropic effects on both WSC and TKW. Linear regression analyses showed clear cumulative effects of increasing-effect alleles on SWSCC and TKW. In past decades, SWSCC and TKW of wheat cultivars in the YHVWWZ of China were significantly improved due to the pyramiding of WSC-increasing alleles. The present study showed a genome-wide genetic landscape of WSC, providing a perspective for understanding the relationship between WSC and GY. The identified WSC loci, especially the 13 pleiotropic loci for both WSC and TKW, are valuable targets for further dissection of the genetic basis underlying SWSCC and TKW. The five WSC- and TKW-associated KASP markers would be valuable tools for improving WSC and GY by MAS.