Introduction

Wheat (Triticum aestivum L.) is a crucial cereal crop for the demand of global food security (Curtis and Halford 2014). The production of wheat already reached 751.1 million tons in 2016 (USDA 2017). However, wheat is now cultivated and consumed more widely and a lot of products such as breads, biscuits, noodles, cakes, and beer have been developed (Curtis and Halford 2014). The further improvement of the yield potential and quality traits of wheat so as to meet current and impending challenges is necessary.

It has been reported that a lot of important agronomic traits, such as plant height (PH), grain protein content (GPC), etc., are controlled by multiple genes that are often affected by various environmental conditions (Groos et al. 2003; Prasad et al. 2003). Generally, quantitative trait locus (QTL) mapping is used to detect these corresponding loci. For example, the loci for GPC have been performed in different populations (Distelfeld et al. 2008; Echeverry-Solarte et al. 2015; Li et al. 2012a; Nigro et al. 2014; Prasad et al. 2003; Terasawa et al. 2016). However, the traditional QTL mapping approach could only locate genomic regions with low resolution and be limited to the bi-parental population (Sukumaran et al. 2015).

Based on the test of correlations between genotype and phenotype in a set of individuals consist of germplasm collections or natural populations (Marcotuli et al. 2015), genome-wide association study (GWAS) is a powerful tool widely used in breeding programs for it could be more efficient and less laborious in analyzing complex traits under varied environmental conditions. For example, in rice, a high-density SNP (single nucleotide polymorphism) array was used for identifying grain length and width loci (Chen et al. 2014). In soybean, GWAS was carried out for oil content using the re-sequenced lines (Zhou et al. 2015). In maize, many beneficial alleles were identified for kernel starch, protein, and oil by using GWAS method (Cook et al. 2012). In the same way, there were a lot of successful cases of GWAS applied in wheat analysis, such as rust resistance (Gao et al. 2016; Maccaferri et al. 2015; Naruoka et al. 2015; Yu et al. 2011; Zegeye et al. 2014), Fusarium head blight resistance (Kollers et al. 2013), arabinoxylan content (Marcotuli et al. 2015), grain morphology (Breseghello and Sorrells 2006; Rasheed et al. 2014), and grain yield and related traits (Ain et al. 2015; Lopes et al. 2013; Sukumaran et al. 2015; Zanke et al. 2014a, b, 2015).

In this study, the main wheat cultivars and excellent germplasm collections were selected to conduct GWAS by using the high-density wheat 90 K Illumina iSelect SNP Array (Wang et al. 2014). The SNPs associated with yield and quality-related traits such as PH, GPC, thousand kernel weight (TKW), and SDS sedimentation volume (SSV) were identified which could ultimately facilitate breeding by whole genome selection.

Materials and methods

Genetic resources

A total of 192 bread wheat lines were used for this study including 25 synthetic hexaploid wheat lines, 80 landraces, and 87 cultivars (Table S1). Wheat plants were grown in two locations of Sichuan Province, in two growing seasons, 2014–2015 and 2015–2016. The locations were Shuangliu (N content 1.604 g kg−1) and Shifang (N content 1.680 g kg−1). High and low nitrogen treatment was also designed in 2015–2016 growing season in Shifang. The main cultural practices consisted in nitrogen application (60 kg N ha−1) in the high nitrogen treatment place and without nitrogen application in the low nitrogen treatment place after seeding. The lines were sown in single row of 1.2 m long and 0.2 m apart following a randomized complete block design (RCBD) with two replications per location, and harvested by hand. Three plants of each line per-replication were selected at random for phenotype analysis.

Genotyping and SNP calling

Total DNA was extracted from the fresh leaves of seedling wheat using the cetyl trimethyl ammonium bromide (CTAB) method (Murray and Thompson 1980), and genotyped with the wheat 90 K Illumina iSelect SNP Array at the Compass Biotechnology Co., Ltd. All SNPs were filtered according to the criteria: minor allele frequency (MAF) greater than 0.05, sample call rate more than 0.8 and SNP call rate more than 0.9, and Hardy-Weinberg equilibrium (HWE) value less than 0.000001.

Population genetics and linkage disequilibrium analysis

A phylogenetic tree was constructed using pruned SNPs by the neighbor-joining method and displayed in MEGA 6.06 (http://www.megasoftware.net/archived_version_active_download) (Tamura et al. 2013). Population structure (Q) was evaluated using the filtered SNPs by ADMIXTURE (http://www.genetics.ucla.edu/software/admixture/download.html) (Alexander et al. 2009). A principal component analysis (PCA) of filtered SNPs was performed with the GAPIT (http://zzlab.net/GAPIT/) (Lipka et al. 2012) and the first two PCA values were plotted in two dimensions. Linkage disequilibrium was estimated using PLINK (http://www.cog-genomics.org/plink2) (Chang et al. 2015) by estimating the squared allele frequency correlations R 2 value from pruned SNPs.

Phenotyping

PH was measured after-harvest, as the average height (cm) of each point from the ground part of the root to the tip of the spike, excluding the awns. TKW was determined from the weight of more than 200 random kernels with two technical repeats by SC-E software (Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China). Three grams of each variety was subsequently ground to whole meal using a 1-mm-sieve Cyclotec mill (Foss Tecator AB, höganäs, Sweden). GPC was determined twice for each line by a Kjeltec 2200 system (FOSS). The SSV was measured with two duplications according to AACC 56-61A criterion (https://doi.org/10.1094/AACCIntMethod-56-61.02) (AACC International 1999).

Genome-wide association study

A compressed mixed linear model (CMLM) program which took the results of population stratification and kinship as covariate was used for the association analysis to minimize false positives (Lipka et al. 2012). Also, the first three PCA values, which were derived from pruned SNPs, were used as covariate in the model to adjust population stratification. For GWAS results, a threshold P-value of 0.001 (−log10 P = 3) was used to declare significant SNPs.

RNA sequencing and analysis

Twelve lines were selected for RNA-Seq analysis. Immature grain tissue was collected from plants at 14 and 21 day after flowering. RNA sequencing and data analysis were done by the Guangzhou Gene denovo Biotechnology Co., Ltd.

Multiple alignment and phylogenetic analysis

The gene structure was analyzed using the online software GSDS 2.0 (Hu et al. 2015). The amino acid sequences were aligned using the ClustalW (http://www.clustal.org/download/current) (Larkin et al. 2007). The homologous amino acid sequences from Triticum aestivum, Arabidopsis thaliana, Hordeum vulgare subsp. vulgare, Brachypodium distachyon, Oryza sativa spp. japonica, Setaria italica, Zea mays, Sorghum bicolor, Glycine max and Homo sapiens were used for phylogenetic tree construction. The neighbor-joining program with the parameters selection of P distance model, pairwise deletion, and 1000 bootstrap was conducted and displayed by using MEGA 6.06.

Results

Genotyping and population genetics analysis

A total of 192 bread wheat samples including synthetic hexaploid wheat lines, landraces, and cultivars were used in this study. SNPs calling of the wheat lines were generated by wheat 90 K Illumina iSelect SNP Array. The call rate was 0.977813. There were 13,154 polymorphic SNPs selected after quality control.

The phylogenetic relationships among the 192 wheat lines were analyzed through filtered SNPs. They were separated into three parts on the basis of synthetic hexaploid wheat lines (red line), landraces (green line), and cultivars (blue line), although there were a few exceptions (Fig. 1a). For instance, three landraces were closed to cultivars, one cultivar and three synthetic hexaploid wheat lines fell together with the landraces, and one landrace and one synthetic hexaploid wheat line got together with the cultivars as revealed by the phylogenetic tree. Although there was no obvious population structure among the wheat lines (Fig. 1b), the selection, domestication, and improvement breeding process may have caused a hidden population structure (Neumann et al. 2010). So, the samples were divided into three groups according to synthetic hexaploid wheat lines, landraces, and cultivars, and this classification method was supported by the principal component analysis (Fig. 1c).

Fig. 1
figure 1

Population genetic analysis of 192 wheat lines. a A neighbor-joining tree of the 192 wheat lines. b Population structure analysis of 192 wheat lines. The numbers of clusters (K) were set from 1 to 20. c PCA of the 192 wheat lines. Individuals from the same group were represented by the same color. Synthetic hexaploid wheat lines (red), landraces (green), and cultivars (blue)

Phenotyping

The PH, TKW, and SSV in four environments and GPC in five environments were assayed and showed that there were broad variations for each character (Fig. S1). The mean value of PH was increased from the synthetic hexaploid wheat lines to the landraces (118.1 to 132.7 cm), but strong decrease was observed from the landraces to the cultivars (81.6 cm) with a reduction of more than 100 cm from the shortest to the tallest line (Fig. S2a, Table S1). In contrast with PH, the mean value of TKW firstly decreased from 45.0 g in the synthetic hexaploid wheat lines to 31.6 g in the landraces and then increased nearly the same to the previous level in the cultivars (44.6 g) (Fig. S2b, Table S1). However, the mean value of GPC gradually reduced from synthetic hexaploid wheat lines (14.4%) to cultivars (11.7%) over time (Fig. S2c, Table S1). In comparison with GPC, the mean value of SSV has increment from landraces to cultivars (Fig. S2d) and highest values were in the cultivars (Table S1).

GWAS

The filtered SNPs were used to perform GWAS with the CMLM program which greatly reduced false positive rates (Lipka et al. 2012). The GWAS results for PH were presented in Fig. 2a and Fig. S3. Fifty-seven SNPs significantly associated with PH located in 50 different genomic regions in 19 of the 21 chromosomes were observed (Table S2). Six of the 50 regions had two more SNPs with significant association. Nine SNPs around two of the six regions mapped to the same region but more narrow interval of qPh-6B on chromosome 6B which was reported in previous study (Zhang et al. 2017) (Table 1). Ten of the SNPs were found in more than two environments. Four of them (a haplotype) were organized as a cluster distributed at physical position 75.0–77.2 Mb segment of chromosome 2A. This haplotype was detected in all the environments and exhibited strongest association with PH (Fig. 2b, Table S2). In addition, the significant association of GWAS results with GPC, TKW, and SSV were detected for 27, 30, and 34 SNPs, respectively (Fig. S3, Table S35).

Fig. 2
figure 2

GWAS results for PH in E2 environment. a Manhattan plot for PH in E2 environment. The dashed horizontal line depicted a significant threshold level. b A strong linkage disequilibrium region was observed on chromosome 2A from 75.0–77.2 Mb. The four significant SNPs were indicated with bold font. c Haplotypes of the significant SNPs among wheat lines. Stacked bar chart indicated the proportion of each wheat lines for haplotype I and II. Bar plot displayed average PH of each haplotype. SH synthetic hexaploid wheat lines, L landraces, C cultivars, PH plant height. Error bars indicated standard error

Table 1 SNP clusters associated with PH

Genome scan and RNA-Seq analysis

Twelve lines used for RNA-Seq analysis were listed on the Supplementary Table S1. The enrichment of Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis for the differential expression genes was performed to deduce the biological pathway that might be active in the wheat lines. The results showed that the protein processing in endoplasmic reticulum were mostly associated with GPC, and starch/sucrose metabolism with TKW (Fig. S4).

One Mb flanking region of the associated SNPs in wheat genome was scanned to identify genes associated with GPC, TKW, and SSV. To determine whether these found genes were indeed associated with these characters, the differential expression RNA-Seq data was analyzed. The result showed that 102 GPC associated candidate genes, 52 TKW associated candidate genes, and 92 SSV associated candidate genes were actually present and transcribed (Table S6). KEGG analysis showed that two down-regulated expression genes Traes_2AS_DFDA79E58 and Traes_7BL_10754D81F were linked to the pathway involved in proteasome (ko03050), and one up-regulated expression gene Traes_7AL_1A21E8798 was linked to the pathway involved in protein processing in endoplasmic reticulum (ko04141) (Table S6).

TaUBP24

PH was an important trait affecting wheat yield (Cui et al. 2011). Its variation was highly associated with a group of polymorphic SNPs detected in a 2.2 Mb (75.0–77.2 Mb) genome region (Fig. 2b). One hundred ninety-two wheat lines were classified into two haplotypes based on the genotyping results. Haplotype I included 23 synthetic hexaploid wheat lines (115.57 ± 14.69 cm), seven landraces (104.94 ± 21.16 cm), and 86 cultivars (81.53 ± 8.07 cm), and haplotype II included two synthetic hexaploid wheat lines (147.53 ± 4.01 cm), 73 landraces (135.32 ± 10.99 cm), and only one cultivar (86.80 cm). The PH mean value of the haplotype I (89.69 ± 17.63 cm) was significantly shorter than the haplotype II (135.00 ± 12.31 cm) (Fig. 2c, Table S7).

Furthermore, the last two of the four significant SNPs belonged to the same gene and located in the fourth intron and the seventh exon of the gene. This gene was homologous to At4g30890 (UBP24, ubiquitin-specific protease 24) designated it as TaUBP24 (GenBank accession number: MF179095). It contains seven exons and CDS length was 2247 bps (Fig. 3a). The SNP located in the seventh exon was nonsynonymous SNP which could cause amino acid changes from His to Arg (Fig. 3b). The amino acid sequence of TaUBP24 was aligned with reported sequences from Brachypodium distachyon, Oryza sativa spp. japonica, and Arabidopsis thaliana. All the sequences were checked for the presence of the important active sites (Asn-Cys-His-Asp) (Hu et al. 2002) which were essential for the enzyme activity (Fig. 3b).

Fig. 3
figure 3

Analysis of TaUBP24. a Gene structure of TaUBP24. b Comparison of the amino acid sequences of UBP24 from Triticum aestivum (TaUBP24), Brachypodium distachyon (BdUBP24), Oryza sativa spp. japonica, (OsUBP24), and Arabidopsis thaliana (AtUBP24). The residues in the red solid line frames were the active sites required for catalytic core domain of all UBP24. The arrow indicated the nonsynonymous SNP locus

For phylogenetic analysis, UBP24 amino acid sequences from Triticum aestivum, Arabidopsis thaliana, Hordeum vulgare subsp. vulgare, Brachypodium distachyon, Oryza sativa spp. japonica, Setaria italica, Zea mays, Sorghum bicolor, and Glycine max were analyzed. The homologous sequence of Homo sapiens was used as an outgroup control. The neighbor-joining program with the parameter selection of P distance model, pairwise deletion, and 1000 bootstrap was conducted. As a result, monocots and dicots were well separated into two clades (Fig. 4).

Fig. 4
figure 4

Phylogenetic analysis of UBP24. Two clades were present on the basis of monocots and dicots. TaUBP24 was indicated in red font. Triticum aestivum (TaUBP24), Arabidopsis thaliana (AtUBP24, BAH19813), Hordeum vulgare subsp. vulgare, (HvUBP24, BAJ92714), Brachypodium distachyon (BdUBP24, XP_010234436), Oryza sativa spp. japonica, (OsUBP24, XP_015646906), Setaria italic (SiUBP24, XP_004958591), Zea mays (ZmUBP24, XP_020396334), Sorghum bicolor (SbUBP24, OQU90420), Glycine max (GmUBP24, XP_003539934), and Homo sapiens (HsUSP10, NP_001259004)

For marker-assisted selection, two pairs of complementary type primers were developed to distinguish the two haplotypes (Table S7). The nonsynonymous SNP located in the seventh exon of TaUBP24 was employed (Fig. 2c). The PCR product could be amplified by TaUBP24-L primer pair in the haplotype I, which could not be detected in haplotype II. TaUBP24-H primer pair could amplify in the haplotype II and not in haplotype I (Fig. S5). Thus, the molecular markers based on PCR reliably discriminated the two haplotypes.

Discussion

The association study approach was firstly proposed by Risch and Merikangas (1996). The application of GWAS in plant for the first time was analyzed the growth habit of sea beet (Hansen et al. 2001). With the development of high-throughput sequencing technologies, GWAS has been widely used for dissecting the genetic basis of complex traits in crops, such as rice, maize, soybean, and wheat (Hao et al. 2012; Huang et al. 2010; Neumann et al. 2010; Tian et al. 2011). Compared with QTL mapping, GWAS overcame the limitation of bi-parental populations and enhanced the genome resolution often to the gene level, reduced the cost and saved time for its power to detect whole genomic variations from large sample size (Brachi et al. 2011).

GPC and SSV were two important traits affecting grain quality (Ormoli et al. 2015; Prasad et al. 2003). In this study, GWAS and RNA-Seq were carried out at the same time and several genes associated with GPC, TKW, and SSV were identified. Among these genes, Traes_2AS_DFDA79E58 and Traes_7BL_10754D81F linked to the pathway of proteasome possibly taking part in the process of proteasome-mediated protein degradation, which were down-regulated expression and consequently affected grain quality (Guo et al. 2015). Traes_7AL_1A21E8798, an up-regulated expression gene, which was involved in the pathway of protein processing in endoplasmic reticulum might participate in protein accumulation within the endoplasmic reticulum directly resulted in GPC increased (Tosi et al. 2009).

PH was a crucial trait affecting grain yield (Cui et al. 2011). Since the “Green Revolution” period, dwarf genes of Rht-B1b and Rht-D1b were introduced into crops to resist lodging, which resulted in a rapid increase of grain yield (Gale et al. 1975; Hedden 2003; Li et al. 2012b). There were many other Rht (reduced height) genes that have been detected. For example, Rht_NM9, Rht4, Rht5, Rht8, Rht9, Rht12, Rht13, Rht14, Rht16, Rht18, Rht22, and Rht23 (Chen et al. 2015; Ellis et al. 2007; Ellis et al. 2005; Haque et al. 2011; Lu et al. 2015; Peng et al. 2011). In this study, two groups of strong GWAS signals were detected. One was mapped to the previously identified plant height QTL qPh-6B (Zhang et al. 2017), and another one located on chromosome 2A encompassing the gene TaUBP24.

Abscisic acid (ABA) and gibberellic acid (GA) were notable mutual antagonism phytohormones which have been proved to affect PH (Yaish et al. 2010). Dwarfing gene OsAP2-39 has been confirmed to increase endogenous ABA level through up-regulating ABA biosynthetic gene OsNCED-1 which could make the EUI protein up-regulate expression to deactivate gibberellins in rice (Yaish et al. 2010). In Arabidopsis thaliana, the UBP24 mutant was hypersensitive to ABA in both post-germinative stage and seedling growth. UBP24 had deubiquitinating enzyme activity which was essential for UBP24 function. UBP24 genetically located upstream of ABI2, and the phosphatase activity of protein phosphatase 2C (PP2Cs including ABI1 and ABI2) was lower in the UBP24 mutant than that in the wild type in the presence of ABA. These results supported the important regulatory role for the UBP24 that negatively regulate ABA signaling in plant (Zhao et al. 2016). To sum up, the wheat homologs of Arabidopsis thaliana UBP24, TaUBP24 might indirectly up-regulate GA expression in the pathway and consequently affecting PH.

In our study, two significant SNPs located in TaUBP24, together with the other two SNPs were classified into two haplotypes in 192 wheat lines. The PH was significantly different between haplotype I and II. Also, the synthetic hexaploid wheat lines were significantly higher than the cultivars in haplotype I and II (Table S8). The possible reason might be that synthetic hexaploid wheat was obtained by crossing of durum wheat and Aegilops tauschii and did not subject to artificial selection (Zegeye et al. 2014). PH was complex quantitative trait controlled by multiple genes (Zhang et al. 2017). The effect of TaUBP24 gene on the PH was highlighted in the cultivars by selection breeding.

In conclusion, several candidate genes associated with GPC, TKW, and SSV were found by using GWAS and RNA-Seq. TaUBP24, encoding an ubiquitin-specific protease, was first identified to be associated with PH. The functional markers of TaUBP24 could be directly used for marker-assisted selection to improve wheat quality and yield.