Introduction

Wheat is the most important food crops in the world besides to rice and maize and is the largest crop in global trade volume (Borrill et al. 2015). It provides rich protein, dietary fiber and energy for human beings (Ling et al. 2013). Therefore, maintaining high and stable yield of wheat is essential to ensure global food security (Boyer and Westgate 2004). Global wheat production increased from 69.9 million tonnes in 2012 to 76.13 million tonnes with a growth rate of about 1% per year, which is far for achieving the goal of doubling the yield in 2050 (Ray et al. 2013; FAO 2020). Wheat yield is a complex quantitative trait, which is controlled by many low effect genes. Although wheat breeders have developed a large number of genetic and gene resources in the past few decades, due to the lack of integration of existing genetic resources related to yield-related traits, it is difficult to effectively transfer these genetic information into wheat breeding programs to improve wheat yield (Quraishi et al. 2017).

Wheat yield is affected by many factors, such as grain weight, tiller number, grain number per spike, harvest index, etc. Additionally, growth stages including heading stage, flowering stage and maturity stage are also closely related to the yield and environmental adaptability of wheat (Chen et al. 2012). Early mapping analysis of quantitative trait loci (QTL) based on bi-parental populations has accelerated the breeding process of improving wheat yield and other quantitative traits by marker-assisted selection (MAS) (Gupta et al. 2020). However, QTL results based on bi-parental populations are heavily dependent on the genetic background of the population and environment, which greatly limits the wide adaptability and stability of these QTL in wheat breeding programs (Khahani et al. 2020; Daware et al. 2017).

Meta-QTL analysis is an effective method to integrate QTL data and narrow QTL interval by integrating different QTL in different trials to obtain reliable consistent and stable meta-QTL (MQTL) (Welcker et al. 2011). The integrated MQTL are not affected by the genetic background, population type and planting environment in the previous independent experiments and can be directly used for quantitative trait improvement (Arcade et al. 2004; Sosnowski et al. 2012). This method has been widely used in plant genetic breeding and has achieved good results in the QTL-integration of different quantitative traits in multiple crops, such as yield-related traits and combined insect resistance in maize (Wang et al. 2013, 2020b; Badji et al. 2018), drought tolerance and yield-related traits in rice (Khowaja et al. 2009; Raza et al. 2019; Khahani et al. 2020), agronomic and quality traits in cotton (Said et al. 2015). Similarly, meta-QTL studies of various traits in barley and wheat have also been reported, such as abiotic stress tolerance in barley (Zhang et al. 2017), root-related traits (Soriano and Alvaro 2019), drought resistance (Kumar et al. 2020), tan spot resistance (Liu et al. 2020a, b), stem rust resistance (Yu et al. 2014), leaf rust resistance (Soriano and Royo 2015), pre-harvest sprouting resistance and Fusarium head blight resistance in wheat (Tai et al. 2021; Venske et al. 2019; Cai et al. 2019; Zheng et al. 2020), etc. In these wheat meta-QTL studies, the meta-QTL analysis of root-related traits, leaf rust resistance and stem rust resistance included the initial QTL from both durum wheat and bread wheat. There were at least three meta-QTL studies for yield and related traits in wheat, and multiple consistent MQTL and candidate genes were found (Zhang et al. 2010; Quraishi et al. 2017; Liu et al. 2020a, b). However, due to the relatively small number of QTL mapping studies (59, 27 and 24) and initial QTL (541, 376 and 381), the results have certain limitations.

With the advent of DNA sequencing technology, high-throughput genotyping based on SNP array or next-generation sequencing (NGS) provides convenience for genome-wide association studies (GWAS) of complex quantitative traits. This association analysis method based on natural population has been applied to QTL and gene mapping of rice, barley, wheat and other crops (Wang et al. 2015; Fan et al. 2016; Yang et al. 2020) and has also achieved very good results in QTL mapping for wheat yield and yield-related traits (Edae et al. 2014; Sun et al. 2017; Sukumaran et al. 2015). In addition, several important QTL identified by GWAS have been further confirmed by linkage mapping studies (Chen et al. 2019; Wu et al. 2021). All of these indicate that the combination of meta-QTL and GWAS can effectively integrate the original QTL results from different studies, so as to mine the key genomic regions and candidate genes that affect important quantitative traits. At the same time, the release of hexaploid wheat Chinese spring high-quality reference genome (International Wheat Genome Sequencing Consortium, 2018) provides an unprecedented opportunity to use these public resources to reveal the molecular mechanisms affecting important agronomic traits of wheat at the physical map level (Quraishi et al. 2017).

The objective of this study was to conduct a meta-QTL analysis of wheat yield-related QTL published in recent years, and to further integrate the GWAS and transcriptome evidences to discover the genomic regions and important candidate genes that affect wheat yield. This work will help to better understand the genetic determinants of wheat yield and lay a foundation for the identification, transfer and aggregation of these important QTL or candidate genes in wheat breeding.

Materials and methods

Scan of initial QTL for meta-QTL analysis

A detailed screening was carried out on recent published papers about yield QTL mapping studies in wheat (including bread wheat and durum wheat) from 1999 to 2020, and a total of 119 studies were found that could provide the initial QTL information required for meta-QTL analysis. The basic information of these studies is listed in Table S1, and some of them were also used in previous meta-QTL analysis (Zhang et al. 2010; Quraishi et al. 2017; Liu et al. 2020a, b). The initial QTL were mainly associated with yield-related traits and growth stages. Of which, yield and yield-related traits mainly included the number of spikelets (sterile/fertile/total), the number of florets, the number of grain per spike, the weight of grain per spike, spike length, spike compactness, tiller number (single plant/unit area), yield (single plant/unit area), thousand grain weight, grain number (single plant/unit area), grain filling duration, grain filling rate, biomass and harvest coefficient, etc. The growth stages included heading date, flowering date and maturity date.

For each initial QTL, the necessary information was collected as: (1) associated trait; (2) type of QTL mapping population (F2, DH, RIL and Backcross); (3) size of population; (4) LOD value; (5) R2 or PVE (phenotypic variance explained) value; (6) flanking or closely linked marker. To find more available initial QTL, in most cases, the LOD threshold in the original study was followed, though some cases, it was less than 3. The QTL that were significantly associated with traits but with R2 values less than 10% in individual studies were also retained. For a very few QTL that the LOD and R2 values were missing in the previous studies, which was assumed as 3 and 10%, respectively, following the common practice (Venske et al. 2019; Khahani et al. 2020). Additionally, the confidence intervals (CI, 95%) of each initial QTL were recalculated according to its population type and size, using the standard formula as following: (1) F2 and backcross population, CI = 530/(Number of lines × R2); (2) Recombinant Inbred Line (RIL) population, CI = 163/(Number of lines × R2); (3) Double-haploid population, CI = 287/(Number of lines × R2). Where Number of lines was the size of the mapping population used for QTL analysis, and R2 was the phenotype interpretation rate of QTL (Darvasi and Soller 1997; Guo et al. 2006). The details of these initial QTL are listed in Table S2.

Construction of consensus genetic map

Seven genetic maps containing multiple markers, which widely used in multiple QTL mapping studies, were used to construct a reference genetic map, including "Wheat, Consensus SSR, 2004," "Wheat, Composite, 2004" and "Wheat, Synthetic × Opata, BARC" downloaded from the GrainGenes website (https://wheat.pw.usda.gov/GG3/), "Wheat consensus map version 4.0" downloaded from its website (https://www.diversityarrays.com), and three SNP genetic maps derived from the 9 K iSelect Beadchip Assay and iSelect 90 K SNP Assay based on the Illumina platform, and genotyping by sequencing (GBS) (Venske et al. 2019; Cavanagh et al. 2013; Wang et al. 2014; Saintenac et al. 2013). R package LRmerge was used to construct the reference map for this meta-QTL study with the optimized "synthetic" method, as it could produce genetic maps across multiple populations as described by Venske et al. (2019). The basic principle of this method is to collapse co-segregating markers into "bins" to ensure the ordering of most markers in the linkage maps is preserved. By deleting the smallest groups "bins" in the maps, it can effectively solve the position conflicts caused by the inconsistent order of markers in different maps.

96 independent genetic maps were extracted from the 119 independent QTL studies investigated, which derived from 93 mapping populations including 8 durum and 85 bread wheat populations. Brief information of these genetic maps is listed in Table S3. BioMercator v4.2.3 delivers a graphical interface that allows the projection of single maps from different QTL studies onto a reference map (Sosnowski et al. 2012). All individual genetic maps (marker name, location) and related QTL statistics (LOD, R2, CI) and the reference map synthesized from 7 genetic maps were used as input files, through the iterative map compilation tool implemented in BioMercator v4.2.3, all single maps were integrated into the reference map and the consensus map was constructed.

QTL projection and meta-QTL analysis

BioMercator first integrates independent genetic maps into a comprehensive map and secondly recalculates the marker position as well as those of the initial QTL, based on a most likely consensus QTL distribution through meta-analysis algorithms. In this study, different methods were used to project the initial QTL onto the consensus map, according to the sources of the initial QTL. QTL with individual genetic map information were projected based on the original map position as the input QTL file, while QTL without genetic map information were projected based on their positions in the consensus map. As for the initial QTL where genetic map information was missing or difficult to extract, QTL were projected onto the consensus map according to the shared common markers. The following criteria were used to project the initial QTL onto the consensus map: (1) If the peak marker of initial QTL was in the consensus map, the peak marker position was directly used; (2) If there were two flanking markers available, their middle position was used as the QTL peak position; (3) If there was only one flanking marker available, priority was given to a nearby marker instead, if no such marker found, only the flanking marker was used for QTL projection. The necessary information of these initial QTL from different sources was input into the BioMercator software in different ways. QTL with genetic map information were input in pairs with corresponding genetic maps, while QTL without genetic map information were input in pairs with consensus map.

Secondly, meta-QTL analysis was conducted following the standard process by BioMercator V4.2.3, as detailedly described by Arcade et al. (2004), Veyrieras et al. (2007) and Sosnowski et al. (2012). The best Meta-analysis model was screened by the multiple statistical methods added in the new version of BioMercator, such as AIC (Akaike information content), AICc (AIC correction), AIC3 (AIC 3 candidate models), BIC (Bayesian information criterion) and AWE (average weight of evidence); thus, more than 4 meta-QTL can be supported in a single linkage map. The input files for meta-QTL analysis with BioMercator V4.2.3 including the consensus genetic map and QTL information were placed as Table S2 and Table S3.

Mapping of meta-QTL on the genome and verification by GWAS

All obtained meta-QTL (MQTL) were then mapped to the wheat reference genome. The markers on both sides of the MQTL confidence interval were manually searched, and their flanking or primer sequences were obtained from URGI Wheat (http://wheat-urgi.versailles.inra.fr), GrainGenes (https://wheat.pw.usda.gov/GG3/), DArT (https://www.diversityarrays.com) and Illumina company website (https://www.illumina.com). The obtained flanking sequences and primer sequences were blast aligned to the wheat Chinese Spring reference genome sequence to obtain the physical location information of these markers, based on the local BLASTN program. In addition, the physical locations of some SSR, SNP and DArT markers provided in the previous researches were also used as reference (Cabral et al. 2018; Wang et al. 2014). For the markers, their physical locations were not found, the physical locations of the MQTL were anchored by manual screening.

The data on yield-related traits of 10 genome-wide association studies published from 2014 to 2020 were collected and used to verify the accuracy of these MQTL regions. The detailed information of these GWAS studies is listed in Table 1. The phenotypic data of these studies were collected from 7 different countries, with population sizes ranging from 123 to 688, including 3 spring wheat populations, 6 winter wheat populations and one mixed population of spring and winter wheat. Similar to anchoring the physical position of MQTL, the physical position of the MTA (Maker-Trait-Association) in these studies was obtained by BLASTN of the flanking sequence.

Table 1 The GWAS (Genome-Wide Association Study) studies on yield and yield-related traits used in this study

Homology-based candidate gene mining and expression pattern analysis

Considering the leading position of rice in gene function study, the strategy of wheat-rice orthologous comparison was used to mine the key candidate genes in the MQTL region. The basic information of all functionally verified yield-related genes published in rice was downloaded from the China Rice Data Center (https://www.ricedata.cn/), and their protein sequences were extracted using TBtools (Chen et al. 2020). Using the protein sequence of rice gene as the seed sequence, a BLASTP was conducted to all protein sequences of the wheat reference genome to find their orthologous genes in wheat. The genes located in the MQTL region were considered to be important candidate genes affecting wheat yield and yield-related traits.

Analyzing the expression patterns of orthologous genes between different species was an important way to determine their functional conservation (Tian et al. 2020). The transcriptomic data of multiple tissues in wheat deposited in the expression Visualization and Integration Platform (expVIP, http://www.wheat-expression.com) was downloaded to explore the tissue expression characteristics of candidate genes (Borrill et al. 2016), which including the expression data of 18 tissues during the whole growth period of wheat (Ramírez-González et al. 2018). The recently reported complete transcriptome data including endosperm, embryo and seed coat were used to analyze the expression patterns of candidate genes during grain development (Xiang et al. 2019). Expression levels of candidate genes were evaluated by transcripts per million (TPM) values and displayed using the heat map of log2 (TPM + 1). Additionally, STRING database (search tool for the retrival of interacting genes/proteins, https://string-db.org) were used to predict the protein–protein interaction (PPI).

Plant materials

To verify the contribution of candidate genes to yield and yield-related traits, 94 wheat accessions containing 3 foreign materials and 91 accessions from 3 major winter production regions in China were planted in field during three winter cropping seasons (October to early June of 2016–2017, 2017–2018 and 2018–2019)(Table S7), on the experimental farm of the Institute of Water Saving Agriculture in Arid Areas of China, Northwest A&F University, Yangling, Shaanxi, China (34°7’N, 108°4’E). The detailed field trials were as described in Yang et al. (2020).

Yield-related traits measurement

After harvest, the sun-dried grains were used for measuring thousand grain weight, grain yield per square meter, and the grain size traits including grain length and grain width were measured by image analysis provided with SC-E software (Hangzhou WanShen Detection Technology Co., Ltd., Hangzhou, China). All trait measurements were repeated at least 3 times.

Dominant haplotype analysis and molecular marker development

Based on the variations revealed by genotyping with the Affymetrix wheat 660 K SNP array, the polymorphism SNP loci on the candidate genes were searched (Sun et al. 2020). The CAPS marker of TraesCS4A02G460100 was designed with the SNP primer design service on Triticeae Multi-omics Center (http://202.194.139.32) as Hha I-F/R (Hha I-F: TCTGAATGCAGGCTGACAAG; Hha I-R: AAACAAGGAACGATGGCAAC). Genotyping of these wheat accessions was performed by one round of PCR and direct enzyme digestion of the PCR product. The PCR cycling conditions were as an initial denaturation of 2 min at 94 °C, followed by 37 cycles of denaturation at 94 °C for 30 s, annealing at 60 °C for 30 s, extension at 72 °C for 10 s, and a final extension at 72 °C for 25 min. After three hours of digestion with Hha I enzyme, the products were separated on 2% agarose gels, and DM2000 DNA marker (CoWin Biosciences Co., Ltd., Taizhou, China) was used to determine the fragment size.

Results

Characteristics of yield-related QTL studies in wheat

The characteristics of these 119 previous QTL studies were systematically analyzed (Table S1). These QTL studies based on bi-parental populations were mainly published on 2006 to 2015, while relatively few before 2005 and after 2015, which is closely related to the development of genotyping technology (Fig. 1a). Among the 130 mapping populations used in the 119 studies, 119 (91.54%) were permanent populations, including 85 recombinant inbred line (RIL) populations and 34 DH (doubled haploid) populations, respectively (Fig. 1b, Table S1), as these lines of the permanent mapping populations were genetically stable and could be used for phenotyping the yield and yield-related traits for years under different environment conditions.

Fig. 1
figure 1

The information of QTL for wheat yield and yield-related traits in previous QTL mapping studies used for meta-QTL analysis. a The time distribution of previous QTL mapping studies. b The population type of pervious QTL mapping studies. c The proportion of QTL for different yield and yield-related traits. d The distribution of QTL on chromosomes. The QTL of yield-related traits and development stage were marked in (c) and (d)

A total of 2230 QTL for yield and yield-related traits in wheat were found from the 130 populations of these independent studies, including 2027 QTL (more than 80%) directly related to yield-related traits, and 203 QTL for growth period-related traits, which contribute to yield indirectly (Table S2). Many traits represent the same or similar trait but in different methods, such as 1000 grain weight, 50 grain weight and 200 grain weight for grain weight, while yield per plant, yield per square meter and yield per tiller for yield. After manual screening, these traits were mainly divided into 12 traits, including 9 yield-related traits (Grain weight, GW; Grain number, GN; Grain yield, GY; Tiller number, TN; Spike length, SL; Spikelet number, SLN; Grain filling rate, GFR; Biomass, BY; Harvest index, HI) and 3 traits of growth period (Days to maturity, DTM; Days to heading, DTH; Flowering date, FD) and some other traits (including spike compactness, threshing, etc.). The QTL of GW, GN, TN and GY accounted for 63% of all the initial QTL, which was closely related to their roles as important components of grain yield (Fig. 1c). Then, the QTL of SL, SLN and GFR also accounted for a large proportion, as they were important factors in determining wheat grain yield. The distribution of QTL on chromosomes was not even, with about 78.03% (1740/2230) on A and B sub-genomes. Chromosome 5A, 2D and 7A contained 187, 165 and 136 QTL, respectively, accounting for 21.88% (488 / 2230) of the total, while chromosome 1D only found 38 QTL (Fig. 1d, Table S2).

Construction of a high-density consensus genetic map

After combining the seven widely used genetic maps with R package LRmerge, a reference genetic map including SSR, DArT, SNP and a few genes was obtained for downstream meta-QTL analysis. Then, 96 individual genetic maps were projected onto the reference map, and finally a high-quality consensus map was constructed, which contained 572, 862 markers with a total length of 4567.2 cM, and average length of each chromosome of 217.49 cM, which was consistent with that by Venske et al. (2019) (Fig. 2). These markers were unevenly distributed on chromosomes, and chromosome 2B contained the most 47,062 markers and constituted the longest linkage group of 316.13 cM. The marker density at the fore-end of chromosome was significantly higher than that at the end. This was mainly due to the independent genetic map used to construct the consensus map was composed of different numbers and types of markers, but overall, this was the best consensus map that could be built with a lot of marker information.

Fig. 2
figure 2

Marker distribution on the consensus genetic map used for meta-QTL analysis. From red to green, the marker density in chromosome, the genetic length of chromosome, and the number of markers in chromosome decreases from high to low (color figure online)

Identification of meta-QTL of yield and yield-related traits

Here, 2230 initial QTL from 119 independent QTL studies were mapped to the consensus map (Fig. 1d). After meta-QTL analysis, these initial QTL were constituted into 145 MQTL (6.5%, 145/2230), and each MQTL contained at least two initial QTL (Table S4). Of which, 96.55% (140/145) of MQTL were composed of three or more QTL, and 44.83% (65/145) of MQTL were composed of 11 to 50 QTL (Fig. 3a). Six MQTL contained more than 50 QTL, including MQTL-5A-2 (52), MQTL-5A-3 (57), MQTL-2A-4 (60), MQTL-1B-3 (60), MQTL-4B-2 (69) and MQTL-2D-2 (82) (Table S4). These MQTL composed of QTL identified from different bi-parental populations were more reliable and stable for wheat yield improvement. All 145 MQTL were distributed unevenly on different chromosomes. Chromosome 5B, 7B and 7D contained 9 MQTL each, while only 4 MQTL on chromosome 3D (Fig. 3b). The distribution of MQTL on chromosome was not consistent with that of initial QTL (Fig. 1d). To evaluate the reliability of MQTL on different chromosomes, the average number of initial QTL contained in MQTL on each chromosome was calculated (Fig. 3b). Although the number of MQTL on chromosomes 2B, 2D and 5A was not high, they contained more QTL from different populations, indicating that they may have more extensive adaptability in wheat yield improvement. The average confidence interval (CI, 95%) of MQTL was 2.92-fold less than that of initial QTL, and there were significant differences among different chromosomes (Fig. 3c). The average CI of MQTL on chromosomes 4B and 5A decreased by 6.46 and 6.47 times, respectively, followed by 5.35 and 4.66 times on chromosomes 2D and 2A.

Fig. 3
figure 3

Basic information of MQTL obtained in meta-QTL analysis. a Number of MQTL harboring different number of QTL. b The number of MQTL and average number of initial QTL projected on a single MQTL in different chromosomes. From red to green, the number of QTL contained in MQTL decreases from high to low. c The reduction degree of QTL confidence interval (CI, 95%) after meta-QTL analysis. The orange and gray bars represent the average CI length (cM) of MQTL and initial QTL on chromosome, respectively, and the broken line represents the reduction folds of the QTL CI length (color figure online)

All MQTL were associated with at least two different yield-related traits due to the multi-gene and multi-trait effects on yield formation (Table S4, Table S5). Among the 145 MQTL, 130 MQTL contained QTL of GW, and 90 MQTL contained three or more QTL of GW. Similarly, 106 and 87 MQTL contained QTL of GN and TN, respectively. A total of 93 MQTL were directly related to GY, with 84 MQTL contained both QTL of GY and GW, 74 MQTL contained both QTL of GY and GN, 63 MQTL contained both QTL of GY and TN, 69 MQTL contained QTL of GY, GN and GW, and 53 MQTL contained QTL of all four traits. In addition, 49, 50, 28 and 25 MQTL of GY contained QTL of SL, SLN, DTH and GFR, respectively.

Verifying the MQTL by previous GWAS studies

To determine the reliability of meta-QTL analysis, GWAS results on yield and yield-related traits published in recent years were used to verify the MQTL (Table 1). Of the 145 MQTL, 142 were mapped to the physical map of wheat reference genome, and 112 MQTL were mapped into physical region less than 20 Mb, accounting for 77.24% of the total MQTL (Table S4). Considering the relatively long linkage disequilibrium decay distance of wheat (about 5 Mb), the MTAs obtained from GWAS near MQTL in 5 Mb physical region were considered to be co-located with MQTL. Eighty-nine of 142 MQTL were verified in at least one GWAS research (Fig. 4). Among them, 75, 47 and 15 MQTL were verified in GWAS with winter wheat, spring wheat populations and the mixed populations of spring wheat and winter wheat. In addition, 29 MQTL were verified in both GWAS researches with spring wheat and winter wheat. Eleven MQTL were detected at least 4 times in 10 GWAS researches, of them MQTL-1A-1 was detected 6 times, followed by MQTL-7A-1 with 5 times. It’s worth noting that some MQTL contained 30 or more initial QTL were detected many times in the GWAS researches, such as MQTL-2B-1, MQTL-2D-2 and MQTL-5A-3. Furthermore, multiple MQTL clusters or nested MQTL were observed, such as MQTL-2D-1 (2D: 2.69–8.98 Mb) & MQTL-2D-2 (2D: 5.44–10.32 Mb) and MQTL-2D-3 (2D: 24.98–28.76 Mb) & MQTL-2D-4 (2D: 28.88–36.42 Mb). Finally, 76 core MQTL verified in GWAS were screened out by excluding MQTL with few QTL (less than 3) and physical interval longer than 25 Mb (Fig. 5, Table 2). These core MQTL had good collinearity between physical map and genetic map, and they were all clustered at both ends of the chromosomes, which were the gene intensive regions.

Fig. 4
figure 4

Validation of MQTL by MTAs on wheat yield and yield-related traits from GWAS with 10 different natural populations. The innermost ring represents the clustering of QTL numbers that make up different MQTL. From red to blue, the number of initial QTL contained in the MQTL decreases from high to low. The color squares in the outer rings represent the co-location of MQTL with marker-trait association (MTA) obtained from GWAS with different natural populations (color figure online)

Fig. 5
figure 5

The chromosome distribution of the core MQTL for yield and yield-related traits by integration of meta-QTL and GWAS. The circles from inside to outside represent the genetic map, R2 values of initial QTL, the distribution of high-confidence genes and the physical map, respectively

Table 2 The 76 core meta-QTL for yield and yield-related traits in wheat

Some chromosome regions were identified in multiple studies to be associated with specific traits. As the key determinants of yield formation and their complex interrelationships, many MQTL show influence on the three yield component traits of grain number, grain weight and tiller number, such as MQTL-1B-2, MQTL-1B-3, MQTL-3A-4, MQTL-3A-5, MQTL-5A-2, MQLT-5A-3 and MQTL-5A-7. In general, the MQTL affecting the three traits of yield component were mainly concentrated in the terminal regions of chromosomes 1B, 3A and 5A. While the MQTL affecting spike length and spikelet number were mainly concentrated in the terminal regions of chromosomes 1B, 2D, 4A and 5A, and the MQTL affecting days to heading and flowering date were distributed on chromosomes 2B, 3B, 5A, 6B and 7A (Fig. 6).

Fig. 6
figure 6

Distribution of 76 core MQTL on chromosomes affecting multiple traits. Different yield-related traits are marked with different colored squares. The axis on the left represents the physical distance (MB) (color figure online)

Homology-based candidate gene mining within MQTL regions

Many cloned important genes related to wheat yield were found in MQTL regions, including two copies of TaPpd in MQTL-2A-5 and MQTL-2D-4 (Beales et al. 2007), TaVrn1 in MQTL-5A-3 (Yan et al. 2003), TaVrn2 in MQTL-5A-5 (Yan et al. 2004), TaVrn3 in MQTL-7B-2 (Yan et al. 2006), TaRht-B1(Rht1) in MQTL-4B-4 (Peng et al. 1999). All the MQTL where these well-known genes located significantly affected the grain weight and grain number (Table 2). In addition to affecting grain weight and grain number, the three MQTL including TaVrn1, TaVrn2 and TaVrn3 were also related to spikelet number, heading date and flowering date. TaRht1 was co-located with QTL of grain weight, grain number and tiller number, finally contributed to wheat yield. In addition, multiple genes related to grain weight were found in the MQTL regions, such as TaGS-D1 in MQTL-7D-1 (Zhang et al. 2014), TaCKX2 in MQTL-3A-3 (Zhang et al. 2011), TaTGW6 in MQTL-3B-7 (Hanif et al. 2016), TaCWI in MQTL-4A-2 (Jiang et al. 2015), TaGS in MQTL-4A-8 (Bernard et al. 2008), TaCWI in MQTL-5B-7 (Jiang et al. 2015) and TaGS1a in MQTL-6A-4 (Guo et al. 2013). Furthermore, some homologues of cloned yield-related genes were also found in the MQTL, such as the A sub-genome copy of TaGS and TaNAM-B1 in MQTL-7A-1 and MQTL-6A-3, respectively.

To further explore the candidate genes affecting wheat yield and yield-related traits, a detailed search on the cloned genes in rice was conducted, and 398 functional genes affecting yield-related traits in rice were obtained. Base on BLASTP, the orthologous wheat genes of these rice genes affecting yield were obtained. Among them, 237 genes were found in 115 MQTL regions, with an average of 2 per MQTL (Table S6). The candidate genes in 97 MQTL regions had similar effects on the yield and yield-related traits of both wheat and rice. For example, TraesCS1A02G045300 (MQTL-1A-3) and OsMKP1 affected GW, SLN and GN; TraesCS4A02G388400 (MQTL-4A-6) and OsFIE1 affected GW and GN, and TraesCS1A02G031200 (MQTL-1A-2) and its homologous gene affected TN (Table S6). It suggested that the functions of these candidate genes were relatively conserved in rice and wheat.

These genes have been reported to affect yield and yield-related traits in rice through a variety of pathways, such as regulating the content and sensitivity of multiple plant regulators, regulating photoperiod response, affecting photosynthesis, nitrogen use efficiency and flower organ formation. For example, OsGA20ox1, the orthologous of TraesCS4A02G319100 (MQTL-4A-2) and TraesCS5B02G560300 (MQTL-5B-7), affected GN and GW in rice by regulating gibberellin (GA) content (Wu et al. 2016). A MADS-box transcription factor gene OsMADS50, the orthologous of TraesCS4D02G341700 (MQTL-4D-5) and TraesCS5A02G515500 (MQTL-5A-7), regulated rice yield by affecting flowering time and tiller number (Ryu et al. 2010). A nitrate reductase gene OsNR2, the orthologous of TraesCS6A02G326200 (MQTL-6A-5) affected rice yield by regulating nitrate uptake and nitrogen use efficiency (Gao et al. 2019). In general, these candidate genes found were with high confidence, as the functions of their orthologous on affecting yield traits in rice have been investigated intensively.

The expression characteristics of these candidate genes in several tissues during the critical stage of yield formation were further analyzed, and their expression patterns could be divided into two classes (Fig. 7, Fig. S1). Genes in Class I were mostly expressed in the stem and root tissues at the tillering stage, while genes in Class II were mainly expressed in the spike and spike organs. Genes in Class I mostly affected TN, such as TraesCS1A02G091300 (MQTL-1A-4), TraesCS5A02G516000 (MQTL-5A-7) and TraesCS5A02G000200 (MQTL-5A-1); while some of them were also highly expressed in flower organs and developing grains, and affected grain number and grain weight, such as TraesCS1B02G059100 (MQTL-1B-3), TraesCS5A02G000200 (MQTL-5A-1), etc. Most of the genes in Class II had effects on the spike traits of GN and SLN, such as TraesCS3A02G377600 (MQTL-3A-4), TraesCS5A02G511300 (MQTL-5A-7) and TraesCS1B02G069000 (MQTL-1B-3), and some of them were highly expressed in developing grains, which directly affect grain weight, such as TraesCS6A02G287300 (MQTL-6A-4) and TraesCS4A02G388400 (MQTL-4A-6) (Table S6, Fig. S1). Although most of these genes have multiple effects on yield-related traits, some representative candidate genes that have a greater impact on a few important yield-related traits are listed in Table 3.

Fig. 7
figure 7

Expression characteristics of 237 candidate genes in 18 tissues. All transcriptome data was downloaded from expVIP (http://www.wheat-expression.com), and Log2 (TPM + 1) value was used to characterize the expression level. From sky blue to pink, the expression level increases from low to high. The detailed IDs of the candidate genes are shown in Fig. S1 (color figure online)

Table 3 Representative candidate genes with significant contribution to yield-related traits of wheat

A novel candidate gene affecting grain weight by regulating grain size

The association analysis found that SNPs of a novel candidate gene encoding Cytochrome P450 (TraesCS4A02G460100) in MQTL-4A-8 contributed to yield and yield-related traits (Unpublished data). Therefore, a CAPS marker was designed on this A/G locus on its 3’UTR region. The 257 bp PCR product with the G allele could be cut into two fragments of 169 and 88 bp by Hha I, while the PCR product containing the A allele couldn’t. Results showed that this CAPS marker could accurately distinguish this SNP alleles (A/G), and the PCR product and digested fragments were consistent with expectations (Fig. 8c). Therefore, two haplotypes, named Hap 1-A and Hap 2-G of TraesCS4A02G460100 among the wheat accessions could be revealed. The accessions with Hap 2-G had significantly higher grain width, grain length and thousand grain weight than these with Hap 1-A, especially for the grain width, which was extremely significant in all three environments (Fig. 8a, b).

Fig. 8
figure 8

Functional verification of a novel candidate gene (TraesCS4A02G460100) in MQTL-4A-8 affecting grain size and grain weight. a Grain morphology of wheat accessions with different haplotypes. From left to right are ‘Lassit’ (Hap 1-A) and ‘Luomai 21’ (Hap 2-G), respectively. b The differences in grain width and thousand grain weight between the two haplotypes of 94 wheat accessions. E1, E2 and E3 represent the three environments of 2016–2017, 2017–2018 and 2018–2019, respectively. *, ** and *** represent the significant levels of P < 0.05, P < 0.01 and P < 0.001, respectively. c The genotyping results of wheat accessions ‘Lassit’ and ‘Luomai 21’ using a CAPS marker for TraesCS4A02G460100. d The protein–protein interaction of TraesCS4A02G460100 predicted by STRING. e The tissue expression patterns of TraesCS4A02G460100 and its interacting genes. Wheat tissue expression data were downloaded from the wheat-expression database (http://www.wheat-expression.com), and the heat map was displayed using log2 (TPM + 1). SA: shoot apical meristem, RA: root apical meristem, SP: spike, LF: leaf, AN: anther, SO: stigma & ovary, GR: grain, EN: endosperm, EM: embryo proper. The expression data of separate tissues in developing grains were obtained from Xiang et al. (2019). TEN: transition stage endosperm (7 day after flowering), LESC: leaf early stage seed coat (10 day after flowering), LLEN: leaf late stage endosperm (20 day after flowering)

Based on STRING service, the PPI (protein–protein interaction) analysis showed that TraesCS4A02G460100 interacted directly with 10 genes involved in GA synthesis including KOs, KO-likes and GA20oxs (Fig. 8d). These genes including KO-like-2B.1, KO-7D, KO-7A, GA20ox2-3B and TraesCS4A02G460100 were specifically and highly expressed in developing grain and were grouped into one category (Fig. 8e). Further sequence alignment confirmed that TraesCS4A02G460100 (TaKAO-4A) was a copy of wheat KAO genes on chromosome 4A, which encoded an ent-kaurenoic acid oxidase that catalyzed ent-Kaurene to produce GA precursors GA12 on the upstream of the GA synthesis pathway (Pearce et al. 2015). To verify the role of TaKAO-4A in wheat grain development, the expression patterns of these genes involved in GA biosynthesis and signal transduction during grain development were further analyzed using a set of systematic transcriptome data during grain development (Xiang et al. 2019). The GA biosynthetic genes, such as TaKAO-4A, TaKO-7A and GA20ox2-3B, were highly expressed in endosperm, while TaSYP-6A and TaGID2-3D, which involved in GA perception and signaling, were expressed in seed coat at higher levels (Fig. 8e). All of these suggested that TraesCS4A02G460100 (TaKAO-4A) played an important role in the GA biosynthesis to regulate grain size and thus affected grain weight. Interestingly, the TaGA20ox1-4A was also found in our MQTL (MQTL-4A-2) and was regarded as one of the core candidate genes, which verified the reliability of our meta-QTL, GWAS and homology alignment integration strategy in screening important candidate genes.

Discussion

Characteristics of QTL and MQTL associated with wheat yield

In recent 20 years, a large number of QTL mapping data for wheat yield and yield-related traits provided convenience for revealing the genetic basis of wheat yield formation (Quraishi et al. 2017). In this study, a total of 2027 QTL related to yield and yield-related traits and 203 QTL related to growth period from 119 independent studies were used for meta-QTL analysis. Much more initial QTL than previous studies were used for meta-QTL analysis to ensure more comprehensive and accurate anchoring of genetic loci (Zhang et al. 2010; Quraishi et al. 2017; Liu et al. 2020a, b). These initial QTL were unevenly distributed on chromosomes, and more QTL were found in A and B sub-genomes, which was consistent with previous studies (Zhang et al. 2010).

Meta-QTL analysis can eliminate the influence of genetic background, population type and planting environment on QTL, and effectively integrate QTL data in different backgrounds (Welcker et al. 2011). The number of initial QTL used for meta-QTL analysis was significantly and positively correlated with the accuracy of the statistical results. The more initial QTL were used, the better the results of meta-analysis were (Quraishi et al. 2017). In this study, 44.83% of MQTL were composed of more than 11 initial QTL, and 6 of them were composed of more than 50 QTL, which was much higher than the previous studies (Zhang et al. 2010; Quraishi et al. 2017; Liu et al. 2020a, b). The distribution of MQTL and initial QTL on different chromosomes was obviously inconsistent, which was mainly due to the different number of initial QTL contained in MQTL. MQTL containing initial QTL identified from different bi-parental populations were more reliable and stable for wheat yield improvement. In addition, based on the physical location, the consistency of the previous meta-QTL analysis with this study were compared in detail and showed that 17 of 18 MQTL of grain yield previously discovered were identified in this study, and 15 of them were classified as core MQTL, which all confirmed the reliability of this meta-QTL analysis (Quraishi et al. 2017). All these core MQTL could lay the foundation for further cloning and functional studies of those wheat genes and their utilization in wheat yield improvement.

Another advantage of meta-QTL analysis is that it can effectively reduce the confidence interval (CI) of QTL by aggregating QTL information from different genetic backgrounds, thus reducing the difficulty of transferring and aggregating important QTL regions in wheat breeding, and improving the accuracy of candidate gene prediction (Liu et al. 2020a, b). The CI of MQTL was 2.92 times narrower than that of initial QTL, better than 2.44 (12.7 cM / 5.2 cM) of Liu et al. (2020a, b). Interestingly, the larger the number of initial QTL contained in a MQTL, the greater the reduction of CI, which indicated that large-scale meta-QTL analysis could effectively reduce the CI of QTL, especially when multiple QTL from different studies were located at similar positions. There were 69 MQTL for GY, GW and GN, and 52 MQTL for GY, GW, GN and TN, which confirmed the significant effects of GW, GN and TN on GY, and indicated that there might be important candidate genes that could comprehensively improve yield by adjusting the three yield factors in these regions.

Validation of MQTL in GWAS of different natural populations

Compared with QTL mapping, genome-wide association study (GWAS) based on high-throughput sequencing or array technology is another high-precision method for identifying genomic regions of quantitative traits (Yang et al. 2020). Here, the GWAS results were used to verify the meta-QTL results for the first time. More than 60% of MQTL (62.68%, 89/142) were co-located with MTAs from GWAS, which indicated that the impact of these genomic regions on yield may be less limited by genetic background. Furthermore, the contribution of wheat genomic regions to yield varied greatly with the environment. Therefore, breeding strategies vary according to the environment. There were 47 and 75 MQTL verified by GWAS studies of spring wheat and winter wheat populations, respectively. These different MQTL regions may be more effective in improving yield for corresponding wheat regions and can be used as an important target of wheat breeding in these different wheat planting areas. MQTL were mainly distributed in the gene rich regions of chromosomes, which was consistent with the study in Rice (Khahani et al. 2020). In addition, the comparison of the core MQTL identified in this study with that in the two recent important GWAS studies based on wheat materials collected from China and other regions of the world (including ICARDA in Syria, CIMMYT in Mexico and AWCC in Australia), revealed a large number of MQTL were co-located with those GWAS results (Li et al. 2019; Ogbonnaya et al. 2017). More than 40% (31/76) of MQTL were verified in these two studies, which confirmed that the selected 10 GWAS researches were widely representative and diverse. The identification of these MQTL provided a basis for accurately mining candidate genes affecting yield (Veyrieras et al. 2007).

Candidate genes in MQTL and their roles in yield formation

Several well-known important genes, including TaPpd (Beales et al. 2007), Tavrn1 to Tavrn 3 (Yan et al. 2003, 2004, 2006), TaRht1 (Peng et al. 1999), TaGS (Bernard et al. 2008), etc., have been identified accurately in MQTL. In addition to affecting grain weight and grain number, the three MQTL including TaVrn1, TaVrn2 and TaVrn3 were also related to spikelet number, heading date and flowering date. This confirmed that these genes regulated the development of young spikes and grain filling by affecting the process of wheat heading and flowering, and finally affected the grain yield. Earlier study confirmed that materials containing TaRht1 showed an increase in grain number per spike and a decrease in thousand grain weight (Flintham et al. 1997). In this study, TaRht1 was found to be co-located with QTL of grain weight, grain number and tiller number. Another dwarf gene, TaRht12, showed an increase in grain number per spike and effective tiller number, and a decrease in thousand grain weight (Chen et al. 2013). These two Rht genes affect the GA signal transduction and biosynthesis, respectively. Both the GA biosynthesis defective and GA signaling defective mutants show the phenotype of increased tillers, and the proper use of GA biosynthesis inhibitor Paclobutrazol (PAC) in wheat can increase tiller number (Lo et al. 2008; Silverstone et al. 1997; Assuero et al. 2012). All these confirmed that TaRht1 affects wheat yield components by regulating GA biosynthesis, thereby affecting wheat yield.

Considering the close evolutionary relationship between the genomes of Gramineae species (Gaut 2002), the analysis of homology relationship between wheat and model crop rice could broaden our understanding of genes in wheat. Meanwhile, the functional studies of a large number of genes in rice provided great convenience for the study of related crops including wheat (Yang et al. 2020). In addition, several important genes affecting rice yield have been confirmed to have similar functions in wheat, such as TaGS-D1, TaCKX2, TaTGW6, TaCWI, etc., which indicate that it is feasible to screen important candidate genes based on interspecific homology analysis (Zhang et al. 2011, 2014; Hanif et al. 2016; Jiang et al. 2015). Here, 237 candidate genes homologous to yield-related genes in rice were found within the MQTL intervals, most of them affected the same traits in wheat and rice (Table S6). The functions of these genes were relatively conservative in rice and wheat and could be used as primary gene resources for gene manipulation and directional improvement of wheat yield-related traits.

Some genes in rice have been proved to affect yield-related traits such as TN, GN and branch number per spike by regulating the sensitivity or content of plant growth regulators such as GA, IAA, brassinolide (BR) and cytokinin, thus affecting the final yield. Thirty-five orthologous candidate genes of these genes in wheat were found in the MQTL regions determining the corresponding traits. For example, TraesCS4A02G319100 of MQTL-4A-2 (OsGA20ox1) (Wu et al. 2016) and TraesCS3D02G106100 of MQTL-3D-3 (OsD2) (Liu et al. 2016) affected GN and GW by regulating IAA content, GA content and BR content, respectively. Some gene deletion mutants in rice showed decreased photosynthetic capacity and inhibited chlorophyll synthesis. The orthologues of these rice genes in wheat were also found in MQTL, such as TraesCS4A02G388400 of MQTL-4A-6 (OsFIE1) (Cheng et al. 2019) and TraesCS4A02G010000 of MQTL-4A-1 (OsGUDK) (Ramegowda et al. 2014). Some candidate genes were found to regulate TN, GW, GN and other yield-related traits by regulating plant nitrogen transport and utilization, such as TraesCS6A02G326200 of MQTL-6A-5 (OsNR2) and TraesCS6D02G020700 of MQTL-6D-1 (OsNRT2) (Gao et al. 2019; Fan et al. 2016). The orthologous genes of CKI and Hd3a were found in MQTL, which was proved to be related to the flowering and heading dates (Kwon et al. 2015; Galbiati et al. 2016). In addition, several orthologous genes related to grain size were also found in the MQTL region. A recent review showed that the genes affecting wheat yield were mainly concentrated in five aspects, including transcription factors that affect spike development, genes involved in signal transduction of growth regulators, genes involved in cell division and proliferation, flower regulators that affect the structure of inflorescence and genes involved in carbohydrate metabolism (Nadolska-Orczyk et al. 2017). All five types of candidate genes were found in MQTL, and their functions in rice have been confirmed.

Previous studies have shown that GA can directly regulate grain development (Tiwari et al. 2011). In this study, candidate genes involved in GA synthesis pathway including TaKAO-4A (TraesCS4A02G460100) and TaGA20ox1-4A (TraesCS4A02G319100) were found in the MQTL regions. A CAPS marker was developed on the TaKAO-4A gene, and its two haplotypes showed significant differences in grain width and grain weight in a three-year field trial of 94 wheat accessions. As an important upstream gene of GA synthesis, TaKAO-4A plays a key role in regulating GA content (Pearce et al. 2015). Additionally, the expression patterns of GA biosynthetic and signaling genes in separate tissues of the developing grain revealed that GA biosynthetic genes, such as TaKAO-4A, TaGA20ox2-3B and TaKO-7A, were mainly high-expressed in endosperm, while GA signaling gene TaGID2-3D was predominantly expressed in seed coat. TaGID2-3D, which encoded an important component GID2 of SCFGID2, was the key gene for the ubiquitination degradation of DALLA and the initiation of GA reaction. All of those indicated that the main synthesis site of GA in developing grains was in endosperm, while the signal transduction mainly occurs in outer layer, which implied the transportation of GA in inner and outer tissues of grains. Considering that this period is one of the rapid horizontal expansion stages of developing grain, it is obvious that bioactive GA may avoid limiting endosperm growth by promoting the cell expansion of seed coat. In addition, the Rht1 dwarf mutant also exhibited reduced sensitivity to GA and reduced grain size (Flintham et al. 1997). TaGW2-6A, another important gene that affected grain size, had also been reported to regulate grain size through the GA synthesis pathway (Li et al. 2017). All these proved that GA content was important to regulate grain size. In general, the contribution of TaKAO-4A to grain size, especially grain width, was verified in natural populations, and a convenient and efficient CAPS marker was developed, which could be directly used in wheat molecular marker-assisted breeding.

Finally, based on their orthologues’ functions in rice, expression patterns and existing knowledge of these candidate genes, a schematic diagram of the major candidate genes affected wheat yield formation were preliminarily drawn (Fig. 9). Similar to the summary of Nadolska-Orczyk et al. (2017), the candidate genes that play a role in photoperiod response, grain development, multiple plant growth regulator pathways, carbon and nitrogen metabolism and spike and flower organ development were all found in the MQTL intervals. The results showed that photoperiod genes were mainly expressed in flower organs, affecting TN and grain filling by regulating multiple key growth stages; grain development genes were highly expressed in the process of grain development, affecting grain size, GN and grain filling; and the participation included GA, IAA, BR, JA and other growth regulator genes regulate plant development by regulating the response and sensitivity to different hormones, and have an impact on growth period, plant height, TN, GW, GN, etc., while many genes affecting carbon and nitrogen metabolism are mainly expressed in the main sources such as roots, stems, leaves and other transport organs to regulate nitrogen absorption and utilization efficiency, so as to increase the inflow of source and then affect the yield (Fig. 9). Finally, the genes on spike and flower organ formation were mainly expressed in spike development and floral organ, and affected the number of spikelets and grains by affecting the formation and fertility of spikelets. In general, based on homology alignment and expression pattern analysis, a large number of high-confidence candidate genes affecting wheat yield were found in MQTL region.

Fig. 9
figure 9

Contributions of yield-related traits to yield and possible factors affecting these traits. The inner circle represents the contributions of the 11 yield-related traits used in this study to yield. The size of each character circle represents the number of MQTL identified. The line length represents the number of MQTL co-located for each two traits and the shorter the line, the more MQTL co-located with the two traits. The line thickness represents the number of initial QTL contained in MQTL and the thicker the line, the more QTL the MQTL contains. The outer ring represents the main pathway or process affecting these yield-related traits. Several representative candidate genes within the MQTL regions and their tissue expression characteristics are marked near to each pathway. Wheat tissue expression data were downloaded from the wheat-expression database (http://www.wheat-expression.com), and the heat map was displayed using log2 (TPM + 1). SA: shoot apical meristem, RA: root apical meristem, SP: spike, LF: leaf, AN: anther, SO: stigma & ovary, GR: grain, EN: endosperm, EM: embryo proper. The wheat candidate genes are named according to their orthologous genes in rice. TaD2-3D: TraesCS3D02G106100, TaRLCK102-4A: TraesCS4A02G016800, TaTAD1-4D: TraesCS4D02G341200, TaPUP7-1D: TraesCS1D02G393900, TaMADS50-5A: TraesCS5A02G515500, TaCKI-5B: TraesCS5B02G433300, TaHd3a-7A: TraesCS7A02G115400, TaPho1-5A: TraesCS5A02G395200, TaFLO2-2A: TraesCS2A02G517100, TaNF-YB9-7D: TraesCS7D02G216600, TaNR2-6B-1: TraesCS6B02G024900, TaNR2-6B-2: TraesCS6A02G326200, TaNPF6.1-3B: TraesCS3B02G095900, TaGRF6-4B: TraesCS4B02G060000, TaFIE1-4A: TraesCS4A02G388400, TaHGW-7B: TraesCS7B02G018300, TaJar1-1A: TraesCS1A02G425100