Introduction

Wheat provides approximately 20% calories in the human diet worldwide, thus making it the most important crop for food and nutritional security (Shiferaw et al. 2013; Yadav et al. 2019). The global annual wheat production during the last five decades has been rising steadily, and therefore, we have witnessed a global wheat production of 750 million metric tons in the year 2020 (https://knoema.com/atlas/topics/Agriculture/Crops-Production-Quantity-tonnes/Wheat-production); therefore, no shortage of wheat was experienced in recent years. However, for meeting the demand for feeding the estimated population of 9–10 billion in 2050, the production must increase by ~ 50% during the next three decades (https://www.openaccessgovernment.org/demand-for-wheat/83189/) (Shiferaw et al. 2013; Yadav et al. 2019). This will require an annual growth of ~ 2% in production, which will be possible only through an increase in average yield since there is no chance of any increase in the area under wheat cultivation. An increase in yield will be possible only through a further detailed understanding of the genetic architecture of grain yield and associated traits (Gupta et al. 2020).

Grain yield is widely known to be a complex quantitative trait, which is controlled by a large number of QTLs/genes. Major yield contributing traits include the following: grain number, grain weight, grain morphology-related traits, tiller number, spike-related traits, harvest index, plant height, and heading date (Gupta et al. 2020; Hu et al. 2020). Therefore, markers associated with these traits have often been recommended to be used for marker-assisted section (MAS) in wheat breeding programmes (Zhou et al. 2007), although this use has been minimal due to the non-availability of major and robust QTLs and the closely linked or functional markers (Misztal 2006; Collard and Mackill 2008; Cobb et al. 2019). The meta-QTL analysis is an approach that has been shown to provide more robust and reliable QTLs, including ‘QTL hotspots’ (Goffinet and Gerber 2000; Salvi and Tuberosa 2015). The precision of meta-QTL analysis has also been improved through the development of new algorithms (Arcade et al. 2004; Veyrieras et al. 2007; de Oliveira et al. 2014).

MQTLs in wheat have already been identified for several traits, including the following: (i) pre-harvest sprouting tolerance (Tyagi and Gupta 2012); (ii) resistance to a number of diseases including Fusarium head blight (Venske et al. 2019; Saini et al. 2021a); (iii) tolerance to heat and drought stresses (Acuña Galindo et al. 2015; Kumar et al. 2020, 2021); (iv) nitrogen use efficiency (NUE) and root system architecture (Saini et al. 2021b). The first report on meta-QTL analysis for a yield component trait (i.e., time of ear emergence time) appeared in 2007 (Hanocq et al. 2007), following which a number of studies have been conducted for different yield-related traits, which included grain size and shape (Gegas et al. 2010); yield, grain weight per spike, GN, spikelets per spike, thousand-grain weight and PH (Zhang et al. 2010); yield, baking quality, and grain protein content (Quraishi et al. 2017); and TN (Bilgrami et al. 2020). Another recent study on meta-QTL analysis involved QTLs associated with SRTs, GN, GY, GWei, HI, grain filling rate, etc.; the data for this study was collected only from 24 studies conducted under irrigated, heat, and drought conditions (Liu et al. 2020). The regular discovery of additional QTLs, and GWAS-based MTAs and availability of improved algorithms calls for regular development of more robust MQTLs and ortho-MQTLs.

The present study is yet another effort to identify meta-QTLs for grain yield (GY) and the following related traits: (i) grain weight (GWei), (ii) grain morphology-related traits (GMRTs), (iii) grain number (GN), (iv) spikes-related traits (SRTs), (v) plant height (PH), (vi) tiller number (TN), (vii) harvest index (HI), (viii) biomass yield (BY), (ix) days to heading/flowering/maturity (DTH/F/M) and (x) grain filling duration (GFD). In this study, we utilized data from 230 studies published during 1999–2020 (Table S1). MQTLs identified in the present study were also compared with available results of genome-wide association studies (GWAS) and transcriptomics to further confirm the reliability and robust nature of MQTLs identified during the present study. MQTLs were further utilized for the identification of key candidate genes (CGs) that influence grain yield and its component traits in wheat. Ortho-MQTLs were also identified using synteny and collinearity of wheat genomic regions carrying MQTLs with corresponding genomic regions of rice, maize, and barley (Kumar et al. 2009; Mayer et al. 2011; Hirsch et al. 2014). We believe that this comprehensive study should prove useful not only for molecular breeding but also for basic research on structural genes and regulatory elements (including fine mapping and cloning of QTLs) involved in grain yield and associated traits not only in wheat but also in other cereals.

Materials and methods

Collection of data on QTLs

The literature related to QTL mapping of grain yield and its component traits was collected from PubMed (http://www.ncbi.nlm.nih.gov/pubmed) and Google Scholar (https://scholar.google.com/) using appropriate keywords. For each QTL, the following data were collected: (i) QTL name (wherever available); (ii) closely linked flanking markers; (iii) position of the peak and associated confidence interval (CI); (iv) type and size of the mapping population used; (v) LOD score, and (vi) phenotypic variation explained (PVE) or R2 value. In cases, where peak position was missing, the mid-point between the two flanking markers was treated as the peak. Also, when an actual LOD score for an individual QTL was not available, but test statistic was given, LOD score was calculated using the available test statistic; if no information was available, a threshold LOD score of 3.0 was used.

In some cases, the names of QTLs were not available; in such cases, names were assigned following the standard nomenclature (letter “Q” followed by the abbreviated name of the trait, the institute involved, and the chromosome involved). Different QTLs on the same chromosome were distinguished by using Arabic numerals following the identity of the chromosome. In some cases, QTLs were available with related names, which did not exactly match any of the names used in the present study, such QTLs were accommodated in the following six traits (the remaining four traits did not require inclusion of any other trait): (i) the trait ‘grain weight (GWei)’ included thousand-grain weight, 50-grains weight, mean grain weight, hundred-grain weight, single grain weight, grain weight per plant, and test weight, (ii) ‘grain morphology-related traits (GMRTs)’ included grain length, grain width, grain length–width ratio, grain thickness, grain thickness-length ratio, grain area, grain diameter, grain volume/weight, etc.; (iii) ‘grain number (GN)’ included average grain number per spike, grain number per spike, grain number per square meter, grain number per spikelet, grains per spikelet, and grains per fertile spikelet, etc.; (iv) ‘spike-related traits (SRTs)’ included spike length, spikes per plant, spikes per square meter, spike compactness, spike formation rate, spike layer uniformity, basal sterile spikelet number, top sterile spikelet number, fertile floret per spike, and spikelets per spike, etc. (v) ‘tiller number (TN)’ included effective tiller number, tiller number per plant, and tiller number per square meter; (vi) ‘biomass yield (BY)’ included total biomass, tiller biomass, and plant biomass.

Construction of consensus linkage map

A consensus map was developed using the following seven available linkage maps involving different types of markers, which have been widely used for QTL mapping: (i) ‘Wheat_Composite_2004’ with 4403 marker loci (involving SSR, RFLP, AFLP, and some gene-specific loci) available at GrainGenes database (http://wheat.pw.usda.gov); (ii) the ‘Wheat, Consensus SSR, 2004’ with 1235 marker loci (Somers et al. 2004); (iii) an integrated map for durum wheat with 3669 markers (Marone et al. 2013); (iv-vii) four SNP maps developed using following SNP arrays: ‘Illumina 9 K iSelect Beadchip Array’ (Cavanagh et al. 2013),‘Illumina iSelect 90 K SNP Array’ (Wang et al. 2014), ‘Wheat 55 K SNP array’ (Winfield et al. 2016) and the ‘AxiomR, Wheat 660 K SNP array’ (Cui et al. 2017). Marker information or maps from several other independent studies were also utilized for developing the consensus map.

The consensus map was developed using the R package LPMerge (Endelman and Plomion 2014), which involves two steps. The first step involves the calculation of the number of consensus bins, the number of markers, and the initial number of ordinal conflicts. In case of inconsistent order of markers, the package resolves them by removing ordinal constraints using certain statistical manipulations. The second step involves the development of one to four consensus maps as required (K = 1 to 4, where K is the maximum interval size), such that one can select one or more maps using the associated statistics involving estimation of root-mean-square error (RMSE, based on mean and standard deviation, sd) between each individual map and the consensus map. The consensus map with the lowest RMSE was accepted as the final map for further analysis.

QTL projection and meta-QTL analysis

Only major QTLs, each explaining at least 10% of phenotypic variation for the target trait were used for projection on consensus map using the information on confidence interval (CI, 95%), peak position, LOD score, and PVE for each QTL and employing BioMercator V4.2 (Sosnowski et al. 2012). The CI for each QTL was worked out using the following different population-specific equations: (i) for recombinant inbred lines (RILs): CI = 163/(population size x R2); (ii) for F2 and backcross populations: CI = 530/(population size x R2); and (iii) for doubled haploids (DHs): CI = 287/(population size x R2); where 163, 530 and 287 are the population-specific constants obtained from different simulations (Darvasi and Soller 1997; Guo et al. 2006; Venske et al. 2019).

Following the projection, meta-QTL analysis was performed, for each chromosome individually, via the Veyrieras two-step algorithm available in the BioMercator V4.2. In the first step, the best meta-QTL model was selected when the lowest values of the selection criteria were achieved in at least three models; the selection criteria used for this purpose included the following: Akaike information criterion (AIC), AIC corrected (AICc), AIC model 3 (AIC3), Bayesian information criterion (BIC), and Average weight of evidence (AWE). In the second step, MQTLs were generated in accordance with the best model. A genomic region with at least two QTLs was accepted as a MQTL. The LOD score and PVE values of MQTLs were the means of LOD and PVE values of the QTLs involved. Among the identified MQTLs, some promising MQTLs (termed as breeder’s MQTLs) were selected using the following criteria: CI < 2 cM, PVE > 20%, and LOD > 14. We believe that the markers associated with breeder’s MQTLs are the most suitable markers for MAS.

The physical positions for individual MQTLs were obtained using the nucleotide sequences for markers flanking the MQTLs. These sequences for markers like SSRs, ISSRs, STSs, AFLP, and RAPD were retrieved from published reports or the databases like GrainGenes (https://wheat.pw.usda.gov/GG3) and those for SNPs were obtained from CerealsDB (https://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/indexNEW.php) in case of SNP arrays and from URGI (https://wheat-urgi.versailles.inra.fr/) in case of GBS-SNPs. These sequences were utilized for BLASTN searches against wheat reference genome sequence (RefSeq v1.0) available in the EnsemblPlants.

Determination of homoeology among MQTLs

Since wheat is a hexaploid with three related genomes, carrying sets of triplicate genes on homoeologous chromosomes, the available MQTLs were also examined for homoeologous relationships among MQTLs occurring on homoeologous chromosomes at approximately corresponding physical positions. This analysis involved the following steps: (i) CGs identified from the MQTL regions were subjected to BLAST analysis against wheat reference genome sequence available at EnsemblPlants to identify corresponding homoeologues in chromosomes of three different sub-genomes; (ii) homoeologues were extracted with their physical positions from the database; (iii) physical positions of these homoeologues were then compared with the physical coordinates of the MQTLs; MQTLs located on homoeologous chromosomes of three sub-genomes having similar genes were accepted as homoeologous MQTLs. A circular map (or circos) was constructed using an integrative toolkit i.e., Tbtools (Chen et al. 2020b) to visualize the homoeologous relationships among 21 wheat chromosomes based on MQTLs associated with grain yield and related traits.

Comparison of MQTLs with GWAS-based MTAs

Marker-trait associations (MTAs) reported in 20 earlier GWA studies (published during 2017–2021) on grain yield and associated traits were also used for a comparison with MQTLs detected in the present study. These 20 GWA studies involved the use of an equal number of association panels, each panel carrying either exclusively durum wheat, spring wheat, winter wheat, or mixed panels carrying both spring and winter wheat. The details of the population size, SNP markers used, traits involved, and MTAs reported in these GWA studies are available in Table S2. The physical positions of MTAs were obtained from source papers or the databases (such as CerealsDB and URGI-JBrowse wheat genome browser) and compared with the physical coordinates of the MQTLs (chromosome-wise); an individual GWAS-MTA falling within the genomic region of one or more MQTLs was considered as co-located.

Identification of ortho-MQTLs

For identification of ortho-MQTLs, the available information on MQTLs for corresponding traits from other cereals was retrieved from the following reports: (i) rice (Khahani et al. 2020), (ii) barley (Khahani et al. 2019), and (iii) maize (Chen et al. 2017; Wang et al. 2013, 2016; Zhao et al. 2018a, b). In each case, a set of wheat CGs underlying an individual MQTL (showing expression of at least two transcripts per million) was used for a search of syntenic and collinear regions in the genomes of rice, barley, and maize. Following steps were involved: (i) CGs detected in the regions of most stable and robust MQTLs were subjected to BLAST analysis against rice, barley, and maize genome databases at EnsemblPlants to identify corresponding rice, barley, and maize orthologues. (ii) The corresponding orthologues from each species were obtained with their physical positions from the database. (iii) Physical positions of CGs underlying wheat MQTLs were then compared with the orthologous genes underlying rice, barley, and maize MQTLs, and (iv) MQTLs harbouring similar genes located on known syntenic genomic positions among the wheat and other cereals were accepted as ortho-MQTLs.

Candidate genes (CGs) and their expression analysis

MQTLs, each having a physical CI of ≤ 2 Mb, were directly used for the identification of the CGs. For the remaining MQTLs with longer CI, only a 2 Mb genomic region (1 Mb region on either side of the MQTL peak) was examined for the occurrence of CGs. Peak physical positions of the MQTLs were calculated by using the following formula:

$${\text{peak}}\;\,{\text{position(bp)}} = {\text{start}}\;{\text{position }}({\text{bp}}) + \frac{{({\text{end}}\;{\text{position }}({\text{bp}}) - {\text{start}}\;{\text{position }}({\text{bp}}))}}{{({\text{end}}\;{\text{position}}\,({\text{cM}}) - {\text{start}}\;{\text{position}}\,({\text{cM}}))}} \times \frac{{{\text{CI }}(95\% )}}{2}$$

The tool ‘BioMart’ of the EnsemblPlant database (https://plants.ensembl.org/index.html) was used for the search of CGs and also for gene ontology (GO) analysis of each CG obtained through the search. The available information on annotations of each CG, thus identified, was used to select the best CGs within each MQTL. For this purpose, an in silico expression analysis for each CG was conducted using the ‘Wheat Expression Browser-expVIP’ (Expression Visualization and Integration Platform) (http://www.wheat-expression.com) (Ramírez-González et al. 2018). The expression datasets utilized for this purpose included the following: (i) ‘developmental time-course of Chinese Spring’ (Choulet et al. 2014), (ii) ‘transcriptomes of synthetic hexaploid wheat’ (Li et al. 2014); (iii) ‘grain tissue-specific developmental time-course’ (Gillies et al. 2012; Pearce et al. 2015; Pfeifer et al. 2014); (iv) ‘leaves and roots of Chinese Spring wheat at seedling stage’ (Clavijo et al. 2017), (v) ‘seven leaf stage’ (Ramírez-González et al. 2018) and (v) ‘gene expression during flag leaf senescence’ (Borrill et al. 2019). Following the criteria suggested by Wagner and co-workers (2013), only the CGs showing at least 2 transcripts per million (TPM) expression were considered in this study (Wagner et al. 2013). Heat maps for expression data were constructed using the software ‘Morpheus’ (https://software.broadinstitute.org/morpheus/).

Known wheat genes within MQTLs

A search was also made for all known wheat genes for the traits used in the present study. Sequences of these known wheat genes were retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/) using their accession IDs available in the corresponding studies. BLASTN searches were then carried out against the genomic database of wheat to find the physical positions of these genes in the genome. These physical coordinates of the genes were compared with MQTL genomic regions to identify genes that may correspond to individual MQTLs.

Known genes from other cereals and their homology with genes carried by wheat MQTLs

Information on rice, barley, and maize genes associated with grain yield and related traits was collected from the literature. Amino acid sequences for these genes were retrieved from the NCBI and used for BLASTP searches to identify the corresponding wheat proteins (available in EnsemblPlants) at an E-value of < 10−10, with 60% coverage, and > 60% identity. Physical positions of the corresponding genes and wheat MQTLs were then compared to detect the MQTL regions homologous to known genes from other cereals.

Results

Traits and their associated QTLs

A total of 8998 QTLs were available from 230 studies (including 19 studies on durum wheat) conducted during 2015–2020), which involved 190 mapping populations (DH, RILs, F2, and backcross), ranging in size from 32 to 547 lines (for details, see Table S1). The details of the data on QTLs for different traits and their distribution in sub-genomes and chromosomes are summarized in Table S3 and Fig. 1.

Fig. 1
figure 1

Frequency distributions of QTLs using four different criteria: (a) total QTLs and major QTLs on three different sub-genomes (A, B and D); (b) major QTLs on 21 individual chromosomes, (c) major QTLs with different LOD scores, (d) QTLs with different values for phenotypic variation explained (PVE)

The number of QTLs for different traits ranged from 50 for TN to 768 for SRTs (Fig. 1a, Table S3); 2852 major QTLs (with PVE ≥ 10%) were selected for meta-QTL analysis. These selected QTLs were distributed on all the 21 wheat chromosomes with a range of 39 QTLs on 1D to 210 on 2B (Fig. 1b). The number of QTLs in three sub-genomes also differed [1084 (38.0%) QTLs on sub-genome A, 1114 (39.1%) on sub-genome B, and 653 (22.9%) on sub-genome D (Table S3)]. LOD scores ranged from 1.7 to 130.5 (Chen et al. 2020a, b) with 45.7% of QTLs showing a LOD score ranging from 3 to 5 (Fig. 1c). PVE for individual QTLs ranged from 10 to 98.7% (average of 17.8%), with a majority of QTLs (49.9%) with a PVE < 15% (Fig. 1d). Data on most QTLs are also available in the recently developed wheat QTL database (WheatQTLdb; http://wheatqtldb.net/) (Singh et al. 2021).

High-density consensus map of wheat

The integrated consensus map contained 2,33,856 markers, which included the following types: SNPs, DArT, SSR, AFLP, RAPD, STS, SSR (including EST-SSR), SRAP, ISSR, and KASP markers. In addition, the following important genes were also included on this consensus map: Vrn, Ppd, Rht, and Glu loci (Table S4). The consensus map was 11,638.7 cM long with genetic lengths of individual chromosomes ranging from 281.3 cM (4D) to 763.1 cM (4A) (Fig. 2; Table S4, S5). The mean marker densities for individual linkage groups ranged from 12.8 to 48.3 markers per cM for sub-genome A, from 16.7 to 33.5 for sub-genome B, and from 7.7 to 18.1 for sub-genome D. Markers on an individual chromosome were not uniformly distributed, with two ends of a chromosome carrying markers with different densities (Table S4, S5).

Fig. 2
figure 2

Distribution of the markers on the consensus map used for meta-QTL analysis in the current study. The number of loci mapped on individual wheat chromosome is also given

QTL projection and meta-QTL analysis

Only 1842 (of the total of 2852 selected major QTLs, as above) could be projected onto the consensus map; the remaining 1010 QTLs had low R2 values and/or large CI and therefore could not be projected. The projection resulted in the identification of 141 MQTLs, derived from 1828 QTLs; the remaining 14 QTLs were singletons, finding no place in any MQTL (Table 1). The number of MQTLs differed in three sub-genomes (38 in sub-genome A, 54 in sub-genome B, and 49 in sub-genome D), and also in 21 individual chromosomes (from 3 on chromosome 6A to 10 each on chromosomes 4D and 6B) (Figs. 3, 4a). The MQTLs were named using the standard practice, which includes the identity of chromosome followed by Arabic numerals for more than one MQTL on the same chromosome (e.g. MQTL1A.1 to MQTL1A.7). Individual MQTLs were each based on a minimum of 2 QTLs each for many MQTLs to 71 QTLs for MQTL5A.2. Sixty-three (63) MQTLs (each based on at least 10 QTLs) were considered to be the most stable and robust since each was based on QTLs identified in more than one environment and more than one mapping population (Fig. 4b). The density of MQTLs did not match the density of QTLs on individual chromosomes. For instance, the number of MQTLs on chromosomes 4A, 5A, and 7A were low relative to the corresponding densities of QTLs.

Table 1 MQTLs for grain yield and yield-related traits identified in the present study
Fig. 3
figure 3

Distribution of MQTLs on all the 21 wheat chromosomes. MQTLs in orange colour were co-located with marker-trait associations (MTAs) known from earlier genome-wide association studies (GWAS); those in purple colour are breeder’s MQTLs (highlighted), while those in black colour are GWAS-verified breeder’s MQTLs (highlighted)

Fig. 4
figure 4

Basic information of MQTLs identified in the present study. (a) number of MQTLs on different wheat chromosomes, (b) number of MQTLs harbouring different number of initial QTLs, (c) MQTLs associated with different grain yield its component traits, (d) A comparison of mean CI for initial QTLs and MQTLs

MQTLs also harboured some known genes; the following are some examples: (i) MQTL5A.3, 5B.3, and 5B.7 included Vrn loci; (ii) MQTL2A.1 and MQTL2D.6 had Ppd loci; (iii) MQTL4B.5 and MQTL4B.6 included Rht1 in the overlapping region, and (iv) MQTL4D.5 harbored Rht2 (Table S6). Each identified MQTL explained a large proportion of the phenotypic variation ranging from 10.7 to 49.2% for the different traits, whereas LOD scores for individual MQTLs ranged from 3.1 to 62.7. Generally, an individual MQTL was found to be associated with a minimum of two traits, and a maximum of seven traits (Table 1). Among the 141 MQTLs, 102 MQTLs included QTLs of GWei, 118 included QTLs for SRTs and 88 MQTLs included QTLs for GMRTs. A total of 70 MQTLs were directly associated with GY, with 60 MQTLs for GY and GWei, 45 MQTLs GY and GN, 51 MQTLs for GY and GMRTs, 61 MQTLs for GY and SRTs, 44 MQTLs for GY and PH, 37 MQTLs for GY, GWei, and GN, and 10 MQTLs for 7 major traits (GY, GWei, GN, GMRTs, SRTs, PH, and TN). More details are included in Table S7 and Fig. 4c.

The CI for MQTLs ranged from 0.01 to 13.4 cM (mean CI 1.4 cM), with 85 MQTLs each having a CI of < 1 cM (Fig. 4d; Table 1, S6). The level of reduction in CI in MQTLs differed among different chromosomes; the mean CI of MQTLs on chromosomes 4A and 2A reduced by 25.5 and 24.2 times, respectively, followed by, 23.1 and 19.1 times on chromosomes 5A and 2B. The physical length of CI ranged from 0.01 Mb (MQTL4B.3 and MQTL5B.2) to 661.9 Mb (MQTL6B.10) with a mean of 31.5 Mb. Clustered MQTLs were also available on chromosomes 1A, 1B, 2B, 4B, 5B, 6B, and 6D (Fig. 5). In some cases, genetic and physical positions differed, for instance, MQTL3D.4 and MQTL3D.5 had different genetic positions but occupied the same physical position. Similarly, MQTL6B.4, 6B.5, and 6B.6 with different genetic locations, had the same physical positions (2.2–10 Mb) (Table 1). Five clusters of MQTLs (with 362 QTLs), one each located on chromosomes 1B, 3B, 4D, 5B, and 6B included QTLs for almost every trait.

Fig. 5
figure 5

Diagrammatic representation of MQTL clusters detected on chromosomes 1A, 1B, 2B, 4B, 5B, 6B and 6D; only desired parts of the chromosomes are shown for better visualization; bars on the left with different colours represent individual MQTLs within a cluster on each chromosome

Homoeology among MQTLs

Homoeology among 117 of the 141 MQTLs located on three sub-genomes and seven homoeologous groups of wheat chromosomes was also worked out using candidate genes (CGs) occurring in each MQTL region (each MQTL may contain hundreds of CGs; not all CGs within a MQTL may be involved in homoeology). Among the above 117 MQTLs, 46 involved all the three sub-genomes, 66 MQTLs involved only two sub-genomes, and 5 MQTLs involved more than one subgenome, but different homoeologous groups. Details of the number of total genes available in the MQTLs and those showing homoeology are shown in Table S8. The maximum number of CGs were conserved among the MQTLs located on chromosomes 6A and 6B, followed by the MQTLs on chromosomes 2A and 2D, as shown in a circos map (Fig. 6). Partial homoeology among MQTLs belonging to different homoeologous groups was also observed in some cases; the following are some examples: (i) MQTL2A.3 exhibited partial homoeology with MQTL6B.10, (ii) both MQTL2D.8 and MQTL5A.3 showed partial homoeology with MQTL6A.2 and MQTL6B.10, (ii) MQTL4A.3, 4A.4, and 4A.5 each had partial homoeology with MQTL7D.1.

Fig. 6
figure 6

Circular map showing homoeologous relationships among 21 wheat chromosomes based on MQTLs associated with grain yield and related traits

Comparison of MQTLs with GWAS-MTAs

A comparison of physical coordinates of MQTLs with those of GWAS-MTAs for individual traits, resulted in the identification of 77 MQTLs (Fig. 3), which co-localized with 792 GWAS-MTAs (Table S9), including GWAS-MTAs from durum wheat, spring wheat, winter wheat, and mixed wheat populations. The number of co-localized MTAs for each MQTL also varied, with as many as 28 MQTLs each matching with at least 10 GWAS-MTAs; of these, MQTL1B.1 co-localized with 95 MTAs, followed by MQTL2B.1 with 88 MTAs and MQTL7A.1 with 80 MTAs. However, the possibility of some of these MTAs being false positives cannot be ruled out.

Ortho-MQTLs involving MQTLs from wheat, barley, rice and maize

For identification of ortho-MQTLs, initially, 27 wheat MQTLs, each based on > 20 QTLs, were selected. Of these 27 MQTLs, three wheat MQTLs had no corresponding MQTLs in any other cereal. Therefore, ortho-MQTLs involving only 24 stable and robust wheat MQTLs could be identified. These 24 ortho-MQTLs included 5 ortho-MQTLs involving only wheat and maize; 11 involving wheat, rice, and maize, 2 involving wheat, maize, and barley, and 6 involving all the four cereals (Table 2, Fig. 7). The chromosomes of three cereals (other than wheat) involved in ortho-MQTLs were as follows: all the 10 maize chromosomes; 4 barley chromosomes, namely 2H, 4H, 5H, and 7H each with one ortho-MQTL; 9 of the 12 rice chromosomes (ranging from 1 ortho MQTL on chromosome 2 to 8 on chromosome 4). The remaining three rice chromosomes, namely chromosomes 1, 9, and 10 were not involved in any ortho-MQTL. The number of MQTLs involved in an individual ortho-MQTL region ranged from 1 to 16 MQTLs (for more details see Table 2, S10).

Table 2 Ortho-MQTLs involving MQTLs from wheat, barley, rice and maize based on the synteny
Fig. 7
figure 7

Syntenic regions of five ortho-MQTLs among the wheat, maize, rice, and barley. The chromosome number, genomic position, and common genes among the wheat, maize, rice, and barley are indicated. More details are presented in Table 2

Candidate genes: GO terms and expression patterns

A search for CGs in MQTL genomic regions allowed identification of 2953 putative CGs, which included 2298 unique CGs after eliminating duplicate CGs in overlapping MQTLs, as well as CGs with no available information on molecular function and gene ontology (GO) terms. CGs with similar functions were also available in large number, which included the following: (i) 257 CGs for proteins with leucine-rich repeat domain, (ii) 101 for zinc finger proteins (RING/FYVE/PHD-type), (iii) 85 for proteins with F-box-like domain, (iv) 75 for cytochrome P450 proteins, (v) 53 for serine-threonine/tyrosine-protein kinases, (vi) 41 for UDP-glucuronosyl/UDP-glucosyltransferase, etc. (Fig. S1). In some MQTL regions, clusters of genes associated with specific gene superfamilies were also available, which included the following superfamilies: (i) kinase-like domain superfamily, (ii) F-box-like domain superfamily, (iii) UDP-glucuronosyl/UDP-glucosyltransferase, etc.

Gene ontology (GO) analysis gave a number of GO terms, out of which some of the crucial and most abundant GO terms included those involved in all the three categories, as follows: (i) 'biological processes’ (e.g., protein ubiquitination, phosphorylation, oxidation–reduction processes, and protein phosphorylation, etc.). (ii) ‘molecular functions’ (e.g., protein binding, DNA binding, ATP binding, ADP binding, heme binding, metal ion binding, oxidoreductase activity, transmembrane transporter activity, etc.) and (iii) ‘cellular components’ (e.g., cell membrane and its components).

The in silico expression analysis of the above CGs allowed identification of 1,202 CGs, each showing at least 2 transcripts per million (TPM) expression (highlighted with yellow in Table S11). The expression was examined in the following plant organs/tissues at specific developmental stages: grains, spikes, leaves, shoots, and roots, etc. (for some details, see Fig. 8; Table S11). These 1,202 CGs mainly encoded proteins belonging to the following five major classes, (i) transcription factors; (ii) proteins involved in metabolism and/or signaling of growth regulators- gibberellins, cytokinins, and brassinosteroids; (iii) proteins involved in cell division and proliferation; (iv) proteins involved in floral regulators, and (v) proteins involved in the regulation of carbohydrate metabolism (Table S12). The expression patterns of some CGs expressed in spikes and grains are shown in Fig. 8.

Fig. 8
figure 8

Heatmap showing differential expression of CGs underlying the breeder’s MQTLs

We also selected 162 high confidence CGs, with the majority having more than 5 TPM expressions in different tissues (Table 3); 143 of these 162 selected CGs could also be analysed for GO enrichment; the most significantly enriched GO terms associated with biological processes belonged to metabolic (10 genes) and cellular processes (4 genes). The most significantly enriched GO terms associated with molecular functions were for bindings (43 genes) and catalytic activities (27 genes). In terms of cellular components, the genes were enriched only with the cellular anatomical entity (46 genes) mainly including the integral component of membrane, plasma and proximal membrane, cytoplasm, and nucleus (Fig. S2). Most of these CGs showed the highest expression in the spike, spike organs and grains (including the grain tissues such as endosperm, embryo, aleurone layer, seed coat, and transfer cells) at the reproductive stage and therefore, supposed to affect SRTs, GWei, GN, and GMRTs traits; while remaining CGs showed the highest expression in the root, leaves and stem tissues at the vegetative stage and therefore believed to affect TN, HI, and BY of the plants (Table 3).

Table 3 High confidence CGs occurring in genomic regions of wheat MQTL identified in the present study

Known wheat genes associated with MQTLs

Several known yield-related genes in wheat also co-localized with wheat MQTLs (Table S13); 18 such genes included the following: TaSnRK2.3-1B (MQTL1B.7), TaCwi-A1 (MQTL2A.2), TaCYP78A5-2D (MQTL2D.8), FRIZZY PANICLE (MQTL2D.5), Btr1-A (MQTL3A.3), TaPSTOL and TaSnRK2.9-5A (MQTL5A.3), DEP1-5A (MQTL5A.4), TaCWI5D (MQTL5D.4), DEP1-5D (MQTL5D.5), TaSPL21-6A, TaGW2-6A, TaPRR1-6A, and TaBT1-6A (MQTL6A.2), TaBT1-6B, TaPRR1-6B, and TaSPL21-6B (MQTL6B.10), and TaGS3 (MQTL7A.1). These genes encode a variety of proteins spanning all major pathways, for instance, cell wall invertase, sucrose non-fermenting 1-related protein kinase, E3 ubiquitin ligase, APETALA2/AP2/ERF transcription factor, cytochrome P450 protein, and phosphatidylethanolamine binding protein. Similar proteins/products are also encoded by many other CGs, identified in the present study.

Wheat MQTLs having homology with known genes of other cereals

Known alien genes for yield and related traits available from other cereals including rice, barley, and maize were also used for the identification of wheat homologues in MQTL regions. Such alien genes included 24 (50%) of the 48 available rice genes, 3 of the 7 available barley genes, and 8 of the 13 available maize genes. In some cases, the same wheat MQTL had homologues for more than one alien genes; the following are some examples, (i) MQTL2A.2 carried homologues of rice genes An-1, GIF4, GW2, and OsPK2; (ii) MQTL2D.8 carried homologues of rice genes D11, GIF1, OsPK2, and FZP and (iii) MQTL3A.3 carried homologues of barley gene Vrs4 and maize genes ramosa2, vt2 (Table 4). In summary, 33 wheat MQTLs contained 50 homologues of 35 yield genes from rice, barley, and maize.

Table 4 Wheat homologous of yield-related genes from rice, barley and maize in wheat MQTL regions

Discussion

During the last > 20 years, a large number of studies have been conducted on QTL mapping for grain yield and its component traits in wheat (Table S1). The majority of QTLs identified in these studies are each associated with a long CI and low PVE, thus making these QTLs not very useful for marker-assisted breeding. Also, validation is generally needed, when these QTLs are used for a breeding programme involving parents other than those used as parents of populations utilized for interval mapping. In contrast to these QTLs, MQTLs are robust, each with a narrow CI and relatively high PVE, thus increasing their utility not only in crop improvement programmes, but also for basic studies involving cloning and characterization of QTLs/genes for the traits of interest.

In recent years, meta-QTL analysis has been conducted for a variety of traits in all major crops. As mentioned earlier, meta-QTL analysis in wheat has been conducted for several traits (Griffiths et al. 2009, 2012; Gegas et al. 2010; Zhang et al. 2010; Quraishi et al. 2017; Bilgrami et al. 2020; Liu et al. 2020). However, the information on MQTLs soon becomes out-of-date due to the regular appearance of a large number of studies on QTL analysis for individual traits. This creates a need for conducting studies on MQTLs periodically to obtain improved MQTLs. The present study is one such study, conducted to improve upon known MQTLs for grain yield and related traits in wheat. Two earlier studies on meta-QTL analysis for yield included one involving 1,162 QTLs leading to the identification of 71 MQTLs in durum wheat (Maccaferri et al. 2019) and the other involving 381 QTLs leading to the identification of 81 MQTLs in hexaploid wheat (Liu et al., 2020). In contrast to these earlier studies, in the present study, a total of 2,852 QTLs (all major QTLs with PVE > 10%; selected from 8,998 available QTLs) were used for MQTL analysis, leading to the identification of as many as 141 MQTLs; the study involved QTLs from both tetraploid and hexaploid wheat. Thus, it is apparent that the present study is so far the most comprehensive study for the identification of MQTLs for yield and related traits in wheat.

The results of the present study along with those from earlier studies suggest that the precision of the results of the meta-analysis depend at least partly on the number of QTLs available for meta-QTL analysis (Quraishi et al. 2017; Soriano et al., 2021). It may be recalled that in the present study, 63 MQTLs (44.7%) were each based on ≥ 10 known QTLs, 16 of them each based on > 30 QTLs; these frequencies of QTLs per MQTL are higher than those involved in each of the earlier studies (Quraishi et al. 2017; Maccaferri et al., 2019; Liu et al. 2020). Also, 24 MQTLs in the present study had their genetic positions almost overlapping those occupied by MQTLs reported in two recent studies (Bilgrami et al. 2020; Liu et al. 2020). On a critical evaluation of these 24 MQTLs, we selected 15 MQTLs, which can be used with a higher level of confidence for molecular breeding and future studies for cloning and characterization of QTLs/genes (Table S6).

Another interesting feature of the present study is up to ~ ninefold reduction in the length of CI, which means that associated markers must be closely linked to the MQTL, thus making it easier to transfer MQTL during breeding and to improve further the CG prediction. The reduction in the length of CI available in a solitary study was a mere 2.4 times (12.7 cM/5.2 cM; Liu et al. 2020). On the other extreme of the length of CI is the availability of one MQTL (MQTL6B.10) with an unusually large CI (50.7 cM), which was much longer than even the CIs of the corresponding participating QTLs (Table 1). No suitable explanation for this is available, although one possible explanation is the occurrence of unusual and abnormally low recombination frequencies in this MQTL region. This unusual MQTL may be subjected to further study to find out whether or not the region is associated with very low recombination.

Different MQTLs identified during the present study also exhibited homoeologous relationships suggesting the occurrence of sets of duplicate or triplicate MQTLs. These homoeologous relationships between MQTLs were determined using physical positions of CGs belonging to different MQTLs, although a variety of experimental and computational methods for the identification of homoeology are available (Glover et al. 2016, 2021). Large scale similarity was available among MQTLs, which included more than one MQTLs on the same homoeologous chromosome or on chromosomes belonging to different homoeologous groups, suggesting the occurrence of orthologues as well as paralogs. Differences were also observed in the number of the CGs associated with individual MQTLs located on three homoeologous chromosomes of a group. This uneven distribution of genes across three homoeologues of a group may be due to a variety of evolutionary events including gene duplication or gene loss and chromosomal translocation (Clavijo et al. 2017); such differences have also been observed during studies involving comparative genomics, including analysis of microsynteny among wheat chromosomes.

Co-localization of > 50% MQTLs (77/141) with GWAS-MTAs is another interesting feature of the present study. Such studies undertaken earlier include three of our own studies (Saini et al. 2021a, 2021b; Kumar et al. 2021) and only one other study (Aduragbemi and Soriano, 2021). The MQTLs matching the GWAS-MTAs can be used with a higher level of confidence not only for MAS but also for cloning of important genes associated with MQTLs.

Breeder’s MQTLs and clustered MQTLs (hotspots)

A set of 13 selected MQTLs were also described as breeder’s MQTLs. Each of these breeder’s MQTL is characterized by relatively narrow CI, high PVE, and high LOD score (Fig. 3, Table 1). These breeder’s MQTLs can be used for MAS with a higher level of confidence. Sequences of the markers associated with these breeder’s MQTLs are provided in Table S15. Of these breeder’s MQTLs, MQTL3D.1 had a PVE value of 49.2% and therefore can be considered to be a mega breeder MQTL for three traits (GWei, GMRTs, and PH).

Availability of clusters of MQTLs, one cluster each on seven different chromosomes, is another important feature of the present study (Fig. 5). These clusters of MQTLs may be treated as ‘hotspots’ and can be utilized for MAS in breeding programmes and also for future basic research with a high level of confidence. The term ‘hotspot’ for a specific ‘genomic region’ controlling a trait has often been used for clusters of QTLs, sometimes occurring within a MQTL, as has been done in rice for QTLs controlling thermotolerance (Raza et al. 2020) and drought tolerance (Selamat and Nadarajah, 2021) and in chickpea for drought tolerance (Kale et al. 2015). The ‘hot spot’ for drought tolerance in chickpea has also been utilized for the development of a drought-tolerant chickpea cultivar named BGM 10,216, with 16% yield gain over parent cultivar Pusa 372 (Bharadwaj et al. 2021). However, ‘hot spots’ representing clusters of MQTLs, as reported in the present study, have not been described earlier and should prove useful in wheat breeding programmes.

Ortho-MQTLs for cereals

In the present study, 24 ortho-MQTLs were also identified, which should represent conserved genomic regions and therefore may be recommended for use across the cereals. The conserved nature of these ortho-MQTLs also suggests that these may be associated with some regulatory elements, each influencing many genes (Quraishi et al. 2011; Jin et al. 2015; Khahani et al. 2020). Six of these 24 ortho-MQTLs covered all four crops, revealing a high level of conservation among the wheat, barley, maize, and rice. The remaining 18 ortho-QTLs had conservation across three or only two cereals suggesting limited conservation.

The occurrence of three MQTLs (viz., 1A.5, 2A.2, and 3A.4) for which no ortho-MQTLs were available may be attributed to their occurrence in the pan-genome’s dispensible regions, which are not shared by other cereals or the occurrence of structural variations (SVs) including insertions, deletions, rearrangements, and duplications that occurred during the cereal evolution and thus have disrupted the colinearity of genes in the target regions. Another possible alternative is a lack of sampling of QTLs from the corresponding syntenic regions in rice, barley, and maize. Ortho-MQTL involving such MQTLs may be available in the future when more MQTL studies are conducted in the other three related cereals used in the present study.

Overall, ortho-MQTL analysis unraveled the large conserved regions among the cereals utilizing the information on synteny and collinearity; these regions possess a large number of un-characterized and characterized genes which are believed to be associated with the traits in question. The ortho-MQTLs, identified in the present study, may be used as a useful resource for further studies involving the identification of novel conserved genes providing functional markers and for the development of the so-called conserved orthologous set (COS) markers for use in cereal breeding programs. The success of this approach is apparent from at least two earlier studies, one of these studies involving the identification of the conserved gene ‘glutamate synthase’ (GoGAT) associated with an ortho-MQTL for nitrogen use efficiency (Quraishi et al. 2011). The other study involved the identification of two genes (GRMZM2G178190, and GRMZM2G366919) associated with ortho-MQTLs for grain iron/zinc. These genes were considered to be the best candidate genes associated with grain iron and zinc in maize and were further characterized as naturally occurring resistance-associated macrophage protein genes (Jin et al. 2015).

Candidate genes for MQTLs

Identification of 2298 CGs within the MQTL regions and the study of their spatio-temporal expression in different parts of the plant at different developmental stages is another important part of the present study. In this study, 1204 CGs had > 2 TPM expressions, of which 28 CGs had > 10 TPM expressions (Table S11). Being associated with MQTLs for yield and related traits, many of these CGs are known to be involved in controlling grain yield and associated traits in cereals (Nadolska-Orczyk et al. 2017; Daba et al. 2020) (Table S12). CGs with similar functions were also available in more than one MQTL region (Fig. S1). Some CGs encoding unpredicted or uncharacterized proteins also showed significant expression in different plant tissues and may be utilized for future research (Table S11). In some MQTL regions, clusters of CGs associated with specific gene superfamilies were also available; the following are some examples of genes encoding the following proteins that are known to be involved in controlling yield and related traits: kinase-like domain superfamily, F-box-like domain superfamily, UDP-glucuronosyl/UDP-glucosyltransferase, etc. (Table S11). These gene clusters are quite common in plant genomes and are known to encode proteins involved in many enzymatic pathways in plants (Yi et al. 2007; Medema et al. 2015). These members of a gene cluster are often located within only a few thousand base pairs of a small genomic region and encoded similar products or proteins, thus together sharing a generalized function. Association of these genes/gene families with grain yield and its component traits has been reported in several earlier studies (Ma et al. 2017; Nadolska-Orczyk et al. 2017; Gunupuru et al. 2018; Sakuma et al. 2018; Niño-González et al. 2019; Gautam et al. 2019; Daba et al. 2020; Jia et al. 2020; Li and Wei 2020).

The above CGs may be further cloned and characterized and then exploited through biotechnological approaches including transgenesis and gene editing. One such CG encodes ‘expansin’ in developing seeds and minimizes the trade-off between grain number and grain weight, thus presumably contributing to the improvement of grain yield, as shown in an earlier study, where transgenic plants with enhanced expression of the ‘expansin’ gene yielded 12.3% higher grain weight relative to the control; this finally resulted in an 11.3% increase in grain yield under field conditions (Calderini et al. 2020). In the present study, we also identified many putative CGs, including genes for expansin proteins in some MQTL regions (Table 3, S11). In the future, the targeted transgenic and gene editing approaches using these potential CGs may allow improvement for grain yield in wheat. However, in some cases, where gene clusters regulate the expression of target trait, the transgenic method using a single gene may not be as effective as MAS, where flanking markers can target a much larger region encompassing all the genes of a cluster. Alternatively, a multi-transgene cassette containing multiple genes can also be constructed and introduced to the plant cells for the genetic improvement of grain yield and associated traits. Some of the MQTLs also included some known genes, such as, Vrn, Ppd, and Rht genes, that are widely known to regulate plant phenology, ultimately influencing the grain yield and other component traits in wheat (Kamran et al. 2014; Gupta et al. 2020).

Wheat homologues of yield-related alien genes from other cereals

Identification of 50 wheat homoeologues of 35 alien yield-related genes from other cereals was also an important part of the present study. Following wheat homologues have already been cloned and characterized; TaGW2 (Su et al. 2011), TaCwi-A1 (Ma et al. 2012), TaGS-D1 (Zhang et al. 2014), DEP1 (Vavilova et al. 2017), TaCKX family genes (Ogonowska et al. 2019), TaSPL14 (Cao et al. 2021), FRIZZY PANICLE (FZP) (Dobrovolskaya et al. 2015), and TaTAR2.1 (Shao et al. 2017). Among the remaining genes, the following alien genes can be used for the identification of wheat orthologues using comparative genomics followed by the development of functional markers: (i) rice genes: An-1, Bsg1, D11, D2, LP, PGL1, qGL3, SMG1, OsOTUB1, OsLG3, OsDHHC1, OsY37, qWS8, OsALMT7, GS9, GSN1, OsPS1-F, and OsPK2;(ii) barley genes: vrs4 and COM1 and (iii) maize genes: FASCIATED EAR2, ramosa2, ZmFrk1, bs1, KNR6, and BIF1 (Table 4, S14). The success of this approach is apparent from the identification of a wheat homolog of the rice gene, OsGRF4 from a MQTL region (Avni et al. 2018). This suggests that integrating an MQTL study with a well-annotated genome can rapidly lead to the detection of CGs underlying the traits of interest. However, this strategy is not feasible for the identification of unknown functional genes; ortho-MQTL analysis may prove quite useful in such cases (as discussed above).

Conclusion

The present study is an effort towards a better understanding of the genetic architecture of grain yield and its component traits in wheat through the identification of MQTLs, ortho-MQTLs, and CGs (Fig. 9). The study involved an integration of the available information about QTLs that were identified in earlier studies along with the available genomic and transcriptomic resources of wheat. As many as 141 MQTLs, including 77 MQTLs matching known GWAS-MTAs, each associated with a narrow CI, and 1,202 putative CGs were identified. Thirteen of these 141 MQTLs regions are described as breeder’s MQTLs; we recommend these breeder’s MQTLs for use in MAS for grain yield improvement in wheat.

Fig. 9
figure 9

Analysis conduit and outcomes of the present study

The ortho-MQTL analysis demonstrated that MQTLs of yield-related traits appear to be transferable to other cereals; this may assist breeding programmes in other cereals. Based on a comparative genomic approach, several wheat homologs of corresponding alien genes from rice, barley, and maize were also detected in the MQTL regions. As many as 162 of 1,202 putative CGs are also recommended for future basic studies including cloning and functional characterization. The in-vivo confirmation and/or validation of any of these loci, specifically the CGs identified, may be accomplished through one or more of the following approaches: gene cloning, reverse genetic approaches (like gene silencing), transcriptomics, proteomics, etc. The information on the molecular markers associated with MQTLs and the CGs occupying the MQTL regions may also prove useful in breeding for grain yield improvement in wheat especially with the development of multiplexed SNP detection platform.