Introduction

Bread wheat (Triticum aestivum L.) is a major staple crop all over the world, which provides food for ~ 36% of the world’s population involving ~ 20% calories in human diet (http://faostat.fao.org; Maulana et al. 2018). In this crop, as also in other cereals, the grain size is known to be associated with various characteristics of flour (e.g.. hydrolytic enzymes activity), which in turn control baking quality and end-use suitability, the protein content and grain yield (Evers 2000; Breseghello and Sorrells 2007; Gegas et al. 2010). The larger grains also have a positive effect on seedling vigour, market preference, grain yield and flour yield characteristics (Chastain et al. 1995; Gan and Stobbe 1996). The grain size and shape (including length, width, perimeter of the grain, etc.) did not receive the desired attention for the improvement of yield in the current wheat breeding programmes (Kovach et al. 2007). However, significant phenotypic and genetic variation for grain size and grain weight does occur in different Triticum species and can be exploited for improvement of grain morphology, indirectly leading to an improvement in grain yield (Gegas et al. 2010; Jing et al. 2007; Rasheed et al. 2014). However, genetics of grain characteristics in wheat did not receive as much attention as it did in case of rice, where > 30 genes for grain characteristics have been cloned and characterized (Zheng et al. 2015; Hu et al. 2015; Yu et al. 2017; Jiang et al. 2019; Yang et al. 2019).

The genetic studies in wheat and other cereals already undertaken suggest that grain shape and size are complex quantitative traits (with their component traits), each controlled by a number of major and minor genes. For the study of genetics of such complex traits, interval mapping and GWAS are two important approaches, each with its own merits and demerits. In recent years, GWAS has been preferred, since it utilize a wide range of genetic variation generated through hundreds of cycles of recombination. A beginning of GWAS for the study of genetic architecture of quantitative traits in animals (mainly humans) and plant systems was made in early years of the present century, when relative merits of LD-based association mapping (AM) over the linkage-based interval mapping was documented (for reviews, see Huang et al. 2013; Gupta et al. 2014, 2019; Cortes et al. 2021). The major breakthough, however, took place, when a mixed linear model involving population structure and relatedness was proposed (Yu et al. 2006) and the software TASSEL was made available (Bradbury et al. 2008). This was soon followed by further improvement of GWAS through a series of models proposed during 2010–2020, a process that still continues.

A number of studies involving interval mapping and GWAS are available for genetics of traits related to grain shape and size (Breseghello and Sorrels 2007; Sun et al. 2009; Okamoto et al. 2013; Rasheed et al. 2014; Li et al. 2015; Yan et al. 2017; Kumari et al. 2018; Ma et al. 2019; Wang et al. 2019; Yu et al. 2019; Alemu et al. 2020; Sun et al. 2020; Xin et al. 2020). In most of these genetic studies, a limited number of mapping populations and association panels have been used (Breseghello and Sorrels 2007; Sun et al. 2009; Rasheed et al. 2014). Also, a majority of QTLs that have been reported for grain weight are not suitable for breeding purpose, since these are not stable across environments, explain limited phenotypic variability, and are involved in epistatic interactions (Campbell et al. 2003; Gupta et al. 2007; Prashant et al. 2012; Patil et al. 2013; Cabral et al. 2018). This warrants further studies for detection of additional MTAs using novel germplasm, which has not so far been used for the study of grain morphology traits in wheat.

Most early GWA studies including the above studies, conducted in a number of crops, involved only single locus, single trait analysis and did not involve analysis of epistatic interactions. Although occurrence of epistasis was initially recognized in early years of the last century (Bateson 1909), its importance and methods for detection and estimation, particularly for quantitative traits, have largely been recognized in the present century (Holland 2001; Lu et al. 2011; Ritchie and Van Steen 2018). It is now known that both MTAs/QTLs with main effect and those with no main effect are generally involved in epistatic interactions (Niel et al. 2015; Ritchie and Van Steen 2018; Slim et al. 2020). In some GWA studies also, epistatic interactions have been examined, and significant epistatic interactions have been reported for several traits in different crops including wheat (Mackay 2014; Moellers et al. 2017; Sehgal et al. 2017, 2020).

Keeping the above in view, the present study was planned, which involved the use of the following four improved mixed linear models: (1) Compressed Mixed Linear Model (CMLM;Zhang et al. 2010), (2) Fixed and random model Circulating Probability Unification (FarmCPU; Liu et al. 2016), (3) Settlement of MLM Under Progressively Exclusive Relationship (SUPER; Wang et al. 2014), and (4) Multi-trait analysis matrix variate linear mixed model (mvLMM; Korte et al. 2012). These models allowed multi-locus and multi-trait analysis and also allowed identification and measurements of epistatic interactions. Hopefully, the results of the present study will prove useful for developing wheat cultivars, through MAS/MARS, with improved grain/flour quality, high market value and grain yield.

Materials and methods

GWAS panel and genotyping details

Originally, 330 spring wheat genotypes belonging to spring wheat reference set (SWRS) were obtained from CIMMYT, Mexico. However, in the present study on GWAS a subset comprising only 225 diverse wheat genotypes was used. The geographical distribution of these genotypes is shown in Fig. 1 and pedigree information for each genotype is available in Table S1. Genotyping data was retrieved from the original data on the entire set of 330 SWRS genotypes that were genotyped using DArT-seq (outsourced by CIMMYT to Diversity Array Technology Pvt. Ltd, Australia, under their “Seed for Discovery” project). The genotypic data for 10,904 SNPs out of 17,937 SNP markers for the whole set of 330 genotypes was available for the subset of 225 genotypes used during the present study, as also in a previous study (Kumar et al. 2018).

Fig. 1
figure 1

Country of origin of the wheat genotypes comprising association mapping panel used during the present study

The GWAS panel of 225 genotypes was raised during rabi-season in a simple lattice design with two replications at the Research Farm of the Department of Genetics and Plant Breeding, Ch. Charan Singh University, Meerut (location coordinates: 28.984644°N and 77.705956°E) over two consecutive years representing two different environments (E1; 2017–18 and E2; 2018–19). Each genotype was represented by a plot of 3 rows of 1.5 m each, with a row to row distance of 0.25 m. The total number of blocks were 15 with each block containing 45 rows i.e. three rows of each genotype. Normal cultural practices including fertilizer application (i.e., 200 kg/ha fertilizer; N:P:K = 8:8:8) and irrigation were followed.

Data on grain morphology traits

The data on six grain morphology traits were collected using 24 grains (picked up randomly) for each of the 225 genotypes per replication using SmartGrain software ver. 1.2 (Tanabata et al. 2012). SmartGrain software makes use of grain images to record data on grain morphology, using all the grains within a digital image; it detects outlines, and then estimates all grain size parameters viz, cross-sectional area (CSA), grain perimeter (GP), grain length (GL), grain width (GWid), grain length–width ratio (GLWR) (Fig. 2). Thousand grain weight (TGW) in grams (g) was also measured by weighing 1000-grains of each of the 225 genotypes in each replication. Another parameter (grain form-density; GFD) was calculated with the help of three different parameters (TGW, GL & GWid) using the following formula: \({\text{GFD = }}\frac{TGW}{{GL \times GWid}}\) , which determines variation in grain weight that is not accounted for by the differences in grain length and weight (Giura and Saulescu 1996). For each trait, average values obtained for all 24 grains were utilized for further analysis.

Fig. 2
figure 2

Images of seeds showing method of recording phenotypic data: (1) 24 seeds (left panel) from one genotype, (2) single whole wheat seed (middle panel), and (3) flowchart for estimation of seed morphology (right panel) using SmartGrain software. Grain morphological traits in the middle panel: (1) GCSA = grain cross-sectional area; shaded region of grain; (2) GP = grain perimeter in red line; (3) GL = grain length as yellow line, (4) Gwid = grain width as green line

Statistical analysis

The violin plots were prepared to depict the distribution of phenotypic data for all the six traits for each of the two individual environments (E1 and E2) and also using BLUP values (B). The BLUP values were generated using the ‘lme4’ package in the R programme (Bates et al. 2015). Pearson’s correlation coefficients were estimated using R package Performance Analytics (Peterson et al. 2018) and ANOVA was conducted using AMMI (additive main effects and multiplicative interactions) available in Agricolae R package (Mendiburu and Yaseen 2020). Broad sense heritability (H2) was calculated as the ratio between genotypic variance (σ2g) and phenotypic variance (σ2p) using mathematical formula; \(H^{2} = \frac{\sigma 2g}{{\sigma 2p}} \times 100\) by Microsoft Excel 2010 (Allard 1999).

Population structure using principal component (PC) analysis and, kinship matrix

The Q matrix based on PCs and kinship matrix based on relatedness among markers were automatically generated using default set of parameters in GWAS models (VanRaden 2008; Lipka et al. 2012). These were used to perform GWAS using software for CMLM, FarmCPU and SUPER available in the package ‘Genomic Association and Prediction Integrated Tool (GAPIT version 2.0; Tang et al 2016) in R programme; mvLMM for multi-trait analysis was perormed using the software GEMMA (Zhou and Stephens 2012).

Marker-trait association (MTAs)

Main effect MTAs were identified using three different models, namely CMLM, FarmCPU and SUPER using software available in GAPIT version 2.0 using R programme (Tang et al. 2016). Matrix variate linear mixed model (mvLMM) for multi-trait analysis was used for the identification of SNPs associated with more than two traits (Furlotte and Eskin 2015). The epistatic interactions were identified using R package SNPassoc, where the function interactionPval was executed for computing the statistical significance of paired SNP–SNP interactions (Gonzalez et al. 2007). Main effect MTAs and those involved in epistatic interactions, identified by different methods, were independently subjected to FDR (− log p value = 5.0) and Bonferroni correction to address the problem of multiple testing; very few MTAs were obtained using Bonferroni correction (− log p value = 6.0); therefore, MTAs after applying FDR were used for further analysis. MTAs obtained after Bonferroni correction, were also used for identification of candidate genes (CGs) (see later). The chromosomal locations and positions of unmapped markers associated with traits were identified using BLASTn from IWGSC RefSeq v1.0 Ensembl Plants using nucleotide sequences (SNP tags) of SNPs with 100% identities (https://plants.ensembl.org/Triticum_aestivum/Info/Index).

Comparison of MTAs with known QTLs

The MTAs obtained in the present study were compared with known QTLs/MTAs of each trait. For this purpose, physical positions of known MTAs were obtained utilizing the data available in Ensemble Plant (version 50; https://plants.ensembl.org/Triticum_aestivum/Info/Index).

Identification of putativeCGs and their in silico expression analysis

For identification of CGs, only the MTAs which qualified Bonferroni correction, were utilized. For this purpose, SNP tags associated with MTAs were aligned with reference genome version IWGSC1.0 (IWGSC 2018), available at Ensembl Plants (http://www.ensembl.org/info/docs/tools/vep/index.html) and 200 kb window (100 kb on either side of SNP) was obtained for identification of  CGs. The gene ontology (GO) annotation information of all these CGs was extracted from the IWGSC website (http://www.wheatgenome.org/). The RNA-seq expression data from Wheat Expression Browser was used for in-silico gene expression analysis (http://www.wheat-expression.com/). The expression of each CGs in different parts of the seed and at different developmental stages was examined and presented in the form of a heatmap.

Homology-based mining of genes

Protein sequences of known rice genes, for the traits of interest, were retrieved from NCBI database (https://www.ncbi.nlm.nih.gov/), and used against wheat genome sequences (IWGSC RefSeq v1.0) available at Ensembl Plants. Orthologous wheat genome sequences were identified using BLASTx search. From these wheat genome sequences, the sequences which were located within the CGs were used for further study. The rice genes and their functions were identified using oryzabase database of rice (https://shigen.nig.ac.jp/rice/oryzabase/).

Results

Frequency distribution, ANOVA, heritability and correlations

The frequency distributions of phenotypic data for all the six traits based on BLUP values (Fig. 3) and for each of the two environments are presented in the form of violin plots (Fig. S1). The results of ANOVA for all the six traits are presented in Table 1. Genotypic differences were significant for all the six traits, but G X E interactions were significant for only four (GCSA, GP, GL, and GFD) of the six traits. Broad sense heritability (H2) ranged from 41% (GWid) to 89% (GCSA) (Table 1). All 21 pairwise Pearson’s correlations involving all six grain morphology traits and TGW for all the three sets of data including BLUP and two environments are provided in Fig S2. Significant correlations included 12 positive and 7 negative correlations in E1 (Fig. S2), 8 positive and 4 negative correlations in E2 (Fig. S3), and 11 positive and 7 negative correlations using BLUP values (Fig. 4). Significant positive correlations were also available between  TGW andGCSA in all the three sets of data (Fig. 4, Fig. S2 & S3).

Fig. 3
figure 3

Violin plots showing the frequency distribution of BLUP values for six traits of grain morphology. Shaded regions of the violin plots represent the frequency distribution of data, in each case, the vertical solid bar indicates range of average values, and median is shown as white circle, depicting the lower, medium and upper quartile

Table 1 Analysis of variance (ANOVA) and broad sense heritability (H2) of wheat genotypes tested in two different environments (E1:2017–18 and E2:2018–19) with two replications in each environment
Fig. 4
figure 4

Pairwise Pearson’s correlation coefficients among the six grain morphology traits and 1000-grain weight estimated using BLUP values of each trait

Population structure using principal component analysis (PCA)

Population structure was worked out using principle component analysis (PCA), where the first three PCs produced a 3D scatter plot showing distribution of genotypes into sub-groups. The PCs divided the whole association panel into three sub-groups with variation within sub-groups ranging from 3.44% (PCA3) to 6% (PCA 1) (Malik et al. 2021). These three subgroups represented the population structure and were used for development of Q matrix for GWAS.

MTAs for two individual environments and BLUP values

The results of GWAS for individual traits involving two environments and BLUP and those for multi-traits will be described separately.

MTAs for two environments

Using data from two environments (E1 and E2), 824 MTAs for individual traits were identified using three models of GWAS (CMLM, FarmCPU; and SUPER; Table S2). After FDR correction, only 21 significant MTAs were identified. The number of trait-wise MTAs identified using each of the three models were as follows; (1) CMLM: 11 MTAs for three trait (Table 2 and Fig. 5), (2) FarmCPU: 9 MTAs for five traits (Table 3 and Fig. 6) and (3) SUPER: only 1 MTA for GL (SNP_3202: 3A). After Bonferroni correction, only one significant MTA (SNP_3202) for GL in E2 was available in each of the following three models: CMLM, FarmCPU, and SUPER methods; this MTA is located on chromosome 3A at 593,990,302–593,990,370 bp. Only MTAs obtained using FDR and Bonferroni correction were used for further study.

Table 2 Summary of significant MTAs detected in the two environments (E1, 2017–18 and E2, 2018–19) and BLUP values after false discovery rate (FDR) correction using CMLM
Fig. 5
figure 5

Manhattan plots (left side) showing significant marker-trait associations (MTAs) on 21 chromosomes and ump (represents unmapped MTAs) using CMLM with corrected P-value after FDR (blue line) and Bonferroni (red line) corrections in each case. Q-Q plots (right side) represent distribution of observed and expected P values for the same traits using environment 1 & 2 (E1 and E2) and BLUP values

Table 3 A list of significant MTAs that qualify FDR correction detected in each of the two the environments (E1-2017–18 and E2-2018–19) using FarmCPU
Fig. 6
figure 6

Manhattan plots (left side) showing significant marker-trait associations (MTAs) on 21 chromosomes and ump (represents unmapped MTAs) using FarmCPU with corrected P-value < 0.05 after FDR (blue line) and Bonferroni (red line) corrections in each case. Q-Q plots (right side) represent distribution of observed and expected P-values for the same traits using environment 1 & 2 (E1 and E2) and BLUP values

MTAs for BLUP data

Using BLUP values, 234 significant MTAs were identified; the distribution of trait-wise MTAs for three models were as follows: (1) CMLM: 65 MTAs for six traits; this number was reduced to 11 MTAs after FDR; only four MTAs qualified Bonferroni multiple correction for three traits (Table 2 & Fig. 5), (2) FarmCPU: 93 MTAs for six traits; only 10 MTAs after FDR for five traits, and only two MTAs after Bonferroni correction for two traits (Table 3 & Fig. 6), (3) SUPER: 76 MTAs were found for six traits, but no MTA was detected after FDR or Bonferroni corrections for any trait. Two MTAs were also common in each of the following pairs involving BLUP, environments and models: BLUP-FarmCPU, E1-FarmCPU and E2-FarmCPU; these MTAs were also associated with GCSA and GFD, respectively (Table 3).

MTAs using multi-trait analysis

Multi-trait analysis, using mvLMM, gave 46 MTAs in 16 trait combinations (2–4 correlated traits) in both environments (E1 and E2) after FDR correction (Table 4), but after Bonferroni correction only four significant MTAs (out of 514 identified MTAs) for one trait combination (GL-GWid-GLWR-FFD) were available in each of the two environments. These four MTAs involved only two SNPs, one each on 2B and 6B (SNP_9744 on 2B at 800,127,644–800,127,712 bp and SNP_912 on 6B at 674,558,317–674,558,364 bp: Table 4).

Table 4 A summary of significant MTAs for 2–4 traits that qualify FDR correction and detected in both the environments (E1-2017–18 and E2-2018–19) using mvLMM

MTAs involved in epistasis

MTAs involving 360 pairs of epistatic interactions involving all the six traits were also identified; after FDR correction, only MTAs for 29 epistatic interactions involving 47 markers (11 markers were common among the interactions) were available. The trait-wise interactions included a number of interactions for individual traits ranging from a minimum of one for GP to a maximum of 12 interactions for GFD (Table S3). After Bonferroni corrections, only three epistatic interactions remained significant, which involved two of the six traits (Table S3).

Comparison of MTAs with QTLs reported earlier

A total of 116 significant MTAs (21 CMLM + 19 FarmCPU + 1 SUPER + 46 mvLMM + 29 epistasis) were available for all the six traits after applying FDR. These significant MTAs were compared with the previously reported QTLs/MTAs for same traits. Only two MTAs one each for GP and GWid were detected in the flanking region of the reported QTL/MTAs. However, three MTAs were identified in the vicinity (< 50 Mb) of the reported QTL regions (Table 5).

Table 5 MTAs for six traits identified during the present study that correspond to known QTLs (either overlapping known QTLs or located in the vicinity of known QTLs) in wheat

Important MTAs for marker assisted selection (MAS)

MTAs identified in the present study, were subjected to further scrutiny in order to identify the most important MTAs, which could be recommended for MAS. MTAs were selected, which fulfilled at least one of the following criteria: (1) highest R2, (2) lowest P-value, (3) availability in more than one models, (4) stability (identified in all the environments), (5) availability in earlier studies (including both interval mapping and GWAS). (6) qualified Bonferroni corrections, including those involved in epistatic interaction. Using these criteria, only 14 MTAs were short-listed, which all qualified Bonferroni and included 4 MTAs from CMLM, 1 MTA from FarmCPU, 2 MTAs from mvLMM, 6 MTAs from epistasis and 1 common MTA form CMLM, FarmCPU and SUPER (Table 6).

Table 6 Summary of most important MTAs for MAS

CGs based on GWAS-MTAs

Using GWAS-MTAs, which qualified Bonferroni correction, 25 CGs were available for 9 MTAs (all six traits were represented). Of these 25 CGs, two CGs were available for a solitary common MTA identified using each of the three models (CMLM, FarmCPU and SUPER); six CGs were also available for one MTA detected using mvLMM and 17 CGs were available from 6 MTAs involved in epistatic interactions. However no hit was detected for two MTAs (one each from mvLMM, and epistatic interaction; Table S4). Similarly, 30 CGs were identified using six MTAs obtained using BLUP values for the four traits (Table S4). CGs so detected were found to be involved in different biological (including grain related traits) and molecular functions. Some CGs encoded important proteins such as sucrose synthase (TraesCS7B02G482200: Table S4). These protein domains were associated either directly or indirectly with grain traits in wheat.

Expression analysis of CGs

In silico expression analysis suggested that CGs differed in their level of expression in different tissues of the seed (e.g. embryo, endosperm, aleurone, seed coat) and at different stages of development (Fig. 7). It is apparent from the figure that most genes have tissue-specific expression and that many more genes have high expression only in embryo proper and seed coat.

Fig. 7
figure 7

In-silico expression analysis of CGs in different tissues in wheat; columns represent 10 different development stages/tissues of grain and rows represent 26 different proteins encoded by candidate genes (CGs)

Wheat orthologues of rice genes for grain morphology

Wheat orthologs of known rice genes for grain morphology traits were also identified (Table S5). For this purpose, eight rice genes were selected on the basis of their functions relevant to the traits of our interest and location within the regions of corresponding wheat CGs. Each of these rice genes is known to play an important role in seed morphology and reproductive organ development. The results of this exercise are summarised in Table 7. Among these eight rice genes, following two genes deserved special attention. (1) The wheat gene TraesCS7B02G409000 (associated with GLWR of wheat) corresponded to the rice gene OsGW2, which is known to play an important role in in grain width/weight in rice. This gene is highly expressed in seed development stages and the encoded protein is localized in the nucleus and cytoplasm. (2) The wheat gene TraesCS2B02G487800 (associated with GLWR) corresponded rice gene OsVP1, which is known to play an important role in seed maturation and dormancy.

Table 7 List of selected homology-based mining of candidate genes

Discussion

Grain morphology is an important complex quantitative trait, which includes a number of component traits including six traits that were utilized for GWAS in the present study. It is known that a long tiny primitive grain of the progenitors of wheat was transformed during domestication into the present-day much bigger round grain of wheat (Gegas et al. 2010). Among the three progenitors of wheat sub-genomes, the progenitor of sub-genome A, namely T. urartu, is known to be the major determinant of grain shape (Feldman et al. 2012). Among other alien species, the sub-genome D progenitor, namely Ae. tauschii is also known to affect grain morphology of wheat (Röder et al. 2008; Okamoto et al. 2013; Rasheed et al. 2014).

The objective of the present study was to investigate the genetic architecture of six grain morphology traits utilizing a set of 225 wheat genotypes comprising an association panel. The genotypes in the above panel were also a part of the association panels used in three of our earlier studies (Kumar et al. 2018; Gahlaut et al. 2019, 2021). Several studies involving interval mapping or association mapping have been conducted earlier to study the genetic architecture of the grain weight in wheat (Ain et al. 2015; Sun et al. 2017; Kumar et al. 2018; Singh et al. 2021). Some of these studies also involved grain morphology traits, including the following traits used in the present study: GCSA, GWid, GLWR, GP, GL and GFD (Ramya et al. 2010; Gegas et al. 2010; Prashant et al. 2012; Williams et al. 2013; Okamoto et al. 2013; Williams and Sorrells 2014; Tyagi et al. 2014; Rasheed et al. 2014; Zhang et al. 2015; Wu et al. 2016; Arora et al. 2017; Yan et al. 2017; Kumari et al. 2018; Yoshioka et al. 2019; Alemu et al. 2020; Gao et al. 2021; Schierenbeck et al. 2021). The present study is yet another study, which added a number of novel MTAs to the ever-growing list of markers associated with grain morphology traits. In the present study, three sets of data (E1, E2 and BLUP) were utilized. Majority of MTAs were unique in these three sets with only 4 MTAs that were common in two of the three sets. This suggested that majority of MTAs were environment specific and that there were strong QT L x Environment interactions, making it difficult to use the same MTAs for MAS under different environment. This is in agreement with the results of some earlier studies (Gahlaut et al. 2020; Alemu et al., 2021; Thudi et al. 2021).

The four models and another model for epistatic interactions were used in the present study. Software for these models (except mvLMM) are available in the package GAPIT version 2.0, made available in 2016 (Tang et al. 2016), although now GAPIT version 3 is also available (Wang and Zhang 2021); GAPIT version 1.0 was developed earlier (Lipka et al. 2012). Each of these four models have their own merits over initial MLM proposed by Ed Buckler’s group at Cornell University (Yu et al. 2006). For instance, CMLM overcomes the computational problem associated with large datasets by decreasing the effective sample size by clustering individuals into groups. FarmCPU eliminates confounding problems arising due to kinship, population structure, multiple testing, etc. through the use of Fixed Effect Model (FEM) and a Random Effect Model (REM) iteratively. SUPER solves the computational burden problem, and overcomes the limitation, where number of SNPs should be less than the number of individuals through extracting a small subset of SNPs, thus increasing the statistical power without reducing the number of SNPs. The mvLMM allows use of two or more correlated traits for multi-trait analysis.

In the present study, different models gave different results suggesting that different models differ in their superiority over TASSEL. Only a solitary MTA (SNP_3202) was identified using three different models (Table 2). Similar results have been reported in several recent GWAS studies involving multiple models (Alemu et al. 2021; Thudi et al. 2021). This feature alone shows the utility of multiple models for GWAS. The results of the present study also support the general conclusion that till recently, FarmCPU has been the best model for GWAS. However, FarmCPU is a bin method under the unrealistic assumption that quantitative trait nucleotides (QTNs) are evenly distributed throughout the genome. In FarmCPU we use REM which is computationally expensive. In the recently developed tool named BLINK, REM is replaced with FEM in order to eliminate the requirement that QTNs are evenly distributed throughout the genome, which further improved the statistical power of FarmCPU, in addition to reduced computing time, so that a large dataset with one million individuals and one-half million markers can be analyzed within three hours, instead of one week using FarmCPU (Huang et al. 2018). Therefore, we anticipate that BLINK will be increasingly used in future replacing FarmCPU, particularly, when large association panels with millions of markers are used for GWAS.

Correlations between different traits including SA, GP, GL and GWid also suggested that there may be QTLs which contribute each to more than one trait involved in grain size (Fig. 4, Figs. S2 & S3; Okamoto et al. 2013; Kumari et al. 2018; Yoshioka et al. 2019; Alemu et al. 2020; Mérida-García et al. 2020; Gao et al. 2021). Significant negative correlation of GFD with all the traits except GLWR is also in agreement with earlier reports (Dholakia et al. 2003; Kumari et al. 2018). A significant negative correlation between GLWR and GWid observed in the present study in both the environments (Figs. S2 & S3) may be the result of positive selection for round grains (Okamoto et al. 2013; Kumari et al. 2018; Gao et al. 2021).

The association panel used during the present study exhibited a low level of population structure containing only three sub-populations, as was also reported in two of our earlier GWA studies, where association panels included the genotypes used during the present study (Kumar et al. 2018; Gahlaut et al. 2019, 2021). However, the number of sub-populations in some other previous GWA studies ranged from three (Wang et al. 2017; Rahimi et al. 2019) to five (Qaseem et al. 2018; Jamil et al. 2019). Low level of population structure, as observed in the present study and in our earlier GWA studies is a desirable feature for conducting GWAS.

The availability of a large number of MTAs identified during the present study suggested that multiple loci are involved in controlling each of the six traits (Fig. 3 and Table 1). A comparison of the results of the present study with those reported earlier suggested that 14 of the novel 112 MTAs identified in the present study (Tables 5, 6) may be useful for MAS. The multi-locus analysis (allowed by FarmCPU) and multi-trait analysis also allowed identification of MTAs, each controlling more than one trait. This feature of mvLMM provides an additional advantage over multi-trait mixed model (MTMM) initially developed by Korte et al. (2012), which permits detection of MTAs for not more than two traits at a time (Jaiswal et al. 2016; Kumar et al. 2018). Although multi-trait analysis using MTMM and involving only two traits has been used in earlier GWA studies, more recently mvLMM permitting identification of MTAs involving more than two traits has been used in several studies (Furlotte and Eskin 2015; Kumar et al. 2018; Deng et al. 2018; Gao et al. 2021; Chen et al. 2021; Malik et al. 2021). Availability of 46 MTAs, each involving more than one trait (Table 4), suggest a significant role of pleiotropy or close linkage in controlling the grain morphology traits in wheat. These MTAs included four significant MTAs for GL-GWid-GLWR-GFD that qualified Bonferroni correction (Table 4). These four MTAs were common among both the environments and should prove useful for simultaneous improvement of more than one trait using MAS. Earlier reports on multi-trait association analysis reported 6 to 38 MTAs involving up to five traits for a variety of grain quality and yield traits in wheat (Kumar et al. 2018; Gao et al. 2021; Malik et al. 2021). However, specifically for grain morphology traits, only six MTAs involving two to four traits were reported by Gao et al. (2021), which agrees with the results of the present investigation.

The availability of 29 first order epistatic interactions, which qualified FDR correction (Table S3) also suggested that epistatic interactions were not uncommon. However, epistatic MTAs have been sparingly used in MAS for crop improvement (Reif et al. 2011; Kao et al. 1999; Langer et al. 2014; Jaiswal et al. 2016; Sehgal et al. 2017; Kumar et al. 2018). Therefore, on the basis of the present study, it is recommended that MTAs involved in epistatic interactions, which significantly contribute to the available genetic variation for the traits under study, should also be used for MAS. Epistatic interactions in wheat using GWAS have been reported in several earlier studies involving a number of traits, which include the following: (1) flowering time (Reif et al. 2011; Langer et al. 2014), (2) stem rust resistance (Yu et al. 2011), and (3) agronomic traits (Sehgal et al. 2017). Epistatic interactions were also earlier identified in several studies from our own laboratory, where 63 epistatic interactions for 13 different yield traits were identified in one study and 73 epistatic interactions for three micronutrients were identified in the other study (Jaiswal et al. 2016); Kumar et al. 2018). Similarly, epistatic interactions were also detected using bi-parental populations through interval mapping (Li et al. 2011; Xu et al. 2012; Rouse et al. 2014; Boeven et al. 2020). Therefore, the results of epistatic interactions from earlier studies and the present study may be useful for MAS to supplement conventional wheat breeding.

MTAs were also used for identification of CGs for each individual trait; this gave 55 CGs (Table S4). The in silico expression analysis for 26 of the above CGs showed variable expression in grain and related tissues. The CGs which showed very high expression in one or more of the different tissues of grain encode the following different proteins: (i) Zinc finger, ZPR1-type, (ii) Neprosin, (iii) Protein kinase domain, (iv) WD40 repeat, (v) Helicase, (vi) Protein kinase domain.1, (vii) F-box like domain superfamily, (viii) K homology domain, type,1 superfamily, (ix) B3DNA binding domain, (x) HhH GPD domain, (xi) DNA glycolase, (xii) Glycosyltransferase 61, (xiii) Protein JASON, (xiv) Zinc finger, RINGFYV/PHD-type1, and (xv) F-box like domain superfamily.1 (Fig. 7). The CGs encoding these proteins are the targets of future functional genomics research aimed at determining the role of these CGs in controlling the grain morphology traits examined during the present study.

Some other important CGs includes the following; (1) The gene TaFBA1 encoding F-box protein and its function was confirmed through overexpression in tobacco, suggesting its involvement in plant growth and development including seed germination (An et al. 2019). (2) A sucrose synthase gene, which is a key enzyme of starch biosynthesis, thus affecting contents of starch affecting grain morphology and weight (Dai et al. 2009). (3) Glycoside hydrolase (a superfamily protein) is involved in different processes such as seed development and endosperm cell wall degradation during germination (Dong et al. 2019).

Another exercise in the present study involved identification of eight important CG-based wheat genomic regions carrying orthologues of rice genes for grain morphology. These CGs, were associated with reproduction and regulation of seed morphology and should be the target for future studies in wheat (Table 7). For example, the gene TraesCS3A02G344400, which is orthologous to the rice gene OsRLCK47, is associated with GL of wheat. Shubha et al. (2008), functionally characterized RLCKs from several crops including rice, where it was shown to play roles in development (panicle and seed development) and stress responses. Similarly, TraesCS7B02G482200 (associated with GLWR of wheat) is orthologue of the gene OsSUS5, which plays an important role in seed development and was shown to express in sink tissues like root, flower and immature seed (Cho et al. 2011).

Conclusions

The MTAs for six grain morphology traits, identified in present study, may be useful for MAS for development of wheat cultivars with high grain yield associated with high quality. These MTAs may be validated and may also be used for post-GWAS or joint linkage and association mapping (JLAM; Gupta et al. 2019; Gahlaut et al. 2019). The information of the CGs may also be useful for the development of CG-based functional markers. These markers may also be useful for MAS to facilitate breeding for improvement of grain morphology traits. The CGs identified in the present study may also be used for CG-based association mapping and functional genomics in future research.