Introduction

Bread wheat is the third most important food crop globally (after rice and maize) in terms of production and consumption. It is consumed by more than 40% world population as a staple food and is the primary source of calories for millions of people world-wide. However, the crop is deficient for major micronutrients like Fe, Zn and β-carotenoid, which are present only as minor constituents of wheat grain. Over three billion people, including one third of the children in developing countries suffer from micronutrient malnutrition or hidden hunger (Chattha et al. 2017). Deficiency of these micronutrients is also witnessed in the form of metabolic disorders like anaemia (due to Fe deficiency), night blindness and xerophthalmea, cardiovascular diseases and a variety of cancers and neurological disorders (due to β-carotenoid deficiency; Colasuonno et al. 2017). There are also reports of poor pregnancy outcomes like impaired or stunted growth in children due to Zn deficiency. It is estimated that > 60% of the world population suffers from Fe deficiency and > 30% population suffers from Zn deficiency (White and Broadley 2009). Similarly, 190 million pre-school-children and 19.1 million pregnant women around the world suffer from β-carotenoid deficiency (WHO 2016).

Grain protein content (GPC) is another important trait, which has an impact on the nutritional value of the grain and also on the technological property of the flour. Protein content and essential amino acids also affect the key functions of the human body including development and maintenance of muscles. Protein energy malnutrition (PEM) has been noticed among 161 million children globally (http://www.worldhunger.org/2015-world-hunger-and-poverty-facts-and-statistics/).

Biofortification through genetic manipulations is known to be one of the best options for nutritional improvement of crops (Welch and Graham 2004; Ortiz-Monasterio et al. 2007). The most important work on genetic improvement for micronutrients contents (biofortification) in staple food crops (rice, wheat, maize, cassava, sweet-potato, pearl-millet and bean) has been conducted under HarvestPlus project launched in 2004 by International Agricultural Research Consortium (IARC) of CGIAR. During HarvestPlus Phase I (2003–2008) involving screening of 3000 wheat accessions, contents of Zn and Fe were found to be in the range of 20–115 ppm and 23–88 ppm, respectively, with the highest levels found in landraces (https://biofortconf.ifpri.info). In recent years, biofortified varieties in some crops have also been released in ~ 30 countries including India in respect of various micronutrients (http://www.harvestplus.org).

The variability for micronutrients (including Fe, Zn and provitamin A) in wheat is largely genetic in nature with a complex polygenic control (Shi et al. 2008; Joshi et al. 2010; Velu et al. 2012; Srinivasa et al. 2014). This makes the improvement in these traits difficult through conventional breeding (Velu et al. 2012). Marker-assisted backcrossing (MABC) and marker-assisted recurrent selection (MARS) are good options. These methods would require determination of marker-trait associations through linkage-based interval mapping and LD-based genome wide association studies (GWAS). This will also help in understanding the details of genetic architecture of the traits (Tiwari et al. 2009). A number of interval mapping studies have already been conducted to identify QTLs for micronutrients in wheat (Pozniak et al. 2007; Peleg et al. 2009; Tiwari et al. 2009; Xu et al. 2012; Roshanzamir et al. 2013; Zhao et al. 2013; Tiwari et al. 2016; Sharma et al. 2018). GWAS has also been conducted for micronutrients in different crops like rice (Norton et al. 2014), pea (Diapari et al. 2015), maize (Suwarno et al. 2015), barley (Leplat et al. 2016), chickpea (Diapari et al. 2014) and cassava (Esuma et al. 2016; Rabbi et al. 2017). Some reports involving GWAS for Zn, Fe and carotenoid contents are also available in wheat (Gorafi et al. 2016; Manickavelu et al. 2017; Colasuonno et al. 2017). However, a majority of GWA studies conducted so far are based on the study of single locus and single trait, which gives information of limited utility. Often epistasis is also ignored during GWAS, although there are few GWA studies, where epistasis was examined (Jaiswal et al. 2016; Sehgal et al. 2017). Recently, MLMM and MTMM have become available for GWAS, which overcome the above limitations of genetic analysis (Segura et al. 2012; Korte et al. 2012; Jaiswal et al. 2016; Thoen et al. 2017). mvLMM has also been used so that more than two correlated traits may be examined in multi-trait analysis (Zhou and Stephens 2012; Furlotte and Eskin 2015).

SNPs are the most abundant class of markers associated with sequence variability in the genome and thus have the potential to provide the highest map resolution. Therefore, SNPs have become the markers of choice for GWAS (Jones et al. 2007). In the present study, GWAS was conducted for micronutrients and other yield-related quantitative traits in a set of wheat genotypes. The study involved single locus single trait, MLMM, MTMM and mvLMM; where both main effects and epistatic MTAs were also identified. The results of this study should prove useful for developing wheat cultivars with improved nutritional value using MAS/MARS.

Materials and methods

Association mapping panel and genotyping

The association mapping panel comprised 246 wheat genotypes of a spring wheat reference set (SWRS) procured from CIMMYT gene bank, Mexico. A set of 17,937 SNP markers generated using DArT-seq, at Diversity Array Technology Pvt. Ltd. Australia under the “Seed for Discovery” project of CIMMYT Mexico, was used for genotyping of all the 246 accessions of bread wheat (Table S1). The markers were mapped on all the 21 chromosomes using DArT PL’s consensus map of wheat based on > 100 crosses. The map (version 4) has 110,000 markers including ~ 5000 original DArT markers, the remaining being DArTseq markers. Only 8637 SNPs could be placed on genetic map involving all the 21 chromosomes; 2973 belonged to A sub-genome, 4505 belonged to the B sub-genome and 1159 belonged to the D sub-genome.

Field trials and experimental data

The above association panel was raised in a simple lattice design with two replications at two different locations, during rabi season of 2013–2014 at Powerkheda (Location coordinates: 22°40′50.01N 77°44′59.18E), Madhya Pradesh, India and during 2014–2015 at Meerut (Location coordinates: 28.9845°N, 77.7064°E), Uttar Pradesh, India, using normal field management practices (i.e., 200 kg/ha fertilizer; N:P:K = 8:8:8). Each genotype was raised in a plot of 3 rows of 1.5 m each, with a row to row distance of 0.25 m. Phenotypic data were recorded for four nutritional and four yield-related traits. The nutritional traits included the following: grain protein content (GPC) as per cent of grain weight, (2) grain β-carotenoid content (GBCC; µg/g), (3) grain iron content (GFeC; ppm), and (4) grain zinc content (GZnC; ppm). Similarly, the yield traits included the following: (1) tiller number per plant (TNPP), (2) grain number per spike (GNPS), (3) thousand grain weight (TGW) in g, (4) grain yield per plot (GYPP) in kg/ha.

For GPC and GBCC, 50–60 g of seed of each sample was used. GPC (%) was adjusted to 12% moisture content using the following formula and the value obtained was used for further analysis.

$$ {\text{Adjusted}}\,{\text{GPC}}\,\left( \% \right)\,{\text{value}} = \frac{{100 - {\text{Base}}\,{\text{M}} . {\text{C}}.\,(12\% )}}{{100 - {\text{Actual}}\,{\text{M}} . {\text{C}}.\,(\% )}} \times {\text{Measured}}\,{\text{GPC}}\,\left( \% \right)\;{\text{value}} $$

where M.C. is moisture content.

For estimating Zn and Fe content, 10–12 g (90–100 kernels) seed of each genotype was used. GPC and GBCC data was recorded using Infratec (1241) Grain Analyzer available at CCSU, Meerut. Data on Zn and Fe contents were recorded using X-ray Fluorescence (EDXRF spectrometer X-Supreme 8000; Paltridge et al. 2012) available at Department of Genetics and Plant Breeding of Banaras Hindu University (BHU), The data on yield-related traits was recorded using the traditional methods. For each trait, the data was recorded for all replications. Only means of data over replications were used for further analysis.

Statistical analysis (Descriptive statistics, Pearson’s correlation coefficients, analysis of variance (ANOVA) and heritability)

The estimates of descriptive statistics including mean, range, standard error, coefficient of variation (CV as %), Pearson’s correlation coefficients were obtained using SPSS v. 17.0. ANOVA was conducted using additive main effects and multiplicative interactions (AMMI) model through Agricolae package of R program. Broad sense heritability (H2) estimates were calculated from phenotypic variance (σ2p) and the genotypic variance (σ 2g ) according to Allard (1999) using MS Excel 2010.

Population structure analysis

Model-based cluster analysis of association mapping panel was conducted to infer the level of population structure in the association panel using the software STRUCTURE version 2.2 (Pritchard et al. 2000), which was performed using 42 SNPs, one from each of the 42 chromosome arms.

The number of assumed sub-populations (K) was set from 2 to 20, and the process was repeated three to five times. For each run, burn-in and Markov Chain Monte Carlo (MCMC) iterations were set to 50,000 and 100,000, respectively and a model “without admixture and correlated allele frequencies” was used. The number of sub-populations was determined following delta K (ΔK) method (Evanno et al. 2005). Assignment of genotypes to sub-populations was done on the basis of their affiliation probabilities. A genotype was assigned to a specific sub-population, with which it has ≥ 80% affiliation probability; genotypes with ≤ 80% affiliation probability with each sub-population were treated as “admixtures”.

Linkage disequilibrium (LD) analysis

LD (in terms of r2) analysis was performed for each of the 21 wheat chromosomes with associated mapped SNPs using window size 50 with the help of software TASSEL v. 5.0. Genome wide threshold LD was calculated using unlinked markers following Breseghello and Sorrells 2006.

Marker-trait associations (MTAs)

Phenotypic data for the micronutrient contents, GPC, and yield related traits for 246 genotypes of each location and the corresponding genotypic data of SNP markers were used for single locus single traits analysis to identify MTAs. TASSEL v. 3.0 was used for this purpose using both General Linear Model (GLM) and Mixed Linear Model (MLM). For GLM, population structure (the Q model) without familial relatedness (the K model) was used, whereas for MLM, both population structure and the familial relatedness (Q + K model) were used (Yu et al. 2006). Familial relatedness (kinship matrix) was calculated using the genotypic data for all the 17,937 markers.

GWA analysis was also conducted using MLMM, MTMM and mvLMM. The MLMM, MTMM and mvLMM analyses were performed using relevant R packages (Segura et al. 2012; Korte et al. 2012; Furlotte and Eskin 2015). For MLMM, background genome was considered as cofactors (as in composite interval mapping) using stepwise mixed-model regression with forward inclusion and backward elimination (Segura et al. 2012; Jaiswal et al. 2016). For MTMM, all pairs of phenotypic traits showing significant correlation were used. Similarly, for mvLMM, more than two phenotypic traits showing significant correlation with each other were used. In all cases, P value ≤ 0.001 was considered for identification of significant MTAs. Stringent criteria of P ≤ 0.001 was used to deal with the problem of multiple testing. Each significant MTA was subjected to Bonferroni multiple correction (using additional αGWAS≤ 0.05, thus naking the overall significant threshold to be P ≤ 0.00005) for eliminating false positives in case of single locus single trait and MTMM. However, for MLMM and mvLMM, Bonferroni correction was not needed, since the packages used already had a provision for Bonferroni correction.

Analysis of epistasis

Analysis for epistatic interactions was carried out using SNPassoc package (González et al. 2007) of R program, where the function interactionPval was executed for computing the statistical significance (P value) of SNP–SNP interaction.

Results

Phenotypic data and correlations

The eight different phenotypic traits were placed in the following two groups for presentation of results: (1) nutritional traits, and (2) yield-related traits; Violin plots representing the frequency distributions of data for all the eight traits on each of the two locations are presented in Fig. 1a, b; the data for each trait gave a good fit to normal distribution. The results of descriptive statistics; the range of mean values were as follows: GPC, 9.99–18.87 (%); GBCC, 2.94–6.555 (mg/kg); GFeC, 24.50–44.30 (ppm); GZnC, 17.75–49.70 (ppm); TNPP, 3.00–14.67; GNPS, 10.50–58.00; TGW, 12.43–59.36 g; GYPP, 42.50–520.00 kg/ha. The details of descriptive statistics and CV are presented in Table S2.

Fig. 1
figure 1

a, b Violin plots showing the frequency distribution of 4 nutritional and 4 yield-related traits at two locations. Shaded regions of the violin plots represent the frequency distribution of data, in each case, the vertical solid bar indicates range of average values, and median is shown as a minute white circle within the solid bar, with a horizontal bar, depicting the lower, medium and upper quartile

Correlations in majority of the pairs of traits at both the locations were highly significant at P ≤ 0.01 (Table 1). As many as 14 of the possible 28 pairs of traits, each having a correlation above 0.25, were selected for MTMM analysis; four of these correlations were consistent at both the locations. Of the remaining ten correlations, seven were available in data from Meerut and only three were available in data from Powerkheda. Correlations involving more than two traits were also available (e.g. GZnC and GYPP); in such cases, mvLMM approach was also used for determining MTAs involving more than two correlated traits.

Table 1 Estimates of Pearson correlation coefficients for different traits of the association panel at two locations

ANOVA and heritability

The combined ANOVA revealed highly significant variation for each trait with the following sources of variation: genotypes, environments and G × E interaction. Estimates of broad sense heritability (H2) of nutritional traits ranged from 6.65% (GPC) to 62.05% (GFeC) and that for yield traits ranged from 52% (TNPP) to 89.66% (GYPP; Table 2).

Table 2 Analysis of variance (ANOVA) and heritability (H2) of yield traits, GPC and micronutrient contents in SWRS genotypes tested for 2 years/environments and four replications (two in each environment)

Population structure and LD analysis

Model-based cluster analysis revealed that the AM panel used in the present study is structured and comprised four subpopulations viz. G1, G2 G3 and G4. The four sub-populations included 30 (G1), 42 (G2), 49 (G3) and 125 (G4; admixture) genotypes, respectively (Fig. 2). The information generated by population structure was used for analysing marker-trait associations to reduce the number of false positives.

Fig. 2
figure 2

AM panel showing structuring of four subpopulations in different colours, viz. G1 (red colour, 30 genotypes), G2 (green colour, 42 genotypes), G3 (blue colour, 49 genotypes) and G4 (a single bar with two or more colours, admixture of 125 genotypes)

LD between pairs of markers was estimated for each of the 21 chromosomes. The measures of LD decay for all the 21 chromosomes are summarized in Figs. S1. The number of SNP pairs with significant LD (P ≤ 0.01) on individual chromosome ranged from 328 (on chromosome 4D) to 6955 (on chromosome 1A). Genome-wide threshold LD was r2 = 0.21 and the genome-wide mean genetic distance showing no LD decay was 3.0 cM. The genetic distance showing LD ranged from a a minimum of 2 cM in certain regions of chromosomes 2B, 3A, 3B and 6B to a maximum of 20 cM in a region on chromosome 3D (Fig. S1).

MTAs identified using single locus single trait and MLMM

MTAs for two groups of traits including nutritional traits and yield-related traits are described separately. Results of MTMM and mvLMM are described for both the groups together, since the analyses involved two and more than two traits, sometime belonging to both the groups. The most important MTAs are depicted in Fig. 3.

Fig. 3
figure 3figure 3

Distribution of significant MTAs on different chromosomes identified using SLST, MLMM and MTMM. Different colour shaded star symbols indicate different traits associated with SNPs identified using SLST (GNPS- , GYPP- , GPC- , GBCC- , GZnC- ). Similarly, different coloured marker names indicate the MTAs identified using MLMM ( ) and different combinations of coloured squares, like indicate MTAs identified using MTMM (GPC/GFeC- ). MTAs indicated with #,@ and & indicate those detected in both the environments using SLST, MLMM and MTMM, respectively. MTAs indicated with R are those which were reported earlier

Nutritional traits

Using SLST, a total of 584 significant MTAs for four nutritional traits at two locations were identified; only 10 of these MTAs passed Bonferroni correction, of which one belonged to sub-genome A, six belonged to sub-genome B, and none belonged to D genome; the associated markers for the remaining three MTAs were not assigned to chromosomes (Table S3, S4 and Fig. S2). The above 10 MTAs all belonged to Meerut location only.

Using MLMM, 271 MTAs (after Bonferroni correction) were identified for the same four nutritional traits. The 164 MTAs belonged to Powerkheda, 106 belonged to Meerut and only one belonged to both the locations (Table S3). Of these only 144 MTAs were mapped on chromosomes, with 58 for sub-genome A, 66 for sub-genome B and 20 for sub-genome D (Table S3, S5 and Fig. S2).

Yield-related traits

For the four yield-related traits, SLST gave a total of 3251 MTAs involving both the locations (P < 0.001). But only 126 of these MTAs passed Bonferroni correction, of which 96 MTAs were detected at Powerkheda, 8 were detected at Meerut and 22 at both the locations (Table S3, S4 and Fig. S2). Similarly, using MLMM, 316 MTAs (Bonferroni correction passed) were identified for these traits (149 MTAs at Powerkheda, 161 at Meerut and 6 at both the locations). Of the 316 associated markers, only 209 markers were mapped on different chromosomes, with 101 on sub-genome A, 92 for sub-genome B and 16 for sub-genome D (Table S3, S5 and Fig. S2).

MTAs/QTLs identified using MTMM and mvLMM

Using MTMM, 1253 MTAs were identified for 14 pairs of correlated traits at both the locations (no MTA was available for GPC/GBCC, TGW/GPC and TGW/GBCC), but only 28 of these MTAs (6 for GPC/GFeC, Powerkheda and Meerut; 1 for GYPP/TGW, Meerut; 21 for GPC/GZnC, Meerut) for three pairs of traits qualified after Bonferroni correction. Only 18 MTAs could be mapped on A and B sub-genomes (3 belong to sub-genome A; 15 belong to sub-genome B), none was found on sub-genome D (Table S6 and S7; Fig. 2).

Using mvLMM, 33 MTAs were identified at both the locations for eight pairs of correlated traits. Seven MTAs for three pairs of traits were identified for Powerkheda location; five of these seven MTAs were mapped, one on sub-genome A (3A) and four on sub-genome B (one each on 2B, 3B and two on 6B). The remaining 26 MTAs involving five pairs of traits were identified at Meerut location; of these, 4 MTAs belonged to sub-genome A; 11 belonged to sub-genome B and only 2 belonged to sub-genome D. The remaining nine MTAs involved SNPs that were not mapped (Table 3).

Table 3 A summary of significant (P ≤ 0.001) MTAs detected using mvLMM at Powerkheda and Meerut

There were also MTAs that were identified each by more than one approaches. There were also MTAs, which were common for more than one traits. There were 22 MTAs, involving three yield traits, each identified by both SLST and MLMM (13 MTAs for GYPP, 6 for GNPS and 3 for TGW). There were another 12 MTAs involving three pairs of correlated traits (with 5 traits) that were detected by both MTMM and MLMM, thus placing a higher level of confidence in these markers (Table S8). Similarly, mvLMM detected one MTA (involving marker M5415) that was associated with a combination of four traits (GYPP/GNPS/TGW/GZnC); this marker was also detected by MLMM for GYPP. Thus, there were more several individual MTAs, each involving more than one trait; these MTAs may or may not represent pleiotropic genes/QTLs. These MTAs included the following: (1) 1 MTA detected using SLST was common between two traits; (2) 32 MTAs detected using MLMM involved 8 traits (Table S8). All these markers are depicted in Fig. 3.

MTAs involved in epistatic interactions

For six of the eight traits (excluding GFeC and TNPP), 73 epistatic interactions involving 146 markers {spread over 18 of the 21 chromosomes (except 4D, 5D and 6D)} were identified (43 at Powerkheda and 30 at Meerut). The interactions for individual traits ranged from only one interaction for TGW to a maximum of 27 interactions for GYPP (Table S9).

Relative importance of MTAs

The MTAs identified by all the methods were subjected to scrutiny in order to identify the most important MTAs, which could be recommended for marker-assisted selection (MAS). The criteria used for this purpose included the following: (1) lowest P value, (2) credibility (if identified by more than one method) and (3) whether or not the same marker was detected in earlier studies (including both interval mapping and GWAS) on the same chromosome. The MTAs and QTL detected at only one of the two locations were due to environmental effect (G × E) and were treated as location specific (see Table S3, S4, S5 and S6). The following 6 MTAs fulfilled the last two criteria: M3205, M802, M10371, M1673 & M4208 (all five for Fe and protein content) and M14494 (for grain yield). One MTA (M15816; for Fe and protein content) was found to be novel, being reported for the first time and fulfilled the criteria (ii) and the remaining last MTA (M15616; for GYPP) fulfilled the first criteria. There were also markers, which were highly significant (lowest P value), some of these confirming the markers identified in earlier studies (Table 4 and Fig. 3). A solitary MTA (M5415 for GYPP using MLMM and for GYPP/GNPS/TGW/GZnC using mvLMM; Table 3) fulfilled the only single (ii) criterion, this MTA is not included in Table 4.

Table 4 Summary of most important QTLs

Discussion

An improvement in the content of major micronutrients in cereal grains, popularly described as biofortification, is an important area of research in order to address the problem of nutritional security (Neeraja et al. 2017). This requires an understanding of the genetic architecture that is associated with the content of each micronutrient and protein. The same is true for yield-related traits. Also, since screening a large segregating population for contents of individual micronutrients in a breeding program can be labour intensive and cost-ineffective, marker-assisted selection (MAS) is a desirable option, if information about marker-trait associations (MTAs) is available (Loladze 2014; Myers et al. 2014).

Results of the present study support earlier reports that adequate variability for different micronutrients (Zn, Fe and β-carotene), protein content and yield-traits is available in the global wheat germplasm (Tiwari et al. 2009; Srinivasa et al. 2014; Li et al. 2016). However, these traits are quantitative in nature, making generation of knowledge about genetic architecture for improvement of these traits relatively difficult (Shi et al. 2008; Srinivasa et al. 2014). The variability in the concentration of Fe (24.50–44.30 ppm), Zn (17.75- 49 ppm) and beta-carotene (0.5–6.5 mg/kg) in the material used in the present study is apparently adequate. However, there were clear location effects, with plants accumulating more Fe and Zn in wheat grain at Powerkheda than at Meerut (Table S2), although the experiments were conducted with uniform nutrient management practices at both the locations. In contrast, the micronutrient concentration in some land races of wheat is relatively high particularly in Afghan landraces (Fe: 55.14–122.2 ppm and Zn: 15.56–87.29 ppm; White and Broadley 2009; Ortiz-Monasterio et al. 2007; Manickavelu et al. 2017). A diverse collection of 132 wheat cultivars that are available at CIMMYT also has a wide range of micronutrient concentration (Fe: 28.8–56.50 ppm; Zn: 25.2–53.3 ppm; Graham et al. 1999; Xu et al. 2011). The variability of concentration of β-carotene was also adequate in the association panel used, although concentration up to 14 mg/kg have been reported in einkorn wheat, which can be used as a good source for improvement of β-carotene concentration in the current high yielding wheat varieties (Hentschel et al. 2002; Hidalgo et al. 2006; Zhou et al. 2005; Leenhardt et al. 2006).

The present study is an effort firstly to evaluate the different methods available for GWAS, secondly to supplement the knowledge about the genetic architecture of micronutrient contents and some yield-related traits and finally to provide MTAs for these traits for MAS. The study was based on an association panel of 246 genotypes with large range of variation, in contrast to limited variation sampled in biparental mapping populations, which have been more frequently used for QTL analysis.

For conducting GWAS analysis, mixed linear model is generally used, which involves use of information about population structure (Q matrix) and familial relatedness/kinship (K matrix). In the present study, population structure was examined using 42 markers, located on 42 different arms of 21 wheat chromosomes. Such markers showing no linkage are generally used in GWAS studies to avoid the confounding due to linkage between markers. A similar strategy was also followed in two earlier independent studies, where 40 markers (Ogbonnaya et al. 2017) and 49 markers (Mulki et al. 2013) showing no linkage and covering the entire wheat genome were used for population structure. In order to find out the adequacy of 42 markers for population structure, we also conducted population structure analysis using 84 markers; the results of population structure based on 84 markers were no different from the results obtained using 42 markers suggesting that 42 markers were adequate for determining the population structure.

Similarly, familial relatedness or kinship analysis accounts for relationship among genotypes that are used for GWAS, since the results of GWAS may be confounded if familial relationsgip that may arise due to selection and/or genetic drift is not taken into account (Bernardo et al. 1996; Yu et al. 2006; Zhang et al. 2007). It has been shown that the use of kinship estimates in mixed models markedly reduces the number of false positive associations (Korte and Farlow 2013). In the present study, all available 17,937 SNP markers were used for estimating familial relatedness.

MTAs using different approaches (SLST, MLMM, MTMM & mvLMM)

Although single locus single trait (SLST) analysis has several limitations that have been widely discussed (Gupta et al. 2014; Jaiswal et al. 2016), this analysis was conducted in the present study mainly for the purpose of comparing its results with those of several other recently developed and improved approaches that were used in parallel during the present study. These new approaches included MLMM, MTMM and mvLMM that were proposed during recent years (Segura et al. 2012; Korte et al. 2012; Zhou and Stephens 2014; Furlotte and Eskin 2015; Jaiswal et al. 2016; Thoen et al. 2017) and were used in several recent studies including the present study. These three newer approaches for GWAS used in the present study take into consideration the genetic background and epistatic interaction (addressed in MLMM) and pleiotropy involving QTL/genes, which may each influence more than one traits (addressed in MTMM and mvLMM).

For identification of significant MTAs using different approaches, one of the major problem is multiple testing, which results in a large number of false positives. In order to overcome this problem, we used two measures, one is the use of a stringent P value ≤ 0.001 (instead of P value of 0.05 or 0.01), and the other was application of Bonferroni correction, which has bee specially designed for this purpose, These criteria (including a P value ≤ 0.001) were also used in several earlier studies to identify significant MTAs in wheat (Ain et al. 2015; Wang et al. 2017) and other crops (Mogga et al. 2018). The problem of ≤ 0.001 has been further addressed through Bonferroni correction (αGWAS ≤ 0.05), which is the most conservative approach (overall P ≤ 0.00005) to deal with the problem of multiple testing, so that one does not expect any false positives after application of Bonerroni correction, although this may lead to some false negatives, which is not as serious a problem as the problem of false positives. In MLMM and mvLMM, which are the other two methods used in the present study Bonferroni correction is built-in within the software used, so that no Bonferroni correction is applied on MTAs obtained.

In the present study, 3835 MTAs were obtained using SLST and 990 MTAs were obtained using MTMM. However, only 136 MTAs from SLST and 28 MTAs from MTMM qualified after Bonferroni correction. MLMM and mvLMM gave 587 and 33 MTAs respectively (Bonferroni corrections were inherent in these approaches). However, it is widely recognized that Bonferroni correction is a method, which leads to overcorrection giving many false negatives and is therefore a trade-off; consequently, many true and genuine MTAs must have been lost due to Bonferroni correction, in each of the approaches used during the present study. However, additional MTAs identified using epistatic interactions and a comparison with the results of earlier studies may partially address the problem of false negatives. In SLST results, there were a number of MTAs, which did not qualify Bonferroni, but were perhaps genuine MTAs, because they actually confirmed earlier reports (5 for GZnC to 7 for GPC), thus vindicating our conclusion that Bonferroni correction really leads to overcorrection, resulting in many false negatives. For instance, after application of Bonferroni correction to the SLST results, no MTAs were available for two traits (TNPP, GFeC), although six MTAs for TNPP were reported in two earlier studies, and eight MTAs for GFeC were reported in three earlier studies. The identification of 35 MTAs for TNPP and 33 for GFeC through MLMM also suggests that SLST results are confounded due to genetic background that was ignored. Similar results with respect to Bonferroni were also available using MTMM involving 11 pairs of correlated traits, where Bonferroni correction reduced the number of MTAs from 1253 to 28 MTAs and that too for only 3 of the 11 pairs of correlated tarits, once again suggesting overcorrection leading to false negatives due to Bonferroni correction. Similarly, a total of 97 MTAs were detected using epistatic interaction for 6 traits (GNPS; 18, TGW; 2, GYPP; 31, GPC; 16, GBCC; 17, GZnC; 13); these MTAs were not detected using MLMM (Table S5 and S9). Similarly, 666 MTAs were also detected using epistasis for eight trait combinations (GYPP/GNPS/TGW/GZnC; 126, GNPS; 56, TGW/GBCC/GZnC; 42, GYPP/GNPS/TGW/GPC/GBCC/GZnC; 146, GNPS/TGW/GPC/GBCC/GZnC; 92, TGW/GPC/GBCC/GZnC; 58 and GPC/GBCC/GFeC/GZnC; 58). These MTAs were also not detected using mvLMM for the same traits combinations. Similarly, 666 additional MTAs which were also detected using epistasis but for different trait combinations were also perhaps false negatives and hence were not detected for the same trait combinations when analysed using mvLMM. We can thus conclude that the commonly used SLST approach with or without Bonferroni correction is a very inefficient approach for GWAS. In SLST, MTMM and mvLMM, the results are also confounded due to the genetic background. This issue is addressed in MLMM and therefore, MLMM is certainly an improvement over SLST.

MTAs each controlling two or more than two correlated traits also deserve attention, although it is not easy to find out whether such multi-trait MTAs are due to pleiotropism or due to close linkage. Nevertheless, identification of such MTAs will help not only in the study of genetic architecture of correlated traits, but also in simultaneous improvement of more than two correlated traits using these MTAs for correlated traits for MAS.

Although MTMM gives useful information involving identification of common MTAs for pairs of correlated traits, its major limitation is that MTAs can be identified for only two correlated traits at a time, so that mvLMM approach was developed to overcome this limitation and to allow multi-trait analysis involving more than two correlated traits. MTAs detected using mvLMM involved 33 SNPs associated with eight combinations of traits (Table 3). The results of MTMM and mvLMM together suggest wide occurrence of pleiotropy or close linkage. The MTMM approach for identification of multi-trait MTAs using GWAS has also been used in two earlier studies in plants, one in wheat (Jaiswal et al. 2016) and the other in Arabidopsis (Thoen et al. 2017). It has, however, been utilized in several studies in animal systems including humans and cattle (Heid and Winkler 2017; Bolormaa et al. 2014; Crispim et al. 2015; Pausch et al. 2016). For mvLMM, the present study is perhaps the first study in plants, although it was earlier used in mouse (Zhou and Stephens 2014) and humans (Furlotte and Eskin 2015).

MTAs identified using more than one approaches, i.e. SLST and MLMM or MLMM and MTMM or SLST and MTMM can be accepted with a higher level of confidence. During the present study, 22 MTAs involving three traits (GNPS, TGW and GYPP) were common between SLST and MLMM, 12 MTAs involving three pairs of traits (TGW/GNPS, GPC/GZnC and GYPP/TGW) were common between MLMM and MTMM (Table S8). These MTAs can be used for MAS with a higher level of confidence.

Some novel MTAs

A large number of MTAs identified using three different methods of analysis during the present study were novel. As many as 222 novel MTAs were identified using SLST and MLMM taken together; of these 70 MTAs were for nutritional traits identified using MLMM only (none thorugh SLST) and 152 for yield-related traits were identified using SLST and MLMM (Tables S4, S5). This suggests that many MTAs remain to be identified. These MTAs are new additions to the already available MTAs for a variety of traits in wheat (Tables S4, S5 and S7).

Epistasis is often overlooked in GWAS

Knowledge of epistasis is also necessary to understand the genetic architecture of a trait (Boone et al. 2007; Phillips 2008; Jaiswal et al. 2016). If this information is not available, it may lead to under-utilization of genomic information for crop improvement (Wang et al. 2011; Jaiswal et al. 2016). However, such epistatic interactions have been seldom examined during GWAS despite their wide-spread occurrence (Reif et al. 2011; Kao et al. 1999; Langer et al. 2014; Jaiswal et al. 2016). In the present study, 73 important epistatic interactions were detected for six traits. Six of these epistatic interactions (one for each of 6 different traits) were highly significant (on the basis of lowest P value) and included the following: (1) M3424-7A × M15107-3B) for GNPs, (2) (M519-6B × M71-6B) for TGW, (3) (M10613-2A × M12763-3B) for GYPP, (4) (M61-5A × M3743-7B) for GPC, (5) (M2908-2A × M5863-3B) for GBCC and (6) GZnC (M8512-7A × M9308-1D) for GZnC (Table S9).

Epistatic interactions in wheat using GWAS have been reported in several earlier studies for flower time (Reif et al. 2011; Langer et al. 2014) and stem rust resistance (Yu et al. 2011). In a recent study from our own laboratory also, 63 epistatic interactions involving 13 different traits in wheat were reported using GWAS (Jaiswal et al. 2016). Interval mapping involving biparental populations have also been used for detection of epistatic interactions (Li et al. 2011; Xu et al. 2012). Pairs of markers involved in each of the reported epistatic interactions including those detected in the present study may be useful for molecular breeding in wheat.

G × E interactions

In analysis of variance (ANOVA) G × E interactione were found to be significant for all traits except GPC (Table 2), However, specific QTL × E interaction effects were not estimated. However, the location and year effects were apparent from the availability of QTL that were location/year-specific. Such location-specific QTLs can be used for breeding varieties for specific location. Also since one of the two locations (i.e. Powerkheda) used in the present study is known to experience drought and high temperature, the QTL identified at Powerkheda (absent at Meerut) may be useful, while breeding for tolerance against abiotic stresses (drought and high temperature).

Markers for marker-assisted selection (MAS): biofortification

For the purpose of selection of superior individual plants for enhanced micronutrient content in a segregating population is time-consuming and labour-intensive. Keeping this in view, the goal of the present study was to find markers to be used for MAS or MARS for improvement of not only the nutrition traits but also the yield-related traits. Although a large number of MTAs were detected in the present study, nine major MTAs for three traits were considered to be the most important (using criteria mentioned in Results). Three of these nine markers can be used for GYPP, and other six markers can be used each for selecting two traits together (GPC and GFeC; see Tables 3, 4 for details). For other traits, M14605 is important for GZnC; M167 and M3471 for GBCC; M401 and M4501 for TGW; M13270 for GNPS and M15558 for TNPP (Table S4, S5 and Fig. 3).

Conclusions

Molecular breeding involving MAS has now become a component of conventional plant breeding, particularly for improvement of complex quantitative traits (QTs). A prerequisite for molecular breeding is the availability of markers associated with the targeted QTs. The two approaches, which have been extensively used for this purpose, include interval mapping and association mapping (GWAS), both having their merits and demerits. While markers detected through interval mapping have already been put to use in many crops including wheat, we don’t have many documented examples of the use of the results of GWAS for this purpose. One classical example is the improvement of provitamin A in maize (Suwarno et al. 2015). However, interval mapping has a serious limitation of using limited genetic variability, available between two parents of a biparental population. This limitations is addressed in GWAS through the utilization of almost complete genetic variability, although it does have other limitations, which are being addressed (Gupta et al. 2014). The present GWA study made use of an association panel including diverse global spring wheat germplasm, with a particular emphasis on the study of genetics of the contents of micronutrients like Fe, Zn and β-carotene. A large number of MTAs were detected, which can be used by wheat breeders, after due prioritization and validation. The present study is one of the first few studies, where four different approaches were used for GWAS. The analysis has been undertaken, resulting in more reliable and useful information. The results of the present study also demonstrated that GWAS is a powerful tool for genetic dissection of complex traits, if newer approaches are utilized. The results obtained in this study should prove useful not only in molecular breeding, but also for further studies thus giving direction to future research in the field of association mapping.