Introduction

Global production of cereals has increased by around threefold since 1960 (FAO 2012) and is correlated with increased application of nitrogen (N) fertiliser. To date, the global growth in fertiliser demand is still positive as the demand for grain increases (FAO 2011). Thus, to sustainably enhance worldwide cereal production, it is necessary to increase production per N fertiliser unit.

Nitrogen use efficiency (NUE) is defined as grain yield divided by the available nitrogen. In bread wheat (Triticum aestivum L.), genetic progress on NUE-related traits has been assessed in various studies (Ortiz-Monasterio et al. 1997; Guarda et al. 2004; Muurinen et al. 2006; Cormier et al. 2013) and was mainly driven by selection on yield at a constant and high N level. This genetic progress should be at least maintained and preferably accelerated to deal with political, economic, and environmental concerns (Rothstein 2007; Pathak et al. 2011). Several promising ways to improve NUE have been proposed such as focusing on root architecture (Hirel et al. 2007; Foulkes et al. 2009; Kant et al. 2011) or on senescence and remobilisation (Gaju et al. 2011, Distelfeld et al. 2014). Although encouraging results have been obtained (Knyazikhin et al. 2013), phenotyping for NUE is still tedious as there are actually no high-throughput methods available (Manske et al. 2001; Tester and Langridge 2010). Moreover, G × N interactions have been observed on various agronomic traits (e.g. Le Gouis et al. 2000; Barraclough et al. 2010; Cormier et al. 2013) meaning that varieties may have to be tested in several N regimes. Thus, in a global context of fertiliser reduction, the ability to identify stable quantitative trait loci (QTL) controlling NUE-related traits and to implement this knowledge in breeding programs may condition a part of the future genetic gain. Various studies have already identified interesting quantitative trait loci (QTL) linked to N metabolism and response to N using biparental populations (e.g. An et al. 2006; Laperche et al. 2007; Habash et al. 2007; Guo et al. 2012; Xu et al. 2013). Originally developed in animal and human genetics, genome-wide association study (GWAS) is now used in numerous studies in crop species. Although GWAS has provided useful results in dissecting complex traits in wheat such as yield and its components (e.g. Crossa et al. 2007; Neumann et al. 2011), and yield response to nitrogen (Bordes et al. 2013), to our knowledge, this study is the first GWAS on NUE and NUE-related traits in small grain cereals.

GWAS overcomes the two main limitations suffered by biparental design of limited allelic diversity and poor mapping resolution due to limited recombination events during the creation of the population (Korte and Farlow 2013). However, the use of linkage disequilibrium (LD) to identify marker-trait association at the whole genome level has also some specific limitations. False positive association (Type I error) can easily arise from population structure. In addition, though the accumulation of recombination allows for a high-resolution mapping, it also decreases LD between causal mutation and markers, which in turn decreases the power of detection for a given number of markers. To deal with these major trade-offs, independent markers can be used to assess the relative kinship in the panel. This information is then used to control Type I error. The power issue can be solved by increasing the number of markers which is now possible with the use of wheat single nucleotide polymorphism (SNP) chips at relatively low cost (Wang et al. 2014).

In GWAS, results are mostly shown using simple Manhattan plots and there is no widespread method to well define associated chromosomal regions. Indeed, in a panel, the link between linkage disequilibrium and genetic or physical distance is much more complex than in a biparental population, where methods such as one LOD support interval or bootstrapping are commonly used to assess QTL confidence interval (e.g. Lander and Botstein 1989; Mangin et al. 1994; Visscher et al. 1996). Moreover, in strong LD regions, pairwise correlation between significant markers can approach genotyping accuracy rate. Thus, even with methods such as stepwise logistic regression to test whether a marker in a given set is necessary or sufficient to explain the association signals, finding the one likely to be closest to the causal mutation is nearly impossible (McCarthy and Hirschhorn 2008). Added to that, in high LD regions, the tested marker is correlated to many other SNPs that can contribute to the estimation of the kinship reducing the power of detection (Rincent et al. 2014). Thus, the most significant quantitative trait nucleotide (QTN) may not be the closest to the causal mutation. In low LD regions, it is possible that only one SNP is significant, and there is no simple way to define a region from the relationship of P value (P) with genetic/physical distance. In any case, P value depends on the QTL effect. This biases the P value support method of constructing “confidence interval” (Mangin et al. 1994). Thus, authors often fix a more or less arbitrary window around QTN peaks based on mean LD decay, for example, 1 Mb in maize for Tian et al. (2011), 200 kb in rice for Zhao et al. (2011), or 5 cM in wheat for Le Gouis et al. (2012). The method chosen to define an associated chromosomal region influences GWAS reliability and this issue remains under investigated.

Using 214 European elite varieties, 28 NUE-related traits, and 23,603 SNP, this study aimed to (1) estimate the power of such an elite panel to perform GWAS with respect to the method used to define associated chromosomal regions and false positive rate, (2) identify stable chromosomal regions involved in NUE-related traits and assess their transferability to the field, and (3) analyse colocalisations for NUE components and NUE-related traits to estimate pleiotropic effects associated with QTL-based selection.

Materials and methods

Phenotypic data

Phenotypic data are described in Cormier et al. (2013). Briefly, 225 European elite varieties were evaluated in eight environments defined as a combination of year, site, and nitrogen supply (two seasons, three sites, and two nitrogen supplies). The high N treatment corresponded to common agricultural practices. The low N treatment corresponded to a mean yield reduction of 20 % (Suppl data 1). Other crop inputs including weed, disease and pest control, potassium, phosphate and sulphur fertilisers, were applied at sufficient levels to prevent them from limiting yield. Plant growth regulators were applied to limit lodging in all environments. In each environment, 28 traits were measured or calculated (Table 1). From adjusted means by trial, overall adjusted means by varieties were computed using a simple linear model with environment and genotype as fixed effects. These values were used in the GWAS. Generalised broad-sense heritabilities (H 2G ) were calculated using the formula proposed by Cullis et al. (2006) from the previous linear model with genotype as a random effect.

Table 1 Description of measured and calculated traits assessed in all environments for which adjusted means by varieties were calculated on a 214-lines wheat association panel

Genotyping and consensus map

Of the 225 varieties present in field trials, 214 were genotyped. SNP data consisted of a subset of SNP from an Illumina 90 K chip (Wang et al. 2014) together with SNP developed by Biogemma. Heterozygous loci were considered as missing data. Loci with a minor allele frequency inferior to 0.05 or loci which had available data for less than 150 varieties were not used. In total, we used 23,603 mapped SNP in this study.

We built a consensus map with the Biomercator software (Arcade et al. 2004). We used the map published by Le Gouis et al. (2012), based on Somers et al. (2004), as a reference. This map contains SSR and DArT markers, and the location of several major genes (Vrn, Ppd, Rht). SNP was projected on it, from non-published maps containing 535 markers in common with this reference map. The Strudel software was used to check map alignments (Bayer et al. 2011) and mapping errors were corrected.

Linkage disequilibrium

We used the r 2 estimator (Hill and Robertson 1968) to assess linkage disequilibrium (LD). LD was calculated for every pair of markers mapped on the same chromosome, and then r 2 was plotted against map distance. For every chromosome, LD decay (cM) is estimated at the point where a curvilinear function proposed by Hill and Weir (1998) intersects the threshold of the critical LD. Critical LD was the 95th percentile of the unlinked-r 2 assessed on 100,000 randomly chosen pairs of unlinked loci (mapped on different chromosomes) which were square root transformed to approximate a normally distributed random variable (Breseghello and Sorrells 2006).

Association mapping study

Following Patterson et al. (2006), we did not find any structure in this 214-varieties panel. Indeed, the largest eigenvalue was not significant (P = 0.043). Thus, we tested SNP-trait association using a mixed model K (Yu et al. 2006) written in R using the ASReml-R package (Butler et al. 2009) and expressed as:

$$y = 1\mu + S\alpha + Zu + \varepsilon$$

where y is a vector of estimated genetic values, 1 is a vector of 1’s, μ is the intercept, α is the additive effect of the tested SNP, u is a vector of random polygenic effects assumed to be normally distributed \(N(0,\sigma_{y}^{2} K)\) with K a matrix of relative kinship, S and Z are incidence matrices, ε is a vector of residual effects.

K was estimated as 1(n × n) − R dist where R dist is the modified Rogers’ distance (Rogers 1972) matrix based on 3,461 SNP spread over the genome and with less than 0.1 missing data and 1(n × n) is a matrix of 1’s of the same size as the R dist matrix (n = 214).

To summarise, we tested 23,603 SNP on 28 traits using the adjusted means of 214 European elite varieties. There is no widespread method to define QTL boundaries from GWAS results. So, we proceeded as follows. First, for each trait, we computed LD between every significantly associated SNP (quantitative trait nucleotide—QTN). LD blocks were defined as a group of QTN belonging to the same LD cluster (clustering by average distance) using a cutoff of (1-“critical LD”). We define the initial QTL boundaries as the minimum and maximum map position of QTN belonging to the same LD block. Then, as previously described, we assessed LD between every mapped SNP within a window covering 10 % of the chromosome length and centred on each QTL. We used the LD decay to extend the previous boundaries. This second step aimed to take into account possible LD with the causal mutation at the first QTL boundaries (for detail Suppl data 2). We only defined QTL for LD blocks containing SNP mapped on the same chromosome. For each trait, QTL with overlapping boundaries were considered the same if the alleles increasing the trait value at each were themselves correlated positively.

Phenotype simulation and power

The statistical power provided by the panel was evaluated through simulation studies where −log10(P) thresholds, narrow-sense heritability and variance explained by a SNP were the three modulated parameters. We set −log10(P) threshold at 3, 4, 5, 6; narrow-sense heritability (h 2) at 0.3, 0.6, and 0.9; and variance explained by the SNP (π) at 0.010, 0.030, 0.050, 0.075, 0.100, 0.150, and 0.200.

Phenotypes were simulated as follows:

$$y_{i} = g_{i} + a_{ij} + \varepsilon_{i}$$
(1)

where y i is the simulated phenotype of the variety i, g i is the genetic additive background effect of variety i, a ij the additive effect at the quantitative trait nucleotide (QTN) j of variety i allele, and ε i a residual error term sampled from a normal distribution N(0, σε 2).

First, k = 100 SNP were chosen to simulate the genetic background effect. This selection is made by forming k-means cluster based on the genotyping incidence matrix and selecting the SNP nearest the centroid of each cluster (Lorenz et al. 2010). Thus, if g i is the genetic background effect of variety i:

$$g_{i} = \mathop \sum \limits_{k = 1}^{k = 100} a'_{ik} ,\quad \,a'_{ik} = \left\{ {\begin{array}{*{20}c} 1 \\ 0 \\ \end{array} } \right.$$
(2)

with a ik the effect of the variety i allele at the locus k.

Narrow-sense heritability (h 2) is defined by:

$$h^{2} = \frac{{\sigma g^{2} + \sigma j^{2} }}{{\sigma T^{2} }}$$
(3)

where σj 2 the genetic variance related to QTN j different from k, σg 2 the variance related to the genetic background, and σT 2 the total variance.

The variance explained by QTN j (π) is defined by:

$$\pi = \frac{{\sigma j^{2} }}{{\sigma T^{2} }}$$
(4)

Total variance (σT 2) is deduced from Eqs. (3) and (4) as h 2 and π are fixed in each simulation study:

$$\sigma^{2} = \frac{{\sigma g^{2} }}{{h^{2} - \pi }}$$
(5)

Given the percentage of variance explained by QTN j (π), its additive effect (a j ) is calculated by Falconer and Mackay (1996) as:

$$a_{j} = \sqrt {\frac{{\pi \times \sigma T^{2} }}{{p_{j} \left( {1 - p_{j} } \right)}}}$$
(6)

with p j the allele frequency of the reference allele at locus j. Thus, if variety i allele at QTN j was the reference allele, a ij from Eq. (1) was equal to a j , else a ij was equal to −a j .

Finally, the variance of the residual error term (σε 2) was computed as:

$$\sigma \varepsilon^{2} = \left( {1 - h^{2} } \right) \times \sigma T^{2}$$
(7)

In total, 400 SNP were randomly chosen to play in turn the role of the QTN j with j ≠ k (QTN ≠ genetic background effect) for each pair of h 2 and π parameter values. The statistical model used to detect associations between SNP and simulated phenotypes was the previously described model K. In the same way, QTL were defined following the two steps already described. Detection power was estimated by the ratio of the number of times a true QTN was located in the computed QTL to the total number of tests. The SNP selected as being the true QTN j was not tested per se.

Prediction

The percentage of total variance explained by each significant SNP was first assessed for each trait using a simple regression of overall adjusted mean on the SNP (r 2snp ). Then, for each trait, the predicted values of varieties were estimated by summing the allele effects assessed in GWAS at associated loci. To avoid redundancy, only one SNP per LD block was kept; that which explained the most variance.

This model was first used to predict overall adjusted means. It was then used to predict adjusted means in each of the eight individual environments. Consequentially, we computed two types of correlations (r 2): the correlation between predicted values and overall adjusted means (r 2adj ), and the correlation between predicted values and each of the eight individual environments (r 2env ).

To assess transferability of GWAS results to field trials, we calculated a prediction similarity [mean(r 2env )/r 2adj ] that we plotted as a function of trait heritability.

Colocalisation and network approach

To assess the impact of genetic correlation and pleiotropy, we analysed colocalisations through a network approach. QTL colocalisation between two traits were statistically tested using the probability of an hypergeometric law (“sampling without replacement”; Larsen and Marx 1985) with the total cumulative length of QTL for trait i and trait j and the total map length as parameters of the hypergeometric distribution. The cumulative length of QTL shared by trait i and j was the parameter of the probability. A fairly stringent threshold of P = 0.001 was set as the criteria of significance.

On the basis of significant colocalisations, inter-trait relationships were then studied through a network approach using traits as nodes and the percentage of one trait QTL overlapping another trait QTL as edges. Betweenness centrality was computed on each node following Opsahl et al. (2010) method with α = 0.5 to equally take into account the number of edges and edges’ weights in the calculation. To statistically test trait betweenness centralities values, this network was then permuted 500 times to assess the empirical distribution of betweenness centrality, and thus determine the statistical law underlying this distribution.

Results

Genetic map and linkage disequilibrium

The consensus genetic map obtained had a total length of 3,167 cM. To finely map QTL, LD has to decay rapidly and SNP density has to be high to ensure that at least one SNP is linked to the causal mutation. While diversity level is similar in the A and B genomes, it is greatly reduced in the D genome (Cadalen et al. 1997), contributing to its higher levels of LD. Indeed, mean LD decay on genome A, B, and D was, respectively, 0.52, 0.70, and 2.14 cM. LD decay is the estimated distance from which two SNP are not genetically linked, meaning that their LD (r 2) is inferior to the critical LD. Critical LD was estimated from a sample of 100,000 pairs of unlinked SNP which revealed a mean unlinked-r 2 of 0.016 and a critical LD (95th percentile) of 0.23.

A rapid LD decay predicts a good mapping resolution in GWAS. Though as previously mentioned, it can decrease power if SNP density is not sufficient. SNP density ranged from 0.7 cM−1 for chromosome 4D to 14.6 cM−1 for chromosome 7A (Table 2). On genomes A and B, SNP density seemed sufficient with respect to LD decay. On genome D, the lower SNP density may be compensated for by the higher LD, but QTL will be less precisely defined.

Table 2 SNP used in association: number of mapped SNP, coverage on the consensus map, SNP density and LD decay at a critical LD r 2 = 0.23. Critical LD was assessed as in Breseghello and Sorrells (2006)

Power assessment

Choosing a P value threshold has to balance the control of Type I error (false positive) with Type II error (false negative). Considering power simulation and the expectation of small-effect QTN, a −log10(P) threshold of 3 was adopted as a criterion for significant marker-trait associations. Indeed, a more stringent threshold inflated Type II error and thus reduced extremely the power of detection, notably on QTN explaining less than 10 % of the variance (Fig. 1). At a QTN heritability of 5 % and a narrow-sense heritability of 0.6, power was dramatically reduced from 55 % to 7 % when −log10(P) threshold increased from 3 to 6 (Fig. 1).

Fig. 1
figure 1

Influence of trait heritability and −log10(P value) threshold on the relation between locus heritability and power of detection in a 214-lines wheat association panel. In red, green, blue, violet, respective LOD score thresholds are 3, 4, 5, and 6. Square, triangle, and circle represent a respective narrow-sense heritability of 0.9, 0.6, 0.3

At a −log10(P) score threshold of 3, when the genetic variance explained by the locus was greater than 10 %, trait heritability did not affect power, and Type II error was reduced. In general, the variance explained by the QTN was the main factor that influences the power of the study as compared to trait narrow-sense heritability. It should be noted that with a weakly stringent threshold of 3, the power to detect an association for a QTN, which explained 5 % of the total genetic variance, was 48, 55, and 60 %, for a trait narrow-sense heritability of 0.3, 0.6, and 0.9, respectively.

GWAS results

Overall, 1,010 SNP were significantly associated (QTN) to at least one of the 28 studied traits. Considering QTN, LD blocks and LD around associated regions, 333 QTL were mapped with a mean size of 3.2 cM. Ninety percent (between the 5th and 95th percentile) of QTL had a range within 0.1–14 cM indicating that the method used to define QTL is mostly efficient. In few cases, the assessments of LD decay in the chromosomal region containing QTN may not correctly fit and QTL boundaries must be used with caution.

In agreement with SNP density and the genetic diversity, the number of QTL on genome D (42) was smaller than on genome A (142) and B (149). Homeologous group 2 maximised the number of QTL with 73 QTL. The number of QTL by trait ranged from 6 for NutE to 21 for %N_S (Table 3).

Table 3 QTL detected on a wheat association panel for 28 traits. QTL boundaries were defined as the minimum and maximum genetic position of QTN belonging to the same LD block (for LD blocks containing SNP mapped on the same chromosome) extended by the LD decay assessed on a window covering 10 % of the chromosome length centred on the mean genetic position. See Table 1 for trait abbreviations

Predictions

First, we assessed the variance explained by each significant SNP (QTN). Then, we predicted overall adjusted means and each of the eight environments’ adjusted means. On average, QTN explained 8.81 ± 4.79 % of the overall adjusted means (r 2snp ). On overall adjusted means, the best prediction (r 2adj ) was made on HI (Table 4). Using 20 SNP, we were able to explain 61.4 % of the genetic variation. Using 15 SNP on NUE, we were able to explain 55.7 % of the overall adjusted mean variation (Fig. 2) and 29.7 ± 4.9 % of the individual environment’s variation (Table 4). On the environments’ data (r 2env ), flowering date was the best predicted trait with 55.3 % of the variation explained on average.

Table 4 Summary of GWAS results predictions made by SNP (r 2snp ) and using the sum of SNP effect on both overall adjusted mean (r 2adj ) and on eight individual environments (r 2env ). To avoid redundancy, for each LD block, the SNP which maximised the genetic variance explained was selected
Fig. 2
figure 2

Prediction of NUE values as a function of overall adjusted mean for 214 wheat lines. Predictions were made summing the effects of 15 significantly associated SNP. The following regression function is also plotted: y = 0.86x + 2.66 (r 2 = 0.56; P < 0.001)

Differences between predictions made on overall adjusted means (r 2adj ) and predictions on individual environment values (r 2env ) resulted from genotype × environment interactions. Thus, it was linked to trait broad-sense heritability. In fact, the transferability of our GWAS results to environmental values was exponentially proportional to trait broad-sense heritability (Fig. 3). This means that GWAS results became rapidly powerless to predict phenotypic values as broad-sense heritability decreased.

Fig. 3
figure 3

Prediction similarity (r 2env /r 2adj ) between predictions made on overall adjusted means (r 2adj ) and the ones made on individual environment’s values (r 2env ) as a function of generalised heritability (H 2G ) of 28 traits. Means (diamond), standard deviations (whisker). Mean(r 2env /r 2adj ) = −0.39\({\text{e}}^{{\mathop H\nolimits_{G}^{2} }}\) (r 2 = 0.88; P < 0.001)

Colocalisation network

Altogether, the QTL covered 20 % (646/3,167) of the genetic map. There were colocalisations for 39 % of the QTL identified. Major regions of colocalisation were on chromosomes 1B, 2B, and 7A (Suppl data 3). Considering NUE and its two components, N uptake and N utilisation, there was no common QTL between NupEMat and NUE, but two NutE QTL (out of six) colocalised with NUE QTL and acted in the same way on both traits. NUE QTL (9/14) which colocalised with NutE_Prot QTL had opposite effect on these traits. By comparing QTL for the N uptake efficiency at flowering time (NupEFlo) and at maturity (NupEMat), we found that only one QTL was in common between these two traits.

Figure 4 provides a visual representation of the frequencies of QTL colocalisations. Using a bootstrap procedure with 500 permutations, it was assessed that the empiric betweenness centrality followed a gamma distribution (shape = 2.169, rate = 0.079; Suppl data 6). This distribution was used to test trait betweenness centrality. Four traits had a significant (P < 0.05) high betweenness centrality: INN_FLO, FLO, NutE, %N_Flo were ordered from the most significant to the less significant. We should notice that INN_FLO, %N_S, and FLO were not independent as we detected four chromosomal regions of colocalisations between these three traits. Two of them affected the three traits in the same ways. Two of them acted oppositely between FLO and the two other traits. All common QTL between %N_Flo and INN_FLO affected both traits in the same way.

Fig. 4
figure 4

Network of QTL colocalisations for 28 traits measured on a 214-lines wheat association panel. This network is based on the percentage of common QTL between traits after correction using a hypergeometric law to determine significant colocalisations (P < 0.001). Link thickness is function of the percentage of common QTL, from 5 % for the thinnest to 100 % for the thickest (values in Suppl data 5)

Discussion

QTL definition and power

In most studies, authors fixed a window around QTN peaks often based on linkage disequilibrium to define associated chromosomal regions in GWAS. However, massive variation of LD exists along the chromosomes in wheat (Würschum et al. 2013). In this study, we suggested a method based on LD between QTN and LD within the chromosomal region of interest and assessed its power of detection. This method had the advantage of being based on LD decay in the chromosomal region of interest. Moreover, authors focus on P value methods (ad hoc and post hoc) to control false positive rate, although the way they design their associated region influences it. Indeed, linkage disequilibrium between causal mutations and associated SNP or mapping error can lead to the construction of a chromosomal region which does not contain the causal mutation even though the SNP-trait association was real.

Regarding power simulation and error type II, we chose a −log10(P) threshold of 3 to validate SNP-trait associations. Our real false positive rate (error type I) was not only influenced by this −log10(P) threshold. Indeed, in our real error Type I, we should consider all QTL which did not contain the causal mutation whether the SNP-trait association was real or not. Using the results of the power simulation studies, we estimated our real false positive rate at 7 % (for a QTN heritability between 5 and 10 %; Suppl data 2). If we had chosen a −log10(P) threshold of 6, it would have been 3 %. Thus, increasing P value threshold reduced real error Type I for small-effect QTN yet drastically decreased power (Fig. 1). Moreover, for QTN with a heritability >10 %, a P value threshold superior to 3 slightly increased the real error Type I due to smaller QTL (Suppl data 2).

In GWAS, the real issue to control error Type I is not in the definition of a stringent P value threshold. It is in the development of a powerful method to define QTL boundaries, particularly in the case of GWAS oriented to gene discovery. This field has practically never been investigated and publications mainly focus on P value. We advocate balancing QTL coverage, real error Type I, and power altogether. An improvement of our methods could be to adapt the construction of the associated region to QTN heritability.

Power, locus heritability, and genetic determinism

The fraction of total genetic variance explained by a single significantly associated SNP (QTN) averaged 8.81 ± 4.79 % which is coherent regarding the simulation study. Indeed, the power started to be maximised from a locus heritability of 10 % (at a −log10(P) threshold = 3, Fig. 1). Yet variability existed and fraction of total genetic variance ranged from GPC (14.0 ± 8.7 %) to NHI (5.3 ± 2.9 %).

When numerous QTN explained a small fraction of genetic variance, we can presume that the GWAS study was powerful and that the genetic determinism underlying this trait is highly polygenic. When QTN have larger locus heritability, the cause can be a less polygenic genetic determinism and/or a lack of power due to low narrow-sense heritability. Narrow-sense heritability estimates the proportion of additive variance on total variance (Falconer and Mackay 1996). Thus, narrow-sense heritability is also linked to the importance of epistasis in the trait genetic architecture. In this study, we have not searched for epistasis. However, several studies have highlighted its impact. For example, GPC is controlled by major protein concentration genes (Payne 1987; Uauy et al. 2006; Avni et al. 2013) and significant interactions between them (Dumur et al. 2004; Conti et al. 2011; Plessis et al. 2013). Another example is epistatic contribution in the genetic control of PH is important as revealed by Novoselovic et al. (2004), Zhang et al. (2008), and Wu et al. (2010). Using a doubled haploid wheat population, Zhang et al. (2008) estimated first-order epistatic contribution up to 19.9 % of the PH phenotypic variation.

Authors have often focused on epistatic interactions between SNP having a significant additive effect. However, epistatic interactions between SNP without additive effect can also explain genetic variability (Huang et al. 2014) as detected for heading date (Le Gouis et al. 2012). Nonetheless, whole genome scan for epistasis is a real computational and analytic challenge, which will surely help pathways mining (Philipps 2008; Mackay 2014).

Candidate genes and comparison with previously published QTL

Altogether, we detected 333 QTL on 28 traits. Significant colocalisations (QTL boundaries overlapping) between some of them and candidate genes or previously published QTL deserve to be pointed out. Regarding major genes for precocity, only the photoperiod sensitivity gene Ppd-D1 on chromosome 2D colocalised with QTL of FLO, HI, INN_FLO, %N_FLO, %N_S, affecting all these traits in the same way (late genotype have higher HI, INN_FLO,  %N_FLO, and %N_S). Ppd-D1 also colocalised with an ADM_S QTL, with an opposite effect. Two factors can explain that Vrn genes were not associated to precocity: (1) this panel contains only winter wheat varieties and (2) only autumn trials were sown with vernalization requirements fulfilled.

On chromosome 4D, the dwarfing gene Rht-D1 (Rht2) was tested and had an expected significant effect on PH and ADM_S.

Similarly, the three closely mapped genes coding the glutenins and gliadins (Glu3A, Glu3B, and Gli) not surprisingly colocalised with QTL of NUE and NutE_Prot located on chromosome 1A. Moreover, the structural gene for high molecular weight glutenins GluD1 located on chromosome 1D lay within the boundaries of QTL affecting GNY, NTA, and NupEMat.

Several genes from the N assimilation pathway have already been associated to NUE QTL including the genes coding for glutamate synthase (NADH-Gogat) located in QTL on chromosome 3A, and 3B (Quraishi et al. 2011). On chromosome 3A, this colocalised with QTL of NFA, NupEFlo, and %N_S. On chromosome 3B, the NADH-Gogat gene colocalised with QTL of NUE_Prot, GPC, and ABSN. The gene for glutamine synthetase GS1 on 6A (Habash et al. 2007) colocalised with a cluster of QTL for EFFREMN, GPD, NutE_Prot, DMGY, and %N_S. Several publications already mentioned this region as affecting grain number per ear (Habash et al. 2007; Quarrie et al. 2005), NupEMat (An et al. 2006; Xu et al. 2013), root dry weight (An et al. 2006), %N_S and DMGY (Xu et al. 2013).

On chromosome 4B, a QTL of %N_S colocalised with numerous previously published QTL of nitrogen efficiency-related trait (An et al. 2006; Guo et al. 2012), glutamate dehydrogenase and glutamine synthase activity (Fontaine et al. 2009), harvest index (Xu et al. 2013), ears, spike, and grain-related trait (Quarrie et al. 2005; Habash et al. 2007; Laperche et al. 2007; Fontaine et al. 2009), and root morphology (Laperche et al. 2006). Previously published results were in part due to the presence of Rht-B1 (Rht1) in this chromosomal region. In our case, a diagnostic marker for Rht-B1 was tested and no significant effect was detected for any trait most probably because of the unbalanced allele frequencies of the combination of Rht-B1 and Rht-D1 (0.05, 0.65, 0.18, and 0.12 for the four allelic classes Rht-B1b/Rht-D1b, Rht-B1b/Rht-D1a, Rht-B1a/Rht-D1b, and Rht-B1a/Rht-D1a). The glutamine synthetase gene GSe (Habash et al. 2007) mapped using the SSR gpw7026 (Sourdille et al. 2004; Fontaine et al. 2009) was also within this QTL confidence interval and may be a good candidate gene to investigate.

On chromosome 2A, the Rbcs (Xpsr109) gene for the small subunit of the chloroplast photosynthetic enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) was located in a %N_S QTL, and has already been shown to colocalise with a QTL for N grain concentration (Laperche et al. 2006), and from a meta-QTL analysis on yield and yield-related traits (Zhang et al. 2010). Considering the small size of this QTL in this study (1.6 cM), and the link between N remobilisation and Rubisco subunit expression and degradation (Hörtensteiner and Feller 2002; Gregersen et al. 2008), Rbcs has to be considered as a good candidate gene.

Further investigations are needed on two promising regions where no obvious candidate genes were found within QTL boundaries. On chromosome 5B (gwm67-BCD351), a region linked to the INN_FLO colocalised with QTL previously published by Fontaine et al. (2009) on carbon percentage in flag leaf, and Habash et al. (2007) on nitrogen percentage in peduncle. As the nitrogen nutrition index (INN) refers to the minimum N concentration enabling maximum biomass growth (Justes et al. 1994), this confirms the effect of this region on nitrogen/carbon balance before remobilisation. On chromosome 7B (wPt-3530-wPt-7113), Laperche et al. (2007) published a QTL of %N_S which colocalised with one of this study affecting the same trait. This region also appeared in Laperche et al. (2006) as being linked to the lateral root number and the primary root length, and in Habash et al. (2007) for GNC.

Breeding strategies

As we worked on a panel composed of commercial varieties mostly registered between 1985 and 2010, results of this study have to be discussed in light of selection pressures. Although QTL have been detected, if favourable alleles are already fixed in the more recent varieties, those QTL are not so useful in future breeding.

As expected, favourable alleles are more frequent in recent varieties for QTL affecting traits under a high selection pressure than on QTL affecting untargeted traits. We estimated a positive correlation (P < 0.001; r 2 = 0.48) between the frequencies of alleles having a positive effect (in varieties released from 2005) and genetic progresses assessed by Cormier et al. (2013). Cormier et al. (2013) showed that in this panel of European elite varieties, NUE was increased by improving N utilisation (NutE: +0.20 % year−1) and remobilisation (NHI: +0.12 % year−1; %N_S: −0.52 % year−1) through a major positive selection pressure on grain yield (DMGY: +0.45 % year−1), while maintaining constant N uptake. In agreement, we found that for DMGY QTL, NutE QTL, and %N_S QTL, the median frequency of favourable alleles (in varieties released from 2005) was, respectively, 88, 68, and 79 % (Suppl data 7). Moreover, for a given trait, the frequency of alleles having a positive effect in recent varieties is directly linked to the genetic correlation between this trait and DMGY (P < 0.001; r 2 = 0.49; Suppl data 7). Thus, favourable alleles are already well represented in new varieties at QTL associated to traits directly (e.g. DMGY) or indirectly (e.g. NutE) targeted by breeding. This study has provided information to facilitate their monitoring.

Studying correlations between traits using QTL colocalisations rather than genetic correlations has the advantage of taking into account trait genetic architecture and the power with which we can dissect them. Moreover, it gives a better estimation of the pleiotropic effect of QTL-based selection on a trait. Indeed, the genetic correlation is symmetric (r a/b  = r b/a ), contrary to the percentage of QTL colocalising between two traits. For example, based on our detection, selection on GPC QTL will surely affect NUE_Prot as all GPC QTL are also NUE_Prot QTL. However, only 73 % of QTL for GPC would be affected by selection on NUE_Prot QTL.

Results of colocalisation analyses revealed that we should select on INN_FLO, FLO, NutE, and %N_Flo QTL to maximise the number of affected traits. As 57 % (4/7) of INN_FLO QTL and 50 % (4/8) of % N_Flo QTL were also FLO QTL, effect of phenology and pre-anthesis uptake are mixed. Thus, QTL controlling flowering time should be our first concern. Anthesis corresponds to a physiological transition and consequently, the date of this transition has a major impact on genotype × environment (G × E) interaction (Kamran et al. 2014). In this study, we observed an average genotypic flowering time standard deviation of 7 days. As varieties were tested in a small range of slightly contrasted environments, anthesis date directly affected G × E interaction and above all varieties’ genetic values, favouring genotypes adapted to these environments. This created a confounding effect of major phenology genes (Reynolds et al. 2009) which are more likely to be associated to agronomic traits.

None of the central traits (INN_FLO, FLO, NutE, and %N_Flo; Fig. 4) was linked to final N uptake. As mentioned before, recent breeding efforts improved N remobilisation and N utilisation, and not N uptake (Cormier et al. 2013). Thus, selection pressure enhanced N utilisation centrality in our network (Fig. 4). In this panel, the low genetic variance of the N uptake was not sufficient to reveal meaningful correlations with other agronomic traits and thus significant QTL colocalisations. Nevertheless, as a component of NUE, N uptake is a promising lever of action (Hirel et al. 2007; Foulkes et al. 2009). This study has provided tools to start selecting for N uptake in elite varieties without fastidious phenotyping or can be used as an entry point in investigating genes and pathways controlling this trait (Korte and Farlow 2013) with further investigations in a more diverse panel.

Results on QTL colocalisations highlighted the importance of focusing on pre-anthesis nitrogen status, especially on INN_FLO which had a good heritability (0.63) and for which QTL have also the same effect on TKW and NUE_Prot.

Conclusions

Identification of chromosomal regions associated with nitrogen use efficiency-related traits at both high N levels and moderate N will help breeding for better adapted varieties. To our knowledge, this work is the first published study that reports GWAS results on N use efficiency in small grain cereals using a high marker density for precise mapping of genomic regions. Using an LD-based method to define QTL boundaries, 333 QTL were identified on 28 traits. Several colocalisations between our QTL and previously published QTL were pointed out. Using a network approach on colocalisation frequencies between traits, this study highlighted the interest of working on N status at flowering, and underscores the effect of recent breeding on N utilisation efficiency.

Author contributions

Statistical analyses and manuscript were conducted by FC during his PhD thesis co-directed by SP and JLG. PD provided useful help on power assessment methods and interpretations. JLG, SL, and SP were implicated in methods, interpretations, and in reviewing the manuscript.