Introduction

Grain protein content (GPC) is the most important quality trait in bread (Triticum aestivum L.) and durum wheat (Triticum turgidum L. ssp. durum), determining the nutritional value and the baking quality as well as the pasta-making technology characteristics of flour and semolina. Mature wheat grain has a low amount of storage proteins compared to grain legumes, generally ranging from 10 to 20%. During the past 3 decades, the increase in GPC has mainly been achieved by intensifying nitrogen (N) fertilization. Recently, however, the high costs of N fertilizers and the detrimental environmental impacts of nitrates loss from the soil have stimulated interest in the possibility of reducing the amount of N applied to cereal cropping systems while maintaining the high productivity of modern cultivars. The development of new wheat varieties with high GPC has, therefore, become a breeding priority).

The possibility of improving GPC and yield should be considered in the light of the major recent advances in the identification of genes that might play a key role in the N uptake from the soil, and of those controlling the enzymes of amino acid metabolism, potentially involved in N transferring to the protein in the grain. Nitrogen metabolism is a quite complex pathway, involving several processes such as nitrate uptake, reduction, assimilation into amino acids, and translocation throughout the plant. A number of genes and transcription factors are involved in these steps (reviewed by Balyan et al. 2016). Nitrate transporters play a central role in nitrate uptake. NRT2 (nitrate transporter 2) family members act as high-affinity nitrate transporters in roots and their interplay enables efficient use of soil nitrate when poorly available. Some authors found them to be involved in nitrate-regulated root development in Arabidopsis (Garnett et al. 2009). The up-taken nitrate is then first reduced by cytosolic nitrate reductase (NR) to nitrite, which is subsequently imported into the chloroplast and further reduced by nitrite reductase (NIR) into ammonium. Ammonium is then incorporated into organic molecules by the glutamine synthetase (GS) and glutamate synthetase (GOGAT) pathways (Lea and Azevedo 2007). Several studies have been carried out on these genes, pointing out a correlation with GPC QTL in durum wheat (Gadaleta et al. 2011, 2014; Nigro et al. 2013, 2014, 2016).

In addition to these central enzymes, others have been found to be somehow involved in N utilization, such as glutamate dehydrogenase, alanine aminotransferase, and asparagine synthetase, all involved in amino acid metabolism. Glutamate dehydrogenase (GDH) catalyzes the interconversion between glutamate and 2-oxoglutarate (Habash et al. 2007), while alanine aminotransferase (AlaAT) is involved in the synthesis and degradation of alanine (Shrawat et al. 2008). Asparagine synthetase (ASN) catalyzes the synthesis of asparagine (Asn) and glutamate from glutamine (Gln) and aspartate. Several ASN genes have been identified in durum wheat, and ASN1 is up-regulated in response to some mineral deprivation (Gao et al. 2016).

Besides genes and enzymes previously reported, which have a direct role in the nitrogen metabolism, there are others, working as regulatory and signaling factors when specific conditions occur (Xu et al. 2012). NIN-like protein (NLP7) is a transcription factor involved in regulation of nitrate assimilation in higher plants, as a key player in plant nitrate signaling (Castaings et al. 2009). Nitrogen remobilization related genes may also have an important role in N content and final yield. Over-expression of the cytosolic pyruvate orthophosphate dikinase (PPDK) gene was found to increase the seed weight in Arabidopsis (Taylor et al. 2010). Specifically, it accelerates N remobilization from leaves and thereby increases rosette growth rate and seed weight as well as N content. All these described genes and enzymes activities might then be a target for NUE crop improvement.

Grain yield and GPC are complex trait determined by multiple quantitative trait loci (QTL) that interact with each other and with the environment. Selecting for both traits simultaneously has been difficult in practical breeding programs, and little progress has been achieved in the last 30 years for developing new wheat lines with increased GPC and high or acceptable grain yield. GPC improvement has been hindered by the negative correlation with grain yield and yield-related traits in segregating populations and germplasm collections in all cereals grown under the same conditions of N availability (Simmonds 1995; Oury et al. 2003). Because of this inverse relationship, high yielding modern cultivars have generally lower GPC compared to older ones (Simmonds 1995). During the past decades, many QTL controlling GPC have been located on all wheat chromosomes in a number of studies conducted by linkage mapping based on biparental populations (recent reviews by Blanco et al. 2012; Giancaspro et al. 2016; Quraishi et al. 2017; Kumar et al. 2018) and by association mapping based on germplasm collections (Rapp et al. 2018). A major QTL on the short arm of chromosome 6B, accounting for up to 66% of the phenotypic variation, has been reported from the wild tetraploid wheat T. turgidum spp. dicoccoides, and the functional gene Gpc-B1 has been cloned by Uauy et al. (2006). Uauy et al. (2006) found, indeed, that the ancestral wild wheat allele of this gene/QTL encodes an NAC transcription factor (NAM-B1) that accelerates senescence and increases nutrient remobilization from leaves to developing grains. The introgression of Gpc-B1 has shown different effects on grain weight and yield in different genetic backgrounds, and a dedicated breeding effort was suggested to offset its potential negative effects (Brevis and Dubcovsky 2010). Most GPC QTL were detected in single mapping populations and/or in specific environmental conditions, and without considering the relationship with yield-related traits. However, some recent investigations have taken into account GPC and grain yield components simultaneously assessed on the same population to identify GPC loci without pleiotropic effects and/or not closely linked to gene for low yield-related traits (see references in Table S1).

Monaghan et al. (2001) suggested the analysis of residuals from regression of grain protein concentration on grain yield (grain protein deviation, GPD) to identify genotypes with higher grain protein concentration than would be expected from their GY. The primary components of GY are the number of productive spikes per unit area, the average number of kernel per spike, and the average kernel weight. The product of kernel number per spike and kernel weight is grain yield per spike (GYS). These yield components are quantitatively inherited, and significant correlations and coincident QTL were most frequently observed with GY, while the number of spikes per unit area was the least coincident yield component (e.g., Börner et al. 2002; Habash et al. 2007; Cuthbert et al. 2008). Kernel number per spike, kernel weight, and GYS show higher heritability than GY, and are used as selection parameters in the early breeding generations. As QTL analysis and genome-wide association study (GWAS) involve phenotypic evaluations of hundreds of lines included in segregating populations and germplasm collections, these yield components are often used as proxies for GY. Despite these studies described the calculation and experimental design to assess GPD, a few information is still available about its genetic and physiological bases. Thus, the identification of genetic sources of elevated protein content without negative pleiotropic effects would be useful for improving GPC and GY simultaneously.

So far, the objectives of this study were to: (i) identify candidate genes (CGs) for GPC and provide their precise map position on high-density single-nucleotide polymorphism (SNP)-based consensus maps; (ii) identify stable QTL/genes for GPC and GPD by the CGs and GWAS approaches using a tetraploid wheat collection coupled with the 90 K iSelect SNP genotyping array (Wang et al. 2014); (iii) identify molecular markers associated with stable high GPC QTL without decreasing grain yield. The identification of loci controlling GPC will provide information on the genetic relationships between grain storage proteins and yield-related traits and on the genetic resources available to breeders to improve the nutritional and technological properties of wheat products, as well as the opportunity to develop closely associated markers to be used in marker-assisted wheat breeding.

Materials and methods

Plant materials and field experiments

A total of 240 accessions of tetraploid wheat (Triticum turgidum L., 2n = 4x = 28; AABB genome) were grown in southern Italy at the experimental fields at Valenzano (Bari) for 5 years (2009, 2010, 2012, 2013, and 2014, hereafter, reported as V09, V10, V12, V13, and V14), at Gaudiano (Potenza) for 1 year (2013, hereafter reported as G13), and at Foggia for 1 year (2012, hereafter reported as F12). The collection included the accessions of seven T. turgidum subspecies: durum (130 accessions), durum var. ethiopicum (10), turanicum (20), polonicum (19), turgidum (16), carthlicum (14), dicoccum (19), and dicoccoides (12). The genotypes (number/name, pedigree, year of release, and country) were described in more detail in Mangini et al. (2018). Details about population structure and genetic diversity of the wheat collection, as investigated with SSR and DArT markers, are provided by Laidò et al. (2014), and with SNP markers by Marcotuli et al. (2015). The collection has been used for the association mapping of loci controlling grain yield components (Mangini et al. 2018) and nutritional compounds (Marcotuli et al. 2015, 2018; Colasuonno et al. 2017; Nigro et al. 2017a).

A randomized complete block design with three replications and plots consisting of 1 m rows, 30 cm apart, with 80 germinating seeds per plot, was used in the seven field experiments. During the growing season, 10 g of nitrogen per m2 and standard cultivation practices were adopted. Plots were hand harvested at maturity and GYS was determined dividing grain yield per row by the number of spikes per row (about 60–80 spikes). Grain protein content, expressed as a percentage of protein on a dry weight basis, was determined on a 2 g sample of whole-meal flour using near-infrared reflectance spectroscopy (Spectra Alyzer Premium, Zeutec Büchi, Rendsburg, Germany).

DNA extraction and SNP genotyping

Genomic DNA was isolated from fresh leaves using a method previously described by Sharp et al. (1988) and subsequently purified via phenol–chloroform extraction. DNA concentration, quality, and purity were checked by both agarose gel-electrophoresis and NanoDrop2000 (Thermo Scientific™). Genomic DNA of each accession was diluted to 50 ng/µL and then sent to TraitGenetics Laboratory, Gatersleben, Germany (http://www.traitgenetics.de) for the genotyping procedure. SNP genotyping was performed with the wheat 90 K Infinium iSelect array containing 81,587 gene-associated SNP markers (Wang et al. 2014) and following the manufacturer’s recommendations as described in Akhunov et al. (2009). The genotyping assays used an Illumina iScan reader and analyzed using Genome Studio software version 2011.1.

SNP discovery in candidate genes for grain protein content

Genomic sequences of putative GPC related genes were retrieved from EnsemblePlant database (http://plants.ensembl.org/), and then blasted against the available data set of SNP marker sequences reported by Akhunov et al. (2009). Markers aligned with at least 80% identity were considered as markers within the coding sequences of the GPC candidate genes. Furthermore, the BLASTn analysis was extended to contigs assembled in the chromosome survey-sequencing project (http://wheat-urgi.versailles.inra.fr/Seq-Repository) to identify additional SNPs flanking the GPC candidate genes. SNPs markers within the coding sequences of the GPC candidate genes were investigated by screening the variants table section of genetic variation reported for each considered candidate gene in EnsemblePlant database (http://plants.ensembl.org/).

Statistical analysis and association mapping

Each year-location combination was considered as an environment. Analysis of variance of GPC for each environment and for the combined analysis of environments was carried out using the MSTAT-C software. Phenotypic correlations were calculated for GPC between the environments and between GPC and GYS. The estimation of GPD was calculated by covariance analysis of GPC and GYS. Variance component estimates were used to determine genetic variance (σ 2G ), environmental variance (σ 2E ), and broad-sense heritability (h 2B   =  σ 2G /(σ 2G + σ 2E  + σ 2G×E ).

Mean values across replicates and mean values across replicates and environments of GPC were used in the GWAS for each environment and over environments, respectively. SNP markers with > 10% missing data points and with a minimum allele frequency (MAF) of less than 5%, and unmapped markers on the durum wheat consensus map (Maccaferri et al. 2014), were removed from the data matrix prior to marker–trait association analysis carried out by TASSEL v5.0 (http://www.maizegenetics.net). Associations between SNP markers and GPC were previously calculated using the most widely used statistical models to evaluate the effects of population structure (Q) and kinship (K): (i) the general linear model (GLM); (ii) the GLM including population structure as a covariate using the Q matrix derived from the principal component analysis as implemented in TASSEL (GLM + Q); (iii) the mixed linear model (MLM) based on the kinship matrix (MLM + K); (iv) the mixed linear model based on both Q matrix and K matrix (MLM + K + Q). The model MLM + K showed less deviations of the expected values from the observed values in the quantile–quantile (Q–Q) plots and was definitely used. For GWAS results, a threshold P value of 0.001 (−log10(P) ≥ 3.0) was used to declare significant marker–trait associations, while suggestive QTL were considered at the sub-threshold 2.5 < −log10(P) < 3.0 when supported by co-migrating CGs. In agreement to the linkage disequilibrium estimates determined by Laidò et al. (2014), the value of 8 cM was used as the support interval to declare significant SNPs associated with a QTL. QTL were considered stables when detected at −log10(P) ≥ 3.0 in at least three environments and in the mean across environments.

For the associations between CGs and GPC, the conservative Bonferroni correction for multiple testing was calculated by dividing P < 0.01 with the number of markers used in the analysis. Chromosome localization and map position of SNP markers were derived from the high-density linkage maps described by Maccaferri et al. (2014) for durum wheat and by Wang et al. (2014) for common wheat, used as reference maps.

Results

Phenotypic variation for grain protein content

The 240 accessions of the tetraploid wheat collection were evaluated for GPC in southern Italy in replicated field trials in seven environments. The analysis of variance revealed highly significant differences (P < 0.01) among genotypes in each of the seven environments (not shown); the combined analysis across environments, including 218 accessions evaluated in all seven environments, showed significant effects of environments, genotypes, and environment × genotype interaction (Table 1). Mean, range, standard deviation, genetic variance, and heritability estimates (h 2B ) for GPC for each trial are reported in Table 2. The phenotypic variations of GPC greatly varied among the seven environments: from 11.5% at G13 to 16.2% at V09; the GPC mean value across environments was 14.5% with values ranging from 12.1 to 19.4%. A normal frequency distribution of GPC was observed in each environment and in the mean values over the seven environments. Estimates of broad-sense heritability showed moderately high values in individual environments (from 0.76 at G13 to 0.91 at V10) and a relatively low value (0.60) over environments.

Table 1 Mean square from the combined analysis of variance for grain protein content in a tetraploid wheat collection evaluated in seven field environments
Table 2 Means, ranges, standard deviation (SD), coefficient of variation (CV), genetic variance (σ 2G ), and heritability (h 2B ) of grain protein content in a tetraploid wheat collection evaluated in seven environments

Correlations of GPC means between environments (Table S2) were all significant and ranged from 0.38 to 0.83, consistent with the strong environmental influence on phenotypic expression of GPC. Phenotypic correlation analysis between GPC and GYS, scored in the same collection and environments (Mangini et al. 2018) showed highly significant negative correlations at P < 0.001 in each environment (r value ranging from − 0.23 at G13 to − 0.62 at V12) and across environments (r = − 0.55) (Table S3 and Fig. 1). The consistent negative correlations indicated the need to consider the relationship between GPC and GYS to identify favorable QTL/genes affecting only GPC.

Fig. 1
figure 1

Relationship between grain yield per spike and grain protein content in the tetraploid wheat collection observed in the mean across seven environments

Association of candidate genes with grain protein content

A total of 14 genes published as candidate genes for GPC and/or yield-related traits (Habash et al. 2007; Nigro et al. 2013, 2014; Gadaleta et al. 2011, 2014; Krapp 2015; Tian et al. 2015; Gao et al. 2016) were considered in the current study for the gene-based association mapping: AlaAT, Fd-GOGAT, NADH-GOGAT, ASN1, GDH, NR, NIR, NRT2, NLP7, PPDK, and the four glutamine synthetase genes (GS1, GS2, GSe, and GSr).

The BLASTn analysis between the CG sequences and the whole set of wheat SNPs of the 90 K iSelect allowed to identify 81 SNPs with at least 80% similarity. No SNP was identified within the NLP7 sequence. Forty-three SNP markers were located on the consensus durum (Maccaferri et al. 2014) and/or bread wheat maps (Wang et al. 2014), respectively, while the remaining were unmapped and not further considered (Table 3). This enabled us to assign the CGs to specific wheat chromosome arm/groups: PPDK was mapped on 1BL, AlaAT on homoeologous chromosome group 4; Fd-GOGAT on homoeologous chromosome arms 2S; NADH-GOGAT on 3L; GDH on 5BL; ASN1 on group 5; NIR was mapped on 6BL chromosome; NR on homoeologous chromosome arms 6L and 6BS, and NRT2 on 6S. Different isoforms of GS family were mapped as follow: GS2 on 2L, GS1 on 6L, and GSe and GSr genes on homoeologous chromosome group 4 (Fig. 2). As indicated in Table 3, some markers were mapped in the common wheat consensus map but not in the durum one; in such cases, SNP markers co-migrating or mapping in the same contig of the CG sequence were used for the marker–trait association analysis. The minor allele frequency (MAF) ranged from 4.3% for PPDK to 45.5% for Fd-GOGAT.

Table 3 Chromosome localization of candidate genes for grain protein content on the durum (Maccaferri et al. 2014) and bread wheat (Wang et al. 2014) consensus maps, and minor allele frequency (MAF) in the tetraploid wheat collection
Fig. 2
figure 2

Schematic representation of A and B genome chromosomes of the durum wheat consensus linkage map (Maccaferri et al. 2014) with map positions of candidate genes and QTL for grain protein content (GPC, red bars) and grain protein deviation (GPD, green bars). Each chromosome map is represented by the first and the last SNP markers, and by an SNP marker every about 20 cM. SSR markers have been also inserted every about 20 cM to compare the consensus SNP map with the published SSR-based maps. Markers are indicated on the right side and cM distances on the left side of the bar. Black solid regions of the chromosome bars indicate the centromeric region. Candidate genes for grain protein content are indicated after the corresponding SNP located in the gene sequence (in blue) or in the same map position of the co-migrating SNP marker located in the same contig

Using the Bonferroni threshold to control for multiple testing, nine CGs were found significantly associated with GPC in the mean across environments and in at least three environments (AlaAT, NADH-GOGAT, ASN1, NIR, NR, NRT2, GS1, GS2, and GSr) (Table 4). While Fd-GOGAT-2B was significant only at one environment, GDH, NADH-GOGAT-3B, and GSe1 were not significant in any environment or in the mean across environments. Notably, AlaAT-4A, ASN1-5A, NR-6A, and GS2-2B were significant in all seven environments, explaining up to 14.9, 14.6, 24.3, and 14.9% of phenotypic variance, respectively.

Table 4 Regression analysis between candidate genes and grain protein content in the tetraploid wheat collection evaluated in seven environments

The regression analysis between the CGs and GPD detected significances in the mean across environments and in at least three environments for AlaAT, NADH-GOGAT, NIR, NR, NRT2, GS2, and GSe1, explaining 10.4% (NADH-GOGAT)–23.8% (NIR) of phenotypic variance in the mean over environments (Table 5). These results indicated the involvement of these CGs in the GPC without negative interaction with GYS. No statistical significances were found for the other CGs.

Table 5 Regression analysis between candidate genes and grain protein deviation in the tetraploid wheat collection evaluated in seven environments

QTL detection by genome-wide association study (GWAS)

The wheat collection was genotyped using the 90 K iSelect array including 81,587 gene-associated SNPs (Wang et al. 2014). After excluding SNP markers with > 10% missing data points and with a minimum allele frequency (MAF) of less than 5%, and unmapped markers on the durum wheat consensus map (Maccaferri et al. 2014), 17,003 SNPs were used for the marker–trait association. The performance of the four majors statistical GWAS models (GLM, GLM + Q, MLM + K, and MLM + K + Q) were initially evaluated for controlling the relative kinship and population structure of the wheat collection, and then to minimize the marker–trait association rate of false-positives (type I error). Inspection of the deviation of the observed −log10(P) values from the expected distribution in the Q–Q plots in each environment and across environments (Fig. S1) indicated that the models MLM + K and MLM + K + Q controlling relative kinship performed significantly better than GLM and the GLM + Q model controlling population structure for both GPC and GPD. MLM + K model produced slightly better results than MLM + K + Q, and was definitively used in the GWAS analysis.

Many QTL for GPC consistent in one or two environments were detected on all 14 chromosomes; these were considered environment-specific QTL and were not reported in the current work, because we were interested to identify important stable QTL across environments, to be used in marker-assisted breeding programs. Eleven stable QTL for GPC were detected at −log10(P) ≥ 3.0 in at least three environments and the mean across environments (Table 6). The identified QTL were located on the chromosome arms 2BS (two QTL), 3AL (two QTL), 3BL, 4AS, 5AS, 5BL, 6BL, 7AS, and 7BL. The QTL QGpc.mgb-5B.2 on 5BL and QGpc.mgb-3B on 3BL were consistent in six and five environments, respectively, and in the mean across environments. The additive effect of each QTL ranged from 0.65 to 1.91 units and accounted 5.1–8.7% of phenotypic variation across environments. Two additional QTL (QGpc.mgb-4B and QGpc.mgb-5B.1) were detected at −log10(P) ≥ 3.0 in 2–3 environments and were reported, because supported by co-migrating CGs (GSr1-4A and GDH-5B). Each QTL had additive effects in the same direction and was often significant at sub-threshold 2.5 < −log10(P) < 3.0 in the other environments.

Table 6 Marker–trait associations for grain protein content (GPC) and protein content deviation (GPD) identified by GWAS (model MLM + K) in the tetraploid wheat collection

To take into account the negative correlation between GPC and GYS (Fig. 1 and Table S3), the marker–trait analysis was carried out on GPD data for each environment and for the mean value across environments (Table 6). Using the same criteria to detect a significant stable QTL, the GWAS analysis on GPD confirmed the QTL on chromosome arms 5BL at 172.9 cM (QGpd.mgb-5B.3) and detected one additional QTL on 5BL at 54.4 cM (QGpd.mgb-5B.1). Two additional QTL for GPD (QGpd.mgb-4A and QGpd.mgb-5B.2) were detected in two environments and were reported, because supported by the candidate genes GSr1-4A and GDH-5B. QGpd.mgb-5B.2 at 141.6 cM on 5BL could coincide with QGpc.mgb-5B.1 detected at 146.5 cM. These four GPD QTL are expressed independently from variation of GYS, while all the other 10 QTL for GPC failed to show significant stable effects indicating a genetic or physiological association between GPC and GYS.

Discussion

Nitrogen metabolism is a very complex pathway involving several processes such as N uptake, reduction, assimilation, and translocation. The dissection and deciphering of these components is still a challenge for both scientists and breeders. As typical quantitative traits, NUE and GPC are influenced both by the actions of multiple genes and environmental factors. During the last decades, GPC improvement has been hampered by the tight negative correlation with grain yield in segregating populations and germplasm collections in all cereals grown under the same conditions of N availability (Simmonds 1995; Oury et al. 2003). Several physiological explanations for this general negative relationship have been proposed taking into consideration the dilution effect of grain nitrogen by carbohydrate compounds (Acreche and Slafer 2009) or the bio-energetic requirements for synthesis of carbohydrates and proteins (Bogard et al. 2010). Despite this general assessment, a number of bread (Sears 1998; Oury et al. 2003) and durum wheat lines (Clarke et al. 2005) showing increases in both GY and GPC have been selected. According to Sears (1998), it is possible to improve both GPC and GY simultaneously when an adequate source of genes increasing GPC is used in breeding programs. The use of GPD was proposed by Monaghan et al. (2001) to screen for increased grain protein concentration without a concurrent grain yield reduction. Several studies carried out in wheat demonstrated the high heritability of GPD and provided a new selection tool to improve GY and GPC at the same time (Bogard et al. 2010; Thorwarth et al. 2018). The balance between N remobilization and N up-taken post-flowering strongly affects both GPC (Kichey et al. 2007) and GPD (Bogard et al. 2010).

Nitrogen-related candidate genes and relationships with GPC

Studying the genetic variability existing in CGs involved in the processes controlling the relationship between C and N metabolism could be an efficient way to overcome or reduce the negative correlation between GY and GPC, and to use allele-specific markers for the selection of high GPC genotypes in wheat breeding. As proposed by Garnett et al. (2009), nitrogen metabolism-related enzymes play important roles in N utilization efficiency. So far, one objective of our study has been to characterize different N-related CGs and investigate their relationships with GPC.

By genotyping the previously described tetraploid wheat collection with SNP marker array (Wang et al. 2014) and performing a BLASTn analysis of the entire SNP data set against the CG sequences, 1–8 SNPs within the coding sequences of 13 out of 14 examined CGs were identified. These genes were precisely located on genetic map, given the recent availability of the high-resolution consensus maps of both durum (Maccaferri et al. 2014) and common wheat (Wang et al. 2014). Regression analysis was performed between each considered CG and both GPC and GPD. Interestingly, six of them (AlaAT, NADH-GOGAT, NR, NRT2, GS2, and NIR) were found to be associated with both traits.

Alanine aminotransferase is a ubiquitous enzyme that may influence N assimilation and remobilization. The transamination reaction carried out by this enzyme results in the reduction of glutamate and pyruvate in a reversible manner, providing dual functions in carbon and nitrogen metabolism. Successful transgenic approach to increase NUE has been obtained in oil seed rape (Good et al. 2007) and in rice (Shrawat et al. 2008) by genetically modifying plants for AlaAT gene. The same situation was also reported for NRT2 and NR genes. Both resulted to be associated with GPC and GPD by regression analysis in at least three environments and in the mean of all seven environments. NRT2 genes are high-affinity nitrate transporters and most of them require a nitrate reductase protein to mediate nitrate transport (Kotur et al. 2012). Specifically, among the several NRT2 transporters, the most active is NRT2.1 (Li et al. 2007). A series of genetic and physiological studies have shown that high-affinity complex NRT2.1–NR protein plays a central part in efficient N uptake under low availability (Krapp et al. 2011; Gu et al. 2013). Tauleumesse et al. (2015) showed a significant positive correlation between the expression patterns of the high-affinity NO3 transporter family member TaNRT2.1 and the nitrate reductase TaNR in wheat at different concentration of nitrate.

Moreover, Zhao et al. (2013) reported a transgenic approach to overexpress a tobacco NR gene in wheat, which resulted in an increased seed protein content, without increasing N fertilization. Our analysis showed also a significant and positive additive effect of both genes on both GPG and GPD, but not on GYS. These results support our finding in underlying the possibility of using NRT2.1–NR complex as a breeding target to improve wheat NUE, yield, and grain quality.

NADH-GOGAT gene has been long considered one of the major candidate genes for cereal NUE, as supported by a cross-genome ortho-meta QTL study carried out by Quraishi et al. (2011). Transgenic approaches have been used in both wheat and rice (Yamaya et al. 2002; Lu et al. 2011), finding that its over-expression led an increase in grain weight (80% maximum), and on the contrary, its suppression reduced yield per plant and thousand kernel weight, phenotypic indications of nitrogen starvation, supporting that NADH-GOGAT is, indeed, a key step in nitrogen utilization and grain filling. Association of NADH-GOGAT and GPC was also reported in durum wheat (Nigro et al. 2017b). A different situation was shown for GS2 genes. We found the homoeologous plastidic GS2-2B and GS2-2A being associated with GPC and GPD, respectively. Among all N metabolism-related genes, glutamine synthetase, which catalyzes the first step in assimilating inorganic N into organic compounds, has been one of the most extensively studied and has been used to improve NUE in wheat (Thomsen et al. 2014). Plastidic GS2 has been shown to be involved in re-assimilation of N during photorespiration (Tabuchi et al. 2007). Correlation studies from Kichey et al. (2007) provided evidence of the role of the GS enzyme in the control of NUE. In addition, a number of QTL related to GY and GPC co-localizing with structural genes encoding plastidic GS2 (Gadaleta et al. 2011; Bordes et al. 2013) were identified. As reviewed by Thomsen et al. (2014), the effects of overexpressing GS on NUE-related physiological traits, biomass, and yield have generally been inconsistent. However, a very recent paper of Hu et al. (2018) showed that transgenic expression of a specific TaGS2 allele increased flag leaf GS activity, leaf functional duration, N uptake, grain N concentration, nitrogen harvest index, harvest index, grain yield, and yield components under both low N and high N conditions, suggesting that the control of GPC- and NUE-related processes may be specific, depending both on the species examined but most of all on the genetic variability within the species, and then, on single genotype (Simons et al. 2014; Nigro et al. 2017a). Besides the evaluation of CGs associations with both GPC and GPD, regression analysis was also carried out between candidate genes and grain yield per spike (data from Mangini et al. 2018) in the tetraploid collection (Fig. 3). The most interesting genes were, indeed, those having a significant and positive effect both in GPD and GYS, or at least significant and positive for GPD and not significant in GYS, meaning that an increasing in protein content is not negatively correlated with yield. Interestingly, the six CGs previously discussed, being associated with both GPC and GPD, reflected this situation. Specifically, AlaAt-4A, NADH-GOGAT-3A, NIR-6B, NR-6A, NRT2-6A, and GS2-2A showed a positive and significant additive effect on GPD (and GPC), but had no significant effect on GYS. GSe1-4A showed a positive additive effect only on GPC but not on GPD, but anyway, had no effect on GYS. A different situation was observed for the other five genes, ASN1 (both 5A and 5B homoeologous), GS1-6B, GS2-2B, and GSr1-4B. These genes, belonging to complex gene families, showed inverse significant additive effects for GPD and GYS, according to the known negative correlation existing between the two traits.

Fig. 3
figure 3

Additive effect of candidate genes for grain protein content (blue bars), grain protein deviation (yellow bars), and grain yield per spike (green bars) identified by regression analysis in the mean values across seven environments in a tetraploid wheat collection. The effect refers to the allele with a positive effect on grain protein content. *, **, and ***Significant at P ≥ 0.05, P ≥ 0.01, and P ≥ 0.001, respectively, using the Bonferroni threshold to control for multiple testing

Stable QTL for grain protein content detected by GWAS

Objectives of the current study were the identification of stable QTL for GPC, and to analyze the genetic basis of the negative relationship between GPC and GYS in a tetraploid wheat collection, evaluated in seven replicated field trials in southern Italy. The different order of magnitude of the mean square of the environments in comparison to that of the genotypes (Table 1) indicated the high influence of annual environmental variations on the GPC. The GPD has been also considered for identifying QTL for high GPC not associated with low grain yield. The wide phenotypic variation for GPC observed in each of the seven environments (Table 2) can be attributed to the composition of the T. turgidum collection including wild and semi-domesticated accessions, landraces, and modern durum cultivars. The genome-wide association study with 17,003 SNP-derived genes, conducted by the GLM and the three widely adopted MLM models (MLM + Q, MLM + K, and MLM + Q + K), indicated the MLM + K as the most suitable model for the association mapping analysis for both GPC and GPD, thus, confirming the previous results of GWAS on grain yield components and grain quality traits carried out on the same wheat collection (Colasuonno et al. 2017; Nigro et al. 2017b; Mangini et al. 2018), and supporting several previous studies on the suitability of the MLM + K model for reducing potential spurious marker–trait associations (see review by Gupta et al. 2014).

Eleven stable GPC QTL, distributed on nine chromosomes of the A and B genomes, were detected at −log10(P) ≥ 3.0 in at least three environments and the mean across environments (Table 6 and Fig. 2); two additional QTL, co-locating with two candidate genes, were detected in two environments.

The high-resolution consensus linkage map of durum wheat (Maccaferri et al. 2014), used in the current study as reference map for SNP markers associated with GPC QTL, includes both SNP and SSR markers, and that made possible the comparison of the detected QTL with previously reported QTL in the literature. Several QTL for GPC have been mapped on all bread and durum wheat chromosomes (see reviews by Blanco et al. 2012; Quraishi et al. 2017; Kumar et al. 2018; Rapp et al. 2018). A summary of 22 peer-reviewed papers on GPC QTL that considered yield-related traits in the same experiment is reported in Table S1. Many QTL were detected in single mapping population and may be attributed to the different genotypes of the parental lines, the high number of QTL/genes controlling GPC, epistatic gene actions, marker density of the linkage maps used in QTL analyses, and the statistical models used for detecting marker–trait associations. Many other QTL were detected in individual environments; these can be considered “adaptive” QTL as expressed under specific environmental conditions or in accordance with the level of an environmental factor (Collins et al. 2008). Most QTL for GPC identified in the current study validated QTL previously detected in different genetic background (Table S1). Interestingly, the detected QTL on chromosome arms 3BL (QGpc.mgb-3B) and 5BL (QGpc.mgb-5B.2) were detected in 5 and 6 out of 7 environments, respectively, and in the mean across environments. QGpc.mgb-5B.2 was also expressed as QTL for GPD (QGpd.mgb-5B.3), meaning that it can increase GPC without negative effect on grain yield per spike. QGpc.mgb-5B.2 maps in the same 5B region of a GPC QTL previously described by Zanetti et al. (2001), Habash et al. (2007) and Wang et al. (2012).

The GWAS analysis on GPD detected one QTL on 4AS and three QTL on 5BS and 5BL (two QTL) chromosome arms. Three of these QTL, which were independent from grain yield per spike, were coincident with three GPC QTL (QGpd.mgb-4A, QGpd.mgb-5B.2, and QGpd.mgb-5B.3). The QTL QGpd.mgb-4A was previously identified by Groos et al. (2003) and Habash et al. (2007) by GLM and MLM analyses, and by Blanco et al. (2012) and Wang et al. (2012) by conditional QTL mapping. The QTL QGpd.mgb-5B.1 at 54.4 cM on 5B chromosome was also detected by the other authors by (Groos et al. 2003; Habash et al. 2007; Wang et al. 2012; Mahjourimajid et al. 2016) in biparental mapping populations, and in a Central European durum wheat collection (Rapp et al. 2018).

We positioned QGpd.mgb-5B.2 and QGpd.mgb-5B.3 on the long arm of chromosome 5B at 31.3 cM from each other (141.6 and 172.9 cM, respectively). QGpd.mgb-5B.2 was found to be co-localized with the candidate gene GDH. This region has been reported to carry at least one GPC QTL not associated with yield-related traits by Zanetti et al. (2001), Habash et al. (2007), Suprayogi et al. (2009), Wang et al. (2012), Bordes et al. (2013) and Mahjourimajid et al. (2016). However, the different linkage maps with different types of molecular markers and coverage do not enable a reliable QTL comparison, and it is hard to ascertain if the above studies detected one or two QTL on 5BL chromosome arm.

The lower number of putative QTL for GPD compared to the number of QTL for GPC can be partly attributed to the negative correlations observed between GPC and GYS in each of the seven environments and across environments (Fig. 1 and Table S3). A higher number of putative QTL for the primary traits than for the mathematically derived traits were previously found by Wang et al. (2012) in maize and by Rapp et al. (2018) in wheat. Wang et al. (2012) specifically studied the impact of mathematically deriving traits from primary traits, and concluded that derived traits had a lower detection power and a higher false discovery rate than for their primary traits.

Co-location of candidate genes with QTL

Among the detected GPC QTL, some were quite interesting as they co-localized with N metabolism-related genes. QGpc.mgb-2B.2 localized at 70.1 cM on 2BS chromosome arm of the durum consensus map (Maccaferri et al. 2014) in the same position of Fd-GOGAT-2B (Table 4). Despite the regression analysis on GPC and GPD detected no significant effect of this gene on these traits (it was found to be significant only for GPC in one environment), an SNP marker co-migrating with the gene was found to be significantly associated with GPC by GWAS at −log10(P) = 4.7, with an additive effect of 1.77 and explaining 8.7% of the phenotypic variance, the higher value reported in this study. In a recent study, Zeng et al. (2017) showed that an amino acid substitution in rice Fd-GOGAT genes caused by a single-nucleotide polymorphism led to an increased GPC, confirming its important role as a potential candidate in NUE improvement.

Another SNP marker, IWB9264, at 44.9 cM on 5AS chromosome arm, was found to be associated with a QTL for GPC, QGpc.mgb-5A. This SNP was found to be located 1.9 cM downstream to SNP IWB51581, which happened to be located in the coding region of one asparagine synthetase gene isoform. Asparagine is one of the main amino acids involved in N recycling and remobilization. Asparagine synthetase (ASN) was reported to have combined effect with GS in transportation of nitrogen-containing molecules, during N remobilization phase. Authors have reported its involvement in nitrogen remobilization in flag leaves during grain filling in wheat (Curci et al. 2018), suggesting the role of ASN genes in response to N status in crops.

Interestingly, two of the QTL detected with GWAS analysis confirmed the involvement of two candidate genes in GPC control, according to the result obtained with regression analysis. Specifically, peak marker IWB14495 identifying QGpc.mgb-3A.1 is localized on 3AL chromosome, at 72.2 cM of the consensus map, in the same position of SNP marker IWB71028, localized in the coding region of NADH-GOGAT gene (Table 4). NADH-GOGAT has been reported co-localizing with an important QTL for NUE in a META analysis study (Quraishi et al. 2011). In addition, Salse et al. (2013), proposed that this gene could be used to improve grain filling either using genetic manipulation or by selecting the best alleles, both in wheat and in other cereals. In the previous studies on durum wheat, we also found a strong correlation between NADH-GOGAT gene expression and grain protein content (Nigro et al. 2013, 2017a). However, the most interesting result was reported for the SNP marker IWB7702, localized 1.6 cM apart from SNP IWB14244 on chromosome arm 6BL, which was found in the coding region of NIR-6B gene. This locus was, indeed, found to be significantly associated with QTL for GPC both by regression analysis and GWAS. Regression analysis between candidate genes and GPD resulted in the NIR-6B gene being highly significantly associated with the trait, (−log10(P) = 13.1, with an effect of 1.18 and explaining 23.8% of the phenotypic variance, Table 5). Nitrite reductase gene was also found to have positive and significant additive effect on both GPC and GPD, but none on GYS, making it a suitable candidate gene to be considered for improving GPD in wheat.

As previously reported, GWAS analysis carried out on GPD identified four QTL, three of which coincident with QTL for GPC. Two of them co-located with N-related CGs. Peak marker IWB74057 identifying QGpd.mgb-4A (coincident with QGpc.mgb-4A) is localized on 4AS chromosome, at 39.0 cM of the consensus map, in the coding region of GSr1-4A gene. The homoeologous gene GSr1-4B was found to co-localize with a detected QTL for GPC, QGpc.mgb-4B, but not with GPD QTL. Peak marker IWB6889, on 5BL, identified QGpd.mgb-5B.2, 1.9 cM upstream GDH-5B gene. 4.9 cM upstream, another GPC QTL was identified, QGpc.mgb-5B.1, co-locating with the same CG.

Comparison between candidate genes and GWAS analyses for QTL detection

The CG and GWAS approaches have been widely utilized for association mapping studies in a number of higher plants including the major cereal crops, such as rice, wheat, and maize. The CG strategy is often utilized in the identification of structural genes/QTL or regulatory genes involved in a specific metabolic pathway, while the GWAS has been generally used to identify the genomic regions controlling a quantitative trait. In the current study, we used both approaches to identify genes/QTL involved in the control of the grain protein content in wheat. The SNPs located in the sequences of the candidate genes NADH-GOGAT, ASN1, NIR, and GSr1 were significantly associated with GPC by both GWAS and CGs. However, the SNP markers of the genes AlaAT, NR, NRT2, GS1, and GS2 were found to be significantly associated with GPC QTL only by GLM, while SNP markers of Fd-GOGAT and GDH genes were consistent only in the MLM analysis. PPDK was excluded from the regression analysis because of its allele frequency lower than 0.05. Similar results were obtained in some recent investigations on different crop species (see review Gupta et al. 2014). In rice, Zhao et al. (2013) consistently detected the semi-dwarf gene, SD1, by both the GLM and MLM models, while several SNPs near other height-controlling genes, such as OsBAK1 and DGL1, were only detected by the regression analysis; they suggested that the MLM model might overcompensate for population structure and relatedness, leading to false negatives. The regression analysis generally has high false-positive rates because of the confounding effect of population structure and relatedness among individuals (Breseghello and Sorrells 2006). On the contrary, GWAS could effectively eliminate the excess of false-positives for most traits, but it also likely eliminated true positives (Gupta et al. 2014). In some situation, GWAS seems to be ignoring most of the important findings on the genetics and physiology of the traits of interest (Donnelly 2008). To overcome the limitation of both association mapping approaches and to increase the power and precision of QTL detection, the CG approach could be used following GWAS and select the CGs corresponding to some QTL for further study (Weng et al. 2011). Alternatively, the CG approach could be simultaneously used with GWAS as in a recent investigation in maize (Lipka et al. 2013) and in the current study. The combination of molecular biological knowledge and population genetic models has been recently proposed to increase statistical power of association mapping and to include GWAS results in a biological context (Marjoram et al. 2014).

Conclusions

The identification of SNP markers within the coding sequences of QTL/genes associated with GPC can contribute to elucidating the mechanism of synthesis and accumulation of storage proteins in the wheat kernels, to understand the relationship between GPC and GYS, and to exploit the genetic variability of wheat collection for the technological and nutritional improvement of end-finished products. The CG approach carried out in parallel with GWAS can overcome some limitations of both association mapping methods, thus reducing type I (false-positive) and type II (false-negative) error rates, and, therefore, increasing the power and precision of QTL detection. The detection of useful QTL/genes to be transferred in commercial cultivars by marker-assisted and/or genomic selection should be carried out taking into account the general negative relationship between GPC and important yield-related traits to identify favorable alleles for the improvement of GPC without compromising grain yield. Moreover, marker–trait association analysis should be carried out on phenotypic data measured in more environments for detecting stable QTL that can play a relevant role in breeding programs. In the current study, we detected 11 stable GPC QTL consistent in at least three environments, and in the mean across environments; these QTL had always additive effects in the same direction and were consistent at suggestive sub-threshold 2.5 < −log10(P) < 3.0 in the other environments. Interestingly, seven QTL were co-migrating with N-related candidate genes. Four QTL were found to be significantly associated to increases of grain protein deviation, indicating that selecting for GPC could not affect final grain yield per spike. These conclusions suggest that the identified SNP markers could be efficiently used in wheat breeding programs for developing new genotypes possessing a better nutritional quality and improved technological value of wheat grain.

Author contribution statement

DN, AG, and AB conceived and designed research, analyzed data and wrote, and reviewed and edited the manuscript. RS reviewed and edited the manuscript. GM worked on investigation, data curation, and visualization. PC, IM, AG, and SLG contributed on resources and data. All authors approved the manuscript.