Introduction

Cereals are the most widely grown and consumed staple. The use of cereals for most food products is mainly due to the characteristics provided by gluten that accounts for 80% of total grain protein content (Rosell et al. 2014). Gluten is a complex mixture of proteins comprising of prolamins with gliadins and glutenins in wheat, with equivalents in barley and rye (Shewry et al. 1999). The gliadins are traditionally divided based on their mobility in acidic polyacrylamide gel electrophoresis at low pH into three groups: S-rich α-type gliadins (with 3 inter-chain disulphide bonds (ICDB), γ- gliadins (with 4 ICDB) and S-poor ω-gliadins with no ICDB. Similarly, glutenin subunits are separated into low and high molecular weight (L/HMW) subunits based on SDS-PAGE analysis (Payne et al. 1980). Wheat gliadins are encoded by medium to large multigene families (Payne et al. 1984a), including α-gliadins encoded by the Gli-2 loci on the short arms of group 6 chromosomes, γ-gliadins and ω-gliadins encoded by the Gli-1 loci (Gli-A1, Gli-B1 and Gli-D1) on the short arms of homologous chromosome 1, and are tightly linked to the Glu-3 loci coding for LMW-glutenin (Anderson et al. 2012; Payne et al. 1984a; Tatham and Shewry 1995), while HMW-glutenin encoding genes are on long arms of group 1 chromosomes (Glu-1 loci) (Payne et al. 1984b).

The γ-types are the most ancient Triticeae gluten prolamins that have been studied extensively in wheat (γ-gliadins), rye (γ-secalins) and barley (γ-hordeins) (Qi et al. 2009; Stenman 2011). Studies of the amino acid composition, and molecular mass analysis indicated that rye γ-secalins and barley γ-hordeins are closely related to wheat γ-gliadins. The γ-gliadins that contribute to the visco-elastic properties of the dough are mainly heterogeneous collection of 30–78 kDa monomeric proteins with poor solubility in dilute salt solutions and good solubility in 70% ethanol (Bietz and Wall 1972; Gellrich et al. 2003, 2005; Guo et al. 2012; Qi et al. 2009, 2013; Singh et al. 1990; Shan et al. 2002; Shewry and Tatham 1990; Wieser 2007). The number of γ-gliadins was preliminary estimated to be 15–40 (Anderson et al. 2001; Shewry et al. 2003) and in contrast to alpha-gliadins, only ~ 14% of which are pseudogenes in hexaploid bread wheat (Ohno 2013). From clinical point of view, γ-gliadins show a strong association with celiac disease (CD), a chronic inflammatory condition of small intestine triggered by the ingestion of gluten derived from wheat, barley and rye in up to 10% of the populations (Colomba and Gregorini 2012; Ferretti et al. 2012; Van den Broeck et al. 2009). The disease has high heritability, and shows a strong association with the human leukocyte antigen (HLA) class II DQ2/DQ8 molecules as a major genetic risk factor. The formation of disease lesion in small intestine involves in the activation of gluten-reactive CD4 (cluster of differentiation 4) T-cells. These T-cells recognize particular proline and glutamine-rich gluten peptides (CD-epitope cores) presented by the predisposing HLA-DQ2/8 molecules of antigen presenting cells (APCs), where transglutaminase2 (tTG2) enzyme converts certain glutamine (Q) residues to negatively charged glutamate (E) residues (Van den Broeck et al. 2009; Vaccino et al. 2009; Hausch et al. 2002; Sollid 2002). The resulting intestinal inflammation often causes symptoms related to malabsorption, but in many patients, extra-intestinal symptoms dominate, and in some others, the disease is clinically silent (Sollid and Khosla 2005; Bethune and Khosla 2012). However, the only available and effective treatment for CD patients is a lifelong gluten-exclusion diet. With regard to an immunogenicity response, several sets of CD-epitope cores (9-mer peptides) located on the first variable domain R1 (domain II) region of γ-prolamins were identified (Aggarwal et al. 2012; Altenbach et al. 2010; Goryunova et al. 2012; Gu et al. 2004; Kim et al. 2004; Lionetti and Catassi 2011; Meresse et al. 2012; Qi et al. 2009; Qiao et al. 2005; Salentijn et al. 2012; Shewry et al. 1992; Shewry and Tatham 2016; Sjoström et al. 1998; Stenman et al. 2010; Stepniak et al. 2005).

Search for either natural variants of cereals with different core length or structure in one hand and/or breeding cultivars through crosses with monosomic/nullisomic lines via conventional crossing and selection on the other hand may promise development of wheat varieties being non/less immunogenic (Anderson et al. 2001). Furthermore, evolutionary analysis of gluten seed proteins, L/HMW-GS with great protein structure and low evolutionary variations may have greater potential for improving wheat quality (Li et al. 2007; Wang et al. 2011). In addition, from the phylogeny and heterogeneity analysis of gliadins, it can be demonstrated that γ-prolamin multigene family is highly diverse and shows multiple sets of CD-epitope cores with high variability compared to the conserved dodecamer repeat, common to all α/β- and ω-gliadins in wheat and its closely related species (Goryunova et al. 2012; Noma et al. 2016; Qi et al. 2009).

Here the molecular population genetics and evolutionary pattern of γ-prolamin multigene family from wheat and its closely related species (γ-gliadins), rye (γ-secalin) and barley (γ-hordeins) were established, which have not been reported so far. The roles of selective forces that have driven the polymorphism of these duplicated genes at nucleotides and amino acids levels were assessed. Furthermore, we have introduced Triticeae genomes with low CD-epitopes content to develop the deficient cultivars in immunogenic γ-prolamins using conventional breeding and genetic engineering approaches.

Materials and methods

Sequence retrieval and alignment analysis of γ-prolamin multigenes

Orthologous of Triticeae γ-prolamin genes, wheat γ-gliadins, barley γ-hordeins and rye γ-secalins, were identified that contained 557 DNA sequences from 16 population sets (Table 1). CLUSTALW program (accurate) was used to carry out the multiple-sequence alignments at nucleotide level separately among the total DNA sequences of complete genes and pseudogenes from wheat γ-gliadin and their closely relates species; barley γ-hordeins and rye γ-secalins from GenBank at National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov) (McWilliam et al. 2013). Certain insertions were manually removed after sequence alignments prior to calculating the polymorphism parameters.

Table 1 The nucleotide diversity of 16 different species of Triticeae

In-silico identification of γ-prolamin celiac-immunogenic peptides

The total 16 DNA sequences of γ-prolamin genes, with about 99% identity, from sixteen different species of Triticeae were selected as the query sequences for further bioinformatics analysis. The DNA sequence data of the pseudogenes were excluded from in silico CD-epitope analysis. Accession numbers of the query sequences of DNA/protein sequences were JN849093/AFC98439 (Triticum aestivum L.), JQ269810/AFQ20244 (Triticum monococcum L.), FJ006563/ACJ03424 (Triticum turgidum Desf.), FJ006634/ACJ03495 (Triticum urartu Thumanjan ex Gandilyan), FJ0065715/ACJ03544 (Aegilops sharonensis), JQ269769/AFQ20207 (Aegilops speltoides(Tausch) A`Löve), FJ006712/ACJ03541 (Aegilops bicornis Forsk.), FJ006686/ACJ03515 (Aegilops searsii), FJ006648/ACJ03509 (Aegilops longissima Schweinf. et Muschl.), KF880536/AHJ60680 (Aegilops tauschii Coss.), JQ269731/AFQ20178 (Aegilops umbellulata Zhuk.), HQ875873/AEW46778 (Aegilops comosa), JQ269744/AFQ20184 (Aegilops uniaristata), JQ269704/AFQ20157 (Aegilopes markgrafii (Greuter) K.Hammer), HQ266703/ADP95511 (Secale cereale L.), JQ867079/AFM77738 (Hordeum vulgare L., respectively. CLUSTAL Omega program was used to carry out amino acid sequence alignment among total of 16 γ-prolamin query sequences (McWilliam et al. 2013). Based on a prior report on amino acids composition of the common γ-type CD-epitopes, the corresponding epitope position were identified in the amino acid sequences using MEGA 6.0 program (Tamura et al. 2013) and only perfect matches were considered.

Phylogenetic analysis

For the phylogenetic analysis, the sequence data of diploid, tetraploid and hexaploid Triticum sp., Aegilops sp., S. cereale and H. vulgare with their pseudogenes were retrieved from EMBL/Genbank (August 2011). A total of 461 γ-gliadins, 68 γ-hordein and 89 γ-secalins sequences were included in the phylogenetic analysis (Table 1). The sequence data of Oryza sativa prolamin with accession number of X60979 was considered as the outgroup.

Both the nucleotide and the deduced amino acid sequences of γ-gliadin data set were aligned using MEGA 6.0 (Tamura et al. 2013). The γ-prolamin phylogenetic tree for complete DNA sequence (CDS) was constructed by maximum likelihood method (MLE) under the Tamura and Nei 1993 (TN93) model (Saitou and Nei 1987; Tamura and Nei 1993) using a discrete Gamma distribution (+G = 3.08) in MEGA version 6.0 (Tamura et al. 2013). The 1000 bootstrap replicates were performed, and values greater than 50% frequencies were shown. All positions containing alignment gaps and missing data were eliminated only in pairwise sequence comparisons (Pairwise deletion option).

Evolution and selection pressure analysis

The nucleotide diversity, Tajima’s D test (Tajima 1989), Fay and Wu’s H (Fay and Wu 2000), and a sliding window analysis were carried out using DNASP4.10 (Rozas et al. 2003). The window size for the promoter and transcriptional units (TU) analyses was 50 bp with a step size of 10 bp. The number of base differences per site, number of synonymous differences per synonymous site and number of non-synonymous differences per non-synonymous site from averaging overall sequence pairs within each group and overall sequences were calculated using Nei and Gojobori (Nei 1987) method in MEGA 6.0 (Tamura et al. 2013). Support for individual nodes was assessed through random resampling of sequences with 1000 bootstraps. All positions containing alignment gaps and missing data were eliminated only in pairwise sequence comparisons (Pairwise deletion option). The ratio between synonymous substitutions per site (dS) and non-synonymous substitutions per site (dN), and (dN/dS ratio) were calculated. Sequence divergences between orthologs and paralogs were estimated using Mega version 6.0 (Tamura et al. 2013). The evolutionary distance, k, among γ-prolamin endosperm-specific promoter sequences (500 bp upstream of the coding region) was computed using the kimura 2-parameter substitution model and gaps were treated as missing data. All standard errors of divergence distances were determined using 500 bootstrap replicates. The heterogeneity of polymorphisms to fixed differences, Gmean and DKS statistics were computed by DNA Slider 1.13 (McDonald 1996, 1998). For Gmean and DKS statistics, recombination parameter (R) of 2, 4, 8, 16 and 32 were simulated with 1000 replicates. The highest p value of each statistic was reported.

Results

Sequence polymorphisms of γ-prolamin genes in Triticeae

Sequence polymorphic analysis was revealed that the ORF lengths of the sequences derived from Ae. sharonensis were the most variable (678–1020 bp), and for T. monococcum were the most conserved in length as previously reported by Qi et al. (2009). In total 126 sequences from Triticeae excluding H. vulgare were pseudogenes. All of which contained one or more internal stop codons or frameshift mutations caused by single nucleotide indels (insertions/deletions). The remaining 431 sequences were putatively functional, with no internal stop codons.

Comparative sequence polymorphic analyses between coding DNA sequences (CDS) and promoter sequences of γ-prolamins were revealed substantial differences. Nucleotide sequence analysis of 20 γ-prolamin endosperm-specific promoters revealed 99% identity. γ-prolamins’ promoter regions showed a high nucleotide diversity (π) (Watterson 1975) of 1.25-fold and 1.36-fold greater than that of γ-secalins and γ-gliadins coding regions, respectively. In contrast, the coding region of γ-hordein showed the nucleotide diversity of 0.17673, which was 1.56-fold higher than that of γ-prolamin endosperm specific promoter region. A similar trend was observed when another nucleotide diversity estimate (θw) (Tamura and Nei 1993) was examined. The nucleotide diversity (π) for γ-gliadin coding region was 0.08263; T. tugidum and T. urartu showed the highest (0.14005) and the lowest (0.01966) nucleotide diversity values, respectively (For more detail see Table 1).

Tajima’s D test was not significantly negative in the endosperm-specific promoter region of γ-prolamins (D = − 0.584, p > 0.1, Table 1). In the coding region of γ-gliadins, the D value was not significant excepte for T. urartu with D value of -1.9353 (p < 0.05, Table 1), which was marginally negative among γ-gliadin genes. This indicates that the entire coding region of γ-gliadins from T. urartu was favored by positive selection, and beneficial alleles have been fixed. However, a significant negative D value can also be the result of deriving low-frequency detrimental alleles or a bottleneck effect as described by Huang et al. (2002). Similar to γ-gliadin, neither the promoter nor coding region had significant D values for γ-secalin and γ-hordein genes.

Different regions of γ-prolamin promoter and coding regions were scanned using sliding windows of Tajima’s D test to identify regions that deviated from neutral expectations. In the promoter region (500 bp upstream of the γ-prolamin coding region), D values after about position of 220 were insignificantly positive. In the entire region of promoter, the proportions of positive and negative regions were compatible (Fig. 1A). Furthermore, the CDS sliding window analysis showed the maximum number of singleton polymorphic sites lying in between 1095 and 1490 from the first variable R1 domain (domain II) of the CDS (Fig. 1B). Therefore, both the promoter and the coding region of γ-prolamin multigenes did not deviate from neutrality at nucleotide level.

Fig. 1
figure 1

Sliding windows of Tajima’s D tests along endosperm specific promoter (A) and coding region (B) of γ-prolamins. The scale of the sliding window plot was adjusted for the promoter regions. The last (3′-end) sites of the promoter sequences of γ-prolamin were placed in the same position. The window size is 50 bp, and the step size is 10 bp

In-silico identification of the conserved CD-epitope sites

The amino acid sequence alignment was carried out among the deduced amino acid sequences of the sixteen templates using CLUSTAL Omega program at EMBL (Fig. 2). The seven common CD-epitopes including DQ2.5-glia-γ1 (PQQSFPQQQ), DQ2.5-glia-γ2a (FPQQPQQPF), DQ2.5-glia-γ3 (QQPQQPYPQ), DQ2.5-glia-γ4a (SQPQQQFPQ), DQ2.5-glia-γ4b (PQPQQQFPQ), DQ2.5-glia-γ4c (QQPQQPFPQ) and DQ8-glia-γ1a (QQPQQPFPQ) were identified (the targets for tTG deamidation are bold and underlined) as the most conserved motifs in the domain R1 region of γ-prolamins (Arentz-Hansen et al. 2002; Sjöström et al. 1998; Sollid et al. 2012; Stepniak et al. 2005). Furthermore, DQ2.5-glia-γ4b/c and DQ2.5-glia-γ1 CD-epitopes showed the highest homology (identity = 98.7%) in Triticeae γ-prolamins.

Fig. 2
figure 2

Schematic diagram of the alignment among the 16 deduced amino acid sequences of the γ-prolamins templates from Triticeae. The amino acid sequences of the CD-epitope cores (9-mer peptides) are shown in red letters. In the epitope names, the short terms denote the type of proteins that the epitope derived from: 'glia-γ' denotes γ-gliadin and 'sec' denotes secalin

Clustering and phylogeny relationship analysis

The maximum likelihood (ML) tree of DNA sequences from CDS produced separate clusters of γ-secalins and two well-separated large groups of γ-gliadins with unequal size: 145 consensus sequences belonged to the first large group and 275 consensus sequences belonged to the second large group (Fig. 3). In total ten significant (bootstrap support value of 95% or higher) groups from CDS region were observed. The accession number of HQ875932 from Secale cereale as ancestral sequence of γ-gliadin and γ-secalin was considered as a root of the γ-secalin and γ-gliadin branches. Total DNA sequence data of γ-secalin genes from T. aestivum (AABBDD) and S. cereale (RR) and H. vulgare (VV) were restricted to group 1 in the first ancestral branch. Sequences originating from Triticum species with an A genome (T. monococcum (Am), T. urartu (Au), T. aestivum (AABBDD) and Ae. uniaristata (NN) were restricted to the third branch. Within this branch, all γ-gliadin sequences from T. monococcum (Am), T. urartu (Au) and T. aestivum clustered in group 10. However, the sequences originating from Triticum and Aegilops species with B genome (T. aestivum (AABBDD), T. turgidum (AABB) and Ae. speltoides (BB)) and D genome (T. aestivum (AABBDD) and Ae. tauschii (DD)) were clustered into groups 2, 8 and 7, respectively. All groups except 1, 7, 8 and 10 contained a mixture of sequences of three to six species of Aegilops sp. and Triticum sp.

Fig. 3
figure 3

Molecular Phylogenetic analysis of γ-prolamin from Triticeae by Maximum Likelihood method. The tree is drawn to scale, with branch lengths measured in the 0.1 substitutions per site. The analysis involved 558 nucleotide sequences including an outgroup sequence from Oryza sativa (Accession No. X60979). All positions containing gaps and missing data were eliminated. There were a total of 122 positions in the final dataset. The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model (Tamura and Nei 1993). The tree with the highest log likelihood (− 2899.9916) is shown. Initial tree(s) for the heuristic search were obtained by applying the neighbor-joining method to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach. A discrete Gamma distribution was used to model evolutionary rate differences among sites (3 categories [+G, parameter = 2.3866)]. Evolutionary analyses were conducted in MEGA6 (Tamura et al. 2013)

Genetic variation analysis within and among the groups

In addition to genetic variation analysis within Triticeae species, a comparative evolutionary pattern analysis within and among the groups were carried out using Tajima’s D test in order to determine any deviation from neutral theory. Tajima’s D was significantly negative in groups 3 (D = − 2.312) and 8 (D = − 2.451) (p < 0.01). D values for groups 1 (− 1.560) and 2 (− 1.848) were also marginally negative (p < 0.05, Table 1). Since, most members of groups 2, 3 and 8 were from Ae. tauschii (DD genome) and T. aestivum (AABBDD), the significant negative D values of the groups might be due to the D genome. Thus, it is probable that the γ-gliadins present on the D genome were favored by either positive selection or bottleneck effect that led to the fixation of these beneficial alleles during evolution. With the same argument, it can be stated that positive selection probably has been occurred in γ-secalin from S. cereale (RR genome) according to negative significant D value for group 1. Additionally, the signature of positive selection of the groups was further confirmed by Fay and Wu’s H test, Gmean and DKS statistics (Table 1). In contrast, significant D value within the groups, among two large groups and overall groups were not seen (Table 2).

Table 2 Estimates of average evolutionary divergence and the nucleotide diversity over sequence pairs within and among groups

Fay and Wu’s H test was used to demonstrate if selective sweep, reduction in polymorphism via fixation of an advantageous mutation, has been in process through calculation of differences between high-frequency mutations and intermediate-frequency mutations (Lin et al. 2008). For this a homologous sequence from Oryza sativa was used as an outgroup. In our study, the most significant H values were − 21.78983 (p < 0.01; group 1 with R genome), − 26.16909 (p = 0.0012; group 2 with D genome), − 15.79710 (p = 0.0015; group 3 with D genome) and − 16.90587 (p < 0.02; group 8 with D genome). These negative significant H values are illustrative of a recent selective sweep in the species of S. cereale, T. aestivum and Ae. tauschii.

Furthermore, Gmean, statistics were used to determine number of polymorphic sites. DKS was used as an indication of the heterogeneity of polymorphism to divergence ratios. These statistics were used in the coding DNA sequences of species within groups 1, 2, 3, and 8. Gmean is most sensitive for detecting one or two peaks, and DKS is good at detecting a single low to high change in polymorphism (McDonald 1998). The Gmean and DKS tests were significant for all four groups; indicative of heterogeneity in polymorphism-to-divergence ratios in their coding region (Table 3). In other words, localized selective sweeps were detected. Thus, the selective sweep event in R and D genomes (especially in the γ-prolamin coding region) was supported by four neutrality tests.

Table 3 Neutrality and heterogeneity tests of polymorphisms to divergence using an outgroup sequence

Selection pressure in γ-prolamin evolution

The ratios of ω = dN/dS were computed within and among the defined groups (Table 2). The ratio of dN/dS indicates protein evolution. Where the values of these ratios are greater, equal, and smaller than 1, it means positive, neutral and purifying selection, respectively (Yang 2007). Here, group 9 with R genome in S. cereale and Ae. markgrafii (Greuter) K. Hammer (C genome) showed the highest ω = 1.5463, while group 8 with T. aestivum (ABD genome) and Ae. tauschii (D genome) showed the lowest ratio of ω = 0.7198. Thus, it can be stated that strong positive and purifying selection are acting upon C and D genomes, respectively. Furthermore, the high dN/dS ratios of group 1 (ω = 1.3346) and group 7 (ω = 1.2022) indicated the positive selection for S. cereale (R genome), Ae. speltoides (B genome), Ae. comosa (M genome), and T. turgidum (AB genome). Within groups of 2, 3, 8 (D genome), group 5 (AB genome), and 10 (A genome), the marginally purifying selection were observed, while group 4 and 6 both with Ae. searsii, Ae. sharonensis, Ae. bicornis and Ae. longissima (S genome) and Ae. umbellulata (U genome) were favored for neutral hypothesis.

Since, the total 10 groups of γ-prolamins lie in two large branches, we have determined the dN/dS ratio of these large ancestral branches. The results showed that a purifying selection acts on duplication of the total 2 large ancestral groups, while the dN/dS ratio of 1.0470 revealed the neutral selection among overall groups.

Discussion

Gene duplication and subsequent functional divergence are among the contributing factors in evolution of multigene families such as prolamines (Anderson et al. 2001; Nei 1969; Nei and Roychoudhury 1973; Ohno 2013; Shewry et al. 2003; Shewry and Tatham 1990; Stephens 1951; Ohno 1967). Analysis of the γ-gliadin multigene family fits with the birth-and-death evolutionary model with multiple gene duplication and divergence events, which was previously reported by Goryunova et al. (2012). In all Aegilops/Triticum genomes, the signatures of pseudogene formation and gene loss are apparent. All of which have split within a short evolutionary period (2.5–4.5 MYA) for Aegilops/Triticum group (A, S/B, C, D, M, N, U), S. cereale (R genome) and H. vulgare (V genome) (Van Slageren, 1994) and the multigenes were expanded within this period (Goryunova et al. 2012). Here, γ-type seed prolamins, with no prior record of evolutionary studies, were considered for the analysis of genetic diversity in their promoters and TU regions.

A comparative and extensive analysis was carried out on 557 γ-prolamin sequences, including 461 γ-gliadin from Triticum sp. and Aegilops sp. (A, B, D, S, U, M, N, and C genomes), 89 γ-secalin from S. cereale (R genome) and 7 γ-hordein from H. vulgare (V genome). The sequences were retrieved from GenBank and further considered to check on the sequence diversity, gene duplication and the effect of natural selection forces that mirrored on the evolution of the γ-prolamin multigene family. Furthermore, Triticeae genomes were examined to determine the genotypes with low CD-epitope contents. These genotypes might be useful in the development of future cultivars with reduced level of immunogenicity/allergenicity.

DNA sequence polymorphism and genetic diversity

Nucleotide diversity (θw and π) of the γ-prolamin endosperm-specific promoter was greater than that of γ-secalins and γ-gliadins transcriptional units (TUs), except for γ-hordeins. Similar to many genes, γ-prolamin TUs are more conserved than promoter sequences. Accordingly, these two sequence units have been subjected to distinct selection mechanisms and/or demographic histories. In contrast in γ-hordeins, duplication events have been occurred in blocks of “promoter + TU”, resulting in relatively higher sequence conservation within the promoter. Furthermore, high haplotype diversity was evident for γ-prolamins in general.

According to the several reports, ORF lengths of γ-prolamin sequences range from 678 to 1089 bp, while Ae. sharonensis (678 bp) and Ae. speltoides (1089 bp) are respectively the shortest and the longest so far reported by Qi et al. 2009 as well as our alignment results. None of which does contain intron, interrupting the coding DNA sequence (Qi et al. 2009). The protein structure of γ-prolamins comprised of a 20-residue signal peptide, followed by a short N-terminus non-repetitive domain (I), a highly variable repetitive domain (II), a non-repetitive domain containing most of the cysteine residues (III), a glutamine-rich region (IV), and the C-terminal non-repetitive domain containing the final two conserved cysteine residues (V) (Goryunova et al. 2012; Qi et al. 2009). The sliding window analysis revealed that the maximum number of singleton polymorphisms lie in the first variable domain R1 (domain II) region. This domain bears immunogenic epitopes that leads to CD (Anderson et al. 2001; Salentijn et al. 2012). According to the previous report, the long size of the repetitive domain (encompassing about 45% of total γ-gliadins length) contains regular short repeats caused by SNPs (single nucleotide polymorphisms) (Qi et al. 2009). Here, our data was affirmative on the role of SNPs and frequent amino acid sequence variations specifically within the first variable R1 domain with ~ 138–537 bp long, which is mainly responsible for the size of the γ-gliadins heterogeneity.

Different selection forces working on different γ-prolamin genomes

The evolutionary analyses revealed that selection plays a key role in the maintenance of γ-prolamin promoter and coding regions within and among Triticeae species in the form of positive selection, which are presented by negative values of the neutrality test statistics (Tajima’s D, Fu and Li’s D) as previously described by Goryunova et al. (2012) and Qi et al. (2009). Overall, Gli-1 loci are diverse, although γ-type sub-fractions are supposed to be the most ancient family among prolamins (Shewry and Tatham 1990; Sabelli and Shewry 1991). Furthermore, Tajima’s D test was not significantly negative in the γ-prolamins endosperm-specific promoter region and their transcriptional units within Triticeae species. The only exception was T. urartu, which seems that positive selection is in action and beneficial alleles has been fixed. Thus, neutral hypothesis stands for both the promoter and TU sequences of γ-prolamins. Following duplication events, an equivalence mutation and genetic drift most likely have happened among the nucleotide sequences of γ-prolamins from Triticeae species.

The phylogeny analysis grouped Triticeae γ-prolamins into ten clades of two ancestral branches: first including groups of one to five, and second including groups of six to ten. Group 1 and 10 comprised of respectively R genome from S. cereale, V genome H. vulgare and A genome from T. monococcum, while groups 2, 3, 8 comprised of D genome from Ae. tauschii and T. aestivum. The sequences of H. vulgare lie in separate cluster, separating γ-hordeins from γ-secalins and γ-gliadins. Furthermore, γ-hordeins demonstrated greater genetic diversity than other γ-prolamins. Additionally, an extensive endo-reduplication and multiple polymorphic indels within group 1 (R and V genomes), group 10 (A genome), groups 2, 3 and 8 (D genome) was evident. In groups containing varieties of species, the action of horizontal gene transfer through introgression followed by duplication may describe the genetic diversity.

The significant negative Tajima’s D value of each groups 1, 2, 3, 8 revealed the positive selection pressure acting on R and D genome duplication process. Furthermore, the other statistical parameters, Fay and Wu’s H test, Gmean and DKS, on these 4 groups revealed a strong selective sweep within each group. Although within the two ancestral branches and groups inside, neutral theory holds, dN/dS ratio suggests a strong purifying selection for the ancestral groups. The ω < 1 values of individual groups of 2, 3, 5, 8, 10 and separately two ancestral groups indicate that γ-prolamins are an evolutionary older family than was proposed by Goryunova et al. (2012).

Perspective for cereal breeding programs

So far, the only efficient proposed therapy for CD is a life-long gluten-free diet. However, modified/shortened T-cell stimulatory gluten peptides may reduce or even abolish the immunogenicity of gluten consumption in the patients. Moreover, consumption of foods prepared with these kinds of flour may bring the progression of the disease into halt (Salentijn et al. 2012; Shewry and Tatham 1990). Recently, a number of new potential therapeutic alternatives to the gluten-free diet are under development, including enzymatic detoxification of gluten, tissue transglutaminase inhibitors, blocking of HLA-DQ peptide presentation, silencing of gluten-reactive T cells, cytokine therapy and selective adhesion molecule inhibition, and selection or genetic engineering of less toxic grains (Sollid and Khosla 2005; Bethune and Khosla 2012). Several reports have suggested that the genetic differences in gliadins and γ-prolamin genes with short repetitive R1 domain lead to design strategies for making non-toxic and more nutritious cereal varieties (Molberg et al. 2005; Qi et al. 2009; Spaenij-Dekking et al. 2005). We sought for the genomes with indels or endo-reduplication of the R1 domain. Moreover, we have also mainly introduced the conservative CD epitopes in the γ-prolamin TU region using bioinformatics approach. With overlapping between the conservative CD epitopes and the regions with the least polymorphism in the A and D genomes, we have demonstrated that groups 2, 3, 8 (D genome), and 10 (A genome) showed the lowest polymorphism with fewer CD epitopes than other genomes. This suggests that these genomes might be valuable progenitors in breeding programs to possibly generate less-immunogenic lines as previously reported by Wang et al. (2012). Therefore, we believe that the progenitors of these genomes, A and D, have to be used in future breeding programs to develop cereal flours that are less immunogenic in CD patients or gluten sensitive individuals. Additionally, attempts have been made to use RNAi technology to reduce the corresponding transcripts and therefore lowering down the prolamins of seeds to possibly make the flour less immunogenic (Altenbach and Allen 2011; Gil-Humanes et al. 2008, 2010, 2014; Kohnehrouz and Nayeri 2016; Wen et al. 2012). Advent of genome editing technologies (Boettcher and McManus 2015; Lozano-Juste and Cutler 2014) may promise developing of varieties with totally modified R1 domain. Taking such venues to change the genetics of current cereal cultivars may help to overcome the current gluten free diet (GFD) approach for CD patients and gluten sensitive individuals that accounts for nearly 10% of the worldwide population (Sollid and Khosla 2011; Pérez et al. 2012).