Introduction

Wheat is the third most important food crop worldwide and feeds 4.5 billion people in 95 developing countries (Braun et al. 2010). However, various biotic stresses, including many pests, can severely affect wheat production. Annual economic losses attributed to the incidence of pests vary in countries where they occur. The use of resistance genes has been the most economic and environmentally friendly approach to protecting wheat from pest damage and minimizing the use of insecticides (Ratcliffe and Hatchett 1997; Kong et al. 2008; Mornhinweg et al. 2009; Fatehi et al. 2009).

The continuing advances in molecular technologies including genetic mapping, quantitative trait locus (QTL) analysis, and most recently genome-wide association (GWAS) studies are accelerating the pace of the identification and dissection of the genetic control of disease resistance, providing tools for the use of marker-assisted selection (MAS) to accumulate desirable genes in breeding programs (Morgante and Salamini 2003). Association mapping was first successfully used to identify alleles at loci contributing to susceptibility to different human diseases [reviewed by Goldstein et al. (2003)]. GWAS utilize the linkage disequilibrium concept in order to identify significant marker–trait correlations (Hedrick 1987; Flint-Garcia et al. 2003).

Factors which affect GWAS include (1) population structure, (2) gene–environment interactions and (3) population size (Zheng et al. 2012). GWAS are affected by the population subdivision, which results in inbreeding, because the individuals in a subpopulation can share common ancestors. Population structure may result in an increase in false-positive results by causing genome-wide linkage disequilibrium between unlinked loci. Thus, identifying and taking into consideration population structures (Q matrix) as fixed effects and differences in genetic relatedness (kinship or K matrix) as random effects reduces the number of false positives (Pritchard et al. 2000a, b; Yu and Buckler 2006; Sillanpää 2011).

One advantage of association analysis is the possibility of using sets of historical germplasm where phenotypic data has been accumulated over time, thus avoiding the time needed to develop mapping populations. Therefore, an increasing number of studies are using association mapping in wheat to complement previous QTL studies. These include studies for different physiological traits like seed longevity (Rehman Arif et al. 2012), pre-harvest sprouting (Kulwal et al. 2012), late maturity α-amylase (Emebiri et al. 2010), glutenin content (Ravel et al. 2006) and milling quality (Breseghello and Sorrells 2006). GWAS has also been used for targeting resistance to biotic stresses including Stagonospora nodorum blotch (Tommasini et al. 2007), fusarium head blight (Massman et al. 2011), soil-borne diseases (Mulki et al. 2013) and stem rust (Yu et al. 2012), or abiotic stresses like drought (Zhang et al. 2013). Neumann et al. (2011) used association mapping to identify QTLs for yield and yield components, and for resistance against leaf rust, powdery mildew, protein content and sedimentation value. Peng et al. (2009) presented the only association mapping study on resistance to an insect pest damaging the wheat crop. They utilized a collection of 71 genotypes to identify the association between 51 simple sequence repeat (SSR) markers and Russian wheat aphid resistance in wheat. In this study we will identify the association between Diversity Arrays Technology (DarT) markers and resistance to five major wheat pests.

The Hessian fly (HF) (Mayetiola destructor Say) is arguably the world’s most important pest of wheat, causing economic damage in North Africa, the north of Kazakhstan, South Europe and North America (El Bouhssini et al. 2012). In recent years, increased losses have been reported in wheat as a result of Hessian fly following infected wheat fields in Texas, Oklahoma and Kansas in USA (Comis 2007; Knutson and Swart 2007; Whitworth 2007).

Similarly, Russian wheat aphid (RWA), Diuraphis noxia (Kurdjumov), a pest of wheat and barley, causes economic damage in many parts of the world, and is indigenous to southern Russian, Iran, Afghanistan and countries bordering the Mediterranean Sea. The pest has spread widely and is now found in all continents except Australia (El Bouhssini et al. 2012).

Sunn pest (SP) (Eurygaster integriceps Puton) is the most important insect pest of wheat in Central and West Asia and Eastern Europe (Radjabi 1994; Javahery 1995), affecting about 15 million ha of wheat. Both nymphs and adults of SP cause plant damage, feeding on leaves, stems and grains (Critchley 1998). Yield losses attributable to direct feeding typically range between 50 and 90 % (Darkoh et al. 2010). Furthermore, enzymes injected into the grain during feeding severely compromise the quality of the resulting flour by degrading the vital gluten proteins (Hariri et al. 2000). There is increasing concern that climate warming provides favorable conditions for the expansion of the geographic range of SP with potential to increase its harmfulness (Krupnov 2012).

The wheat stem sawfly (WSSF) (Cephus cinctus Norton) has been a major pest of common wheat in the northern Great Plains of North America for more than 100 years (Beres et al. 2011). Economic loss due to wheat stem sawfly may be due to a reduction in yield and/or grade as a result of larval feeding within the stem. The biggest loss, however, is from the stems being cut and plants falling to the ground, making pick-up for harvest extremely difficult. Dry weather cycles and tight wheat rotations are the primary factors contributing to increased sawfly populations. In North Africa and West Asia, the predominant sawfly species is Cephus pygmaeus L., although it is not the only stem sawfly species that attacks cereals in the region. WSSF infestations leading to as much as 28 % damage in wheat were recorded in Syria. In Morocco, 40.6 and 38.5 % of stems were cut by this pest in bread wheat and durum wheat, respectively (El Bouhssini et al. 1987).

Cereal leaf beetle (CLB) (Oulema melanopus L.), is considered a major pest of small grains in Europe, Asia and the United States (Philips et al. 2011). CLB mainly poses an economic threat to spring-seeded small grain crops such as wheat, barley and oats. Crop damage by CLB may result in significant yield and quality reduction such as lowered grade and reduced economic returns to producers. In addition, this pest has posed marketing limitations because of quarantine restrictions that require fumigation to ship grain and hay to markets in areas not yet infested (Blodgett et al. 2004).

For some of the insects, such as HF, the effectiveness of deploying genetic resistance has been challenged by the ability of the insect to develop different virulent populations or biotypes that overcome the specific resistance genes in deployed cultivars (Ratcliffe et al. 2000). Virulence analysis of six HF populations showed differences in resistant genes in the states mentioned (Chen et al. 2009). The Syrian HF biotype has been found to be the world’s most virulent; only two of the known genes (H25 and H26) are effective against the Syrian biotype (El Bouhssini et al. 2009a). For RWA, the reaction of the Dn differentials to the Syrian biotype showed high resistance in Dn6 and Dn7, good resistance in Dn2, and moderate resistance in Dn4 and Dn5. Therefore, the Syrian RWA biotype is less virulent than the RWA-2 biotype in the USA (El Bouhssini et al. 2012). To counter the development of new biotypes, it is necessary to identify and utilize different resistance genes derived from diverse sources.

The present study adds to the growing use of GWAS to identify marker–trait associations for economically important pests of wheat. The objective of this study was to use GWAS to identify DArT markers that are associated with QTLs that confer resistance to five major pests, RWA, HF, SP, WSSF and CLB, in a set of wheat genotypes. This work is the first association mapping study to our knowledge for five insect pests in wheat, except for RWA.

Materials and methods

Plant material and insect phenotyping

The germplasm used in this study comprised 134 wheat genotypes including landraces, synthetic hexaploid wheat and some elite germplasm. These had previously been screened for resistance to the five pests using established phenotyping procedures (El Bouhssini et al. 2008, 2009a, b, 2011a, b, 2012).

Phenotyping for resistance to HF, RWA and SP was carried out at the ICARDA experimental station (Tel Hadya, Aleppo, Syria) under artificial infestation using insect populations collected from Syria. The phenotyping for CLB resistance was carried out under natural infestation at the Research Institute of Farming “Zemledeliya” of the Academy of Agricultural Science in Tajikistan.

Hessian fly

Screening for HF resistance was conducted in the greenhouse at 20–22 °C, a light/dark photoperiod of 16/8 h, and a relative humidity of 60–70 %. The initial screening was carried out in a standard greenhouse flat (54 cm × 36 cm × 8 cm) containing a mixture of soil, vermiculite and peat. Five seeds of each entry were sown in hill plots. At the one-leaf stage, each screening flat was covered with a cheesecloth tent where about 50 mated females were released and allowed to lay eggs for 2 days. Plant reactions to larval feeding were determined 20 days after the eggs hatched. Susceptible plants were dark green in color, stunted and contained live larvae, whereas the resistant plants were light green, not stunted and contained dead and/or live larvae. Potentially resistant lines selected from the initial screening, in addition to a susceptible check, were re-evaluated, using the same procedure described above for the initial test, except that seeds were sown in rows (ca. 20 seeds per row). A randomized complete block (RCB) design with four replications was used; resistant plants were dissected under a microscope (40×) and checked for the presence of dead larvae to confirm the antibiosis effect.

Russian wheat aphid

Field screening for RWA resistance was conducted at ICARDA, Tel Hadya, Syria in 2009 and 2010. Entries were planted in hill plots using 10 seeds per hill. This initial screening was a non-replicated test, with the susceptible check repeated every 10 entries. At the two-leaf stage, entries were infested with a mixture of RWA nymphs and adults by placing infested leaves from the laboratory culture on the plants. Evaluations were made according to standard procedures when symptoms were clearly seen on susceptible checks using a 1–3 scale for leaf rolling (LR) and a 1–6 scale for leaf chlorosis (LC; El Bouhssini et al. 2011a). The promising lines, based on field screening, were re-evaluated in the greenhouse at 20–24 °C, a light/dark photoperiod of 16/8 h, and a relative humidity of 50–60 %. Seeds were planted in 54 cm × 36 cm metal flats with five seeds per hill, which were thinned to three plants per hill after germination. Seeds were sown in a mixture of soil, sand and peat (2:1:6). Each plant was infested with 10 adult RWA at the two-leaf stage. Evaluations were made 4 weeks post-infestation using the scales described previously. A RCB design with two replications was used. A wheat cultivar from ICARDA was used as a susceptible check.

Sunn pest

The wheat entries were evaluated for resistance to Sunn pest feeding at the vegetative stage using the screening technique developed at ICARDA in which lines are evaluated under artificial infestation in the field (El Bouhssini et al. 2009b). The screening was conducted at ICARDA during the 2009 and 2010 growing seasons. This method uses mesh screen cages of 6 m × 9 m × 3 m. The test was carried out in two stages: initial and advanced evaluation. In the initial screening test, the entries were planted in hill plots using 10 seeds per hill, with a susceptible check from ICARDA every 10 test entries. Plants of each hill were infested with three adults at the time of insects’ migration to wheat fields, around mid-March. Four weeks after infestation, the test entries were evaluated for vegetative stage damage from Sunn pest feeding using a 1–6 rating scale (El Bouhssini et al. 2009b) to assess shoot and leaf damage (and plant stunting). The promising accessions from the initial screening in 2009 were re-evaluated in 2010, using the same screening procedure, using an RCB design with four replications. One susceptible wheat cultivar was included in the test.

Wheat stem saw fly

The wheat lines were evaluated for resistance to WSSF under natural infestation in the field at Tel Hadya in 2007 and 2008, using an RCB design with four replications. The test included a susceptible wheat line from ICARDA. Each entry was planted in a 1-m row. The material was evaluated just before harvest. The evaluation was based on the number of stem cuts/entries caused by WSSF.

Cereal leaf beetle

Screening for resistance to CLB was conducted under natural infestation at the Research Institute of Farming “Zemledeliya” of the Academy of Agricultural Science in Tajikistan during two seasons, 2009 and 2010. The wheat lines were evaluated in a RCB design with four replications. The local cultivar Ziroatkor-70 was included as a susceptible check. A scale of 1–6 was used for evaluation (1 = no damage; 2 = 10 % or less of leaves damaged; 3 = 25 % or less of leaves damaged; 4 = 50 % or less of leaves damaged; 5 = 75 % or less of leaves damaged; 6 = more than 75 % of leaves damaged, including the flag leaf).

Genotyping, genetic diversity and marker allele frequency

Genomic DNA was extracted from 2-week-old seedlings using pooled leaf samples from five plants per line, frozen in liquid nitrogen and stored at −80 °C before DNA extraction. DNA extraction was carried out according to Ogbonnaya et al. (2001), after which 10 μl of a 100 ng μl−1 DNA of each sample was sent to Triticarte Pty. Ltd. Australia (http://www.triticarte.com.au), a commercial service provider for whole genome scan using DArT markers. DArT is an array-based genotyping technology; the markers are binary and dominant (Jaccoud et al. 2001; Wenzl et al. 2004). The bases of polymorphism are single nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs) at restriction enzyme cutting sites and large INDELs within restriction fragments (White et al. 2008). A total of about 3100 DArT markers was generated. The order of DArT markers and their positions were inferred from a consensus linkage genetic map of wheat (Detering et al. 2010). The number of alleles, their frequencies, gene diversity and polymorphism information content (PIC) were estimated using PowerMarker version 3.5 (Liu and Muse 2005). Markers with minor allele frequency lower than 10 % were excluded from all further analyses, resulting in a total of 2,518 polymorphic DArT markers.

Population structure

The population structure of the germplasm was analyzed using the software STRUCTURE version 2.3.3 (Pritchard et al. 2000b) by employing the Bayesian clustering method to infer the number of subpopulations k. Fifty unlinked DArT markers distributed across the wheat genome with two or three loci on each wheat chromosome were used. The genetic distance between pairs of chosen markers on the same chromosome was more than 50 cM [to minimize detecting background linkage disequilibrium (LD) caused by tightly linked markers]. Three runs for each k value from 2 to 8 was performed based on an admixture model. The length of burn-in period and the number of iterations were both set at 105. To reach the appropriate k value, the estimated normal logarithm of the probability of fit [lnP(D)], provided in the STRUCTURE output, was plotted against k. This value reaches a plateau when the minimal number of groups that best describe the population substructure has been reached (Pritchard et al. 2000a).

Linkage disequilibrium

The LD between pairs of DArT markers was estimated as squared allele frequency correlation estimates (R 2) using the software package TASSEL 3.0 (Bradbury et al. 2007; http://www.maizegenetics.net). Only DArT markers with known chromosomal position were used to estimate LD and to measure the significance of R 2 at P values ≤0.01 for each pair of loci on different chromosomes (inter-chromosomal LD) and within the same chromosome (intra-chromosomal LD). In addition, rare alleles with an allele frequency ≤10 % were treated as missing data. The final number of markers used for the estimation of intra-chromosomal and inter-chromosomal LD was 1728 and 2518 markers, respectively, distributed across the 21 wheat chromosomes. The second-degree LOESS smoothing was calculated and plotted using SigmaPlot V.11 software.

Association mapping

TASSEL 3.0 was used to perform association mapping analysis using the mixed linear models (MLM; Yu and Buckler 2006) which takes into consideration kinship matrix (K) while implementing the EMMA (Kang et al. 2008) and P3D algorithms (Zhang et al. 2010) to reduce computing time (Bradbury et al. 2007, http://www.maizegenetics.net). MLM was again used after including population structure (Q) to control both Type I and Type II errors (MLM-Q). The association analysis of the D genome markers was assessed after excluding the durum genotypes. DArT markers with P values ≤0.005 were declared significantly associated with SP, RWA and WSSF resistances. However, marker alleles with false discovery rate (FDR) values ≤0.01 were declared significantly associated with HF and CLB resistances. The Manhattan plots were drawn using ggplot2 code in R written by Stephen Turner (http://www.StephenTurner.us/qqman.r).

DArT clone sequences and annotation

The full sequences of the DArT clones that contain the markers that co-segregated with resistances to the five pests with P values ≤0.005 were obtained from Triticarte Pty. Ltd. These were used to investigate the potential molecular function of the significant markers and the possible biological process underlying some of the detected QTLs. The sequences were formatted as a fasta file and analyzed with Blast2Go software (Conesa and Gotz 2008) by blasting them against the Blast2go databases and plant resistance genes PRGdb2 (http://prgdb.crg.eu/wiki/Special:BlastForm) database (Sanseverino et al. 2013).

Results

Evaluation of pest resistances

The results of the evaluation of wheat landraces, synthetic hexaploid wheat and standard checks for resistance against the five pests are summarized in Supplementary Fig. 1. Infection response varied for each pest and ranged from completely resistant to moderate resistance and susceptible reactions. Details have previously been reported in El Bouhssini et al. (2008, 2009a, b, 2011a, b, 2012). In all cases, the material classified as resistant to the five insects demonstrated resistance that was superior to the existing standard resistant check cultivars in bread wheat (data not shown). Of the 134 genotypes, 41 genotypes exhibited moderately resistant to resistant phenotypes for RWA, 28 for SP, 21 for HF, 21 for WSSF and 16 for CLB. In addition, the results for HF resistance indicated that the resistant accessions exhibited antibiosis as a resistance mechanism because the first hessian larvae died on some of the resistant plants (Supplementary Fig. 2).

Marker coverage, population structure and linkage disequilibrium

Out of the 2,518 polymorphic DArT markers that were common between the 134 genotypes, 1,728 were of known map position. Six hundred and twenty, 687 and 421 loci were mapped on the A, B and D genomes, respectively, covering a genetic distance of 909.1, 987.6 and 955 cM with an average distance of 1.5, 1.5 and 2.3 cM for the A, B and D genomes. The map spanned a total genetic distance of 2,851.7 cM at an average distance of 1.7 cM. Chromosomes 4D and 5D had the least number of markers with only eight markers each while 3B and 7D had the highest number of markers (161 and 154 markers, respectively).

Taking into consideration the population structure of the germplasm prior to association mapping is important in accounting for population subdivisions that may lead to spurious marker–trait associations. The lnP(D) value increased continuously with the increase in k from 2 to 8 but showed a plateau after k = 6 and thereafter tended to fluctuate (Fig. 1). Thus, the best probability for k value was determined to be 6, which appeared to be the most stable prediction of LnP(D) over the eight repetitions. Accordingly, the germplasm panel was clustered into six subgroups which consisted of 27, 25, 21, 30, 18 and 13 genotypes, respectively, with a considerably high mean F value of 0.46 for the six clusters. The germplasm accessions consist of wheat landraces including bread and durum wheat from different locations, and primary synthetic wheat. For example, subgroup 1 comprised durum wheat landraces only and mostly of ICARDA origin interspersed with a few accessions from Afghanistan and one accession from Tajikistan, group 2 consisted of synthetic hexaploid mostly produced at CIMMYT, and some at ICARDA and Australia. The third subgroup contained a mixture of bread wheat landraces predominantly from Pakistan, Iran and CIMMYT. The fourth subgroup mostly contained 16 bread wheat genotypes which originated from Kyrgyzstan while the fifth subgroup comprised bread wheat mostly from CIMMYT. The sixth subgroup consisted of bread wheat landraces mostly from Afghanistan (Table S1).

Fig. 1
figure 1

Population structure of 134 wheat genotypes used in this study

Linkage disequilibrium was estimated using the squared allele frequency correlations (R 2) at P ≤ 0.01 from all pairs of DArT markers. About 16 % of inter-chromosomal pairs of loci were in significant LD at P ≤ 0.01 with an average R 2 of 0.2 while about 44 % of intra-chromosomal pairs of loci were in significant LD with an average R 2 of more than 0.5. The significant R 2 values of the intra-chromosomal LD at P ≤ 0.01 were plotted against the interval genetic distance. The LD decayed with increase in genetic distance (Fig. 2a). The markers that were closely linked (at a genetic distance of 0 cM) possessed the highest LD (average R 2 = 0.47) while the average R 2 decreased to 0.26 for the pairs of loci at genetic distances of up to 20 cM (Fig. 2a). At genetic distances between 20 and 50 cM, the average R 2 decreased to 0.16.

Fig. 2
figure 2

Decline of LD, as measured by R 2, against genetic distance. a All significant LDs. The red ellipse represents significant LDs among two distant LD blocks on chromosome 7D. b Significant LDs for all chromosomes except 3D, 6D and 7D. The red line represents the second-degree LOESS smoothing

The average P values for the intra-chromosomal LD pairs for chromosomes 3D, 6D and 7D were about 0.0003, 0.0005 and 0.0006 with average R 2 values of 0.32, 0.3 and 0.26, respectively, while that for the other chromosomes was about 0.001 with R 2 average value of 0.17 (Fig. 2b). Significantly high LD values were detected among the markers of two distant LD blocks (between positions 0.8–5.1 cM and 170.4–176.4 cM) on chromosome 7D (red ellipse in Fig. 2a).

Association mapping and resistance gene distribution

DArT marker association with resistance to the five pests was determined by MLM analysis including kinship relationships (K matrix) and population structure (Q matrix) as covariates with P ≤ 0.005. A total of 13 DArT markers were identified to be significantly (P ≤ 0.005) associated with RWA resistance through leaf chlorosis. The markers mapped on chromosomes 1A, 1B, 2B, 3D and 7BL with R 2 values which ranged from 6.5 to 9.5 % of the phenotypic variation. Only two markers on chromosome 3D out of the 13 DArT markers were associated with resistance conferred by leaf rolling, with R 2 of about 8.2 % of the total phenotypic variation. Another marker on 7DS (wPt-733729) was associated with LR only, explaining 5.2 % of the total phenotypic variation.

Seventeen markers were associated with visual damage score (VDS) or stunting resistance for resistance to SP and were mostly localized on chromosomes 1A, 1B, 2B, 3AS, 4BL, 5AL and 6BS, with R 2 ranging from 6.2 % for wPt-2400 on 6BS to 10.78 % for wPt-5776 on 1A. Similarly 17 markers were identified as being associated with visual infestation score (VIS) or shoot/leaf damage resistance. The R 2 for VIS ranged from 6.2 % for wPt-730435 on 4BL to 11.16 % for wPt-2748 on 3AS. Fifteen markers were common between SP-VDS and SP-VIS at seven common genomic locations. The other markers (two for VDS and two for VIS) were significant at P ≤ 0.01.

Five markers were identified for QTLs contributing to WSSF resistance on five genomic regions on chromosomes 1D, 3BL, 5BL, 6BS and 7AS. The R 2 values ranged from 6.6 % for wPt-2707 on 5BL to 14.5 % for wPt-666174 on 1D. The absolute phenotypic effect values ranged between 6.3 and 13. Five markers were significantly associated with resistance to CLB on 3BL and 7DS with R 2 values which ranged from 19.5 % for wPt-74308 on 7DS to 43.4 % for wPt-73166 on 3DS. The three markers on 7DS, wPt-66406, wPt-74308 and wPt-74292, overlapped and explained 33.20, 19.50 and 20.21 %, respectively, of the observed phenotypic variation for resistance to CLB. Thirteen DArT markers were identified to be significantly linked with HF resistance. Of these, seven were assigned to chromosomes 1B, 2D, 3A, 3D, 5D and 7D while six were unmapped. The R 2 values ranged from 2.8 % for wPt-8699 on 3A to 25.8 % for wPt-2694 on 1BS. Only one marker, wPt-2400 on 6BS, was identified that had significant association with resistance to more than one pest—SP and WSSF resistance (Table 1). Figure 3 shows Manhattan plots for all scored pest resistances using MLM that accounted for Q + K.

Table 1 The most significant markers associated with resistances against Sunn pest (SP), russian wheat aphid (RWA), Hessian fly (HF), wheat stem saw fly (WSSF) and cereal leaf beetle (CLB) on each genomic location
Fig. 3
figure 3

Manhattan plots for a RWA_LC resistance; b RWA_LR resistance; c SP_VDS resistance; d SP_VIS resistance; e WSSF resistance; f HF resistance; and g CLB resistance. The black, grey and orange dots represent markers from A, B and D genomes respectively

Putative gene function of resistant loci

The in silico approach was carried out to identify the probable function of selected resistance loci with the gene ontology. Of the 54 DArT sequences blasted against the various databases, only three DArT clone sequences showed predicted annotation with gene ontology (Table 2). Of those with annotation, the DArT marker (wPt-9032) associated with resistance to HF on 1B codes as a disease resistance protein rps5-like and was annotated as defense response and apoptosis. The DArT marker, wPt-666174, linked to WSSF resistance on 1D had the same annotated biological process that was identified as NB-ARC domain-containing protein. The annotated marker for RWA, wPt-731493, located on chromosome 3D relates to chloroplast inner envelope with a molecular function described as transferase activity.

Table 2 The gene ontology description of some DArT markers associated with resistance to HF, RWA and WSSF

Blasting DArT sequences in PRGdb resulted in the identification of genes that exhibited similarities to five DArT sequences and three reported genes from two organisms: two from wheat and one from common bean (Table 3). The common bean gene PRGDB00075685 is a serine-threonine kinase and when compared to our results it was similar to sequences of three DArT markers, wPt-7466 on 2D, wPt-2400 on 6B and wPt-8262 on 5A, associated with HF, WSSF and SP resistances respectively. Another serine-threonine kinase gene of wheat (PRGDB00192902) was similar to the DArT marker wPt-1089 on 6B linked with resistance to SP. The DArT marker, wPt-2861, linked to SP resistance was similar to the leucine rich repeat (LRR) gene PRGDB00078093.

Table 3 The blast result with the plant resistance genes database

Discussion

Variation for resistance in genetic resources for association mapping (AM)

The objectives of this study were to characterize and identify a set of loci associated with resistance genes to five major insect pests affecting wheat production, using genetic resources comprising durum and bread wheat land races, synthetic hexaploid wheat and elite wheat germplasm of diverse geographical origins. Knowledge of the genetic basis of insect pest resistance in this germplasm is essential because it will facilitate their incorporation in locally adapted wheat cultivars. Genetic variation for resistance was extensive for RWA, SP, CLB, HF and WSSF in the germplasm set used for this study. They exhibited varying levels of resistance to the pests, as previously reported in El Bouhssini et al. (2008, 2009a, b, 2011a, b, 2012). However, none of the genotypes exhibited resistance to more than one pest. Nevertheless, they represent potentially new and diverse sources of genetic resources for deployment in breeding for resistance against these important pests and offer opportunity for pyramiding these resistances in a wheat breeding program. Results from GWAS are strongly influenced by the choice of germplasm, size of the population, number and distribution of markers and the trait under study (Kulwal et al. 2012). In some cases, exotic or diverse germplasm representing different geographic regions of the world are used to minimize LD and identify QTLs.

Marker coverage, population structure and LD

One of the major factors in the success of GWAS is good marker coverage of the genome because sparse coverage reduces the power for marker identification (Poznial et al. 2012). In the current study, 2518 DArT markers were polymorphic, of which 1728 were of known map position and cover about 2,851.7 cM with an average distance of 1.7 cM, a comparatively greater coverage than previously reported in other studies. For example, Crossa et al. (2007) used 813 DArT and 530 SSR and sequence tagged site (STS) markers on 170 CIMMYT wheat germplasms for AM studies, Neumann et al. (2011) used 574 DArT markers for AM studies on 96 winter wheat germplasms, Emebiri et al. (2010) employed 395 DArT markers for AM studies using 91 synthetic hexaploid wheat germplasms, and Mulki et al. (2013) used 667 DArT markers to identify known and potentially new genomic regions associated with resistance to soilborne pathogens in synthetic hexaploid wheat.

The power of association mapping depends on accurate estimation of the population structure using the admixture model to avoid type I errors (Pritchard et al. 2000a). In this study, the result obtained using STRUCTURE indicated that subpopulations exist in the association panel. Six subpopulations adequately separated genotypes into appropriate clusters which were largely in agreement with the geographic origins, pedigrees and type of germplasm, with limited intermixing of genotypes among the germplasm. There was separation of the germplasm type: for example, subgroup one comprised only the tetraploid durum genotypes while the synthetic hexaploid wheats clustered together into subgroup 2. The latter was the major source of resistance to HF (except the resistant check) and possibly explains the high contribution of the D genome to HF resistance (Table 1). Similarly, all of the CLB-resistant genotypes were bread wheat landraces sharing the same origin (Kyrgystan) and clustered into the fourth subgroup with negligible admixtures, which included a few WSSF-resistant genotypes from ICARDA. The identification of population structure (Q matrix) as a fixed effect and differences in genetic relatedness among genotypes within the subpopulations (kinship or K matrix) as random effects reduces the number of false positives (Yu and Buckler 2006).

The extent of LD in our germplasm was also examined, considering all pairs of DArT markers in significant LD. The general trend was an extremely high LD with a slight decline even at intervals extending over 50 cM. Different studies presented a long-range LD up to 30–40 cM (Crossa et al. 2007; Dreisigacker et al. 2008) while others observed smaller ranges up to 20 cM (Chao et al. 2007; Emebiri et al. 2010). In our germplasm, chromosomes 3D, 6D and 7D have very significant LDs with very high R 2 values even between very distant markers (over 100 cM) and they affected the LD decay, resulting in such long-range LD. Figure 2 shows the differences on LD plot with (Fig. 2a) and without (Fig. 2b) the significant LDs of chromosomes 3D, 6D and 7D. Before excluding the LDs of those chromosomes (Fig. 2a), the LOESS smoothing showed a non-fluent or stuttering trend that varied between R 2 values of 0.13 and 0.19 between 10 and 100 cM, to become almost fluent with the mean R 2 value of 0.2 after 100 cM. The curve smoothness was increased when the significant LDs of the chromosomes 3D, 6D and 7D were excluded to become almost stable with an R 2 value of about 0.11 after 15 cM (Fig. 2b). This may be explained by high population structure in the germplasm involving synthetic hexaploid genotypes and the high number of genotypes from the same geographical regions (e.g. Kyrgystan and Afghanistan). For GWAS, genetic materials characterized by high LD are preferable due to the reasonably low number of markers required to reveal a significant marker–trait association (Maccaferri et al. 2006). In general, this suggests that the marker coverage (one marker every 5.8 cM) in this study was sufficient for a whole-genome association scan with a marker density of one DArT every 1.7 cM.

Association mapping and comparison with previous studies

In this study a total of 54 DArT markers were identified which were significantly associated with 26 different QTLs conferring resistance to five insect pests. This is the first study to utilize GWAS to identify markers linked to many insect pest resistances and to understand the genetic architecture of resistance to multiple insect pests in wheat. The DArT markers linked to QTLs for resistance to CLB on 7DS (wPt-66406) and 3BL (wPt-73166) were highly significant and explained up to 33 and 43 % of the variation for resistance respectively, which is quite high, suggesting that both are major QTLs and perhaps even major genes for CLB resistance. These QTLs are likely to be novel, being the first reported identification of QTLs on 3B and 7D for resistance to CLB in wheat. In an earlier study, Dobrovolskaya et al. (2007) reported the identification of two major genes responsible for leaf hair density on 4BL and 7BS associated with CLB resistance. Leaf hair trait was not characterized in the germplasm set used in the current study, though Gallun et al. (1973) and Dobrovolskaya et al. (2007) reported that it could play a critical role in CLB resistance.

A limited number of molecular studies for genetic resistance to WSSF and SP has been reported in wheat. In this study, DArT markers linked to QTLs that confer significant resistance to WSSF were identified on chromosomes 1D, 6BS and 7AS as well as those linked to resistance to SP which represent potentially novel QTLs. In one of the few reported studies, resistance to WSSF was primarily identified as co-segregating with a QTL for heading date controlled mainly by the gene Ppd-D1 on chromosome 2D and less so by a QTL associated with the gene VrnB1 on chromosome 5B and stem solidness identified on 3B (Qsf.spa-3B and Qss.msub.3BL; Houshmand et al. 2007; Sherman et al. 2010). In the current study, two QTLs on 3BL and 5BL were identified as linked to WSSF and are most likely to be the same as identified earlier. The VrnB1 gene causes earlier heading which exhibits more stem cutting, so the genotypes that lack this gene may possess higher resistance to WSSF, as suggested by Sherman et al. (2010). Besides the QTLs on 3B and 5B, three novel QTLs were identified in this study on chromosomes 1D, 6BS and 7AS linked to resistance for WSSF. Of these, QTLs on chromosomes 1D and 7AS could be considered as the major QTLs in our germplasm since they explained 14.5 and 14.8 % of the phenotypic variation for resistance to WSSF, respectively, and provide further insight into the diversity of genetic resistance to WSSF available in wheat. The major advantage of association mapping is that more QTLs can be detected than in bi-parental mapping populations involving only two parents.

Two phenotypic traits (VDS and VIS) were scored in regard to resistance to SP. The DArT markers overlapped in the seventh genomic regions where QTLs were identified to be significantly associated with resistance to SP-VDS and SP-VIS, suggesting that the two pathways are controlled by the same genes.

Eleven RWA resistance genes (Dn1Dn9, Dnx and Dn2414) have been identified and derived from Aegilops tauschii Coss., rye and wheat (Nkongolo et al. 1991; Marais et al. 1994; Liu et al. 2002; Peng et al. 2007). The genes and major QTLs were identified on chromosomes 1B, 1D and 7D, including both arms of 1D and 7D. The RWA resistance gene Dn7 is located on 1BL (Anderson et al. 2003) while Dn4 is located on 1DS (Liu et al. 2002; Arzani et al. 2004) and the long arm carries Dn9 (Liu et al. 2001). It was previously reported that the gene Dn5 is located on 7DS (Liu et al. 2002). However, Heyns et al. (2006) provided clear evidence of its location on 7DL near the centromere. Amongst these studies, the highest numbers of RWA resistance genes including Dn1, 2, 6, 8 and X were identified on 7DS (Miller et al. 2001; Liu et al. 2001, 2002). We identified a QTL on chromosome 1BL and potentially another four novel QTLs located on chromosomes 1A, 2B, 3D and 7BL (Table 1).

Two scores, namely LC and LR, were used in identifying resistance to RWA as well as linked DArT markers (Table 1). Only one QTL on chromosome 3D was highly significant with both scores while the other QTLs were significantly linked to LR, except for the 7DS QTL (associated with LC). It is most likely that different genes control the two traits, thus offering an opportunity for pyramiding both traits in the development of more robust resistance to RWA.

To date, more than 30 major HF resistance genes have been identified (Ratcliffe and Hatchett 1997; McIntosh et al. 2010). Among these, H13, 22, 23, 24, 26, and 32 originated from Aegilops tauschii (Ratcliffe and Hatchett 1997; Liu et al. 2005c; Zhao et al. 2006; Wang et al. 2006; Xu et al. 2006; Sardesai et al. 2005). In the current study, all HF resistance genotypes were synthetic hexaploid wheat developed at CIMMYT from hybridization between Ae. tauschii and tetraploid durum, analogous to the evolution of bread wheat. We identified four major QTLs on 1BS, 3D, 5D and 7D and two minors on 2D and 3A that are associated with resistance to HF. The four QTLs identified in this study on the D genome together explained 62.7 % of the total variation while that on 1BS explained 25.8 % of the phenotypic variation for RWA resistance. Genes that confer resistance to HF, H24, 26 and 32, have previously been identified on chromosome 3D (Zhao et al. 2006; Wang et al. 2006; Xu et al. 2006, 2009; Sardesai et al. 2005) while H7 and H8 were identified on chromosome 5D (Gallun and Patterson 1977) with no linked molecular marker reported for HF on 5D. In this study, we identified a marker wPt-0886 on chromosome 5D that is significantly linked to HF resistance and explained more than 21 % of the variation for resistance. In addition, we identified DArT markers on 1BS and 7D that have not been previously reported, with R 2 of more than 25 and 19 %, respectively, suggesting that both loci possess potentially new genes that confer resistance to HF. Other studies reported the identification of HF resistance genes Hdic, 3, 5, 6, 9, 10, 11, 16, 17, and H22, H25 and H13 on chromosomes 1AS, 1DS, 4A and 6DL, respectively (Gallun and Patterson 1977; Roberts and Gallun 1984; Liu et al. 2005a, b, c; Kong et al. 2005, 2008; Zhao et al. 2006). These were not detected in our current study because of differences in the germplasm sets used in the various studies and illustrate the difficulty in cross-referencing the QTLs identified in different studies.

In silico comparison and gene annotation

In order to infer the potential biological function of some of the DArT markers that were identified to be significantly linked to resistance to the insect pests studied, we employed an in silico approach. Most of the DArT markers whose sequences were blasted against databases identified genes of unknown function. However, some of the DArT markers identified genes of known function. These include an Rps5 gene and a gene coding for an NB-ARC domain. RPs5 is a well-documented resistance gene and the NB-ARC domain has been described as a novel signaling motif shared by plant resistance genes (van der Biezen and Jones 1998; Shao et al. 2003; Jones and Dangl 2006; Ade et al. 2007; Inohara and Nunez 2003; Bernoux et al. 2011; Takken and Goverse 2012). In this study, we identified the extended role of Rps5-like gene in HF resistance marker wPt-9032 and the role of the NB-ARC domain in WSSF resistance marker wPt-666174, whose biological functions were implicated in “defense response” and “apoptosis”. Qi et al. (2012) reported that the leucine-rich repeat (LRR) domain in Rps5 was exchanged, resulting in an auto-activation response, consistent with earlier findings showing that incompatibility between the LRR and NB-ARC domain leads to uncontrolled cell death (Van der Biezen and Jones 1998).

In this study, the DArT marker wPt-731493, associated with RWA resistance, has similarity to a gene coded as “chromoplast inner envelope” whose molecular function was annotated as a “transferase activity” when annotated using Blast2GO. Previous studies have reported the identification of several resistance genes with transferase activity (Rigden et al. 2000; Brodersen et al. 2002; Maldonado et al. 2002; Moreno et al. 2005; Caplan et al. 2008). For example, NRIP1 is a rhodanese sulfur transferase that resides inside chloroplasts of naive tobacco plants and becomes relocalized to the cytoplasm and nucleus after infection with tobacco mosaic virus (Caplan et al. 2008).

The plant resistance genes database provides a direct tool for targeting sequences that might be related to resistance. Blasting our sequences to PRGdb enables us to predict the resistance contribution of eight DArT sequences to three insect pests (Table 3). Three DArT sequences (wPt-7466, wPt-2400 and wPt-8262) associated with HF, WSSF and SP matched the common bean gene PRGDB00075685 which codes for a Ser-Thr kinase domain. The DArT clones were distributed on different chromosomes, 2D, 5A, 6B and 7B, which may reflect that the Ser-Thr kinase domain might be a leader domain in insect pest resistance pathways since all the markers mentioned have negative phenotypic effect (except for the only common marker, wPt-2400, associated with resistance to both SP and WSSF). AbuQamar et al. (2008) reported that the tomato resistance gene TPK1b is required for resistance to insect feeding and this gene contains the Ser-Thr kinase domain.

Conclusion

Molecular markers are particularly effective for marker-assisted selection for different traits such as pest resistances. To maximize their efficiency, molecular markers should be closely linked to major resistance loci, so that markers are highly concordant with the desired phenotypes (Krill et al. 2010). In this study, a genome scan of the 21 wheat chromosomes with DArT markers identified markers closely linked to SP, RWA, HF, CLB and WSSF resistances, including genomic regions previously known to confer resistance. In addition, potentially novel loci linked to these insect pest resistances were uncovered. This is the first report of the identification of molecular markers that significantly account for SP resistance and the direct damage for CLB resistance. The DArT markers are likely to be in a very high LD with causative sites or contribute to an allelic series controlling resistance to these pests. Thus converting some of the DArTs into a user-friendly marker would be useful in accelerating the incorporation of resistance into different elite locally adapted wheat germplasm.