Introduction

NBS (nucleotide binding site) gene family is one of the largest and most important disease resistance gene families in plant. The NBS domain, encoded by NBS genes, contains several conserved motifs such as Kinase1, Kinase2, Kinase3 and HD residues (Meyers et al. 2003), which can directly or indirectly mediate the pathogen recognition via binding ATP or GTP and therefore participate in signal transduction (Elmore et al. 2011; Krattinger and Keller 2016). To date, more than 140 disease resistance (R) genes were cloned and 80% of them are NBS genes (Shao et al. 2016). With the publications of more and more draft genome sequence of different plants, NBS gene families of 19 dicots and 11 monocots have been isolated from respective genome (Urbach and Ausubel 2017; Morata and Puigdomènech 2017). In dicots, some NBS domains are usually linked to N-terminal TOLL/interleukin 1 receptor (TIR) or C-terminal leucine-rich repeat (LRR) that associated with pathogen recognition; while not the TIR but a coiled-coil domain (CC), which associated with protein–protein interactions, is present in the NBS domain of monocots (Lee and Yeom 2015; Urbach and Ausubel 2017).

Bread wheat (Triticum aestivum L.) is the most widely cultivated food crop, which faces attacks from various pathogens during its growth. Of which, fungal diseases like powdery mildew and rust (stripe rust, leaf rust and stem rust) can cause severe damage to wheat production (Goutam et al. 2015). Extensively identifying and utilizing R genes for disease resistance breeding is the most cost-effective way to control wheat disease (Chen and Line 1995). So far, a total of 58 powdery mildew R genes (Wiersma et al. 2017), 76 stripe rust R genes (Xiang et al. 2016), 76 leaf rust R genes (Bansal et al. 2017) and 59 stem rust R genes (Rahmatov et al. 2016) were formally designated. However, only 24 out of them have been cloned, and 20 (83%) were NBS genes, namely: Pm2a (Sánchez-Martín et al. 2016), Pm3a-g (Yahiaoui et al. 2004; Tommasini et al. 2006), Pm8 (Hurni et al. 2014), Pm21 (Xing et al. 2017; He et al. 2017), Yr10 (Liu et al. 2014), Lr1 (Cloutier et al. 2007), Lr10 (Sela et al. 2012), Lr21 (Huang et al. 2003), Lr22 (Thind et al. 2017), Sr22 (Steuernagel et al. 2016), Sr33 (Periyannan et al. 2013), Sr35 (Saintenac et al. 2013), Sr45 (Steuernagel et al. 2016) and Sr50 (Mago et al. 2015). Moreover, Yr5 (Smith et al. 2007) and Sr21 (Chen et al. 2015) were mapped to the NBS gene clusters, and the expressed sequence tag (EST) of Yr5 was also reported to encode partial NBS domain (Smith et al. 2007). Except for this, a large number of resistances genes and QTLs, formally designated or not, were only preliminarily mapped. Most of the linked molecular markers are often unable to be effectively used for molecular breeding because of their distant locations from the target genes or QTLs. Therefore, isolation of wheat NBS family with clear position information and molecular markers is valuable for both fine mapping the target genes or QTLs mentioned above and the candidate sequences screening.

Using the 454 sequencing data of the common wheat cultivar Chinese Spring (Rachel et al. 2012), 580 and 986 complete NBS sequences were isolated respectively and their types and structures were analyzed (Bouktila et al. 2014, 2015; Gu et al. 2015); however, genomic positions of these sequences are unknown. In this study, the wheat genome sequence based on sequencing isolated chromosome arms (International Wheat Genome Sequencing Consortium 2014) was employed to isolate wheat NBS sequences with genomic location information. In addition, a set of NBS-related microsatellite (NRM) markers were developed according to the microsatellite loci adjacent to the NBS sequences; meanwhile, NRM markers in wheat homologous group (HG) 2, which harbors the most microsatellite loci, were used to construct the genetic map via mapping population and were analyzed their linkage with R genes. These TaNBS sequences and corresponding molecular markers can be further used for R genes or QTLs mapping.

Materials and methods

Plant materials

A RILs population of 194 lines derived from a cross between wheat CH7034 and SY95-71 were recruited to construct NRM marker map; An F2 population of 136 plants from CH5025 (carrying Pm43, He et al. 2009)/Taichang (TC) 29 and 92 F2:3 lines derived from CH7086 (carrying Pm51 and Yr69, Zhan et al. 2014; Hou et al. 2016)/TC 29 were used to test for marker-R gene association. DNA samples of the above mapping population and their parents as well as phenotype data (He et al. 2009; Zhan et al. 2014; Hou et al. 2016) were provided by Shanxi Key Laboratory of Crop Genetics and Molecular Improvement.

NBS sequences isolation and bioinformatics analysis

The wheat whole genome sequences and predicted protein sequences were downloaded from the URGI database (http://wheat-urgi.versailles.inra.fr/). The wheat predicted protein data were retrieved in HMMER 3.0 software (Mistry et al. 2013) using the NBS family Hidden Markov Model file (accession number PF00931, downloaded from http://pfam.xfam.org/); the search results were then examined for the conserved domains of NBS proteins via SMART (http://smart.embl-hei-delberg.de). The obtained NBS sequences were submitted to the CDD database in NCBI (http://www.ncbi.nlm.nih.gov/Structure/cdd) to further analyze characteristic domains such as CC, LRR that correlated with NBS. The corresponding coding sequences and scaffold sequences were extracted from the wheat genome data according to the protein accession number; and their position information was determined by retrieving the wheat genome; then, TaNBSs were assigned to corresponding chromosomes. Gene structures of TaNBS were determined by the GSDS2.0 (http://gsds.cbi.pku.edu.cn/). Expression profiles of TaNBSs were obtained from retrieving wheat transcriptome sequencing data (accession number PRJNA243835, downloaded from http://www.ncbi.nlm.nih.gov/sra/).

NRM markers development and PCR validation

SSRhurnter software (Li and Wan 2005) was used to search for the microsatellite loci within scaffolds where TaNBSs situated. The searching criteria were set as follows: the nucleotides per repeat unit were two to five, and the repeat times ≥ 5. Primers were designed using the Primer5 software (http://www.premierbiosoft.com/primerdesign/) for NRM loci.

To compare with the wheat molecular map constructed by wPt-DArT and SSR markers (Marone et al. 2013), 34 GBS-DArT markers, which located in the same loci (genomic distance < 10 kb) with the wPt-DArT markers in the HG2 of wheat, as well as 346 pairs of SSR markers were also recruited for NRM-map construction. PCR used for screening and mapping NRM and other SSR markers was performed in 15 μL reaction mixture containing 1 U Taq DNA polymerase (Takara Bio Inc. Dalian, China), 1.5 μL 10 × buffer, 0.2 mmol L−1 dNTPs, 0.25 μmol L−1 primers and 100 ng of genomic DNA. PCR products were separated in 8% non-polyacrylamide denaturing gel and visualized by silver staining. The GBS-DArT marker typing was performed by Diversity Arrays Technology (DArT) PL, Canberra, Australia. The linkage map was constructed using JoinMap 4.0 software (https://www.kyazma.nl/index.php/JoinMap/), and the parameters were set as follows: Node-population, Grouping-independence LOD and Mapping function-Kosambi’s.

Results and analysis

Distribution, types and expression of wheat NBS family

Conserved domain check was conducted on sequences isolated from the wheat database, and a total of 2288 complete protein sequences containing NBS domain were obtained. The distribution of these sequences in genome A, B and D is 34.1, 37.7 and 28.2%, respectively; among them, chromosome 4A contained the highest number of protein sequences, up to 227; while 4D contained the lowest, only 24 protein sequences (Fig. 1a). The length of TaNBS varies considerably, from the shortest sequence Ta1asLoc007418.1, only 48aa to the longest Ta4alLoc027793.2 for 1816aa (Table S1). Based on whether the CC and LRR domains are included, TaNBS sequences were classified into 4 types: CC-NBS-LRR (CNL), NBS-LRR (NL), CC-NBS (CN) and NBS (N); and the N-type was the largest, which comprises 1144 sequences, accounting for 50% (Fig. 1a). The gene length of TaNBSs range from 251 bp to 7762 bp and the introns contained varies from one to 13; a total of 903 (56.3%) TaNBSs expressed in wheat (Table S1); out of the 632 TaNBSs with genetic position information, 477 (75.5%) were clustered (Table S1).

Fig. 1
figure 1

Classification and chromosome distribution of TaNBS protein sequences (a) and NBS-related microsatellite loci (b, c) in wheat

TaNBS-related microsatellite loci analysis and marker development

In total, 2203 microsatellite loci were detected on 1061 scaffold sequences containing TaNBS, of which 1621 were dinucleotide repeats loci, accounting for 73.6%, while the five nucleotide repeats loci was the least, only six (Fig. 1b). The distribution of these microsatellite loci across the wheat HGs of is HG2 (20%), HG7 (16%), HG1 (15%), HG6 (15%), HG4 (12%), HG5 (12%) and HG3 (10%) (Fig. 1c). We totally developed 1830 pairs of NRM markers from TaNBS-scaffold sequences with microsatellite loci (Tables 1, S2). Among them, 342 pairs of NRM markers were developed on HG2 that contained the most microsatellite loci, including 49 2AS-NRM markers, 51 2AL-NRM markers, 71 2BS-NRM markers, 94 2BL-NRM markers, 34 2DS-NRM markers and 43 2DL-NRM markers (Table 1).

Table 1 Sixty-nine NRM markers mapped to homologous group 2 of wheat in this study

NRM markers map construction of the HG2 in wheat

The RILs population of CH7034/SY95-71 was amplified with 342 NRM markers of HG2, and 115 NRM markers showed polymorphism between the parents. Finally, 69 NRM markers, 20 SSR markers and 16 DArT markers were mapped to the genetic map (Fig. 2). The results showed that 31 NRM markers were assigned to chromosome 2A, 25 of them were on the short arm and mainly clustered in two regions; eight NRM markers were identified in the region of DArT marker 1088906-1138983, and this region may contain disease resistance genes, such as Yr69 (Hou et al. 2016) and Yr17/Lr37/Sr38 (Helguera et al. 2003). In addition, 22 NRM markers were assigned to chromosome 2B, and three 2BL-NRM markers were detected in the region of SSR markers Xgwm501-Xwmc332, which may contain Pm51 (Zhan et al. 2014), Yr5 (Smith et al. 2007; McGrann et al. 2014), Lr48 (Singh et al. 2011), Sr9a (Tsilo et al. 2007), Sr28 (Rouse et al. 2012) and QYraq.cau-2BL (Guo et al. 2008). Sixteen NRM markers were assigned to chromosome 2D, and four 2DL-NRM markers were in the region of DArT marker 1114628-1086188. There might be Sr6 (Tsilo et al. 2010) and QPm.caas-2DL (Lan et al. 2010) in this region.

Fig. 2
figure 2figure 2figure 2

Genetic map of NRM markers in homologous group 2 of wheat constructed by CH7034/SY95-71 RILs population. The NRM markers are labelled in blue, the SSR or DArT markers appearing in both linkage maps are labelled by pink, and the genes for the next enhancing map-density are labelled with red. (Color figure online)

Construction of more densely populated map with NRM markers

NRM markers were employed to enhance the map-density of three disease resistance genes Yr69 (2AS, Hou et al. 2016), Pm51 (2BL, Zhan et al. 2014) and Pm43 (2DL, He et al. 2009) in HG2, respectively. The results showed that 26, nine and nine NRM markers were integrated into the genetic maps where Yr69, Pm51 and Pm43 sited, respectively. Among them, eight NRM markers were integrated into the original X2AS33-1.9 cM-Yr69-3.1 cM-Xmag3807 region of Yr69, which narrowed the region to X2AS-NRM34-0.4 cM-Yr69-1.3 cM-X2AS-NRM31, inferring that Yr69 may locate in the NBS gene cluster (Fig. 3a). In addition, a marker X2DL-NRM05 (Fig. 3c), which is closer to Pm43, was obtained.

Fig. 3
figure 3

Construction of more densely populated maps of Yr69 (a), Pm51 (b) and Pm43 (c). The disease resistance genes and their linked NRM markers were labelled in red and blue, respectively. (Color figure online)

Identification of candidate sequences within gene clusters by TaNBS and NRM markers

Yr5 (Smith et al. 2007) and Sr21 (Chen et al. 2015) are two disease resistance genes that have been confirmed to be located in the NBS gene cluster. The EST (Genbank number JN631792) of Yr5 encoded partial NBS structure, and its co-segregation marker TaAffx.65234.1.S1_at as well as its flanking linkage marker S23M41-310 were also NBS sequences, in which S23M41-310 is orthologous to the rice NBS gene OsXa1 (Smith et al. 2007; McGrann et al. 2014). In this study, Yr5 was assigned to a gene cluster, consisting of 11 TaNBSs and 11 NRM markers, in the region 693,372,707-732,340,422 of chromosome 2B. The TaNBS sequence Ta2blLoc006115.1 includes the whole Yr5-EST (100% sequence similarity) and can express in wheat (Table S1); its co-segregation marker TaAffx.65234.1.S1_at and flanking marker S23M41-310 correspond to Ta2blLoc008215.1 and Ta2blLoc034091.1 in the gene cluster, respectively (Fig. 4).

Fig. 4
figure 4

Identification and analysis of NBS cluster of Yr5. Yr5 and the NRM markers in this cluster were labelled in red and blue, respectively. (Color figure online)

In addition, we assigned seven linkage markers of Sr21 to chromosome 2A with the closest flanking markers corresponded to a 1.28 Mb region 709,765,601-711,049,068, which contained four TaNBS sequences and two NRM markers (Fig. 5). Multiple sequence alignment results showed that the TaNBS protein sequences Ta2blLoc019062.3, Ta2blLoc029872.3 and Ta2blLoc029875.1 exhibited highly similar motifs to those within the NBS domain of cloned disease resistance genes (Fig. S1), such as P-loop (GGxGKTT) for ATP/GTP binding, Kinase2 (LLVLDDxW), Kinase3 (GxxxLxTxR) and HD residues. Moreover, Ta2blLoc019062.3 expressed in wheat (Table S1), which means it may participate in the process against pathogens, this requires subsequent validation in the mapping population using its marker2AL-NRM05.

Fig. 5
figure 5

Identification and analysis of NBS cluster of Sr21. Sr21 and the NRM markers in this cluster were labelled in red and blue, respectively. (Color figure online)

Discussion

Size of the wheat NBS family

NBS, the largest disease resistance gene family, has been surveyed in various plants, such as bryophytes, lycopodiums, gymnosperms and angiosperms (Elmore et al. 2011; Krattinger and Keller 2016; Shao et al. 2016; Urbach and Ausubel 2017; Lee and Yeom 2015). The fungal bloom during the Cretaceous-Paleogene boundary (~ 66 MYA) may trigger the intensive expansion of NBS genes in plants (Shao et al. 2016), which suggests the important role of NBS genes in the fight against fungal disease during evolution. In cotton, the NBS expansion in Gossypium raimondii enhanced its resistance to Verticillillm dahliae, while G. arboreum without NBS expansion was easily susceptible (Li et al. 2014). After undergoing three independent whole-genome duplications (WGD), however, only 117 NBS genes were included in banana genome, which may explain its susceptibility to pathogen attacks (D’Hont et al. 2012). Therefore, isolation and analysis of plant NBS gene families can help us better elucidate the underlying disease resistance mechanisms.

In this study, 2288 complete TaNBS sequences were isolated from the hexaploid wheat genome, which is significantly higher than that of other gramineous crops evolved from the same grass ancestor (50–70 MYA) (Salse et al. 2008), such as 535 of rice (Zhou et al. 2004), 420 of barley, 316 of Brachypodium distachyon (Gu et al. 2015), 274 of sorghum (Cheng et al. 2010) and 109 of maize (Cheng et al. 2012). We also isolated 463 (Liu et al. 2017) and 701 complete NBS sequences (the results were not listed) from Triticum urartu and Aegilops tauschii, respectively. It was hypothesized that the NBS families of three wild ancestral species of wheat were integrated into the genome of the hexaploid wheat after two polyploidization events (0.8 and 0.4 MYA, IWGSC 2014); then with the propagation of wheat, the TaNBS family expanded again to adapt to the infection of various pathogens in different planting areas, which eventually led to this gene family with large number of members. The majority of these TaNBS members (about 70%) are clustered in genome, which is also true for NBS family members in plant species like Arabidopsis thaliana (71.1%, Meyers et al. 2003), rice (76%, Zhou et al. 2004) and potato (73%, Jupe et al. 2012). The presence of the TaNBS gene clusters as well as the loss of some subgenomic copies in the evolutionary process (Comai 2005; Otto 2007) resulted in an uneven distribution of NBSs across chromosomes in HGs; For example, there were 248 TaNBS genes on chromosome 4A, while only 40 and 24 were on chromosome 4B and 4D, respectively. This may explain that the cloned disease resistance genes are often one copy rather than ‘triplet gene’ (Pfeifer et al. 2014).

TaNBSs provide reference for homology-based and map-based cloning

To date, some disease resistance-related NBS genes were cloned from wheat using a homology-based cloning strategy. In the case of resistance to powdery mildew, there are genes such as TmMla1 (Jordan et al. 2011), homologous to barley HvMla1 (78% sequence similarity), from diploid wheat; TdRGA-7Ba (Gong et al. 2013), a Pm3b homologue (sequence similarity > 90%), from durum wheat; and TaRGA from common wheat (Wang et al. 2016), which is homologous to multiple plant disease resistance genes. Many TaNBS isolated in this study exhibited high sequences similarity with those cloned disease resistance genes. For example, both Ta1bsLoc017427.1and Ta1bsLoc003202.1 shared over 70% identity with barley powdery resistance genes HvMla1 (Zhou et al. 2001), HvMla6 (Halterman et al. 2001) and HvMla13 (Halterman et al. 2003) at the protein level. These TaNBSs can provide candidate gene sequences for cloning wheat disease-resistance genes using homology-based method. Normally, genes with similar domains may possess similar functions. The 19 cloned wheat NBS-encoding protiens are all CNL-type, and the number of TaNBS proteins with CNL isolated from this study is 240 (Fig. 1). In addition, some TaNBSs contain other distinct domains, such as Ta2bsLoc003709 and Ta2bsLoc014737, in HG2, contian ABC (ATP-binding cassette) transport protein domain; Ta2alLoc020734 and Ta2dlLoc021209 contain Jacalin domain; while Ta2alLoc012596 and Ta2blLoc002392 contain zinc finger domian; these domains have been proved to play an important role in disease resistance process (Krattinger et al. 2009; Ma et al. 2013; Guo et al. 2013). ABC is also the function domains of Lr34 (Krattinger et al. 2009). Studies have shown that these domains may interact with the NBS domain and against pathogen invasion together (Deslandes et al. 2002). Hence, it is necessary to further explore these TaNBSs in depth.

Furthermore, TaNBSs can provide reference sequences for map-based cloning of disease-resistance genes that in the NBS gene cluster. It has been reported in previous studies that if the comparative genome analysis showed the region of a mapped wheat resistance gene corresponded to the NBS gene cluster of rice, B. distachyon or other model plants, then often this gene is in the wheat NBS gene cluster and is a TaNBS gene, like both Sr35 (Saintenac et al. 2013) and Sr50 (Mago et al. 2015) are TaNBS genes in the NBS gene cluster. In this study, we analyzed the genomic location of NBS gene cluster where Yr5 sited and identified a TaNBS sequence Ta2blLoc006115.1 containing Yr5-EST, which indicated the accuracy of the genomic location of TaNBS family. Then, Sr21, which is also in the NBS gene cluster, was analyzed and anchored its linkage marker to genomic map. Finally, four candidate TaNBS sequences were found in the target region.

NRM markers in gene mapping and molecular breeding

Extensively identification and cloning disease resistance genes are the foundations for wheat disease resistance breeding. So far, among nearly 300 powdery mildew and rust resistance genes were formally designated in wheat, only few of them can be used for wheat improvement and most of them are facing the risk of resistance loss due to pathogens variation before being used for breeding. This situation could be attributed to that the linkage marker of the R gene cannot be efficiently used for marker-assisted selection (MAS). Since the common wheat contains three sub-genomes and highly repetitive sequences (80%, IWGSC et al. 2014), most of the SSR markers, routinely used for R gene mapping, have a low distribution density across the wheat genome, which resulted in their often faraway locations from the gene of interest. Besides, it is not easy to develop markers in the linkage region due to the unclear genome location of these SSR markers. Thus, gene recombination may occur between linked markers and target genes and result in the marker missing, which leads to breeding failure.

With the rapid development of sequencing technology, markers with high density and precision like SNP, DArT have been developed, which greatly improved mapping accuracy and narrowed the distance between mark and gene. Meta analysis showed a number of DArT markers that in the same loci as the R genes or QTLs are NBS sequences, such as PmHNK54/wPt-5865, QPm.inra.2A/wPt-6064, Pm23/wPt-7024 and Pm42/wPt-2600 in wheat HG2 (Marone et al. 2013). This could be explained by the fact that many R genes are located in the NBS gene cluster, thus their linked DArT markers may also NBS-related sequences. However, relatively high chip scanning costs makes it unsuitable for large population screening in breeding process for now. Aiming at improving the effectiveness of routinely used molecular markers, we developed 1830 NRM markers, each of which lay in the same scaffold sequence with TaNBS. Of all the NRM markers is 7DL_NRM59 the farthest from TaNBS sequence Ta7dlLoc025447 with the longest genetic distance of 38589 bp, which is approximately equal to 0.007 cM based on the ratio of physical and genetic distance on chromosome 7DL (5.41 Mbp/cM; IWGSC, 2014). The remaining genetic distances between NRM markers and TaNBSs are below the value. Since most NRM markers were clustered in some regions with the NBS genes, the polymorphism of NRM markers is low in the genome regions that do not contain disease resistance loci. If R gene and multiple NRM markers are anchored to the same loci, this gene may locate in a NBS gene cluster; such as Yr69 in this study. Moreover, when a TaNBS is confirmed to be associated with disease resistance, the NRM marker(s) on its scaffold can be directly used as a co-segregation marker, which will improve the MAS efficiency in breeding process.