Introduction

Diseases in crop plants are caused by various types of pathogens including fungi, bacteria, viruses and insects which are responsible for severe yield losses. Use of disease resistance cultivars is an efficient approach to overcome these challenges. In the ecosystem, both diseases and plants have been considered as results of co-evolutionary processes (Burdon and Thrall 2009). Plants have evolved diverse and effective mechanisms to respond and resist against pathogens, of which a bilayered defense strategy is the most common (Panstruga et al. 2009). Initially, transmembrane receptors on the cell surface detect and recognize the pathogen via pathogen-associated molecular patterns (PAMPs). Adapted pathogens can suppress the PAMP-triggered immunity (PTI) by releasing effector molecules into host plant cells. Plants, in turn, activate a second line of defense, the effector-triggered immunity (ETI) that represses action of the effector molecules (Jones and Dangl 2006). The genetic interaction between effector proteins expressed from specific pathogen avirulence (Avr) genes and plant R-genes results in disease resistance usually associated with localized hypersensitive response (HR), a form of programmed cell death (Jones and Dangl 2006). As of now, 112 various R-genes and 1,04,310 putative R-genes from 233 plant species that confer resistance to 122 types of pathogens have been categorized in Plant Resistance Gene database Release 2.0 (PRGdb; http://pgrdb.crg.eu; Sanseverino et al. 2013).

R-genes are categorized into six major groups on the basis of conserved domains and structures of the predicted proteins (Martin et al. 2003). R-genes having a serine/threonine-protein kinase (KIN) form the first group with roles in recognizing pathogen effectors and signaling process (Tang et al. 1999; Anderson et al. 2006). The second group forms the largest among the R-gene classes, has nucleotide-binding site (NBS), leucine-rich repeat (LRR) and putative leucine zipper or other coiled-coil (CC) domains and hence is called CC-NBS-LRR or CNL (Maekawa et al. 2011). R-genes with a Toll Interleukin-1 receptor (TIR) domain forms the third group termed as TIR-NBS-LRR or TNL (Medzhitov et al. 1997; Meyers et al. 1999). The fourth group form receptor-like protein (RLP) consists of transmembrane domain with an LRR motif that is supposed to be an extracellular domain with a tiny intracellular motif less C-terminal tail (Jones et al. 1994). The fifth group consists of a receptor-like kinase (RLK) which includes an extracellular transmembrane LRR domain and a cytoplasmic serine/threonine-kinase domain (Song et al. 1995). R-genes that have no conserved domain and do not fit in any of the above groups are classified into a sixth group called as Other (Sanseverino et al. 2010). CNL and TNL are the most common R-gene families in plants, however, TNL genes occur only in dicotyledonous plants, whereas CNL genes are present in both dicotyledonous as well as monocotyledonous plants (Bernoux et al. 2011).

The conserved domain-containing sequences of R-genes have been utilized to identify and characterize them from various plant species by PCR-based approaches (Leister et al. 1996). In silico approaches, have also been employed for the detection of R-genes in plants whose genome sequences are available (Kim et al. 2012; Ni et al. 2014). R-genes have also been enumerated from transcriptome sequences of plants (Liu et al. 2012). Many R-genes have been linked to different molecular markers that help in quick identification and isolation of R-genes from other plants as well as can be implemented in marker-assisted selection (MAS) in resistance breeding programs (Gupta et al. 2010).

All the major classes of R-genes consist of NBS and LRR regions that act as intracellular receptors for recognizing the invading pathogen and are responsible for defense-related signaling (Leister and Katagiri 2000; Glowacki et al. 2011). These domains have been targeted by plant breeders for decades to elicit resistance to crop pathogens. The genomic sequences containing the NBS-LRR from different crop plants suggests that only a minute portion of R-genes might be functional (Shen et al. 2002). The NBS domain contains several conserved and highly ordered motifs that distinctively occur in ATP- or GTP-binding proteins as well as in structurally related regulatory proteins associated with animal apoptosis (Traut 1994). The NB domain of NBS-LRR genes are homologous to human Adenosine triphosphatases (ATPases) that acts as apoptotic protease-activating factor-1 (APAF-1, Zou et al. 1997) and nematode Caenorhabditis elegans CED-4 (Caenorhabditis elegans death-4 protein) (Vaux 1997). The conserved ARC domain in plants contains a R-protein in-between APAF-1 and CED-4. NB forming with a P-loop and Walker motifs, the ARC1 consisting of a four-helix bundle and the ARC2 having a winged-helix fold together constitute the three sub-domains of NB-ARC domain (Leipe et al. 2004; Takken et al. 2006). The ARC domain translates elicitor-induced modulations of the C terminus into a signal initiation event (Rairdan and Moffett 2006). The NB-ARC domain is a type of STAND (Signal Transduction ATPases with Numerous Domains) protein which is involved in regulation of programmed cell death (Leipe et al. 2004; Danot et al. 2009). It has been hypothesized that NB-ARC binds to ATP to hydrolyze it and thus initiates downstream resistance signaling (Tameling et al. 2002; Takken et al. 2006). The hydrolysis of ATP is probably accompanied by a conformational change of the NB-ARC domain, since after ATP-hydrolysis, the binding affinity of ADP increases substantially (van der Biezen and Jones 1998; Tameling et al. 2006).

Bread wheat (Triticum aestivum L.) is the single largest crop grown globally having great social and economic importance but its production is severely affected by many pathogens (Sharma 2012; FAO 2013). The three rust diseases [stem (black), stripe (yellow) and leaf (brown)] are the most widespread wheat diseases resulting in intense reduction in worldwide productivity. Leaf rust, caused by the aggressive basidiomycetes biotrophic pathogen Puccinia triticina Eriks., is the most pervasive rust disease that annually reduces up to 10% yield (Dean et al. 2012). The solution to this problem might be the use of resistant cultivars. Though a number of R-genes have been identified and deployed in wheat breeding programs, only a small number of these genes were cloned and characterized to date. R-genes like Lr10 (Feuillet et al. 2003), Lr21 (Huang et al. 2003), Pm3 (Yahiaoui et al. 2004), Lr1 (Cloutier et al. 2007) and Lr34 (Krattinger et al. 2011) were recognized in Triticum aestivum, while Sr35 (Saintenac et al. 2013) and Sr33 (Periyannan et al. 2013) had been identified in wild relatives of wheat that were introgressed into wheat. The huge genome size (16.94 Gb), recent hexapolyploidy (2n = 6x = 42) and large amounts of repetitive regions with high transposon activity obfuscate detailed analysis of the wheat genome and genomics-based improvement of wheat. The sequencing of the wheat genome using 454 pyrosequencing technologies generated fivefold coverage with 220 million reads (~85 Gb of sequence; Brenchley et al. 2012). The whole genome shotgun sequencing of wheat is available at the NCBI (http://www.ncbi.nlm.nih.gov/bioproject/PRJEB217). The International Wheat Genome Sequencing Consortium (IWGSC 2014) project (http://www.wheatgenome.org/) has also developed a chromosome and chromosome-arm based reference genome sequence for wheat; most of these sequences are available at Ensembl Plants (http://plants.ensembl.org/Triticum_aestivum/Info/Index).

The objective of this study was to decipher the different classes of R-genes present in allohexaploid bread wheat with special emphasis on detection of the NB-ARC domain-containing R-genes. Transcriptional dynamics during compatible and incompatible interactions using near-isogenic wheat lines challenged with Puccinia triticina was also studied. The data obtained will be helpful in the isolation and characterization of additional R-genes in wheat.

Materials and methods

Retrieval of R-gene sequences

Nucleotide and amino acid sequences of 7710 R-genes of Oryza sativa, Zea mays, Brachypodium distachyon, Sorghum bicolor, Hordeum vulgare and T. aestivum were mined from the PRGdb Release 2.0 (Sanseverino et al. 2013) and were used to localize their homologs on the wheat chromosomes. The allohexaploid T. aestivum reference genome sequences were downloaded from Ensembl Plants. De novo assembled transcriptomes of bread wheat with or without inoculation of Puccinia triticina, prepared earlier in our laboratory, were also used for deciphering roles of R-genes in response to leaf rust (Chandra et al. 2016). To find the NB-ARC domain within the wheat genome, 405 sequences of Triticeae tribe containing the NB-ARC domain sequences were downloaded from NCBI and were searched for the presence of the NB-ARC domain using Conserved Domain Database (CDD). The sequences without the NB-ARC domain were eliminated and the remaining sequences were made non-redundant through Cluster Database at High Identity with Tolerance (CD-HIT) software (Li and Godzik 2006).

Identification of R-genes in wheat

To identify the R-genes in the wheat genome, amino acid sequences were obtained from the downloaded 7719 R genes [that included: 1 KIN, 599 CNL, 71 CN (coiled coil with NBS domain), 134 Mlo (Mlo-like R-proteins), 2234 NL (NBS at N-terminal, LRR at C-terminal and no CC domain), 553 N (only NBS and no LRR domain), 25 Other, 256 RLK-GNK2, 35 RLK, 1663 RLP, 7 RPW8-NL (contain NBS, LRR and RPW8 domain), 11 T (only TIR domain) and 2130 unknown sequences]. These sequences were used to search against the wheat reference genome sequence retrieved from Ensembl Plants and de novo transcriptome assemblies of bread wheat with an e value of <e −20 (Kim et al. 2012). For the identification of wheat NB-ARC coding regions, the sequences downloaded from Ensembl Plants were translated into six reading frames using Transeq algorithm of EMBOSS version 6.6.0.0 (ftp://emboss.open-bio.org/pub/EMBOSS/). To find and confirm the presence of the NB-ARC domain, the sequences downloaded from NCBI were checked by CDD and aligned using Muscle (http://www.ebi.ac.uk/Tools/msa/muscle/). The Hidden Markov Model (HMM) profile was build using the hmmbuild tool of the HMMER suite (HMMER 3.1, http://hmmer.janelia.org/; Finn et al. 2011) with default parameters. The pipeline followed for discovery of R-genes is described in Fig. 1. The HMM profile build was then searched for the NB-ARC domain in the 6-frame translated reference sequence using the hmmsearch tool of HMMER suite. The nucleotide contigs corresponding to the NB-ARC domain were retrieved. To identify the candidate NB-ARC domains in the wheat genome, ab initio gene predictions were carried out using FGENESH (http://www.molquest.com; Salamov and Solovyev 2000) using monocot plants specific parameters. The sequences were then individually checked for the presence of precise NB-ARC domains in the Pfam database ver. 28.0 (http://www.sanger.ac.uk/Software/Pfam/). Complete gene models were obtained from hypothetical proteins that are more than 200 amino acids as the expected core NBS domain comprises ~170 amino acids in Pfam database (http://pfam.sanger.ac.uk/family/NB-ARC).

Fig. 1
figure 1

Pipeline followed for data mining, identification and annotation of the NB-ARC encoding R-genes. Initially, the pipeline started with 8,818,642 T. aestivum genome contigs

Annotation of the NB-ARC predicted sequences

The sequences fulfilling the above mentioned criteria were annotated by Blast2GO (B2G; Conesa et al. 2005) software. Transmembrane domains were predicted by TMHMM Server ver. 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and coiled-coil motifs were predicted by the COILS program (Lupas et al. 1991). Various motifs were predicted using DREME (Bailey 2011), as this program executes discovery of short regular expression motifs enriched in a given dataset. The motifs were annotated using GOMo software that scans promoters using nucleotide motifs to determine if any motif is significantly associated with genes linked to one or more Gene Ontology (GO) terms (Buske et al. 2010).

Expression studies of the predicted NB-ARC sequences

To identify the role of the NB-ARC predicted sequences, expression analysis based on the abundance of reads pertaining to a particular library was performed. For the expression study, reads from four SOLiD SAGE libraries (details of plant inoculation and library preparation using near-isogenic wheat lines were mentioned in Singh et al. 2012 and Chandra et al. 2016, respectively), namely S-M (Susceptible HD2329 + Mock inoculated), S-PI (Susceptible HD2329 + P. triticina inoculated), R-M (Resistant HD2329 + Lr28 + Mock inoculated) and R-PI (Resistant HD2329 + Lr28 + P. triticina inoculated) were used to decipher the expression profiles of predicted NB-ARC domain-containing genes. Comparison of S-M vs. S-PI, R-M vs. R-PI and S-PI vs. R-PI libraries were performed by matching the sequences of predicted NB-ARC domain-containing genes with the reference.

Total mapped reads were obtained by mapping high quality reads of the individual library to reference sequences. Gene expression between the pair of libraries was evaluated using Reads Per Kilobase of transcripts per Million mapped reads (RPKM). Read counts of a particular contig were used to determine the expression value. RPKM was opted as it measures even sparingly expressed transcripts by considering read counts as the central factor. Contigs with average fold change abs ≤2 were considered to be differentially expressed. The other two criteria considered were false discovery rate (FDR) P value correction <0.05 and difference in absolute value >10 (Benjamini and Hochberg 1995). All available reads were mapped to contigs containing predicted NB-ARC domains using the following parameters: (1) minimum read length fraction 0.9, (2) minimum similarity 0.95 and (3) allowing up to ten nonspecific matches. The expression value was selected as RPKM. Contigs with uniquely mapped reads were determined allowing a maximum of two mismatches. Kal’s test was performed to compute statistical differences in gene expression at CLC Genomics Workbench version 9.0 (CLC, Qiagen GmbH, Germany) (Kal et al. 1999).

Comparison of the NB-ARC contigs within the Poaceae family

Orthologous NB-ARC clusters in Hordeum vulgare, Zea mays, Oryza sativa (indica), Brachypodium distachyon, Sorghum bicolor and NB-ARC sequences of T. aestivum predicted in this study were visualized using the web platform OrthoVenn (http://probes.pw.usda.gov/OrthoVenn; Wang et al. 2015). The web tool, chromoWIZ was employed to visualize genomic positions of the identified genes in T. aestivum (http://pgsb.helmholtzmuenchen.de/cgibin/db2/chromowiz;genome=Triticumaestivum; Nussbaumer et al. 2014).

QRT-PCR assays of the NB-ARC predicted sequences in response to leaf rust infection

Wheat near-isogenic lines (NILs) HD2329 (seedling leaf rust susceptible, infection type 3+) and HD2329 with Lr28 [seedling leaf rust resistant (Nest Immune, infection type 0-0;)] infected with urediniospores of leaf rust pathogen P. triticina pathotype 77-5 or mock inoculated with talc were used for the qRT-PCR experiment. Lr28 resistant gene from Aegilops speltoides was introgressed into hexaploid wheat (Cherukuri et al. 2005). Puccinia triticina pathotype 77-5, the most prevalent and widespread leaf rust pathogen in the Indian subcontinent, was used for inoculation. The pathogen was maintained as single spore derived cultures on seedlings of highly susceptible wheat plants ‘Agra Local’. Wheat seeds were germinated on sterile composite soil (peat, sand, soil, 1: 1: 1), grown to the three-leaf stage (~12 days after germination) in a phytotron (22 °C, RH 80%, 16 h light at 300 lx and 8 h of darkness) and were either mock- or pathogen-inoculated. Treatment combinations included Talc on susceptible HD2329 (susceptible negative control; S-M), Talc + P. triticina urediniospores on susceptible HD2329 (susceptible pathogen infected; compatible interaction S-PI), Talc on resistant HD2329 + Lr28 (resistant negative control; R-M) and Talc + P. Triticina urediniospores on HD2329 + Lr28 (resistant pathogen infected; incompatible interaction R-PI) (Chandra et al. 2016). Leaf samples from five different plants of each treatment were collected at different time points (0, 12, 24, 48, 72 and 168 h post inoculation; hpi). Total RNA was extracted using TRI REAGENT (Molecular Research Center, Inc., USA) following the manufacturer’s protocol and subsequently treated with Deoxyribonuclease I (Fermentas GmbH, Germany). Complementary DNA from 2 μg of total RNA was prepared using Transcriptor First Strand cDNA Synthesis Kit (Roche Diagnostics GmbH, Germany).

QRT-PCR was carried out for eight NB-ARC domain-containing sequences (1ALN, 1BLN, 2ALC, 2DLC, 4ALC, 6BLN, 7ALN and 7DLN) selected on the basis of high homology to wheat reference sequence and high expression levels during incompatible interactions in SAGE dataset. These eight sequences were used to design qRT-PCR primers using Primer Express ver. 2.0 software (Applied Biosystems, USA) (Table 1).

Table 1 List of primers for qRT-PCR of NB-ARC domain-containing sequence and reference gene wheat GAPDH and amplification for cloning

The qRT-PCR experiment was performed on a 7500 Real Time PCR system (Applied Biosystems, USA). The wheat glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene was used as internal control to normalize the real time amplification data. The optimized concentration of template (200 ng) and primer pairs (400 nM for all eight target and reference genes) were used in 25 µl total reaction volumes containing 12.75 µl SYBR Green JumpStart Taq ReadyMix with ROX (Sigma-Aldrich, Missouri, USA). The samples were assessed using three biological replicates each with three technical replicates along with two non-templates as negative controls. Amplification was carried out at 94 °C for 2 min; 40 cycles of 94 °C for 15 s, 58 °C for 1 min followed by melting curve analysis to ensure that each amplicon was a single product. Instrument operation, data acquisition and processing were performed using the Sequence Detection System v 1.2.2 software (Applied Biosystems, USA). Fluorescence signals at each polymerization step were captured and threshold constant (Ct) values were obtained from the amplification curves (Rieu and Powers 2009). Gene expression levels were calculated at the same time points with respect to expression of the reference gene GAPDH using the 2−ΔΔCt method (Paolacci et al. 2009). Data of susceptible mock-inoculated plants at 0 hpi was used as calibrator in this study for relative quantification of gene expression in other samples.

Plant materials, amplification and cloning of the NB-ARC genes

The two NB-ARC domain-containing sequences showing highest expression in real-time PCR, 6BLN and 7DLN were considered for further studies. Total cellular RNA was isolated from two-week-old wheat cultivar HD2329 and converted to cDNA as mentioned earlier. Amplification reactions were assembled in 20 µl containing 100 ng of cDNA and 10 pM of each primer (Table 1). Amplification was carried out at 95 °C for 5 min followed by 30 cycles at 94 °C for 45 s, annealing at 55 °C for 1 min, extension at 72 °C for 2 min and final extension at 72 °C for 30 min. Amplified DNAs was gel-purified, cloned by T/A cloning into pTZ57R/T (Thermo Scientific, Lithuania), and plasmids from five independent clones were sequenced from both ends.

In silico analysis of the NB-ARC genes

The forward and reverse complement orientations of the obtained sequences were compared and the best possible contigs were assembled and analyzed using sequence analysis software of CLC Genomics Workbench 9.0. The full-length amino acid sequences, deduced from the nucleotide sequences, were BLAST searched at NCBI. Multiple sequence alignment (MSA) of the cDNA and 42 other NB-ARC sequences of Triticeae tribe were downloaded from NCBI was performed using CLC Genomics Workbench. A phylogenetic tree using neighbor-joining method was developed based upon the sequence alignment to classify the NB-ARC genes using CLC Genomics Workbench. Tree topology confidence estimates were derived through Bootstrap analysis of 1000 replicates.

Open reading frames were obtained in all six possible reading frames using the ORF finder at NCBI. Various motifs from the deduced protein sequences were predicted using PROSITE and conserved domains were identified using the Pfam database (Finn et al. 2016). BLAST searches using the full-length amino acid sequences were used to find if any homologous template existed at the Protein Data Bank (PDB) for which three-dimensional crystal structures are available that can be used to model the complete sequence. The BLAST search returned two hits: APAF-1 bound to ADP and an uncharacterized LRR receptor-like serine/threonine-protein kinase with ARC domain of Triticum aestivum. Both templates were used for modeling 6BLN and 7DLN sequences. The region containing the ARC motif from the full-length sequences was used in MODELLER release 9.15 (Eswar et al. 2006). An alignment file containing the aligned target sequences as well as a guiding file having necessary commands required the development of tertiary models of the target sequences that were used as input files. The most reliable model was selected based on Discrete Optimized Protein Energy (DOPE) score and MODELLER Objective Function (MOF) values. The obtained model was validated using Ramachandran’s plot at PDBsum (https://www.ebi.ac.uk/pdbsum/).

Results

Identification and classification of R-genes in bread wheat

Several R-genes were detected in the allohexaploid wheat genome in the present study. The gene list, representing all major classes of R-genes, with their PRGdb accession numbers is provided as Supplementary Table 1. The blast searches returned 5,31,734 hits to the reference genome at Ensembl Plants (Table 2) and 259 hits to de novo assembled wheat contigs (Table 3). Since complete information on the wheat genome is not available and as the wheat genome contains high levels of repetitive sequences, many of the predicted sequences could not be mapped to resources currently available at Ensembl Plants. Among the identified R-genes from the genomic sequences, NL (35.0%) were the largest, followed by CNL (30.7%) and Unknown (19.3%) while the T class (0.2%) was least present in the genome (Fig. 2). The 20 most abundant accessions belonging to each class of R-gene is provided in Supplementary Table 2. The maximum hits to the de novo assembled contigs belonged to Unknown class (170) followed by NL (29) and CNL (22). The two most predominant classes in both Ensembl Plants and de novo assembled contigs were NL and CNL; therefore, we concentrated on these two classes for further downstream processing and analysis.

Table 2 Number of resistance-like genes belonging to each class and their localization on wheat chromosomes
Table 3 Distribution of resistance genes belonging to different class in de novo transcriptome assembly of wheat
Fig. 2
figure 2

Distribution of resistance genes in the wheat genome N (contains only NBS domain and lack LRR), CNL (contains a central nucleotide-binding (NB) subdomain as part of the larger entity, the NB-ARC domain), CN (contains coiled-coil and NBS domains), Kinase (Kinase domain involved in resistance process), Mlo (Mlo-like resistant proteins), Other (miscellaneous set of R proteins that do not fit into any of the known classes), NL (NBS domain at N-terminal and LRR at the C-terminal, and lack coiled-coil domain), RLK (receptor like Kinases, consist of an extracellular leucine-rich repeat region), RLK-GNK2 (RLK class with additional GNK2 domain), RLP (receptor-like proteins consists of a leucine-rich receptor-like repeat, a transmembrane region of ~25 amino acids, and a short cytoplasmic region, with no kinase domain), RPW8-NL (contains NBS, LRR and RPW8 domains), T (contains TIR domain only, lack LRR or NBS) and Unknown (contains possible resistant genes, but do not fit in any of the mentioned classes)

Identification of the NB-ARC genes in the genome of bread wheat

To identify the NB-ARC domain-containing sequences in wheat, similarity based searches were implemented. The steps involved in data mining are summarized in Fig. 1. A total of 405 NB-ARC containing sequences of Triticeae tribe were downloaded from NCBI, and were checked individually for the NB-ARC domain in CDD database and redundancies were removed using CD-HIT that resulted in 365 sequences. Mining of the translated wheat genome with the HMM profile of these 365 NB-ARC domain-containing sequences returned 1320 contigs. Among these, 334 contigs did not predict any gene ab initio through FGENESH while 986 predicted at least one gene model per contig. We retained 604 sequences that were longer than 200 amino acids and were in agreement with the distinctive features of the NB-ARC domains after Pfam analysis (default e value = 1.0); the remaining 382 sequences could not pass the filter. The final list of the NB-ARC domain-containing sequences are provided in Supplementary Table 3.

Annotation of the NB-ARC containing contigs

The NB-ARC domain-containing contigs were functionally annotated through Blast2GO. Out of the total 604 predicted sequences, 511 sequences (84.60% overall) had significant matches in GOSlim, 70 sequences were annotated with InterProScan, 8 sequences were annotated with GO mapping, 13 sequences returned no hits and the remaining two sequences could not be analyzed. Functional annotation categorized GO terms into three main components: cellular component (Fig. 3), biological process (Fig. 4), and molecular function (Fig. 5) (Joslyn et al. 2004). Preponderance of annotated genes encoded for proteins with function in cytoplasm within a cell or an organelle that participated in metabolic and cellular processes and having mostly binding activity followed by catalytic activity. Most of the identified sequences were annotated as R-genes (Supplementary Table 4). These R-genes included disease resistance protein Resistance Gene Analogs (RGA 1, 2, 3 and 4), RPP13, RPM1, TSN1, RPP8-like protein, RPS2, NBS-LRR disease resistance protein homologue, NB-ARC domain-containing expressed, powdery mildew resistance protein pm3 and 3b, F-box/kelch-repeat protein skip11, blast resistance protein, stripe rust resistance protein Yr10, RP1-like protein and E3 ubiquitin-protein ligase SINA-like.

Fig. 3
figure 3

Distribution of GO terms under cellular component category

Fig. 4
figure 4

Distribution of GO terms under biological process category

Fig. 5
figure 5

Distribution of GO terms under molecular function category

The enzyme code distribution revealed most of the sequences to be assigned with hydrolase activity followed by oxidoreductase and isomerase activity (Fig. 6). Chitinases and glucanases are important hydrolase enzymes with role in plant defense, whereas oxidoreductases are vital enzymes required for many metabolic processes. Homology distribution of the annotated NB-ARC domain-containing sequences showed maximum hit to Aegilops tauschii followed by O. sativa and T. urartu since complete genome sequences of these organisms are available in the public domain (Supplementary Fig. 1). To find whether the protein residues contained the CC domain, the COILS program was used which predicted 89 sequences (Supplementary Table 5).

Fig. 6
figure 6

Distribution of enzyme classes within the NB-ARC predicted sequences

To discriminate between soluble R-proteins and membrane-bound R-proteins as well as for their topology prediction, the HMM approach using TMHMM was employed that predicted 73 sequences with transmembrane helices (Supplementary Table 6). GOMo analysis revealed 53 motifs with significant GO terms; the top five GO predictions were considered for each identified motif (Supplementary Table 7). Transcription factor activity, endomembrane system, nucleotide binding and protein serine/threonine-kinase activity were some of the important GO terms predicted for the identified motifs.

Expression study of the NB-ARC containing contigs in SAGE libraries

The expression pattern of the NB-ARC domain-containing sequences during compatible interactions (S-M vs. S-PI) showed 64 sequences to have differential expression of which 32 sequences were upregulated in S-M and 32 sequences had more expression in S-PI. Comparison during incompatible interactions (R-M vs. R-PI) revealed 189 sequences to be differentially regulated of which 102 sequences were upregulated in R-M and 87 were upregulated in R-PI. Of these 87 sequences, 36 were exclusively expressed in R-PI. Finally, on comparing S-PI vs. R-PI, 209 sequences were found to be differentially expressed, of these 131 sequences had expressed greater in S-PI out of which 35 were exclusively expressed in S-PI. In R-PI 78 sequences had more expression as compared to S-PI of which 23 sequences were exclusively expressed in R-PI (Supplementary Table 8). Annotations of the exclusively expressed contigs in R-PI included disease resistance protein NBS-LRR, RGA (2, 3, and 4), disease resistance RPP13-like protein, RPM1 and RP1-like protein.

Comparison of the NB-ARC sequences in Poaceae family and genomic distribution

Analysis of orthologous clusters in whole genomes is an important component of comparative genomics. The identification of overlap between orthologous clusters might illuminate the evolution and function of the NB-ARC domains across important crop species of the Poaceae family. Gene clusters enriched in the five grass genomes and the NB-ARC predicted sequences of T. aestivum detected in the present study were identified by OrthoVenn (Fig. 7). The sequences were compared pairwise with an e value 1e-5. The Edward’s Venn diagram displayed 134 orthologous clusters that were shared between the NB-ARC predicted sequences of T. aestivum and other five grass species. T. aestivum shared 69, 63, 58, 37 and 56 orthologous clusters with O. sativa (indica), B. distachyon, S. bicolor, Z. mays and H. vulgare, respectively. The diagram illustrates sharing of 17 gene clusters by all six species suggesting lineage-specific conservation even after speciation. Additionally, 19 clusters were found to be specific to T. aestivum. Some of the lineage-specific clusters were annotated to have roles in plant disease resistance. The histogram in the figure represents the number of clusters formed by each species. H. vulgare formed 15,178 clusters while S. bicolor, Z. mays, B. distachyon, O. sativa indica and T. aestivum formed 21,307, 20,396, 19,182, 21,114 and 134 clusters, respectively. The graph at the bottom shows 17 clusters that were found in all six species and 11,616, 5135, 2589, 3362 and 4833 clusters that were found in five, four, three, two and one species, respectively.

Fig. 7
figure 7

Venn diagram showing distribution of the NB-ARC orthologous clusters shared among B. distachyon, O. sativa indica, S. bicolor, H. vulgare, Z. mays with that of T. aestivum. The cluster number in each component is listed (details mentioned in text)

The chromosome arm-specific distribution showed the chromosome arm 4AL had the highest number of the NB-ARC hits (74) followed by 7BL (42) (Fig. 8). On comparing the homeologous groups of wheat chromosomes, 152 NB-ARC domain-containing sequences were decrypted that were mostly present in group 7. At the sub-genome level, the distribution of the NB-ARC containing genes was equipoised among A, B, and D sub-genomes containing 217, 208 and 179 NB-ARC sequences, respectively. Heat map gene density visualization, a graphical representation of data where values are represented as colors, in the genome of wheat is shown in Supplementary Fig. 2.

Fig. 8
figure 8

Distribution of the identified 604 wheat NB-ARC-encoding candidate genes across T. aestivum chromosomes and chromosome arms whose sequences are currently available at Ensembl Plants

Up-regulation of the NB-ARC genes during leaf rust infection of wheat

To understand the function of the NB-ARC domain-containing genes at the transcript level, the spatial and temporal expression patterns were compared between mock- and pathogen-inoculated susceptible and resistant wheat NILs. During incompatible interaction, expression was induced from 12 hpi onwards in all the eight NB-ARC domain-containing sequences (Fig. 9). The expression pattern was quite different for the eight transcripts indicating their involvement at different time points during infection progression. Maximum expression was found at 24 and 48 hpi that relates to the phase of proliferation and spread of secondary hyphae of the pathogen to adjacent cells (Hu and Rijkenberg 1998). At 72 hpi, ramification of secondary infection hyphae leads to collapse of surface mature appressoria. The susceptible plants, in contrast, showed a much slower increase in expression levels of the NB-ARC domain-containing genes when compared with resistant plants. The mock-inoculated resistant (R-M) and susceptible plants (S-M) exhibited imperceptible changes in expression profiles in the absence of pathogen pressure. This discrete spatio-temporal expression pattern of the NB-ARC domain-containing genes demonstrated their positive role against P. triticina-induced incompatible and compatible leaf rust interactions in wheat plants.

Fig. 9
figure 9

qRT-PCR analyses of three NBS domain-containing sequences. Leaf tissues were collected from pathogen-inoculated and mock-inoculated plants of resistant and susceptible NILs at 0, 12, 24, 48, 72 and 168 hpi. Relative gene quantification was calculated by comparative ΔΔCt method. GAPDH expression level was used as internal reference gene. Data from mean of three biological replicates each with three technical replicates ±SD was plotted

Analysis of the NB-ARC gene structure and organization

To gain insight into the fine structure of wheat NB-ARC genes, comparative analysis of the deduced amino acid sequences obtained in the present study was performed with 42 NB-ARC sequences of different members of Poaceae (Supplementary Fig. 3). The phylogenetic tree specified that both 6BLN and 7DLN sequences originated from a common ancestor as they belong to the same immediate clade. The other sequences sharing the common ancestor with 6BLN and 7DLN are NBS-LRR-like disease resistance proteins of Eleusine coracana and Setaria italica, RPP13-like protein of B. distachyon, RPP8 protein of Aegilops tauschii and a predicted protein of H. vulgare. The phylogenetic tree shows the closeness of the identified sequences in the present study with disease resistance proteins from other crops of Poaceae family. The two sequences were deposited to GenBank but as they resembled each other more than 99.8%, a single accession number KY026053 was provided.

The translated protein sequences (consisting of 236 amino acids) had two major motifs, the NB-ARC domain (consisting of 89 amino acids, from position 147 to 236) and the protein kinase C phosphorylation domain (Fig. 10). The derived full-length amino acid sequences of 6BLN and 7DLN were searched using BLAST against any sequence for which a three-dimensional crystal structure was available in PDB. Pairwise alignments in the BLAST search showed homology with two proteins: APAF-1 bound to ADP (an apoptotic protease-activating factor I of human containing the NB-ARC region) (PDB ID-1z6tA) of 261 amino acids and an uncharacterized LRR receptor-like serine/threonine-protein kinase with ARC domain of T. aestivum (PDB ID-4r5cA) of 223 amino acids. The 3D structures of the predicted ARC domain were modelled and the model displaying the lowest DOPE score (−21,741.10156) was selected. The 3D structure displays the α-helical R protein motif in between the β sheets of APAF-1 and CED-4, a characteristic of STAND proteins (Supplementary Fig. 4a). Ramachandran’s plot was used to validate the resultant structures (Supplementary Fig. 4b). The output of PDBsum is summarized at the bottom of Supplementary Fig. 4. A good quality model is supposed to have at least 90% residues in the most favored regions; therefore, the prepared model with 90.33% residues in the most favored regions was acceptable.

Fig. 10
figure 10

Deduced amino acid sequence of the wheat NB-ARC showing specific position of ARC domain and other detected motifs

Discussion

In this study, a pipeline was followed for deciphering and filtering R-genes from 8,818,642 wheat contigs accessible at Ensembl Plants. Triticeae-specific sequences containing the NB-ARC domain were used to build an HMM model. A total of 1320 hypothetical NB-ARC containing contigs were identified by HMMER. The stringent parameters used for gene prediction and filtration steps reduced the total number of the NB-ARC containing contigs to 604, which is still in excess of previously reported 580 for wheat (Bouktila et al. 2014), 600 for rice (Shang et al. 2009), 191 for H. vulgare (International Barley Genome Sequencing Consortium 2012), 126 for B. distachyon (Tan and Wu 2012) and 593 for T. urartu (Ling et al. 2013).

We were able to annotate 84.60% of the total identified NB-ARC sequences which shows the strength of our analysis. In molecular function, nucleotide binding followed by protein binding were the major GO terms. As the name NB suggests, nucleotide-binding activity was obviously associated with the domain, which binds ATP and GTP in various organisms (Saraste et al. 1990; Traut 1994). In cellular component category, cytoplasm was the most dominant term. In biological process, the major GO categories represented was DNA metabolic process which may be associated with nucleotide-binding activity of the NB-ARC domain.

To facilitate characterization of the NB-ARC domain-containing genes of bread wheat putatively involved in leaf rust resistance, homology of the predicted protein sequences obtained in this study were compared with those of the other Poaceae members. Orthologous genes in different species vertically descended from a single gene from the last common ancestor (Fitch 1970). The NB-ARC domain-containing wheat sequences displayed 69 and 63 orthologous clusters with O. sativa Indica and B. distachyon, respectively. Moreover, the NB-ARC encoding loci across the wheat genome were very irregular. Such uneven distribution was reported previously also in other plant genomes like grapevine and poplar (Yang et al. 2008); Brassica rapa (Mun et al. 2009) and Solanum tuberosum (Lozano et al. 2012). This may be due to clustered nature of the NBS-LRR genes which assists gene-duplication-mediated evolution (Friedman and Baker 2007). The chromosome arm 4AL denoted the major repertoire of potentially functional NB-ARC loci that is in accordance with Bouktila et al. 2014. Most NB-ARC loci were found accumulated in homeologous group 7 of wheat. Similarly, chromosome 7A was highlighted as containing the most genes conferring powdery mildew (Pm) resistance (Marone et al. 2013).

NB-ARC-mediated resistance has been detected against various fungal, oomycetous and bacterial pathogens having biotrophic or hemibiotrophic lifestyles and several R-genes conferring resistance towards these organisms have been categorized from numerous plants (Jones and Dangl, 2006). To study the expression profiles of the predicted NB-ARC genes in wheat infested with P. triticina, the qRT-PCR studies revealed up-regulation in response to the pathogen denoting the role of the NB-ARC domains in leaf rust resistance. Results obtained in the present study corroborated with previous studies by Li et al. (2013) on elevated expression of a NBS-LRR gene, PnAG3, in fruit tissues of Aspergillus flavus resistant peanut variety. A pepper NBS-LRR gene, Bs2 overexpressed in transgenic tomato resulted in enhanced resistance to bacterial spot disease as well as considerable increase in yield (Horvath et al. 2012). Higher expression of RPP8 gene that confers downy mildew resistance in Arabidopsis against Hyaloperonospora arabidopsidis infection and salicylic acid was also reported (Mohr et al. 2010). The CC-NBS-LRR group of genes also has critical function in disease resistance as over expression of Lr10, a CC-NBS-LRR gene, increased leaf rust resistance in wheat (Feuillet et al. 2003).

In silico analysis of the NB-ARC genes may be considered as coarse because correlating a predicted gene in any organism along with its phenotype necessitates experimental validation. Functional redundancy is expected in hexaploid wheat due to allopolyploidization and often such redundant genes get lost or get converted to pseudogenes and thereby avoid fitness cost in host plants (Tian et al. 2003). The present study is a comprehensive effort of finding genome-wide NB-ARC domain-containing genes in hexaploid wheat. This will help in isolation, cloning and explanation of structure–function of the NB-ARC and other R-genes, thereby providing opportunities for improving disease resistance in wheat. In conclusion, the results in this study provide critical facts on the NB-ARC group of R-genes present in the wheat genome that are related to plant defense mechanism.

Author contribution statement

KM conceived and designed the research, MK supervised the research. SC, AZK and ZA executed the research while GR and VK assisted in in silico data analysis. SC, MK and KM wrote the manuscript. All authors read and approved the manuscript.