Introduction

Pathogen recognition is the first step in the process that triggers plant resistance responses, and is usually mediated by single dominant resistance genes ( R genes). Each of these gene products interacts directly or indirectly with the product of a corresponding avirulence ( Avr) gene in a pathogen (Flor 1971; Keen 1990). Many R proteins from different dicot and monocot plant species, which confer resistance to a wide variety of pathogens, share several conserved motifs (Hammond-Kosack and Jones 1997). Based on the deduced structure of their products, R genes can be classified into three main groups. Members of the major group encode proteins with a nucleotide binding site (NBS) domain followed by a leucine rich repeat (LRR) region. This group can be further sub-divided into two classes based on the nature of its N terminal region. Proteins in the first class show homology to Drosophila Toll or the human interleukin receptor (TIR); whereas proteins in the second class have a coiled coil (CC) domain (Hammond-Kosak and Jones 1997; Pan et al. 2000). The second group of R genes encodes proteins with only LRRs (Dixon et al. 1998). The third group, represented exclusively by Pto (Martin et al. 1993), displays a serine-threonine kinase (S/T KINASE) domain. In addition, one example of an R protein with an LRR followed by a S/T KINASE has been reported (Song et al. 1995). As new genes with novel motifs are cloned, new classes of R genes are emerging; this is the case for the genes RPW8 (Xiao et al. 2001) and Rpg1 (Brueggeman et al. 2002).

Isolation of R genes has historically involved map-based cloning or transposon tagging, both of which are very labor-intensive and expensive strategies. The common features shared by R proteins have led to new cloning strategies. Since the late 90s, PCR primers corresponding to highly conserved amino acid sequences of the NBS domain have been used to amplify resistance gene analog (RGA) fragments from various plant species (Wang et al. 2001). Many of these RGAs appear to be linked to previously described resistance loci or QTLs. However, R genes are often members of multigene families, frequently organized in clusters, and this PCR strategy generally fails to identify the functional R genes within a given cluster (Graham et al. 2000).

Sugarcane is an economically important crop. However, analysis of its genome has lagged behind, compared to other important grass species, mostly due to its genetic complexity. Modern sugarcane cultivars are highly polyploid (2n=100–130), derived from interspecific hybridizations between the domesticated sugar-producing species Saccharum officinarum L. (2n=80) and the wild species S. spontaneum L. (2n=40–128). They thus represent a particular challenge for breeding, genetics and gene cloning purposes (Butterfield et al. 2001; D'Hont and Glaszmann 2001; Grivet and Arruda 2001).

The Brazilian Sugarcane EST Sequencing Project (SUCEST) database, with 291,689 EST sequences, provides an invaluable source of information (Vettore et al. 2001). In this paper, we report the results of a search for resistance gene analogs (RGAs) using the SUCEST database. We have mapped these RGAs on the sugarcane reference genetic map (Grivet et al. 1996; Hoarau et al. 2001 and unpublished data), in order to investigate their genomic distribution and their relationship with disease resistance loci in sugarcane. Fifty-five single-sequence-repeat (SSR) loci were also mapped to allow the classification of the different haplotypes into homology groups. In addition, we compared the sequences of various members of two NBS/LRR resistance gene clusters to those of their orthologs in rice and maize.

Materials and methods

SUCEST database and sequence analysis

The SUCEST database encompasses 291,689 EST sequences derived from 37 different sugarcane cDNA libraries constructed from total RNAs isolated from various tissues, developmental stages and stress conditions including pathogen inoculated seedlings (Vettore et al. 2001). A total of 261,609 sequences have been grouped into 81,223 clusters based on an analysis with the phrap fragment assembly program. Results of comparisons between cluster consensus sequences and GenBank data were available for homology searches (Telles et al. 2001).

The 81,223 clusters were screened to identify RGAs. "NBS-LRR" and "disease resistance" were used as keywords, and Mi-1.2 (gi3449380), Rpm1 (gi963017), RPS2 (gi549979), Xa21 (gi1122443), Prf (gi1513144), Pto (gi430992), Cf-2.1 (gi1184075), N (gi558887), L6 (gi862905), M (gi1842251), Pti1 (gi3668069), RPR1 (gi4519936), I2 (gi4689223), Hcr2–5D (gi7488988), Hs1 pro-1(gi1850968), b5 (gi2792210) and Rp1-D (gi5702196) coding sequences as key genes. The genes were chosen to represent a broad range of plants, pathogen specificities, and R protein structures known at the time the searches were carried out. To avoid spurious hits due to the enormous amount of data, a very stringent expectation value of e−50 or better was used.

Plant material

The progeny analyzed in this study consisted of 112 individuals obtained by the self-fertilization of cultivar R570; this is a subset of the population used to build an AFLP genetic map by Hoarau et al. (2001). R570 is a rust-resistant cultivar developed by CERF (Center d'Essai de Recherche et de Formation, Réunion). Rust resistance phenotypes were determined in the field on the island of Réunion, using natural infection as described in Daugrois et al. (1996).

Restriction Fragment Length Polymorphism (RFLP) analysis

The 55 selected clone sequences were amplified by PCR with universal primers (T7, T3, SP6). The PCR products were purified with the GFX PCR DNA and Gel Band Purification Kit (Amersham Pharmacia Biotech) and radioactive random priming labeling was carried out with the Megaprime DNA Labeling System (Amersham Pharmacia Biotech). Genomic DNA extraction, Southern blotting, and hybridizations were performed as previously described by Grivet et al. (1996). The enzymes used for DNA digestion were HindIII, SstI, DraI and EcoRV.

Simple Sequence Repeat (SSR) analysis

The progeny was analyzed with 76 SSRs developed at CIRAD in collaboration with Génoscope (Evry, France) from an enriched library made with DNA from the cultivar R570, and these markers were localized on a reference RFLP map (in preparation). The primers were end-labeled with [γ-33P]ATP, and amplification was performed in an MJ Research PTC 100 Thermal Cycler in 20-µl reaction mixtures containing 50 ng of sugarcane DNA, 0.2 mM dNTP mix, 2 mM MgCl2, 50 mM KCl, 10 mM TRIS-HCl (pH 8.3), each primer at 0.2 µM, and 1 U of Taq polymerase (Eurobio). The samples were denatured at 94°C for 5 min and subjected to 35 cycles of 94°C for 1 min, 46°C–55°C (depending on the SSR primer sequence) for 45 s, and 72°C for 30 s, followed by an extension step for 10 min at 72°C. After the addition of 20 µl of loading buffer (98% formamide, 10 mM EDTA, bromophenol blue, xylene cyanol), the amplified products were denatured at 92°C for 3 min, and 4 µl of each sample was loaded onto a 5% polyacrylamide gel with 7.5 M urea and electrophoresed in 0.5% TBE buffer at 55 W for 1 h 40 min. The gel was dried for 30 min at 80°C and exposed for 4 days to X-ray film (Fuji RX).

Marker scoring, analysis and map construction

Each segregating RFLP and SSR band was scored independently as a dominant marker (presence vs. absence) and the following nomenclature was adopted; for RGAs: RGA, followed by three digits indicating the EST clone number, then three letters indicating the enzyme used to reveal the marker and a letter indicating the marker; for SSRs: mSSCIR (microsatellite, Saccharum Spp, CIRAD), followed by the number of the SSR, and then the letter 'm' followed by a number indicating the marker. Since sugarcane is highly polyploid, only single-dose markers (Wu et al. 1992) were used for map construction. Such markers show a segregation ratio that is not significantly different (by the χ2 test) from 3:1 (presence:absence) at P =0.05 (Grivet et al. 1996).

The single-dose markers were added to the AFLP matrix (883 markers × 112 individuals) developed by Hoarau et al. (2001). The new map was built using MAPMAKER 3.0 (Lander et al. 1987). Marker grouping was performed by two-point analysis at a LOD score threshold of 5 and a recombination fraction threshold of 0.35. Co-segregation groups (CGs) were then ordered by multipoint analysis and the distances calculated using the Haldane function. For homology group VII, we had additional data and thus the map distances were calculated with data from 316 individuals. CGs were assembled into homology groups (HGs) based on (1) common RGA or SSR markers between CGs; and (2) common SSR and AFLP markers with a R570 map encompassing mainly RFLP markers (Grivet et al. 1996, and unpublished results). A minimum of two common markers was necessary for assembly of two CGs into the same HG. When a correspondence between HG and CG could be established between the two maps, we assigned the same name to them, a Roman numeral from I to VIII for the HG, and a number for the CG. Assigned CGs with no correspondence between the two maps were named with the number of the HG followed by a letter. CGs not assigned to a HG were named as U (unassigned) followed by a number.

Analysis of clusters of NBS/LRR-like RGAs

The full length sequences of eight NBS/LRR-like EST clones (RGA118, RGA281, RGA326, RGA185, RGA267, RGA162, RGA152 and RGA087) were obtained by primer walking. Nucleotide sequences were aligned using the program Sequence Navigator 1.0.1 for Macintosh. Sequence variability was estimated using Nei's measure of nucleotide diversity (π) and calculated with the program DnaSP (Rozas and Rozas 1997).

Results

Identifying RGAs in the SUCEST database

Key gene and keyword searches in the SUCEST database identified 88 clusters homologous to known pathogen resistance genes with an expectation cut-off value, for the best matching query, of e−50 or better. Twenty-two ESTs presented homology to genes encoding NBS-LRR resistance proteins, 13 showed homology to LRR-coding genes and 53 were S/T KINASE homologs. No TIR/NBS/LRR-like RGAs were identified, even though genes encoding these three domains (like N, L6 or M) were used as key genes (Table 1). Matches to the NBS or LRR regions of these genes had poorer e-values than did CC/NBS/LRR genes.

Table 1. Characteristics and map location of the 55 EST-RGA studied

A single clone per cluster was selected for further analysis. To increase the likelihood of obtaining full length mRNAs, we chose the most 5´ clone. After identity confirmation by sequencing, 55 of the 88 clones analyzed were selected for mapping. We excluded clones that were wrongly addressed, showed evidence of rearrangement or represented redundant information. Table 1 indicates, for the 55 ESTs, the corresponding cluster-consensus homology and the relevant protein domain (Genbank accession numbers: BQ803996 to BQ804049). Only the best hits against a known R gene or RGA are included in Table 1. Hence, not all clones listed show an e value of e−50 or better. A number of clusters in Table 1 are indicated as Pti1 homologs. Pti1 is not a resistance gene, but is a Pto interactor which shares 36.4% overall protein identity with it (Zhou et al. 1995).

The distribution of RGAs in the sugarcane genome

Fifty-five ESTs were tested on the self-progeny of cultivar R570; no polymorphisms were detected for three of them (RGA251, RGA231 and RGA176) with any of the four enzymes assayed. The other 52 ESTs produced 272 polymorphic markers (an average of 5.23 markers/probe) and, of these, 177 segregated as single-dose markers (3:1 ratio, average of 3.40 markers/probe) and could be used for mapping. Out of these 177 markers, 148 markers corresponding to 50 RGA clones, were localized on the AFLP map (Hoarau et al. 2001) while the others remained unlinked (Fig. 1). Seventy-six SSRs tested on the same progeny produced 170 single-dose markers, of which 134, corresponding to 55 SSRs, were localized on the map. SSR and RGA markers were used to assemble co-segregation groups (CGs) into homology groups (HGs) as described in Materials and methods. The map encompasses 128 CGs, of which 66 could be assigned to seven of the eight HGs in the reference RFLP maps (Grivet et al. 1996, and unpublished results). The RGA markers map on 59 of the 128 CGs. They are present in all seven identified HGs. Six RGAs map on HG I, seven on HG II, two on HG III, six on HG IV, seven on HG VI, two on HG VII and 16 on HG VIII. Alleles of the same RGA map mainly onto the same HG, with four exceptions: RGA142 and RGA526 map on HG IV and HG VIII, RGA258 maps on HG II and HG VI, and RGA149 maps on HG III and HG VI.

Fig. 1.
figure 1figure 1figure 1figure 1

Locations of the 148 RGA markers ( shaded) on the genetic map of the sugarcane cultivar R570. The map encompasses 1123 markers, including AFLP and SSR (mSSCIR) markers, assembled into 128 Cosegregation Groups and seven Homology Groups ( numbered boxes). Genetic distances in centiMorgans are indicated on the left. The rust resistance gene is indicated on CG VII.1

RGAs are not equally distributed along the chromosomes. RGAs that were not more than 5 cM apart were defined as members of a cluster. On this basis, we determined all cluster loci, referred to the basic genome complement, that contain different RGAs (Table 2). We identified four cluster loci with three to six different RGAs, and six cluster loci with two different RGAs. Clusters 1, 7 and 8, contain four, six and three RGAs, respectively, that map on several homologous chromosome segments of HG I and VIII. In these three cases, not all RGAs were mapped on all the homologous CGs. The distance between RGAs on each CG is variable, but in at least one CG the distance between each pair of consecutive RGAs is ≤5 cM.

Table 2. Clusters containing different RGAs

Sixteen RGAs produced more than one marker on the same CG. The majority are clustered and have been identified with an asterisk in the CG column in Table 1. Some of these markers may be redundant, identifying the same allele due to the presence of a restriction site in the RGA sequence. However, since a few of them are separated by recombination events, we retained all of them on the map.

To date, the only pathogen resistance locus mapped in sugarcane is the common rust resistance gene located in CG VII1a (Asnaghi et al. 2000). Two LRR RGAs (RGA137 and RGA019) map on HG VII. Alleles of RGA019 map on CGs VIIa and VII14. Alleles of RGA137 map on CG VIIa, clustered with RGA019, and on CG VII1a some 5.2 cM from the rust resistance gene.

Characterization of the NBS/LRR RGA clusters

Two NBS/LRR RGA cluster loci were identified. Cluster 10 is located on CG U11 and contains three RGAs (RGA162, RGA152 and RGA087) with homology to the maize rust resistance gene Rp1-D (Collins et al. 1999). Cluster 7 is located on HG VIII on six homologous CGs (VIII1, VIII2, VIIIa, VIIIb, VIII.5 and VIII.15) and includes five NBS/LRR RGAs (RGA118, RGA281, RGA326, RGA185 and RGA267) with homology to the rice gene RPR1, which is responsible for probenazole-induced resistance to rice blast disease (Sakamoto et al. 1999).

Analysis of the full-length sequences of eight RGA clones revealed that almost all cDNAs seem to be incomplete at the 5´ end when compared to rice RPR1 and maize Rp1-D, due possibly to an incomplete reverse transcriptase reaction. There were two exceptions: RGA118 from the RPR1-like cluster and RGA162 from the Rp1-D -like cluster. Figures 2 and 3 show the derived protein sequence alignments for RPR1-like and Rp1-D -like clusters, respectively, and indicate the NBS and LRR domains as well as their conserved motifs. Clones RGA162 and RGA152, from the Rp1-D -like cluster, appear to be pseudogenes, as they have stop codons in amino acid positions 655 and 233, respectively. RGA267, from the RPR1-like cluster, also has stop codons at positions 330 and 335 (Table 3). Although there is no difference between RGA162 and RGA152 at the amino acid level, the cDNAs are not derived from the same gene because they have sequence differences in the 3´ non coding region (data not shown) and they map 1.7 cM apart (Fig. 1).

Fig. 2.
figure 2

Alignment of derived protein sequences encoded in the RPR1-like NBS/LRR-like RGA cluster on HG VIII. The shaded amino acids indicate sequence identity to the RPR1 protein of rice. NBS motifs (P-loop, Kinase 2 and Kinase 3a) and regions conserved between resistance gene products (CD) are underlined. Protein domains are indicated on the right

Fig. 3.
figure 3

Alignment of derived protein sequences encoded in the Rp1-D -like NBS/LRR-like RGA cluster on CG U11. The shaded amino acids indicate sequence identity to the Rp1-D protein from maize. NBS motifs (P-loop, Kinase 2 and Kinase 3a) and regions conserved between resistance gene products (CD) are underlined. Protein domains are indicated on the right

Table 3. Characteristics of RPR1- and Rp1-D -like RGAs

With the aim of evaluating the divergence between and within these two NBS/LRR cluster loci, we calculated the sequence variability. For inter-cluster comparison, we aligned the part of the nucleotide sequence encoding the NBS domain of RPR1, RGA118, Rp1-D and RGA162 which were the only full length clones (amino acids 223 to 624 of RPR1 with amino acids 248 to 454 of Rp1-D). We chose this domain because it is the domain most conserved between R genes and outside of this region there is no significant alignment between RPR1 and Rp1-D. Since some RGA clones are incomplete, and do not include the NBS domain, it was impossible to align this region for intra-cluster analysis. Thus, we aligned part of the LRR nucleotide sequence (amino acids 557 to 901 of RPR1 for the RPR1-like cluster with amino acids 1008 to 1292 of Rp1-D for the Rp1-D -like cluster). This region corresponds to the most variable region in the R genes. Despite the fact that the comparison involved a variable region for intra-cluster analysis and a conserved region for inter-cluster analysis, the intra-cluster diversity at the nucleotide level (0.10±0.04 for the Rp1-D -like cluster and 0.22±0.03 for the RPR1-like cluster) appeared lower than the inter-cluster value (0.42±0.1). This allowed the separation of these sugarcane RGAs into two clearly distinct groups: the RPR1-like group and the Rp1-D -like group.

Discussion

The discovery of common sequence motifs between plant resistance genes has led to their use to develop candidate gene approaches for identifying resistance genes and analyzing their distribution in plant genomes. In this study, we have exploited the sugarcane EST database assembled in the course of the SUCEST project for both purposes.

Among the 81,223 phrap clusters comprising the 261,609 EST sequences, we have identified 88 clusters that are highly similar to R genes, using stringent screening procedures. Examples of RGAs encoding proteins with the three classical domains present in R genes (NBS/LRR, LRR and S/T KINASE) were found. No TIR/NBS/LRR-like RGAs were identified, supporting the hypothesis that this class of R genes has undergone divergent evolution in grasses and dicots (Pan et al. 2000, Goff et al. 2002).

We have mapped 148 markers representing 50 RGAs on the AFLP genetic map of the sugarcane cultivar R570 (Hoarau et al. 2001). Since sugarcane cultivars are highly polyploid and heterozygous, these RGAs were mapped simultaneously on several haplotypes. The SSR markers enabled us to relate the RGA mapping data to the RFLP map of R570 (consisting of approximately 1000 RFLP markers; Grivet et al. 1996, and unpublished results) and thus to organize the different haplotypes into homology groups. This will also allow the comparison of the distribution of RGAs in sugarcane to that in other species of Gramineae (Glaszmann et al. 1997, Dufour et al. 1997).

R genes are frequently reported to occur in clusters (Michelmore and Meyers 1998). In the Arabidopsis genome, 33% of the R genes are organized in pairs and 36% in clusters of three to nine members (The Arabidopsis Genome Initiative 2000). In sugarcane, 16 of the 50 mapped RGA loci are organized in four clusters containing three to six different RGAs, while 12 are in pairs (Table 2). The RGAs that belong to the same cluster were not all mapped on every homologous CG. This is probably due to the constraints of mapping in polyploids (only single dose markers can be mapped) but could also be a consequence of gene losses that may have been part of the rapid and extensive genome changes after polyploidization (Wendel 2000; Feuillet et al. 2001).

There is evidence that R gene clusters may contain functionally related genes that are not necessarily similar at the sequence level. This is the case for the tomato Pto cluster, which encodes five related kinases and a NBS/LRR ( Prf) protein also required for resistance to Pseudomonas syringae (Salmeron et al. 1996). In this study, we also observed a few clusters containing RGAs with different protein domains (Table 2).

It is noteworthy that all the RGAs homologous to rice RPR1 map together in cluster 7, and all the NBS/LRR RGAs homologous to maize Rp1-D map together in cluster 10. Sequence comparison of these NBS/LRR RGAs with the respective references in rice or maize reveals that the members of a given sugarcane cluster are more similar to the alien reference (RPR1 or Rp1-D) than to members of the other sugarcane NBS/LRR locus. This observation suggests the existence of a common ancestral gene for rice RPR1and the sugarcane RPR1-like cluster, and for maize Rp1-D and the sugarcane Rp1-D -like cluster.

In addition, protein sequence alignments of the RPR1-like group (Fig. 2), as well as nucleotide sequence analysis (data not shown), show that sugarcane RGAs are not always more similar to each other than to the corresponding rice ortholog. This phenomenon of greater distance between paralogous than between orthologous sequences has already been highlighted by others (Michelmore and Meyers 1998; Feuillet et al. 2001), and led Michelmore and Meyers (1998) to propose a model for resistance cluster evolution called "birth and death".

Many authors have reported linkages between RGAs and disease resistance loci or QTLs (Wang et al. 2001, Graham et al. 2000). This is particularly interesting for sugarcane since, due to its particularly complex genome, only one resistance gene has been localized so far (Daugrois et al 1996; Asnaghi et al. 2000). This major resistance gene, which confers resistance to common rust, has been located on R570 maps and is the focus of a map-based cloning approach (Asnaghi et al. 2000, and unpublished results). The present work identified an LRR cluster near the rust resistance locus, thus indicating the presence of RGAs in this genome region. In addition, the data generated in this study on the distribution of RGAs in the sugarcane genome will provide extremely valuable information for current efforts aimed at mapping resistance genes for other sugarcane diseases including leaf scald (Offmann, personal communication) and smut (Raboin et al. 2001).

Despite the success of the RGA approach to the identification of disease resistance loci, the challenge often remains in recognizing the functional gene within clusters. R gene clusters typically contain several related sequences, and even in the best studied cases, only for half of them has any specificity been demonstrated (Michelmore and Meyers 1998). With regard to this aspect, EST-RGA resources may have advantages, compared to PCR amplification of conserved motifs or "candidate genes" from genome sequencing data. The EST approach considers only expressed genes, thus eliminating many pseudogenes that cannot be transcribed. However, cDNAs with internal stop codons, indicative of non-functional protein, were also found in this study (RGA162, RGA152 and RGA267), and already reported by Vicente and King 2001. In polyploids, the formation of pseudogenes through accumulation of mutations may be a consequence of the reduction in selection pressure on genes that are present in several copies (Wendel 2000).