Introduction

Plants are under constant attack from a diverse array of above- and below-ground pathogens, including bacteria, fungi, oomycetes, viruses, and nematodes (Dangl and Jones 2001). Because of strong environmental pressures, plants have evolved sophisticated mechanisms to recognize and guard against pathogens. Most plant disease resistance (R) genes encode nucleotide binding sites (NBS) leucine-rich repeat (LRR) proteins, one of the largest plant gene families. While the NBS domain binds and hydrolyzes ATP and GTP (Meyers et al. 1999), the LRR motif is involved in protein–protein interactions and pathogen specificity (Dangl and Jones 2001). NBS–LRR proteins can be further divided into two subfamilies, Toll/Interleukin1 receptor (TIR)–NBS–LRR (TNL) and non-TNL based on their N-terminal features (Meyers et al. 2003). Across plant species, R-genes are evolutionarily diverse both in terms of number and sequence. This diversity allows them to directly or indirectly recognize pathogen avirulence factors, triggering signal transduction cascades that lead to rapid defense responses, hypersensitive reactions, and programmed cell death (Dangl and Jones 2001).

Pathogen recognition genes, such as the vertebrate major histocompatibility complex (MHC) genes and plant R-genes, have exceptionally high polymorphism levels. For example, the average nucleotide diversity at the MHC locus is about ten times higher than that at other nuclear loci (Nei and Kumar 2000), and some R-loci have more polymorphisms than the MHC locus (Noël et al. 1999; Ding et al. 2007). The extreme diversity of these loci is often the result of positive selection, under which the ratio of a gene’s non-synonymous substitution rate (Ka) to its synonymous substitutions rate (Ks) is greater than 1. Indeed, larger Ka/Ks ratios have been detected at both MHC and R-loci (Bergelson et al. 2001; McDowell and Simon 2008). This pattern was particularly evident in the LRR region, which may determine resistance specificity of R-genes such as L, RGC2, RPP5, RPP8, and RPP13 (Kuang et al. 2004; Dodds et al. 2006; Jiang et al. 2007; Ding et al. 2007). Additionally, evidence of positive selection is found in pathogen molecules involved in host-pathogen coevolution (Allen et al. 2004; Rose et al. 2004; Dodds et al. 2006). Surprisingly, a genome-wide survey of R-gene polymorphisms in 96 Arabidopsis thaliana accessions found low Ka/Ks ratios, and only 3 of 27 R-genes had Ka/Ks ratios greater than 1 in the solvent-exposed residues (Bakker et al. 2006). Similarly, in a genome-wide comparison of two rice subspecies, only 11.2% of NBS–LRR genes had Ka/Ks ratios significantly higher than 1 (Yang et al. 2006). This unexpectedly low proportion of positively selected R-genes could be the result of undetectable positive selection due to low levels of inter-population diversity within a species.

To better understand the role of positive selection in the evolution of R-genes, it is best to compare these genes from two closely related species. In A. thaliana, 147 NBS–LRRs have been identified (Meyers et al. 2003). Its relative, Arabidopsis lyrata, has recently been sequenced, thereby permitting this comparative genomic analysis. A. lyrata is closely related to A. thaliana, having diverged about 5 million years ago (Koch et al. 2000). These two species have acquired different life history strategies; A. thaliana is an annual selfer with a worldwide distribution, whereas A. lyrata is a perennial outcrosser that is found in small, disjunct populations in the northern hemisphere in ecologically variable habitats (Koch et al. 2000). Through genome-wide comparison of R-genes, we identified the footprint of selection at the rapidly evolving R-loci. Our results suggest that positive selection is a major driver for generating interspecies variation in Arabidopsis R-genes.

Materials and Methods

Identification of NBS-Encoding Genes in A. lyrata

The scaffolds and annotations of the A. lyrata (Al) genome were downloaded from http://genome.jgi-psf.org/Araly1/Araly1.download.ftp.html on September 2008. The DNA sequences of 173 NBS-encoding genes in A. thaliana (At) were obtained from the NIBLRRS Project website (http://niblrrs.ucdavis.edu/data_protein.php). In addition, 12,533 one-to-one orthologous gene pairs between At and Al were downloaded from the GRAMENE website (http://www.gramene.org/).

To identify NBS-encoding genes in Al, both BLAST and hidden Markov model (HMM) search methods were employed, described in Zhou et al. 2004 and Yang et al. 2006. In brief, the DNA sequences of 173 NBS-encoding genes in the At were adopted as queries in BLASTn searches for possible homologs in Al genome. The threshold expectation value was set to 10−4, a value determined empirically to filter out most of the spurious hits. Next, the amino acid consensus sequence of the NB-ARC domain (Pfam: PF00931) was employed as a query in tBLASTn searches for possible NBS-encoding genes in Al. Then, the NBS domain was confirmed by Pfam version 23.0 (http://pfam.janelia.org/). Finally, the nucleotide sequences of the candidate NBS proteins were used as queries to find homologs in the Al genome through BLASTn searches. This step was crucial to find the maximum number of candidate genes. All new BLAST hits in the genomes were joined with flanking regions of 5000–10000 bp at both sides and then annotated using the gene-finding programs FGENESH (http://www.softberry.com/) and GENSCAN (http://genes.mit.edu/genescan.html/) to obtain information on complete open reading frames. To exclude potentially redundant candidate NBS proteins, all sequences were oriented in scaffolds using BLASTn, and sequences found in the same location were eliminated.

All non-redundant candidate NBS genes were surveyed to further verify whether they encoded TIR, coil–coil (CC), NBS, or LRR motifs using the Pfam database (http://pfam.janelia.org/), SMART protein motif analyses (http://smart.embl-heidelberg.de/), MEME (Multiple Expectation Maximization for Motif Elicitation, Bailey and Elkan 2005), and COILS with a threshold of 0.9 to specifically detect CC domains (Lupas et al. 1991).

Identifying R-Gene Families

R-genes sharing at least 60% nucleotide identity and clustering in the same branch of phytogenetic tree with Bootstrap value > 70% were defined as a family. Three types of families were identified. In the first type, each species had a single gene in a family, and the two genes were classified as a single-gene family (or single gene pair). In the second type, either species had multiple members; these genes were classified as part of a multi-gene family. Finally, a gene family was present in only one genome, the gene(s) were categorized as unpaired P/A gene(s), present in one genome but absent in the other.

Data Analysis

Multiple alignments of amino acid sequences were performed by ClustalW with the default options (Thompson et al. 1994) and by MEGA Version 4.0 (Tamura et al. 2007) for manual corrections of alignment. The resulting amino acid sequence alignments were then used to guide the alignments of nucleotide coding sequences (CDSs). Phylogenetic trees were constructed based on the bootstrap neighbor-joining method with a Kimura 2-parameter model by MEGA v4.0. The stability of internal nodes was assessed by bootstrap analysis with 1,000 replicates.

Nucleotide divergence was estimated using Dxy with the Jukes and Cantor correction (Lynch and Crease 1990) using DnaSP v4.0 (Rozas et al. 2003). A gene cluster was defined as a region in which two neighboring homologous genes were <200 kb apart. Sequence exchange was investigated with the program GENECONV 1.81 (http://www.math.wustl.edu/sawyer/geneconv/). The default setting of 10,000 permutations was used for the analysis. The statistical significance of gene conversion events was defined as a global permutation P-value of <0.05.

The xxLxLxx motif includes the solvent-exposed residues (L = Leu or other aliphatic amino acid; × = any amino acid). In the identified R-genes, the LRR regions were divided into the individual LRR and the xxLxLxx motifs according to At R-genes on the NIBLRRS Project website. To detect positive selection, we estimated the ratios of non-synonymous (Ka) to synonymous (Ks) nucleotide substitutions of the NBS domain, the LRR region and the xxLxLxx motif between orthologs and/or paralogs. Ka and Ks were calculated using DnaSp v4.0 based on the Nei and Gojobori (1986) method. A Ka/Ks ratio greater than 1 suggested positive selection and the ratio less than 1 suggested purifying selection. In addition, the HyPhy package with the fixes effects likelihood (FEL) method (Pond et al. 2005) was used to detect the positively selected sites with posterior probability >0.95.

Results

Identification of NBS–LRR-Encoding Genes in A. lyrata

Based on searches for NBS-encoding proteins with available HMM and BLAST methods (see section “Materials and methods”), 156 non-redundant NBS-LRR and 26 NBS (lacking LRR domains) genes were identified from Al (Table 1). Although the Al genome is about twice as large as that of At, the number of Al NBS–LRR genes was similar to that of At (147). Using a sliding window analysis (200 kb window size), 71.4% (130/182) of Al NBS-encoding genes were clustered with at least one other R-gene in the genome. This proportion could be slightly underestimated due to unanchored scaffold sequences.

Table 1 NBS gene distribution in Arabidopsis and rice genomes

A phylogenetic tree was constructed using NBS domain sequences from the 156 Al NBS–LRR-encoding sequences (Supplementary Fig. S1). Two robustly distinguished classes (bootstrap value of 98%), TNL and non-TNL, were identified. According to the phylogenetic tree and the Pfam results, 103 Al NBS–LRRs were identified as TNL-type genes, slightly more than that of At (93). The other 53 Al R-genes were classified as non-TNLs, similar to that of At (54).

Interestingly, two Al TNLs (L_AT5G45050 and L_AT5G4526), whose structural motifs are similar to the At RRS1 gene, contain an additional C-terminal WRKY domain (Pfam: PF03106). This gene, which confers resistance to Ralstonia solanacearum, encodes a protein with structural motifs similar to TNL proteins with an additional C-terminal WRKY domain (Deslandes et al. 2002). Two other Al TNLs had a C-terminal RCC1 domain (Pfam: PF00415) and an N-terminal Toxin domain (Pfam: PF08115), respectively (L_AT4G14370 and L_800386; Table 1). Two Al non-TNLs had an N-terminal RPW8 domain (Pfam: PF05659), which was also found in a few non-TNLs from the grape and poplar genomes (Yang et al. 2008).

Variation of R-Genes Between the AT and Al Genomes

Although At and Al have a similar number of R-genes, orthologous relationships were difficult to identify for most R-genes (Fig. 1). Based on the phylogenetic trees between the two genomes (Fig. 1a and Supplementary Fig. S1), 31 single-gene pairs (18 TNLs and 13 non-TNLs) had an average nucleotide divergence (Dxy) of 0.11 (Table 2). This value is similar to the genome-wide average Dxy (0.10) from 12,533 orthologous gene pairs between At and Al. In contrast, 26 multi-gene families (families 1–26; Table 2) had an average Dxy of 0.20, significantly larger than the genome-wide average Dxy (t-test, P < 0.001). Interestingly, 18 of the 26 multi-gene families had copy number differences between the two species, encompassing 72 and 66 NBS–LRR genes in Al and At, respectively (Table 2). The orthologous relationships between these R-genes were ambiguous (Fig. 1b and Supplementary Fig. S1), characterizing complex R-gene loci. These complex or asymmetric R-gene pairs suggested that R-gene duplication and/or loss events were common in these two closely related species.

Fig. 1
figure 1

An unrooted phylogenetic tree of orthologous–pair (a) and complex–pair (b) R-genes in A. lyrata (triangles) and A. thaliana (squares). Numbers in columns refer to corresponding family numbers in Table 2

Table 2 Detection of positive selection and sequence exchange in R-genes

Many R-genes were present in one genome but absent from the other genome (P/A R-genes; Table 3 and Fig. 2). In total, 41 Al NBS genes (22.5%) were absent from At, and 29 At NBS genes (16.8%) were absent from Al. The fact that about 20% of R-genes we analyzed were P/A genes suggests that the recent duplications and/or losses largely result in lineage-specific R-gene repertoires. Interestingly, TNL gene families had more P/A R-genes than those in non-TNL families (Table 3). The totals of 46 P/A R-genes in TNLs (~23.5%) and 17 P/A R-genes in TNs (~44.7%) were significantly higher (P < 0.001, χ2 test) than those in non-TNL (5.6%) or non-TN genes (6.7%), suggesting that duplication or loss plays a prominent role in the diversification of TNLs. This trend was also observed in the copy number variation of TNL genes, where the proportion of multi-gene families with number variation was higher in TNLs (84.5–87.7% in two species) than those in non-TNLs (63.9–65.8%).

Table 3 Statistics about R-genes present in one genome but absent from the other genome
Fig. 2
figure 2

An unrooted phylogenetic tree of P/A R-genes in A. lyrata (triangles) and A. thaliana (squares). Numbers in columns refer to corresponding family numbers in Table 2. The genes in the shaded area were NBS genes lacking LRR domains

Detecting Positive Selection in R-Genes

To detect positive selection acting on R-genes, the Ka/Ks substitution rate was calculated for each pair between the two relatives. In the NBS domain, Ka/Ks (average = 0.44) was significantly less than 1 (P < 0.001, t-test; Table 2) and was consistently low in each pairwise comparison, suggesting that purifying selection acts on this region possibly because of its conserved function in signal transduction. Nevertheless, the average Ka/Ks for the NBS domain was significantly larger than that of 12,533 orthologous gene pairs (Ka/Ks = 0.29; P < 0.005, t-test).

However, in the xxLxLxx motif of the LRR region, which is assumed to be a determinant of recognition specificity for Avr factors (Ellis et al. 1999), quite different Ka/Ks values were obtained for different groups of R-genes (Table 2). Of 31 multi-gene families, including 5 P/A gene families (Table 2), 24 (77.4%) had Ka/Ks ratios greater than 1. These families included 83 and 71 R-genes in Al and At, respectively (Table 2). Given that this number represents about 80% of Al and At multi-genes (103 in Al and 91 in At), the role played by positive selection in multi-gene R-family evolution is more important than previously believed.

In contrast, only 3 out of 31 single-gene pairs had a Ka/Ks ratio over 1 (Table 2), e.g., AT5G44510 versus L_AT5G44510 (Ka/Ks = 2.37), AT1G69550 versus L_AT1G69550 (Ka/Ks = 1.84), AT5G17680 versus L_AT5G17680 (Ka/Ks = 1.67). Most single-pair R-genes appeared to evolve under purifying selection. In two functional single-pair R-genes, RPS2 and RPM1, the Ka/Ks ratios were only 0.36 and 0.25, respectively, for xxLxLxx motifs between the At and Al. These results indicate that sequence conservation in close relatives is likely enforced by the recognition of conserved, important pathogen molecules.

Interestingly, all seven multi-gene families, which had Ka/Ks ratios <1 in the xxLxLxx motif (family 13–15 and 23–26 in Table 2), have not only equal copy numbers, but also clear orthologous relationships based on the phylogenetic tree between the two closely related species (Fig. 1). The average Dxy of these orthologous pairs (0.092) was similar to that found in single-pair genes, suggesting that these orthologous multi-genes were also conserved and undergo purifying selection. All remaining multi-gene families (except for family 7) with Ka/Ks > 1 have copy number variation and ambiguous orthologous relationships between the two closely related species (Table 2 and Fig. 1), suggesting that all of these R-genes are located in complex gene loci (Yang et al. 2006).

These results are well consistent with the positively selected sites detected by using HyPhy package with FEL method (Supplementary Table S1), where, ~77.3% positively selected sites with posterior probability >0.95 were clustered in LRR region, and ~62.1% of them were located in the xxLxLxx motif (Supplementary Table S1).

Dxy and Ka/Ks from orthologous pairs or the most closely related gene pairs at complex R-loci between At and Al were positively correlated in both their full-coding regions (correlation coefficient r = 0.616, P < 10−4 for all pairs; r = 0.557, P < 10−3 for clear orthologous pairs; Supplementary Fig. S2a) and xxLxLxx motifs (r = 0.663, P < 10−4; Supplementary Fig. S2b). As Dxy and Ka/Ks might not be independent of one another, 50 randomly selected genes (Supplementary Table S2) were used as controls, where only a weak positive correlation (r = 0.222, P = 0.125; Supplementary Fig. S2c) was found. These results suggested that amino acid substitutions could drive the diversification of R-genes.

Analyses of sequence exchange, including recombination, gene conversion, and/or unequal crossing-over, showed a larger number of sequence exchange events (P < 0.05) in complex R-loci (Families 1–12 and 16–22; Table 2) than those of the loci with clear orthologous relationships (Families 13–15 and 23–26). The significant difference in the events of sequence exchange between ambiguous and clear orthologous loci, suggested that frequent sequence exchange between paralogs in the complex R-loci could play a major role in the diversification of R-genes. In addition, approximately 23.0 and 37.2% of sequence exchange events occurred between paralogs, respectively, in At and Al. Approximately, 39.8% of exchange events were detected in homologs between At and Al.

In general, recombination or gene conversion tends to obscure the sequence antiquity among duplicated paralogs therefore may lead to the observation of sequence homogeneity within a species and considerable divergence between species (Hurles 2004). In multi-gene families, approximately 39.8% of sequence exchange events were occurred among homologs between At and Al, which should be ascribe to the recombination of the ancient duplicates in their common ancestor before the split of At and Al, due to the genetic isolation effect of well-diverged species. Meanwhile, in most of the complex families, approximately 60% (~23.0% in At and 37.2% in Al) of sequence exchange events were detected among paralogous members within genome, which could contribute to enhancing the degree of homogeneity of the paralogs within a species and higher divergence between species (Fig. 1b). Therefore, these two factors, deeper coalescence time of these multi-gene families’ duplicates and frequent sequence exchange between paralogs within a species, might both contribute to the diversification of the complex genes. This result suggests that positive selection might play a considerable role in the movement and fixation of recombined segments across members of the multi-gene family in their population.

Estimating the Evolutionary Rate of R-Genes

Our results were consistent with a previous study describing different evolutionary patterns between TNLs and non-TNLs (Yang et al. 2008). According to the phylogenetic tree of orthologous gene pairs, TNL branches were significantly longer than non-TNL branches between At and Al (Fig. 1a). In orthologous pairs, both Dxy and Ks in TNLs were significantly greater than those of non-TNLs for both full CDSs and NBS regions (P < 0.05, t-test; Table 1), suggesting that TNLs evolve faster than non-TNLs.

For R-genes with ambiguous orthologous relationships, the two most closely related genes from At and Al were used to calculate Dxy and D s (the number of synonymous substitutions per degenerate site with Jukes and Cantor correction between At and Al). In contrast to what was found in the unambiguous orthologous gene pairs, TNLs did not evolve faster than non-TNLs (P > 0.05; Table 4). In addition, Dxy and D s in complex gene pairs were significantly greater than those values for single R-gene pairs (P < 0.0001, t-test; Table 4), suggesting different evolutionary patterns between orthologous and complex R-genes.

Table 4 Divergence of orthologous and complex R-genes between Al and At

In the phylogenetic tree of complex R-genes, branches to Al terminal nodes were shorter than those of At (Fig. 1b). This difference might reflect variations in the evolutionary rates of R-genes between closely related Arabidopsis species. To evaluate this possible bias, we chose 11 highly supported clusters (bootstrap value >85%) with at least 2 R-genes from each species and compared the branch lengths between At and Al genes. Four groups show distances in Al were significantly lower than those in At (P < 0.05, t-test) and the remaining seven groups also shared this trend although with insignificant difference (Supplementary Table S3), indicating that the Al R-genes might evolve more slowly than the At genes.

Discussion

Mechanisms of R-Gene Diversification

Two types of R-genes, the rapidly evolving type with frequent sequence exchange and the conserved type with infrequent exchange, are observed at the RGC2 locus (Kuang et al. 2004). Several other R-loci, for example RPP8, RPP13, RPP5, and L loci, show the close association between frequent sequence exchanges and high Ka/Ks ratios, particularly in the LRR region (Ellis et al. 1999; Noël et al. 1999; Rose et al. 2004; Dodds et al. 2006; Ding et al. 2007; Jiang et al. 2007). On the other hand, many conserved R-loci, for example, loci Rpm1 and RPS4, are often associated with infrequent exchanges and low Ka/Ks ratios (Bergelson et al. 2001). These associations indicate that these two types of evolution are commonly present in the whole genome.

Our genome-wide comparison of R-gene families in two closely related Arabidopsis species demonstrates these two distinct evolutionary patterns: positively selected and stably selected. These two patterns differ essentially in their dynamics of copy number variation. Most multi-gene families, which contain >70 of R-genes in genome, have unequal copy numbers between these two species. These genes are often clustered in the genome. In fact, the unequally represented R-genes, including P/A R-genes, are widely distributed among individual plants and organized at different (asymmetric) positions among genomes (Yang et al. 2006; Shen et al. 2006; Ding et al. 2007). Such asymmetric genome architecture can generate genetic isolation, induce increased nucleotide substitution rates for genes in these families, and enhance ectopic recombination (Yang et al. 2007; Tian et al. 2008; Sun et al. 2008). Thus, dynamic copy number variation suggests a unique mechanism for generating and maintaining amino acid variation.

Strong positive selection was associated with copy number variation and was detected in most multi-gene families, and this selection could lead to the rapid diversification of R-genes. Previous studies only revealed a few R-genes with Ka/Ks > 1 in Arabidopsis. In our genome-wide scan, over 60% of compared R-genes had Ka/Ks values greater than 1 (Table 2), reflecting widespread positive selection among R-genes. Consequently, abundant non-synonymous substitutions are selectively accumulated. R-genes under positive selection also had higher Dxy and Ka values (0.235 and 0.249, respectively), but similar Ks (0.174) values when compared to R-genes with orthologs (0.126, 0.104, and 0.239, respectively) or genome-wide reference gene pairs (0.100, 0.058, and 0.185, respectively) between At and Al. Therefore, the key factor leading to rapid evolution is increased Ka, reflecting increased non-synonymous substitutions (Supplementary Fig. S2). This may be a general mechanism for elevated divergence or for the accelerated rate of evolution in R-genes. Thus, the rapid evolution of R-genes is initiated by selectively generated, frequently shuffled and selectively maintained non-synonymous substitutions.

What is the factor driving the selection of non-synonymous substitutions? The answer should be the various and changing pathogens in the environment. It is amazing that the plant has evolved the machinery to selectively accumulate non-synonymous amino acid substitutions for its resistance to pathogens.

Undoubtedly, the frequent gene duplication (Fig. 2b) and their subsequent sequence exchanges (Table 2), either through recombination or gene conversion, can make use of the accumulated non-synonymous substitutions more efficiently to create novel R-genes. In other words, positive selection might contribute to the movement and fixation of recombined segments across members of the family in their population. All of these dynamic R-genes had frequent sequence exchange events, which were enriched 100-fold in the positively selected genes than those in genes with Ka/Ks <1 (Table 2).

On the other hand, the stably selected pattern is the intermediate strength of purifying selection on parts of R-genes in the genome. Compared with the positively selected genes, these genes have lower Ka/Ks, few sequence exchanges, equal copy number, and low divergence. Although these R-genes are evolving much more slowly, they still evolve faster than housekeeping genes (Yang et al. 2006). This pattern suggests that, even in conserved R-genes, some sites may be under positive selection. Although fewer than 30% of R-genes belong to this category, they could play an important role in recognizing conserved pathogen molecules for durable resistance. The diverse evolutionary patterns in R-genes should be the result of adaptation, which allows plants to cope with different types of pathogens.

Different Evolutionary Patterns Between TNLs and Non-TNLs

Comparative analyses of NBS genes in four plant genomes, Arabidopsis, rice, grape, and poplar, revealed extensive species-specific expansion in TNLs (Yang et al. 2008). In contrast, non-TNLs did not segregate by species but mostly clustered in interspecific phylogenetic clades. These data suggested different evolutionary rates and patterns between TNLs and non-TNLs (Yang et al. 2008). Because the species from that study were distantly related, no reliable evolutionary rate between the two types of R-genes could be estimated.

Instead, the R-genes from two closely related species can be used to estimate evolutionary rates. Based on the comparison of At and Al, gene duplication/loss was found to be more common in TNLs than that in non-TNLs. Significantly, more TNLs (23.5%) and TNs (44.7%) were present in one genome but absent from the other genome compared to non-TNLs (5.6%) and non-TNs (6.7%). These results suggest that TNLs and non-TNLs may have different evolutionary patterns for their corresponding environmental pathogens. However, this result is contradictory to our previous study in which five non-TNL and four TNL P/A genes were detected within At (Shen et al. 2006). An alternative explanation is that the P/A polymorphisms in TNLs and non-TNLs have different intra- and inter-species maintaining frequencies.

In Shen’s study, seven new P/A polymorphism R-genes were detected within At (Shen et al. 2006), including three non-TNLs (At5g05400, At4g10780, and At4g27220) and four TNLs (At1g63870, At5g49140, At5g45240, and At5g18350). Interestingly, five of them (At4g1078, At4g27220, At5g49140, At5g45240, and At5g18350) were also absent in Al, indicating that P/A polymorphisms have a high level of intra- and inter-species overlapping. However, the other two most studied P/A polymorphisms within At, RPM1, and RPS5 (both non-TNL genes) are present in Al (Table 2). Additional genotyping experiments confirmed that the two R-genes could be detected in all 15 Al individuals from 5 different European populations (data not shown), suggesting that the P/A polymorphisms in these two R-loci might have occurred independently in At after the divergence of At and Al. This result is consistent with Grant’s conclusion that RPM1 evolved before the divergence of the Brassicaceae and has been deleted independently in Brassica napus and At (Grant et al. 1998).

At the same time, we found that the external branches are generally longer in non-TNL P/As than those of TNLs (Fig. 2). Thus, these non-TNL P/As could have been more easily considered single genes in Shen’s study, which biased their sample at single-copy P/As within At (Shen et al. 2006), resulting in the selection of more non-TNL P/As in their analysis (Fig. 2). Therefore, another alternative explanation for this contradiction (more non-TNL P/As within species) is that the biased sampling method introduced an artificial difference between intra- and inter-species TNLs and non-TNL P/As.

Since NBS–LRR genes share similar sequences and structural motifs, they are believed to share a common ancestor. However, most of the P/A R-genes were single copy (<60% identity between members) in At and Al (Fig. 2), indicating that duplication within At or Al after speciation is less likely to happen and that the P/A R-genes should have had their own ancestral alleles before speciation. Overall, these results suggest that these P/As have been maintained in At or Al or both for a long evolutionary time, and that balancing selection might play an important role in the evolution of these R-genes (Stahl et al. 1999; Tian et al. 2002).

In addition, TNL sequences evolve rapidly as both Dxy and Ks are significantly higher in TNLs compared to non-TNLs (P < 0.05; Table 1). The rapid evolution of TNLs may be an adaptive strategy against a species-specific array of pathogens.

Dramatic Changes at Some Functional R-Loci Between At and Al

RPP5 had eight and ten homologs in At ecotype Columbia (Col-0) and Landsberg erecta (Ler), respectively (Noël et al. 1999). According to our definition of a gene family (>60% identity), seven RPP5 homologs were identified in Col-0. In the xxLxLxx motif, Ka/Ks ratios averaged 2.5 (Table 2), and the high Ka (0.41) could drive rapid evolution at this locus. Interestingly, only one RPP5 homolog was detected in the Al genome, suggesting a dramatic change in copy number between At and Al. This copy number change could be generated by duplication in At or loss in Al. To assess whether or not gene duplication occurred before or after speciation, nucleotide divergence in silent sites (DSil) among paralogs was employed in an attempt to reduce the impact of selective constraints against amino acid substituents on the estimate of neutral variability. In Col-0, DSil averaged 0.20 (ranging from 0.027 to 0.311) among the seven RPP5 paralogs, which have no significant difference (t-test, P > 0.1) between the average DSil (0.185) from 12,533 orthologous gene pairs between At and Al, suggesting that the gene duplication in the RPP5 locus might have occurred in At around the time of speciation between At and Al. Such copy number variation within a species would allow for extensive haplotype diversity. Consequently, the extremely different haplotypes remained in different species by chance or by natural selection.

Another dramatic change between the two species was found in the RPP13 Ka/Ks ratio within At populations and between At and Al. In At populations, positive selection at RPP13 maintained allelic variation to recognize different strains of the oomycete Hyaloperonospora parasitica (Rose et al. 2004). However, in our analysis, no positive selection could be detected in either the complete CDS (Ka/Ks = 0.31) or xxLxLxx motif (Ka/Ks = 0.72) between Al and At. Furthermore, in Al populations, RPP13 was conserved and likely evolves under purifying selection (data not shown). The same R-gene may be under different selective pressures even between the closely related species, suggesting that this locus could respond to different pathogens between species. The dramatic changes in RPP5 and RPP13 demonstrated that R-genes are highly adaptable to cope with their own spectrum of pathogens in different species. This flexible feature is very important for R-gene to recognize the molecules of various pathogens.