Introduction

Recombination is a major mechanism underlying genetic diversity (Korol et al. 1994; Gaut et al. 2007; Wicker et al. 2007; Morrell et al. 2006). Tracking recombination events can illuminate the evolution of genes and genomes. Recombination and linkage disequilibrium (LD) are basic tools in genetic mapping and positional cloning of genes in plant genomes (Ersoz et al. 2008; Waugh et al. 2009). Unlike linkage studies of bi-parental mapping populations, in wild populations LD patterns depend on a long history of recombination and therefore enable to obtain improved mapping resolution. On the other hand, LD is affected by population structure and selection, and hence linkage between markers may not necessarily reflect physical proximity (Flint-Garcia et al. 2003; Rostoks et al. 2006). The extent of LD in barley, a selfing plant and a relative of wheat, was found to be much lower in wild populations (Hordeum spontaneum) than in a cultivated collection (H. vulgaris). LD rapidly decays over the first kb in the wild forms but extends over more than 200 kb in elite cultivars (Caldwell et al. 2006). In durum wheat (Triticum durum), LD stretches for a long distance (Maccaferri et al. 2005), with high LD values found between loci located 50 cM apart. In bread wheat (T. aestivum) collections, the LD extent ranged from 1 cM to 5 cM (Chao et al. 2007; Horvath et al. 2009; Somers et al. 2007). In Aegilops tauschii, the wild donor of the D genome to T. aesitvum, a 1,400-bp LD block was observed in the grain softness locus (Massa and Morris 2006). In wild emmer wheat (T. dicoccoides) populations from Amiad and Tabigha in northern Israel, high LD values were observed even between loci on different chromosomes. It seems that LD values in T. dicoccoides populations are inflated by the population structure (Li et al. 2000a, b). To date, only one study has targeted the intra-genic LD in T. dicoccoides populations and only two studies targeted LD within R-genes in selfers (Mauricio et al. 2003; Kuang et al. 2008; Haudry et al. 2007). Therefore, the current study may add insights into LD patterns in T. dicoccoides populations and LD patterns in R-genes.

Leaf rust resistance gene Lr10 is one of the few cloned disease resistance genes (R-genes) in bread wheat. Lr10 encodes a three domain protein: coiled-coil, nucleotide-binding site and leucine-rich repeats (CC-NBS-LRR) (Feuillet et al. 2003). Lr10 is located 36 cM from the centromere on the short arm of chromosome 1A. Previous studies have shown that Lr10 is a very variable (polymorphic) gene while the CC domain is its most diverse part. The high diversity in the CC domain is probably due to positive selection (Loutre et al. 2009). Many plant R-genes exhibit high diversity, positive selection, and sequence exchanges since the arms race with pathogens drives them to change recognition specificity frequently (Wicker et al. 2007; Bergelson et al. 2001; Mondragon-Palomino and Gaut 2005). In some R-genes, diversity is maintained by frequency-dependent selection where alleles are cycled between high and low frequency in a long co-evolution with pathogens as a form of “trench warfare”. The trench warfare model predicts that the polymorphism will be old and under balancing selection (Holub 2001; Stahl et al. 1999). Although many studies of R-gene diversity were conducted [reviewed in Mcdowell and Simon (2006)], most of them were limited to a few alleles or to paralogs and only a few studies focused on population genetics (Mauricio et al. 2003; Meaux and Neema 2003; Butterbach 2007; Yahiaoui et al. 2009; Kuang et al. 2008). These studies mainly targeted the LRR domain, which is considered the most diverse domain due to its interaction with pathogen effectors. However, in Lr10, the CC domain was found more diverse than the LRR domain (Loutre et al. 2009). Therefore, in the current study, the CC domain was analyzed separately from the NBS and LRR domains.

T. dicoccoides is the tetraploid (genome BBAA) progenitor of most cultivated wheats. The natural habitats of T. dicoccoides are located in the Middle East Fertile Crescent. T. dicoccoides populations harbor a valuable pool of resistance genes that can be transferred into cultivated wheat (Rong et al. 2000; Knott et al. 2005; Mergoum et al. 2005; Uauy et al. 2005; Marais et al. 2005; Gustafson et al. 2009). Numerous studies of T. dicoccoides genetic diversity have shown that the population in Israel is sub-divided. The selfing nature of emmer wheat and the large difference between T. dicoccoides habitats in Israel largely contribute to this pattern (Golenberg and Nevo 1987; Nevo et al. 2002; Peleg et al. 2008b; Sela et al. 2009).

Leaf rust, caused by Puccinia triticina, is the most common rust disease of wheat. It is highly specific and can spread thousands of kilometers by wind (Bolton et al. 2008). In Israel, the pathogen cannot complete its life cycle due to the absence of an alternate host and harsh summer conditions. It is assumed that new inocula of P. triticina arrive every year from regions adjacent to Israel (Y. Anikster, personal communication). Most of T. dicoccoides accessions from Israel are susceptible to leaf rust and only a small fraction of the accessions are resistant (Anikster et al. 2005; Moseman et al. 1985). Nevertheless, leaf rust resistance genes from T. dicoccoides were introgressed into cultivated wheat (Marais et al. 2005).

The main objectives of the present study were the following: (1) to estimate Lr10 nucleotide diversity among emmer wheat populations originating from Israel; (2) test the differences in Lr10 nucleotide diversity between and within T. dicoccoides populations; (3) estimate the extent of LD along the gene; (4) study the effect of recombination on sequence diversity.

Materials and methods

One hundred T. dicoccoides accessions from 12 populations in Israel and vicinity were used in the current study (Table 1; Fig. 1). A full description of their geographic origin and climatic conditions can be found in Nevo and Beiles (1989). DNA was extracted from 10-day-old seedlings using the ArchivePure DNA extraction kit (5 Prime). The whole Lr10 gene (4 kb) was amplified using Pfu Ultra Fusion II HS polymerase (Stratagene) according to the supplier’s protocol (annealing temperature was 53°C and elongation time 1 min) with the primers ThLR10_V (CGGAACTATGGAGAGTGAAC) and ThLR10_U (GGGAAATGTAGACAGGTACAT) (Feuillet et al. 2003). PCR products were separated on agarose gels and extracted using MinElute gel extraction kits (Qiagene). The extracted DNA was cloned into the pSMART vector using a GC cloning kit (Lucigen). Samples were sequenced using Big Dye Terminator chemistry on ABI 3700 or ABI 3730 instruments (Applied Biosystems). Alternatively, eight PCR fragments covering the whole gene region were extracted from the gel and sequenced directly (Table 2). Sequence reads were checked and assembled using Phred, Phrap, and Consed software provided by B. Ewing, P. Green & D. Gordon (available at http://www.phrap.org/). The sequences were aligned using MUSCLE (Edgar 2004) and manually corrected using BioEdit (Hall 2007).

Table 1 List of wild emmer wheat accessions used in the study
Fig. 1
figure 1

Geographic distribution of the 12 populations of wild emmer wheat, T. dicoccoides, tested in this study. For names of the numbered populations see list in Table 1. The numbers of populations are according to Nevo and Beiles (1989)

Table 2 Primers used for amplification and sequencing of overlapping fragments of Lr10 sequence

Statistical analysis

DNA sequence alignments were analyzed for nucleotide diversity of synonymous (d S) and non-synonymous subsitutions (d N) (Nei 1987) estimated by Kumar’s method as implemented in MEGA software (Kumar et al. 2004). Z tests for selection based on the difference between synonymous and non-synonymous substitutions (d S-d N) were conducted using MEGA software as well (Nei and Kumar 2000). A consensus maximum parsimony (MP) tree from 500 bootstrapped trees was constructed using MEGA. Pairwise distances between all sequences were calculated using MEGA. Estimations of evolutionary distances were done using the maximum composite likelihood (Tamura and Kumar 2002). The rate of variation among sites was modeled with a gamma distribution (shape parameter = 1). The differences in the composition bias among sequences were considered in evolutionary comparisons (Tamura et al. 2004). The evolutionary distances were calculated separately for the CC domain and for the NBS-LRR domains. Based on the distances, a principal coordinates analysis (PCO) was conducted using GENALEX (Peakall and Smouse 2006). Raw precipitation data for each population in the past 20 years was obtained from the Israeli Meteorology Service (Beit-Dagan, Israel) and summarized by SPSS (SPSS Inc.). Monthly and annual means and the variability of rain in the sites of collection of the populations (Table S1) were tested for correlations with diversity indices. Variability of rain was calculated as standard deviation/annual mean.

Linkage disequilibrium (LD) coefficients–r 2 and p values were calculated in TASSEL (Bradbury et al. 2007). For convenience, p values were transformed into -log p (-logP) values. Data were filtered for SNPs with more than 20% frequency of the minor allele. In order to minimize the effect of population structure, a subset of 20 sequences that represents major branches on the MP tree was selected (Fig. S2). This subset was analyzed and compared with the whole data set. Regressions and correlations were analyzed in SPSS (SPSS Inc.). Recombination events in the sequence alignment were detected with RDP software that combines seven different recombination detection methods (Martin et al. 2005). Recombination events were confirmed when at least three methods showed p < 0.01 for the event. The population recombination parameter rho (ρ) (Hudson 2001) and the population mutation parameter theta (θ) (Watterson 1975) were estimated using “rhomap” software in the LDhat package (Auton and McVean 2007). All the 33 haplotypes revealed in the alignment were used in the analysis. All polymorphic sites were included for the calculation of theta. Only sites with two alleles were used for the estimation of ρ. The run parameters were: 1,100,000 iterations, 100,000 iterations of burn-in period and sampling interval at every 100 iterations. Illegitimate recombination events were detected using Dotter (Sonnhammer and Durbin 1995) and Bioedit (Hall 2007) following the pattern described in Wicker et al. (2007). Illegitimate recombination is a process of asymmetric pairing of two dispersed homologous sequences of only a few-bp-length followed by sequence exchange that can result in either duplications or deletions (Devos et al. 2002). Two tests were implemented to test for deviation from neutrality of the haplotypes that resulted from illegitimate recombination. The Slatkin exact test (Slatkin 1994) was conducted using Arlequin software (Excoffier et al. 2005), and Tajima D test (Tajima 1989) was carried out in DNAsp (Rozas et al. 2003).

Resistance tests

Ten-day-old seedling of all T. dicoccoides accessions were tested for leaf rust resistance as described by Schachermayr et al. (1995) using isolate BRW97512-19, which is avirulent to the Lr10 resistance gene in bread wheat (T. aestivum). Seedlings were sprayed with the rust isolate suspended in Soltrol 170 oil (Chevron Phillips Chemical) and incubated overnight at high humidity (90%) and low temperature (16°C). Visual scoring of the phenotypes were performed 10 days after infection on a 0–4 scale (0 = resistant; 4 = susceptible) (McIntosh et al. 1995).

Results

A total of 100 accessions of T. dicoccoides from 12 populations were tested for the presence of Lr10 (Table 1). In 95 accessions Lr10 was present as determined by PCR amplification of at least three primer pairs listed in Table 2. The full-length 4 kb Lr10 from 58 accessions was cloned and sequenced. First, reads were assembled into contigs for each accession and then DNA and protein sequence alignments were generated. The sequences were deposited in GenBank (accessions GU393247-GU393304). Four accessions, originating from two populations (Jaba and Givat- Koach), had a 1.2 kb deletion of the NBS domain and therefore were excluded from the diversity analysis. Three sequences obtained from Mt. Hermon accessions had a premature stop codon at position 130.

Sequence diversity

Out of 54 Lr10 sequences, 33 different haplotypes were detected in the alignment; 332 sites out of 4,114 sites were polymorphic, and the nucleotide diversity (π) was 0.029. Within-population nucleotide diversity for coding regions was calculated for nine populations that were represented by at least three accessions in the alignment. Previous studies have shown that the CC domain is the most diverse region of the Lr10 (Feuillet et al. 2003; Loutre et al. 2009). Therefore, the analysis was conducted on the CC domain and on the NBS-LRR domains separately. Diversity values for synonymous and non-synonymous substitution rate (d S and d N, respectively) of the NBS-LRR domains differed significantly between populations (p = 0.04) (Table 3). The highest diversity was observed in the population from Kokhav-Hashahar (d S = 0.036, d N = 0.021) while the lowest diversity was observed in the population from Beit-Oren (d S = 0, d N = 0; Table 3). The overall d S value was 0.0295 while d N value was only 0.0182. The difference d S-d N was mainly positive and varied between populations, ranging from −0.0003 in the Gamla population to 0.0154 in the Kokhav-Hashahar population. Z tests for purifying selection based on d S-d N values revealed significant purifying selection in the populations of Tabigha, Mt. Gilboa, Gitit and Kokhav-Hashahar (p < 0.05) (Table 3). The mean d S value of these populations was significantly higher than the mean d S value of the rest of the populations (d S = 0.030 and 0.008, respectively, p = 0.016), while no significant difference was observed for d N values (d N = 0.017 and 0.006, respectively; p = 0.11). The population diversity values, d S and d N, were tested for correlation with previously obtained simple sequence repeat marker (SSR) gene diversity (He) values of the same populations (Peleg et al. 2008b) (Table S1). d S and d N were positively correlated with He values (r = 0.73, p = 0.027 and r = 0.76, p = 0.021, respectively) (Fig. 2a, b). In the CC domain, mean population d S and d N values were positively correlated with the NBS-LRR d S and d N values (r = 0.81 p = 0.008 and r = 0.80, p = 0.048, respectively). The overall d S value of the CC domain was 0.069, which is more than twofold higher than the NBS-LRR d S value (0.0295), and the overall d N value for the CC domain was 0.094, which is fivefold higher than NBS-LRR d N value (0.0182). In contrast to NBS-LRR values, d S-d N values of the CC domain were negative. The overall d S-d N value was −0.025 and ranged among the populations from −0.05 in Kokhav-Hashahar population to 0 in Beit-Oren population. In four populations (Mt. Hermon, Mt. Gilboa, Kokhav-Hashahar and Amirim) a significant positive selection was detected.

Table 3 Nucleotide diversity values d N and d S of T. dicoccoides populations
Fig. 2
figure 2

Correlations of diversity indices and rain variability. a Correlation of d N values of the populations with SSR He (gene diversity) of the same populations (Peleg et al. 2008b). b Correlation of d S values of the populations with SSR He. c Correlation of d S-d N values of the populations with rain variability (SD of annual precipitation/mean annual precipitation). d Correlation of d S values of the populations with rain variability

Climatic factors associated with diversity

The associations between climatic factors and selection have revealed that populations with putative purifying selection in the NBS-LRR domains are associated with low precipitation and high rain variability. In the sites where those populations were collected, the annual precipitation ranged from 375 mm to 432 mm and rain variation ranged from 0.34 to 0.46, while in the other sites the ranges were 535–1,300 mm and 0.29–0.32, respectively (Table S1). NBS-LRR d S and d S-d N values of the populations were positively correlated with rain variation in the collection sites (Spearman rank correlation r s = 0.79, p = 0.011 and r s = 0.84, p = 0.005, respectively) (Fig. 2c, d).

Principal coordinates analysis PCO

Based on the sequence distances of the NBS-LRR domains and the CC domain, two PCO graphs were plotted (Fig. 3a, b, respectively). Sequences that had the same coordinates were considered as one haplotype. In the plots, many haplotypes were represented by two or more genotypes of the same population and/or genotypes from more than one population. Nevertheless, in most of the cases, populations represented by the same haplotype did not share similar geographic or climatic characteristics. Moreover, different haplotypes of the same population were scattered on the plot and not clustered together. Four T. urartu sequences (Loutre et al. 2009) that were included in the plot did not cluster together. The PCO plots of the CC and the NBS-LRR domains showed different patterns. In the CC plot, haplotypes were scattered all over the plot with some clustering but with no obvious eco-geographic relations between the clustered populations. In the NBS-LRR plot, a clear division into two groups was observed. Seven populations and the T. urartu accessions were represented in both groups, while four populations were represented by only two haplotypes that were almost identical. The main difference in the two groups of sequences lies in the LRR domain where two well conserved, ~400-bp-long, haplotypes exist. These haplotypes diverge from one another. Tajima’s test (Tajima 1989) has shown that in a stretch of 114 bp sequence within these haplotypes there is a deviation from neutrality, suggesting that balancing selection was involved in shaping the diversity (D = 2.46, p < 0.05).

Fig. 3
figure 3

Principal coordinates analysis (PCO) based on the genetic distances between Lr10 sequences of T. dicoccoides accessions from Israel. a PCO based on NBS-LRR domains. b PCO based on CC domain. Diamonds represent position of haplotypes. Numbers near the diamonds are the populations represented in each haplotype x number of accessions represented. Population numbers are according to Table 1. TA is ThatcherLr10, the first cloned Lr10 from T. aestivum cv. ThatcherLr10 (Feuillet et al. 2003). TU are T. urartu accessions (Table 1). Arrows connect the most frequent haplotypes to their position

Linkage disequilibrium analysis

A previous analysis of population structure showed that T. dicoccoides populations are well structured (Peleg et al. 2008a). Therefore, in order to avoid the bias of the population structure on LD, a subset of 20 sequences was selected from the MP phylogenetic tree (Fig. S2). LD values (-logP and r 2) between all pairwise comparisons of polymorphic sites were plotted against the physical distance (bp) between sites in the whole set and in the selected subset (Fig. 4). For the whole set, locally weighted scatterplot smoothing (LOESS) curves of the association values, -logP and r 2, declined within 3 and 2 kb, respectively (Fig. 4a, b). For the subset, the decline was much steeper with sharp LD decay found within only 1 kb. The differences in LD decay patterns between the whole set and the subset were more profound for -logP values than for r 2 values. Curve estimation of regression models showed that the logarithmic model is the best one to describe the relationships between LD values and physical distance. LD-r2 values of the whole set and the subset were correlated with distance (R 2 = 0.40 and R 2 = 0.41, respectively, p < 0.001), while -logP values were correlated for the subset (R 2 = 0.40, p < 0.001) but much less for the whole set (R 2 = 0.16, p < 0.001). LD plots along the gene showed the same pattern (Fig. 5). In the plot of the whole set, low p values were found away from the diagonal line, indicating associations that are not a result of physical proximity. This pattern was not noted either for p values of the subset or for r 2 values of the whole set and the subset. In these plots, small LD blocks (<250 bp) were observed along the diagonal with the exception of a larger block of ~500 bp at the 3′ end of the gene. This block lies in the LRR domain in the region where the two well-conserved haplotypes mentioned earlier were observed.

Fig. 4
figure 4

LD values of pairwise comparisons between polymorphic sites plotted against their physical distances (bp). LOESS curves are in blue, logarithmic regression curves are in red. a, b -log P and r 2 values calculated for the whole dataset (60 accessions), respectively. c, d -log P and r 2 values of the subset (20 selected accessions), respectively

Fig. 5
figure 5

LD plot of p and r 2 values of pairwise comparisons between polymorphic sites. a Values for the whole set. b Values for the subset. Upper triangles are r 2 values; lower triangles are p values. Only polymorphic sites (f > 0.2) are represented

Recombination detection analysis implemented in RDP revealed 17 highly significant recombination events (confirmed by at least three detection methods (p < 0.01). These events resulted in 22 different combinations of recombinant fragments (Fig. 6). The distribution pattern of breakpoints was different between the CC domain and other domains. While in the CC domain most of the break points were found within the domain, in the NBS and LRR domains most of the breakpoints were located near the borders of the domain. To test the significance of this observation, the distance of the breakpoints to the nearest border was calculated and transformed to a relative distance as a fraction of the domain size. The distances in the CC domain were significantly higher than in the NBS-LRR domains (mean 0.6 and 0.25, respectively; p = 0.004). A significant difference (p = 0.001) was found in the length of the recombinant fragments between the domains. In the CC domain fragments were short, ranging from 23 to 361 bp (mean 177 bp) while in the NBS and LRR domains fragments were longer, ranging from 349 to 3,020 bp (mean 1,246 bp). Only one breakpoint was found in the intron region. All events occurred between sequences obtained from different populations and were detected in 2–30 pairs of sequences (mean 11). The estimation of population recombination parameter rho (ρ) was 0.012 per base pair and the estimation of population mutation parameter, theta (θ), was 0.021 per base pair. The ratio between recombination and mutation rates (ρ/θ) was 0.59.

Fig. 6
figure 6

Schematic sequence display of recombination events detected by RDP (Martin et al. 2005). Full length bars are the acceptor (“daughter” in RDP terms) sequence. Short bars below each full length bar are recombinant fragments (“parent” in RDP terms). Different colors represent different representative accessions. Vertical lines are the domains borders

Ten illegitimate recombination sites were detected in the alignment. Three events have resulted in deletions of 20–26 bp in the intron region and four more resulted in tri-nucleotide deletions that were scattered along the gene. These deletions were present in 2–32 of the sequences. One repeat of 81 bp in the CC domain was found only in one sequence. Another repeat of 32 bp was found in the intron region of all sequences. This repeat was about 40% degenerated and a subsequent deletion was nested within it indicating that this region is prone to illegitimate recombination (Fig. 7a, b). None of the illegitimate recombination events observed in the alignment had caused frame shift. A region with high indel variation was revealed between the CC and the NBS domains (positions 380–460) where six indel haplotyps were observed (Fig. 7c). A repeat of 24 bp in the region, caused by illegitimate recombination, and subsequent recombination errors formed a 54-bp difference between the shortest and the longest sequences. However, none of these indels have caused any frame shift. This region is a linker between the CC and the NBS domains. Slatkin’s exact test and Tajima’s D test did not reveal a significant deviation from neutrality in this region.

Fig. 7
figure 7

Illegitimate recombination and indel diversity. a A typical deletion caused by illegitimate recombination. The upper sequence consists of a tri-nucleotide repeat (bold). The sequence between the tri-nucleotide repeats was deleted in the lower sequence. b An example of a degenerated repeat present in all sequences. A sequence aligned against itself. Bold letters are the short hepta-nucleotide repeat that is the cause of the illegitimate recombination. Underlined letters are the tri-nucleotide repeat marked in bold in Fig. 7a. c High indel polymorphism at the linker region between CC and NBS domains (alignment of amino acid sequence). Upper case residues are identical or similar to the consensus. Lower case residues are dissimilar. Lower case x in the consensus sequence are non-consensus residues. Bold and underlined residues are part of a repeat

Resistance tests

Resistance tests were conducted with the leaf rust isolate BRW97512-19, which is avirulent to some Lr10 alleles from T. durum and T. aestivum. All 100 wild emmer wheat accessions tested in the current study showed high infection types (3–4) with this isolate.

Discussion

In the current study, the sequence diversity of Lr10 within and between T. dicoccoides populations from Israel was estimated, the LD and recombination patterns along the Lr10 gene were analysed, and the processes shaping the diversity and the LD were revealed.

Lr10 diversity within and between populations

The diversity revealed in the current paper is of the same magnitude and distribution of synonymous and nonsynonymous substitutions along the gene as observed by Loutre et al. (2009). T. dicoccoides populations showed large differences in diversity. These differences could be caused by the population demographic history, e.g., bottlenecks, or by selection pressure of the pathogen. The d S and d N values of the NBS-LRR domains obtained in this study were strongly correlated with the diversity index He obtained for SSR markers in the same populations studied earlier (Peleg et al. 2008b). This correlation suggests that demographic processes have a large effect on Lr10 nucleotide variation, since they act similarly on all loci. However, the negative d S-d N values in the CC domain obtained in all populations indicate that this domain is under positive selection. Such positive selection in the CC domain might indicate that this domain is interacting with pathogen effectors as suggested by Loutre et al. (2009) for Lr10 and by Bergelson et al. (2001) in general. The d S-d N values differed several-fold between populations both in the CC domain and in the NBS-LRR domains. In the NBS-LRR domains, a significant d S-d N > 0 difference in four populations was observed, suggesting the involvement of purifying selection. However, the significant d S-d N > 0 values in the populations resulted from high d S values and not from low d N values. Therefore, the results may not reveal higher purifying selective pressure in the populations but rather higher diversity in the populations. Nevertheless, purifying selection may have acted on the NBS-LRR domains and its signature maintained in these populations while decaying in the other populations. The populations with significant d S-d N > 0 values originated in the transition zone between Mediterranean and steppe climate with annual precipitation of less than 450 mm and rain variability higher than 0.34. The high d S values may reflect the overall high genomic variability in this climatic transition zone as suggested by Safriel et al. (1994), Volis et al. (2001), Kark et al. (2004), Kark et al. (2008) and Peleg et al. (2008b). In the CC domain, no clear pattern was observed for the distribution of populations with significant d S-d N < 0 values.

The PCO analysis used in the current study showed that the revealed diversity is old and probably existed before populations were separated to the current status, since haplotypes from the same populations did not form clusters. Furthermore, the haplotypes of T. urartu, the donor of the A genome to T. dicoccoides, are scattered and are not clustered. This pattern was also observed in Loutre et al. (2009) where different Triticum species shared the same Lr10 haplotypes. The different pattern observed between the CC PCO and the NBS-LRR PCO could point to different selection pressures and/or a high recombination rate between the domains. The old origin of diversity is supported by the observation that most recombination events were detected between sequences obtained from different populations, implying that the resulting haplotypes were formed before populations separated. Significant gene flow could also contribute to low population differentiation but then one would expect geographical relationships between clustered accessions and this is not the case. Moreover, T. dicoccoides is a selfer with low gene flow rate (Golenberg and Nevo 1987). The genetic distance between T. dicoccoides populations is not correlated with geographic distance, contrary to the expectations for situations where gene flow is substantial (Peleg et al. 2008b; Sela et al. 2009; Haudry et al. 2007). The ancient diversity observed in the current study supports a long evolutionary history of Lr10–pathogen effector interaction in the “trench warfare” model where polymorphism is maintained by alleles that are recycling between high and low frequencies shifted by frequency-dependent selection (Holub 2001; Stahl et al. 1999). The long coalescent time is also supported by the significant Tajima test of the two haplotypes in the LRR domain suggesting balancing selection. These findings of old polymorphism are in agreement with the observations of Loutre et al. (2009).

Resistance tests with a leaf rust isolate which is avirulent to Lr10 of bread wheat did not detect any Lr10-dependent resistance in the collection of T. dicoccoides tested in the current study. Therefore, it was impossible to assess the direct impact of mutations of the different haplotypes on gene function and specificity. However, in most accessions, an intact open-reading frame was observed and the gene was expressed in all eight accessions that were tested for Lr10 mRNA (data not shown). These findings suggest that T. dicoccoides Lr10 haplotypes are functional against unknown leaf rust isolates present now, or that were present in the recent past.

Recombination and linkage disequilibrium

The alignment of Lr10 sequences obtained from 54 accessions of T. dicoccoides revealed many recombination events. In the CC domain there were mostly very short recombinant sequences that are typical of gene conversion (Jeffrys and May 2004), while the NBS and LRR domains exhibited longer sequence exchanges that could result from crossing over events. These results are in agreement with LD analysis, which showed an LD block at the 3′ end (Fig. 5). Recombination events were detected between Lr10 sequences that were derived from different T. dicoccoides populations and not within populations. This may suggest that recombination events took place prior to the current separation between T. dicoccoides populations. Old recombination events imply that selective pressures may have an effect on the recombination outcome. Shuffling haplotypes at the CC domain enhances diversity and may have a selective advantage if this domain is involved in pathogen recognition as predicted. In the LRR domain, two well-conserved haplotypes were observed. These two haplotypes have a long evolutionary history and can be tracked back to T. urartu, the A genome donor of T. dicoccoides (Loutre et al. 2009). It seems that recombination is suppressed in this region. Apparently, there is a tendency not to shuffle haplotypes within the NBS and LRR domains, since most of the breakpoints in the region are restricted to the domain borders. Michelmore and Meyers (1998) proposed that inter-allelic recombination enhances diversity in the LRR domain of R-genes. This expectation was validated by Kuang et al. (2004) and Yahiaoui et al. (2006). The data presented here about the LD block in the Lr10 LRR domain do not match this expectation, but in the CC domain this expectation can be met. The LD block in the Lr10 LRR domain is similar to the LD block that was observed in the Rps2 LRR domain from Arabidopsis thaliana: both blocks are located around a hot spot of diversity. The Rps2 hot spot of divergence differentiates between resistant and susceptible clades while the Lr10 hot spot does not distinguish between resistant and susceptible haplotypes (Mauricio et al. 2003; Loutre et al. 2009). It could be because the A. thaliana Rps2 block is located in the 5′ region of the LRR domain while in the current Lr10 study the LD block is located in the 3′ region. The two parts of the LRR play a different role in the protein function (Lukasik and Takken 2009).

Understanding LD decay patterns is essential when conducting association studies or for estimating rates of recombination. Population structure can lead to spurious LD associations, especially in selfing species (Ewens and Spielman 1995). In the current study r 2 values were better predictors of physical distance than LD p values since they were less affected by population structure. Choosing a subset of sequences from a tree proved to be a good method to avoid population structure bias. This approach is somewhat similar to the method implemented by Breseghello and Sorrells (2006), even though they have removed only very similar haplotypes, and in the current study only distant haplotypes were selected. Taking into account that T. dicoccoides is a selfer, LD decay in Lr10 was very rapid, within 1–2 kb. This LD extent was also found in wild barley, Hordeum spontaneum, a relative of wheat and a selfer as well (Morrell et al. 2006; Caldwell et al. 2006). Whole genome LD studies in the selfer A. thaliana have shown a longer extent of LD, up to 250 kb (Nordborg et al. 2002). LD in bread wheat collections extended 1–5 cM (Chao et al. 2007; Horvath et al. 2009; Somers et al. 2007). Longer LD extents may be due to longer stretches between the sampled loci that overlook local gene conversion events and only display crossing over events (Plagnol et al. 2006). This situation creates a paradox, where high-resolution analysis, on a base-pair scale, shows a steep decay in LD, while in low-resolution analysis, on the centiMorgan scale, the decay is far more moderate (Andolfatto and Nordborg 1998). From a practical point of view, attempts to associate very closely linked SNPs with a trait, especially in the highly diverse R genes, may be hampered by the short LD extent. The rapid decay may be a result of a high number of point mutations or recombination events. In the current study the ratio ρ/θ was lower than 1 (0.59), which may imply that recombination events had a smaller effect than mutations on sequence diversity. Nevertheless, the ρ/θ ratio is fairly high for inbreeders. For reference, the average ρ/θ ratio in inbreeders ranges from 0.05 in A. thaliana to 1.5 in wild barley (Morrell et al. 2006; Nordborg et al. 2002). In the outbreeder wild maize, teosinte (Zea mays ssp. parviglumis), ρ/θ ratio is 4.5 (Wright et al. 2005).

Recombination is a major force enhancing indel diversity as revealed by the numerous events of illegitimate recombination, especially in the intron region of Lr10. One interesting example is the linker region between the NBS and the CC domains of Lr10 where high length variability was observed between six haplotypes. It is not clear if the variability in this region affects the function of the protein. Several studies have shown that the CC domain in R genes can function even when expressed separately from the rest of the protein [e.g. Rairdan et al. (2008) and Moffett et al. (2002)]. Neutrality tests looking for old polymorphism (Slatkin 1994; Tajima 1989) did not reject the neutrality of indel variation of this region but the long and short haplotypes found in all ploidy levels (T. urartu AA, T. dicoccodes BBAA, T. aestivum BBAADD) may indicate that these haplotypes are old and stable (Loutre et al. 2009) .

The results presented here highlight the role of recombination in enhancing diversity in wild populations of the selfing plant species T. dicoccoides. The results support the trench warfare model of old polymorphism in R-genes. The high diversity observed in Lr10 in T. dicoccoides populations emphasizes the need to further explore and conserve this diverse gene pool.