Introduction

A detailed understanding of the behaviour of linkage disequilibrium (LD) is important for association studies to choose appropriate experimental design and determine spacing of markers for the mapping of functional genes of interest (Flint-Garcia et al. 2003; Rafalski and Morgante 2004; Mueller 2004). The resolution of association mapping generally correlates with the pattern of LD extent. Indeed, fine-scale mapping is needed in the case of limited LD, whereas the extensive LD allows underlying traits to be detected more readily using a small number of markers (Mackay and Powell 2007). As stated in review papers from Flint-Garcia et al. (2003), Gupta et al. (2005) and Rafalski and Morgante (2004), several factors influence LD in populations. Recombination is the unique factor that diminishes intrachromosomal LD, whereas other factors, i.e. mutation, migration, genetic drift and selection, all create LD. Moreover, the reproduction system can also influence levels of LD, autogamous species showing higher LD extent than allogamous species.

Several LD studies have been previously conducted in plants, both in natural and domesticated populations. In natural populations, the model plant Arabidopsis thaliana has been evaluated for LD levels and showed a decrease of LD within approximately 1 cM or 250 kb (Nordborg et al. 2002). In wild tomato populations, Arunyawat et al. (2007) showed that intragenic LD decays very rapidly with physical distance, suggesting high recombination rates and effective population sizes in the two investigated species. First investigations of forest tree natural populations have also been carried out. A number of recent studies in coniferous species (Brown et al. 2004; Kumar et al. 2004; Krutovsky and Neale 2005; Heuertz et al. 2006; Gonzalez-Martinez et al. 2006) and Populus (Ingvarsson 2005) described a rapid decay of LD, most probably explained by large populations sizes (Neale and Savolainen 2004; Savolainen and Pyhäjärvi 2007). However, conifers would tend to exhibit lower recombination rates than angiosperms (Jaramillo-Correa et al. 2010).

In addition, the studies of LD in domesticated plants are numerous (Kraakman et al. 2004 for barley; Barnaud et al. 2006 for grapevine; Remington et al. 2001, Tenaillon et al. 2001, Jung et al. 2004, Stich et al. 2005 for maize; Garris et al. 2003 for rice; Hamblin et al. 2004 for sorghum; Zhu et al. 2003 for soybean; Jannoo et al. 1999 for sugarcane; Simko et al. 2006 for potato; Maccaferri et al. 2005 for wheat). Extensive LD is expected in these species because they have undergone bottleneck, domestication and modern breeding during their evolutionary history. A slow LD decay against distance is indeed observed in several species (10 cM for sugarcane, 100 kb for rice), but a rapid decay of LD (within 1 kb) was found in maize. These contrasted patterns of LD found in domesticated species may be explained by different reproduction systems (for instance, rice is a selfing species, whereas maize is an outcrossing species) and/or complex evolutionary histories (admixture). Different sampling strategies and important population structure may account for different LD estimations among studies. Similarly, studies that have investigated LD at the gene level or at the whole genome level and with different molecular tools have also revealed contrasted results. Remington et al. (2001) showed that the level of genome-wide LD found using SSR markers was much higher than that found with SNPs in maize. As argued by these authors, this may be due to a higher frequency of mutations in SSRs than in SNPs that arose during the development of regional maize subpopulations after domestication.

Though the number of LD studies progressively increases in domesticated plants, several main issues still require further investigation. First, increasing the number of studied genera would allow inter-genera comparisons. Second, domestication and modern breeding effects need to be clarified, especially to take into account its impact before establishing association populations.

Prunus avium L. is a diploid fruit tree (2n = 16) that consists of wild cherry and its domesticated form (sweet cherry). Wild cherry is distributed throughout Europe and has been a fresh food supply for human consumption for thousands of years (Zohary and Hopf 2000). It is thought that human probably picked wild cherry in forests long before its cultivation. Several pieces of evidence reported that sweet cherry was present in Europe in Roman and early mediaeval times (Šoštarić and Küster 2001; Rösch 2008). The domestication history of sweet cherry is still unclear, though it is most likely that several events of domestication happened for sweet cherry or/and intense gene flow occurred, explaining the actual structure observed between wild and sweet cherries (Mariette et al. 2010). Morphologically, wild and sweet cherries are very similar in the fruit forms, the most obvious difference being probably the size of the fruits (Zohary and Hopf 2000). Conventional quantitative trait loci (QTL) studies based on linkage mapping have focused on the identification of functional genes of interest, e.g. genes responsible for fruit size in cherry (Clarke et al. 2009; Zhang et al. 2010). However, this linkage mapping approach has limitations such as being cost- and time-consuming and poor in QTL resolution. Association mapping may be a promising alternative approach to detect genes underlying traits in cherry, using germplasm structure and the extent of LD as fundamental information for designing the association application (Mackay and Powell 2007; Slatkin 2008). To our knowledge, information on LD in cherry, and more generally in fruit species and in their wild relatives, is still poorly investigated. It is, therefore, necessary to dissect the LD properties in cherry in order to facilitate appropriate association analysis in these species.

In this study, we employed population genetics analyses, using 35 microsatellite markers and the gametophytic self-incompatibility locus, to explore the pattern and magnitude of LD in wild and sweet cherries, all over the 8 linkage groups (LGs). Specifically, we focused on three main objectives: (1) to estimate the differences of LD extent between wild and sweet cherries, (2) to emphasise the effect of sample size, population structure and relatedness on LD estimates and (3) to evaluate its potential implication on future association studies based on LD patterns.

Materials and methods

Plant materials

A total of 212 French wild cherry individuals were collected at the INRA Orleans collection. Our sample was designed in order to cover most French regions, but did not represent the whole species diversity.

A total of 142 sweet cherry landraces (called landraces hereafter) and 66 sweet cherry modern varieties (called modern varieties hereafter) were obtained from 3 collection centres, including the INRA Bordeaux Prunus Genetic Resources Centre, the INRA Bordeaux sweet cherry breeding collection and the Interprofessional Technical Centre for Fruits and Legumes (CTIFL) collection. The landrace sampling was designed in order to represent European sweet cherry germplasm (e.g. France, Belgium, Czech Republic, Germany, Hungary, Iran, Italy, Romania and Spain); however, a majority of the samples originate from France. Some landraces are of unknown origin. The group of modern sweet cherry varieties selected for this study was defined in order to represent major breeding programmes worldwide (e.g. Australia, Canada, the Czech Republic, France, Italy and USA).

Selection of marker loci and genotyping

We used 35 microsatellite markers, including 21 dinucleotide and 11 other nucleotide repeats and 3 markers for which the number of repeats is not known. Most of the genotypic data used in the current analysis were already available from our previous study (Mariette et al. 2010) and 10 additional SSR markers were further added while assessing LD. The gametophytic self-incompatibility locus, described by Sonneveld et al. (2006) and Vaughan et al. (2006), was included in this study. The alleles of the S-RNase and S-pollen (SFB) genes were amplified using primers developed by Sonneveld et al. (2006) and Vaughan et al. (2006). Allele sizes were revealed on a sequencing machine and allele numbering was deduced following Vaughan et al. (2006). Information obtained for this locus was statistically analysed for LD estimation, similarly to the microsatellite markers. The marker information is given in Table 1. Briefly, for genotyping, genomic DNA was isolated from fresh leaves using DNeasy® 96 plant kit from QIAGEN (Hilden, Germany). Multiplex PCR was performed with the QIAGEN Type It Microsatellite PCR Kit® (QIAGEN, Hilden, Germany). Marker genotypes were determined on the ABI 3730 sequencer, and the Genemapper software (Applied Biosystems, Foster City, CA, USA) was used to estimate allele sizes.

Table 1 Marker information

Definition of samples

It is known that population structure may create spurious LD between unlinked markers (Flint-Garcia et al. 2003). In a previous study (Mariette et al. 2010), the structure analysis (Pritchard et al. 2000) clearly separated wild and sweet cherry groups and further separated two sub-groups in sweet cherry modern varieties and three sub-groups in sweet cherry landraces, whereas wild cherry population was defined as an unstructured group. A group of landraces was admixed. Then, the LD analysis was conducted independently for wild cherry, sweet cherry, landraces, sweet cherry modern varieties and for each cherry population defined by structure (sub-groups 1, 2 and 3 for landraces and sub-groups 1 and 2 for sweet cherry modern varieties). Note that extremely admixed landrace individuals shown in Mariette et al. (2010) were not analysed as a sub-group of landraces in the current study. Sample sizes for each group and sub-group are given in Table 2.

Table 2 Number of markers and sample size for each cherry group and sub-group

Linkage disequilibrium analyses

When using SSR data from diploid individuals to estimate LD, haplotype and gamete frequencies are generally not known. Consequently, we adopted two approaches to estimate LD in the different groups identified. First, we used two different methods without haplotype reconstruction. Second, after haplotype reconstruction, LD was estimated based on the most probable haplotypes. The distribution of allele frequencies may have an effect on the extent of LD; therefore, rare alleles (less than 5% allele frequency) were excluded prior to further LD analyses. The studied loci were sorted within each group following their respective order on each LG, as shown in Table 1. The list of analysed markers in each group is based on the available genotyping data; therefore, the number of markers may be slightly different from one group to another (for more details, see Table 2).

Linkage disequilibrium analysis without haplotype reconstruction

Estimates of LD directly based on gametic frequencies cannot be achieved easily in diploid natural populations because gamete frequencies are not known. To circumvent this problem, we selected two methods that were proposed in the literature.

First, considering the pairs of loci, we tested for the presence of association between genotypes at both loci using a G test calculated on contingency tables, as calculated by Genpop (Raymond and Rousset 1995; Rousset 2008).

Second, Cockerham and Weir (1977) proposed a composite LD to assess LD on diploid data for which gamete frequencies are not known and for which random mating cannot be assumed. A correlation coefficient is then derived and corresponding levels of significance can be calculated (Weir 1979). Correlations and levels of significance were assessed using the modification of the correlation as proposed by Garnier-Gere and Dillmann (1992) in the LinkDos program (this correlation will be called common correlation hereafter).

The analyses were performed using web versions of Genpop (http://genepop.curtin.edu.au/) and LinkDos (http://genepop.curtin.edu.au/linkdos.html).

Linkage disequilibrium analysis after haplotype reconstruction

We also used haplotype reconstruction in order to assess LD from haplotypic data. For this purpose, we reconstructed probable haplotypes within each LG using a Bayesian approach as implemented in the PHASE 2.1 version software (Stephens and Donnelly 2003). LD was estimated by using squared allele frequency correlations (r 2) and standardised disequilibrium coefficients (D′). While r 2 reflects both recombinational and mutational history, D′ only reflects recombinational history. r 2 depends on differences in allele frequencies at two sites and is generally favoured for potential allele–trait associations (Flint-Garcia et al. 2003). Estimation of LD (r 2 and D′ estimators) based on haplotypic data was performed using the TASSEL 2.1 version software (Bradbury et al. 2007). The significance level of LD (p value < 0.01) was assessed by 10,000 permutation tests as implemented in TASSEL version 2.1.

For genotypic and haplotypic LD estimations, heat maps of probabilities were then realised using the heat map function in the R statistical package (http://www.r-project.org/).

Consecutively, correlations among estimations were assessed both for all pairwise LD values and for mean LD values.

For comparing LD significance among groups, we used an experiment-wise first-type error rate of 5%. Because multiple comparisons were performed, we applied Bonferroni’s correction.

Effect of the sample size and relatedness on LD in groups and sub-groups

Effect of the sample size

Sample size can affect LD estimations significantly (Teare et al. 2002). We thus bootstrapped sub-samples of the same data to assess the sample size effect on the extent of LD (Teare et al. 2002). Using the R software, we bootstrapped sets of 23, 43 and 66 samples in the sweet cherry modern varieties group and sets of 28, 34, 44 and 142 samples in the landraces group. D′ and r 2 were estimated over 1,000 replicates for each sample size using the LD pipeline from the TASSEL 2.1 software.

Effect of relatedness

The existence of relatedness within groups can result in increased LD and should thus be controlled in association studies (Slatkin 2008). We estimated relatedness within groups and sub-groups by using a pairwise kinship matrix as assessed by the TASSEL 2.1 software. Correlations were calculated between the mean kinship and mean LD levels, as measured by D′ and r 2 estimators.

Estimation of LD decay

In general, LD is negatively correlated with genetic distance. To examine the decline of LD, we computed mean LD values at 10-cM intervals. We then tested for LD decay using a correlation between LD and pairwise distance (log10 transformed; McRae et al. 2002). In addition, we plotted the LD estimates obtained with TASSEL against genetic distances (in centimorgans) between marker pairs. We performed the non-linear least squares (nls) function in the R statistical package (http://www.r-project.org/) to obtain the expected decay of LD by fitting the observed data in the following formula (Heifetz et al. 2005), which is modified from the study of Sved (1971). Assuming drift–recombination equilibrium:

$$ {\text{LD}} = 1/\left( {1 + 4bd} \right) + e $$

where LD is the observed LD between marker pairs in relation to the genetic distance, d denotes the genetic distance between marker pair, b is the coefficient parameter of decay calculated using the nls function and e refers to the model residual obtained.

r 2 is generally preferred to quantify the extent of “useful” LD for association studies. The threshold suggested by Kruglyak (1999) was r 2 = 0.1. We chose to take a threshold of 0.5 for D′ since the observed D′ values are generally at least five times higher than r 2 in our study (see the results part of Table 3).

Table 3 LD decay with distance for wild and sweet cherry groups

Results

LD estimation: with and without haplotype reconstruction and comparison of estimators

Before comparing LD estimators among biological groups, we compared estimations with and without haplotype reconstruction and we compared estimations with the three different estimators (common correlation on genotypic data, r 2 and D′ on haplotypic data).

LD estimated with the common correlation estimator based on unphased data was generally lower than LD estimated with D′ based on haplotypic data (data not shown). Moreover, LD estimated with D′ was higher than LD estimated with r 2 (Tables 3 and 4).

Table 4 LD decay with distance for landraces and modern sweet cherry varieties

Mean LD estimations were significantly correlated (Pearson’s correlation between r 2 and D′ = 0.908, p value < 0.001; Pearson’s correlation between r 2 and the common correlation = 0.850, p value < 0.001; Pearson’s correlation between D′ and the common correlation = 0.863, p value < 0.01). Estimations were also significantly correlated at the 0.001 level within each group and each sub-group, except for landrace sub-group 2, while comparing D′ with the common correlation.

As for the significance of disequilibrium, trends were stable among the three methods, shown in Fig. 1a–i for the comparisons between p value obtained with haplotypic data (Tassel analysis) and p value obtained with genotypic data (LinkDos analysis) and in Supplementary Fig. 1a–i for the comparisons between p value obtained with haplotypic data (Tassel analysis) and p value obtained with genotypic data (G test on contingency tables).

Fig. 1
figure 1figure 1figure 1figure 1figure 1

Comparison of LD p values obtained with the Tassel software on haplotypic data (upper right of the matrix) and with the LinkDos software on genotypic data (lower left of the matrix). Black squares represent a p value between 0 and 0.0001, dark grey squares represent a p value between 0.0001 and 0.001, light grey squares represent a p value between 0.001 and 0.01 and white squares represent a p value between 0.01 and 1. a Wild cherry, b sweet cherry, c sweet cherry landraces, d sweet cherry modern varieties, e sweet cherry landraces sub-group 1, f sweet cherry landraces sub-group 2, g sweet cherry landraces sub-group 3, h sweet cherry modern varieties sub-group 1, i sweet cherry modern varieties sub-group 2

Since congruent results were obtained between the haplotypic and genotypic analyses, we decided to report our results obtained using the D′ and r 2 estimators based on reconstructed phase data.

Magnitude of LD: comparison among groups

Effect of domestication: wild versus sweet cherry

LD values estimated in sweet cherry were higher than LD values estimated in wild cherry, showing significant LD levels for 50% and 8.73% of all locus pairs in sweet cherry and in wild cherry, respectively (Table 3).

Effect of breeding: sweet cherry landraces versus sweet cherry modern varieties

Mean LD values against genetic distance range were high for both landraces and modern varieties (Table 4). In landraces, 39.16% of all marker pairs exhibited significant levels of LD estimate, while 14.53% of all marker pairs showed significant LD in modern varieties.

Effect of sample size

LD decreased with sample size for both bootstrapped samples and observed data (Fig. 2a, b), except for the modern varieties sub-group 1, for which a slight increase was observed (sample size = 43; Fig. 2b).

Fig. 2
figure 2

a Effect of sample size on the mean of r 2 and D′ for all locus pairs, obtained by re-sampling. Points represent sample sizes of 28, 34, 44 and 142 for landraces and sample sizes of 23, 43 and 66 for modern varieties. b Effect of sample size on the observed mean of r 2 and D′ estimated with all locus pairs. Points represent sample sizes of 28, 34, 44 and 142 for landraces and sample sizes of 23, 43 and 66 for modern varieties

The LD decline depending on sample size partly explained the difference in LD in between groups and sub-groups. LD estimated with r 2 increased more for bootstrapped samples than for observed data, except for sub-group 1 of landraces (sample size = 28) and for modern varieties sub-group 1 (sample size = 43), for which the contrary was observed (Fig. 2a, b; Table 5). Similar LD increases were observed between observed samples and bootstrapped samples for LD estimated with D′ (Fig. 2a, b; Table 5).

Table 5 Bootstrapped and observed relative r 2 and D′ values in landraces and modern varieties sub-groups

In conclusion, for sub-group 1 of landraces (sample size = 28) and for sub-group 1 of sweet cherry modern varieties (sample size = 43), the reduction of sample size was not the only possible explanation for the observed LD increase.

Effect of population structure and relatedness

We found a significant impact of remaining population structure on LD estimates in sweet cherry groups. Significant LD between unlinked markers indicates the presence of genetic structure. Thus, in this section, we analysed inter-chromosomic LD. Unlinked marker pairs showed a higher percentage of significant LD in sweet cherry, in landraces and in modern varieties than in wild cherry (Tables 3 and 4; compare Fig. 1a with b–d). When we analysed LD within sub-groups, limited significant LD was observed between unlinked markers in the three sub-groups of landraces as defined by structure analysis in comparison with the pooled data obtained for landraces (Tables 3 and 4). A similar pattern was also shown in the two sub-groups of modern varieties where the significance of LD between unlinked markers was lower in sub-groups than in the pooled modern varieties samples (Tables 3 and 4; compare Fig. 1h–i to d). Nevertheless, the impact of population structure on LD extent was smaller in modern varieties than in landraces, which may be explained by the less complex population structure existing in modern varieties. In particular, landraces showed a higher level of admixed individuals than modern varieties.

A significant correlation was observed between relatedness within groups of samples and mean LD level, both for r 2 and D′ (Fig. 3; Pearson’s correlation = 0.901, p value < 0.001 for r 2 and Pearson’s correlation = 0.740, p value < 0.05 for D′). The lowest kinships and the lowest mean LDs were observed for wild cherry, sweet cherries and landraces. The highest kinships and the maximum mean LDs were observed for landraces sub-group 1 and for modern varieties sub-group 1.

Fig. 3
figure 3

Effect of kinship estimated using Tassel software on the observed mean of r 2 and D′ estimated with all locus pairs

Pattern of LD along linkage groups

We evaluated the pattern of LD along the LGs by considering the significance of LD blocks as presented in Fig. 1a–i and Supplementary Fig. 1a–i. Significant LD estimates were scattered all over the loci. A comparison among markers at less than 10 cM was possible for three LGs (LG1, LG2 and LG6). Interestingly, among all cherry groups compared in this study, we observed significant LD blocks over several loci on LG2 both in groups of landraces and in groups of modern varieties. However, they were absent in wild cherry. In comparison, the significant LD blocks on LG6 (particularly between the GSI locus and the AMPA121 SSR marker) appeared to be common in all cherry groups, which may be explained by the effect of natural selection at the gametophytic self-incompatibility locus on LD extent.

The decay of LD

As expected, strong LD was observed in nearby closely spaced markers, whereas lower levels of LD were found between more distant markers (Tables 3 and 4). For instance, in wild cherry, the estimation of r 2 decreased from 0.070 to 0.014 for markers at short (up to 10 cM) to long (>60 cM) range of genetic distance. LD decay with distance (log10 transformed) was significant for all groups, except for the landraces sub-group 1 and for the D′ estimate for the modern varieties sub-group 2 (Tables 3 and 4).

Predicted values of r 2 declined much more rapidly than predicted values of D′ (Fig. 4a–i and Supplementary Fig. 2a–i for r 2 and D′, respectively). Predicted values of r 2 generally decayed rapidly to less than 0.1 within a range inferior to 10 cM (Fig. 4a–i). There was, however, variation among groups, since the predicted LD decay was less rapid in sweet cherry groups, especially in modern varieties (Fig. 4d), in the landraces sub-group 1 (Fig. 4e) and in the modern varieties sub-group 1 (Fig. 4h).

Fig. 4
figure 4

Plots of r 2 against genetic distance between markers pairs. Curves show non-linear fit of r 2 on distance, following the equation given in the “Materials and methods” section. Estimated parameters of the model are provided on each figure. The dotted line indicates the threshold at r 2 = 0.1. a Wild cherry, b sweet cherry, c sweet cherry landraces, d sweet cherry modern varieties, e sweet cherry landraces sub-group 1, f sweet cherry landraces sub-group 2, g sweet cherry landraces sub-group 3, h sweet cherry modern varieties sub-group 1, i sweet cherry modern varieties sub-group 2

Predicted values of D′ declined more rapidly in wild cherry than in all groups and sub-groups of sweet cherry (Supplementary Fig. 2a–i). The extent of the predicted D′ was particularly important in modern varieties (Supplementary Fig. 2d) and in sub-groups (Supplementary Fig. 2e–i), except in the landraces sub-group 3 (Supplementary Fig. 2g).

Discussion

For cultivated species, it is generally observed that domestication and breeding have strong impact on the level of genetic diversity in populations. This statement was supported by our recent study, showing that breeding was responsible for a non-negligible bottleneck observed with all SSR markers in the sweet cherry modern varieties (Mariette et al. 2010). In addition, domestication and breeding are expected to influence the level of LD within a population. In the present study, we assessed the level of LD using SSRs markers and the self-incompatibility locus in wild and sweet cherries (P. avium L.).

Comparison of LD estimators

LD estimated with unphased data and with phased data was correlated among groups, and trends for LD significance were similar. We decided on presenting our results obtained using the r 2 and D′ estimators based on reconstructed phase data.

The comparison between these two estimators revealed that D′ was always higher than r 2. This difference has already been reported, for example, in maize (Remington et al. 2001), in chicken (Heifetz et al. 2005) and in sheep (Meadows et al. 2008). Rare alleles and unobserved haplotypes are expected to inflate D′ but not r 2 (Meadows et al. 2008). Despite this difference between D′ and r 2, the two estimators gave congruent conclusions when we analysed the patterns of LD among cherry groups. Either one or the other statistic could be thus used in the current study to depict the patterns.

However, the predicted decay of r 2 against genetic linkage distance was more rapid than the predicted decay of D′. Though, again, trends were quite congruent between the two estimators (for example, a more rapid decay in wild cherry than in sweet cherry groups), this discrepancy leads to different conclusions concerning the absolute range of LD decay (around 5–10 cM in wild cherry with r 2 and around 60 cM with D′). Since D′ extents over larger linkage distances, it reduces the power to identify true association. In the following sections of the “Discussion,” we thus preferred to use r 2 for concluding about LD decay and its consequences for association mapping in cherries.

What affects the patterns of LD among cherry groups?

The interplay of several factors influences the patterns of observed LD within species (Flint-Garcia et al. 2003; Rafalski and Morgante 2004). Generally, LD decays more rapidly in outcrossing compared to selfing species because recombination may be less effective in selfing species (Nordborg and Tavaré 2002). Not surprisingly, we detected a rapid decline of LD in our cherry populations, especially in wild cherry, which belong to a strictly self-incompatible species. Nevertheless, the decline of LD in cherry seems to be relatively slow compared to previous studies based on sequence data in self-incompatible species. For example, LD decays to negligible levels within 150–750 bp in wild tomatoes (Arunyawat et al. 2007) and within 0.2–1.5 kbp in maize (Remington et al. 2001; Tenaillon et al. 2001). Besides, the extent of LD we observed in self-incompatible sweet cherry is comparable to the extent of LD recently published in peach that is self-compatible, 13–15 cM (Aranzana et al. 2010). However, the decay of LD with distance in this study is comparable to LD decay in SSR-based studies performed in both wild and cultivated grapevine germplasm (Barnaud et al. 2010). On the whole, direct comparisons of LD extent with other studies should be questioned because of differences in LD measurement parameters and/or molecular markers used in each study. More interestingly, the comparison between wild and domesticated material provide useful information for future association studies.

Although the decay of LD was comparable among groups, important LD differences were detected between wild and sweet cherries. Domestication and breeding are most likely one of the main factors that leads to a higher significant LD detected in sweet cherry than in its wild relative. Population bottleneck is likely a consequence of domestication that may explain the LD extent in sweet cherry, especially for modern varieties for which a bottleneck was indeed demonstrated (Mariette et al. 2010). A significant effect of domestication on the LD extent has been demonstrated in several crop plants due to bottlenecks and genetic drift. For example, LD is more extensive in elite barley cultivars and most likely due to domestication and breeding (Caldwell et al. 2006). LD extended 12 times further in cultivated grape compared to a wild population (Barnaud et al. 2010). Cultivated sunflower exhibited somewhat greater LD extent than in wild sunflower (Liu and Burke 2006), and a similar trend was found between cultivated and wild rice (Mather et al. 2007). Pattern of diversity and LD in modern sunflower cultivars is another example that has been shaped by domestication and breeding bottlenecks (Kolkman et al. 2007). Interestingly, while estimating the LD in Arabidopsis lyrata, authors reported that the population bottleneck is a cause of LD extent in the region linked to self-incompatibility (Kamau et al. 2007).

Nevertheless, our conclusion about the impact of domestication and breeding should be tempered by the fact that the wild sample is of French origin only, whereas landraces and modern varieties represent larger geographical origins. If we had included material originating from the putative geographical basin of domestication (Caucasia) in this study, wild cherry may show population structure that may result in higher levels of LD (see therein our results on the impact of structure).

Using small sample sizes is expected to bias LD estimations (Teare et al. 2002). In this study, LD estimators were overestimated in small sample size groups (sub-groups of landraces and modern varieties). Following a bootstrapping approach, we showed that sample size alone may account for differences of LD among groups and sub-groups. The slower decrease of LD observed in sub-groups than in the pooled data for both landraces and modern varieties may be due, in part, to the small sample size that may reduce statistical power. We, therefore, cannot ignore the effect of small sample size that may influence the significance level of LD in the sub-groups data. Small sample sizes in our study could also explain why we observed similar extents of LD as in the self-compatible peach sub-groups as stated before (Aranzana et al. 2010), since the study in peach was performed in larger samples (224 individuals in the whole sample and 39, 91 and 94 individuals in each sub-group).

Finally, as fully detailed in the two following paragraphs, population structure, relatedness and selection also influence LD estimations in cherries.

Do population structure and relatedness play a role on LD patterns of cherries?

Sweet cherries, particularly landraces, exhibit a complex population structure, whereas wild cherry appear to be an unstructured population, at least at the studied geographical level (Mariette et al. 2010). The presence of population structure in sweet cherry is plausibly one of the factors to explain the difference in LD significance between wild and sweet cherries, especially for unlinked markers. Additionally, the LD significance was obviously higher in sweet cherry, which is a pool of landraces and modern varieties, in comparison with the LD in unstructured wild cherry population. Finally, clearly marked LD significance was detected in landraces and in modern varieties, in comparison with the LD in sub-groups. We thus postulate that complex population structure have a significant impact on the significance of LD in our cherry material. Besides, intra-group relatedness could also explain the differences we noticed. In particular, for two groups (landrace sub-group 1 and modern sub-group 1), higher relatedness may account for higher LD values. It is worth noting that less rapid LD decay was also found in these sub-groups.

In other species, there is considerable evidence that population structure has influence on the magnitude and pattern of LD. For instance, the study of population structure at a large scale in European Arabidopsis accessions showed the existence of population structure among groups and in consequence was suggested to be responsible for the level of LD detected (Ostrowski et al. 2006). Furthermore, previous studies reported that complex population structure may shape patterns of genetic variation and also influence levels of LD in plant species regardless of wild or cultivated varieties, such as in maize, Populus, and barley (Remington et al. 2001; Ingvarsson 2005; Comadran et al. 2009). Finally, population structure and relatedness may also yield spurious associations in association studies (Pritchard et al. 2000; Helgason et al. 2005).

Consequently, the general observed pattern of LD in sweet cherry populations may be explained by genetic structure and intra-group relatedness.

Insights into cherry association studies: potential applications on LD mapping

The differences between LD extents in wild and sweet cherries are of interest for LD mapping application, in particular on LG2 where a significant LD extent was observed in sweet cherry but not in wild cherry. One explanation is the effect of selection on particular genes maintaining the excess of LD in that LG. It is worth noting that a recent QTL study reported that genes responsible for fruit size are located, in “Emperor Francis” cherry linkage map, on LG2 (Zhang et al. 2010). In addition, a QTL linked to fruit size was previously detected on LG2 in other Prunus species, e.g. in sour cherry (Wang et al. 2000) and peach (Quilot et al. 2005). One practical application would be the selection of big fruits in Prunus species.

Similarly, in our study, the gametophytic self-incompatible locus (S-locus) located on LG6 seems to be responsible for the occurrence of a significant LD on this LG. Indeed, several authors reported a similar selection scenario at the S-locus, for example, in Arabidopsis and Brassicaceae species (Takebayashi et al. 2003; Kamau and Charlesworth 2005; Edh et al. 2009).

The rate of LD decline influences the resolution of association analysis and inversely the density of DNA markers needed for functional variation mapping (Nordborg et al. 2002; Rafalski and Morgante 2004; Mackay and Powell 2007; Slatkin 2008). High genotypic and phenotypic variability, as well as the rapid decay of LD in our unstructured wild cherry population, provides a promising insight for future association studies with the purpose of fine-scale genotype–phenotype variation association. A high-density marker genotyping may be required for the mapping of genes of interest in wild cherry. Conversely, a significant extent of LD in some sub-groups of sweet cherry, useful for association analysis, may require fewer markers for QTL mapping. However, the complex population structure evidenced in sweet cherry, especially in landraces, might cause difficulties when using LD to map phenotypic variation. The extent of LD in modern varieties, together with a less pronounced effect of population structure, may make association mapping possible in modern varieties with fewer markers. Nevertheless, the number of markers we assayed is relatively low, and our study did not cover all the LGs. Our results need further investigation with more markers and/or sequence data.

Conclusion

This study provides the first detailed understanding of the magnitude and patterns of LD in cherry. Overall, the significance and extent of LD is greater in sweet cherry than in wild cherry, which is most likely explained by the presence of population structure in concert with the impact of bottleneck associated with domestication and breeding programme in sweet cherry. Our data also demonstrated the role of selection, either natural or artificial, on the occurrence of LD blocks creation. Finally, LD decays rapidly with increasing genetic linkage distance in the analysed cherries, particularly wild cherry, which seems promising for future association studies aiming at mapping phenotypic variation in cherry.