Introduction

Landraces are populations that have evolved in response to human selection in subsistence agriculture. In such systems, traditional farmers have maintained a wealth of genetic variation mediated through human migration, seed exchange, and natural selection (Harlan 1975; Brush 2000). Since landraces pre-date modern plant breeding and its intense purifying selection that produces modern varieties, they constitute an important source of genetic diversity for plant breeding (Frankel et al. 1995). Landraces are therefore the result of evolutionary processes that lead to heterogeneous populations rather than to few superior genotypes (Brown 2000; Ceccarelli and Grando 2000). In selfing species such as barley, this diversity, held together as blocks of genes in chromosomal regions with low frequency of recombination, may confer a specific adaptation to stress environments (Ceccarelli and Grando 2000). Landraces thus provide an interesting model for association mapping to identify genes that control adaptive variation in crop species (Vigouroux et al. 2002; Mazzucato et al. 2008; Bitocchi et al. 2009; Comadran et al. 2009).

The non-random association of alleles at different loci (linkage disequilibrium, LD), namely between a marker locus and a phenotypic trait locus, is the starting point for association mapping studies. Levels of LD depend on the amount and distribution of the genetic diversity, the mating system, selection regimes and recombination events in the ancestry of the genotypes. High LD levels arise from inbreeding, small population size, population structure, admixture, low recombination rate, or natural and artificial selection. The decay or decrease of LD with increasing map distance between markers is usually faster in allogamous than in autogamous species, and in wild relatives than in modern varieties (Flint-Garcia et al. 2003; Gaut and Long 2003; Morrell et al. 2005; Caldwell et al. 2006; Rostoks et al. 2006; Mather et al. 2007; Song et al. 2009).

The level and distribution of LD play a central role in association studies because they determine the number and the density of markers needed for the analysis (Flint-Garcia et al. 2003; Slatkin 2008). In particular, when using populations where LD is low and decays within few thousands of base pairs, a candidate gene approach is usually preferred; whereas for populations with moderate or high LD, a whole genome scan can be more appropriate. However, population structure may lead to confounding effects in LD mapping studies contributing to spurious statistical correlations between unlinked markers and phenotypic variation (Pritchard et al. 2000; Mackay and Powell 2007). Nonetheless, association mapping is achievable when account is taken of population structure (Cockram et al. 2008; Comadran et al. 2011).

Landraces are often included in LD studies mainly as isolated genotypes from different, and often unrelated, populations in the attempt to capture high levels of genetic variation (Maccaferri et al. 2005; Caldwell et al. 2006; Comadran et al. 2009). However, very little is known of the levels and structure of LD within and among populations of the same landrace, growing in a range of agro-ecological conditions and in a context where complex seed exchange networks could exist.

In the present study, we investigate eleven populations of a Sardinian Barley Landrace (SBL), all of which local farmers call “S’orgiu sardu” (i.e. Sardinian barley). Farmers’ management practices together with natural selection have resulted in populations with high levels of genetic diversity and adaptation to the local environment (Attene et al. 1996; Gorham et al. 1994). Papa et al. (1998) documented the value of the Sardinian barley germplasm as a source of genetic variation using morphological and molecular markers (random amplified polymorphic DNA markers and isozymes). Subsequent genotype by environment studies conducted in six different environments with SBL lines, their recombinant derivatives and commercial varieties point to their potential utility in breeding barley adapted to Mediterranean environments (Rodriguez et al. 2008).

The material studied comprised five populations collected in Sardinia in 1990, some of which were also analysed in Papa et al. (1998), and a further six populations collected in 1999. We performed molecular analyses using the retrotransposon-based markers named simple-sequence amplified polymorphism (S-SAP). These markers are reliable and appropriate for a range of genetic analyses (Leigh et al. 2003; Tam et al. 2005; Rodriguez et al. 2006) including the study of LD in eukaryotic genomes (Charlesworth et al. 1994; Fu et al. 2002). However, compared to other kinds of markers relatively few population genetic studies have used the S-SAP technique (e.g. Queen et al. 2004; Soleimani et al. 2007; Tam et al. 2007).

Several studies have investigated the level of linkage disequilibrium among genetic markers in various barley populations (such as different accessions extracted from cultivars, breeding lines, landraces collected from different regions or countries) (Caldwell et al. 2006; Malysheva-Otto et al. 2006; Rostoks et al. 2006; Comadran et al. 2009; Zhang et al. 2009). Such sets often comprise lines with contrasting diversity (such as winter vs. spring barley; malting vs. feed barley). The level of disequilibrium so constructed is of direct interest to mapping and breeding, but may not reflect the genetic structure of the source populations. Our samples are random plants from populations in situ chosen to examine the standing disequilibrium at local and regional levels using S-SAP markers.

To achieve this objective, we estimated: (a) the levels and the structure of the S-SAP genetic diversity present in our collection of populations; (b) the levels and patterns of LD between pairs of S-SAP loci; (c) the population structure of multilocus LD in multiple SBL populations. Specifically, our main objective was testing if such populations could be suitable for future LD mapping studies. Moreover, we aimed to determine whether landrace populations depart in any key aspect from composite cross populations of cultivated barley, and from populations of wild barley (Hordeum vulgare ssp. spontaneum). This would be useful for a deeper understanding and better management of these different kinds of genetic stocks.

Materials and methods

Plant materials

Eleven populations of a barley (Hordeum vulgare ssp. vulgare L.) landrace were collected in the island of Sardinia (Italy) (see Table S1 of supplementary materials for more details). For each population, we analysed approximately 30 individuals, giving a total of 337 individuals (Table 1). Lines were randomly sampled from each field (one spike per plant) and each field was assumed to be a single population. Five of the eleven populations were collected in 1990 from different agro-ecological areas of Sardinia, and the other six were collected in 1999 from the same geographic areas sampled in 1990.

Table 1 Diversity statistics within populations of SBL and overall

Molecular data

DNA was extracted individually from each of the 337 Sardinian lines, starting from fresh leaf tissues and using the CTAB method (Doyle and Doyle 1987). Each line was analysed using 6 S-SAP primer combinations (Rodriguez et al. 2006). The S-SAP method (Waugh et al. 1997b; Leigh et al. 2003) exploits the combination of a primer designed on the long terminal repeat (LTR) sequence of a barley retrotransposon (e.g. Sukkula, Nikita, BAGY-2, BARE-1) and an Mse primer, which usually generates high levels of polymorphism.

Each primer combination was previously tested in a set of three barley varieties (Leigh et al. 2003) and subsequently used to enrich a Steptoe × Morex (S × M) genetic map (Rodriguez et al. 2006). To infer a map position for the S-SAP markers which were polymorphic in SBL, the 337 individuals were amplified alongside Steptoe, Morex and two of the DH lines obtained from the S × M cross. The S-SAP markers showing the same molecular weight both in the S × M DH lines and in the SBL individuals were presumed to be the products of the same locus (Waugh et al. 1997a; Kraakman et al. 2004). In this way, map positions were deduced for 53 of the 134 polymorphic markers scored in the SBL populations.

Genetic diversity

Barley is a strictly autogamous species with an outcrossing rate of <1 % (Briggs 1978; Abdel-Ghani et al. 2004). Preliminary analysis conducted on SBL using 11 simple sequence repeats (SSR) markers (not shown) showed only a couple of heterozygous loci in a few individuals consistent with autogamy. Therefore, all of the individuals in the present study were assumed to be homozygous. Descriptive statistics such as Nei’s gene diversity (H E, Nei 1978) and the number of polymorphic markers were computed using PopGene 1.32 software (Yeh et al. 1999). The number of haplotypes and alleles was computed by Arlequin 3.5.1.2 (Excoffier and Lischer 2010).

Population structure

The hierarchical analysis of molecular variance (AMOVA) was used to test the significance of the partitioning of genetic variance into three levels: individuals, populations and groups of populations (grouped according the year of collection, i.e. 1990 and 1999) using Arlequin 3.5.1.2 (Excoffier and Lischer 2010). Genetic distances between populations were calculated using Nei’s unbiased genetic distance (Nei 1978). A dendrogram was drawn using the unweighted pair group method with arithmetic mean (UPGMA) clustering method implemented in TFPGA (Miller 1997). The relative strength of the nodes produced by UPGMA analysis was inferred by bootstrapping over loci (1,000 permutations). To have a better insight into the genetic structure of the SBL populations, different methods were exploited. First, we applied the Bayesian model-based clustering algorithm implemented in STRUCTURE 2.3.1 (Pritchard et al. 2000; Falush et al. 2003, 2007; Hubisz et al. 2009). This method assigns each individual to several different groups, according to a membership coefficient (q i; Σq i = 1.0). The options set for the admixture model were: ‘correlated allele frequencies among populations’ and ‘infer the degree of admixture (α) by the data’. The number of hypothetical populations (K) tested was from 1 to 14, which is three more than the original number of populations. For each K, 20 runs (burn-in length l00,000 and 200,000 iterations) were carried out and the logarithm probability of data, lnP(D), was collected for each run. The most likely number of K was determined using the ΔK ad hoc statistic (Evanno et al. 2005). Because the estimation of the most likely number of clusters, K, is a notoriously difficult problem (Guillot et al. 2009), we also used the Instruct software (Gao et al. 2007) to infer the optimal number of clusters via the deviance information criterion (DIC), which is widely used as a statistic for comparing models in a Bayesian framework (Spiegelhalter et al. 2002). Genetic relationships among populations were investigated by principal coordinate analysis (PCoA) based on the Jaccard’s double-centred similarity matrix. Jaccard’s coefficient is appropriate for binary multilocus data (Landry and LaPointe 1996). For this analysis NTSYSpc 2.02i (Rohlf 2000) was used. The correlations between genetic divergence (F ST) and geographic distance between pairs of populations (km) were tested by the non-parametric Mantel test using GenAlEx 6.3 (Peakall and Smouse 2006).

Evaluation of linkage disequilibrium

The estimates of the LD were determined using two different indices:

(a) the squared allele-frequency correlations (r 2) (Hill and Robertson 1968) for pairs of loci, calculated using the software package Tassel 2.1 (Bradbury et al. 2007; http://www.maizegenetics.net/bioinformatics). For the purpose of examining the potential of SBL for association mapping, we preferred to calculate the r 2 statistic, as it is indicative of how markers might correlate with the quantitative trait loci (QTL) of interest (Flint-Garcia et al. 2003; Mueller 2004). The significance (P values) of LD for S-SAPs was determined by permutation (100,000 reps).

(b) A summary measure of multilocus LD (r d), calculated using the software MultiLocus 1.3 (Agapow and Burt 2001). The index r d derives from the multilocus association index (I A) (Brown et al. 1980) which was modified to avoid dependence on number of loci (Agapow and Burt 2001). The significance of the r d estimates was tested by shuffling alleles across individuals (1,000 reps).

To display the change in LD as a function of genetic distance, LD for pairs of markers was measured for five classes of inter-marker genetic distance (≤3, 4–10, 11–30, 31–50, >50 cM) (e.g. Maccaferri et al. 2005). LD among markers located on different chromosomes was also calculated. The level of long range LD was calculated overall, within each of the eleven SBL populations, and within the genetic groups obtained by STRUCTURE analysis. An overall correlation between the genetic distance between markers on the same chromosome and LD was estimated. To derive a critical value of r 2 indicative of LD above sampling error, the r 2 estimates for pairs of unlinked markers were square root transformed to approximate a normally distributed random variable and the parametric 95th percentile of that distribution determined (Breseghello and Sorrells 2006).

Structure of multilocus associations

The structure of multilocus LD in multiple populations was analysed using the framework of Brown and Feldman (1981). This method summarises the structure of the multilocus associations in a series of subpopulations as a set of components. Specifically, “Each source is measured in terms of its contribution to the variance in heterozygous loci in two gametes randomly chosen from within the same population or from a pool of all populations.” (Brown and Feldman 1981). They define three single-locus and five two-locus components. Namely, the single-locus components are the average diversity (MH), the variance among populations in gene diversity (VH) and the variance among populations in allele frequency, namely the Wahlund effect (WH). A high percentage of variance accounted for by the single-locus components indicates that alleles are randomly associated in an unstructured total population. The two-locus components include the mean disequilibria (MD), the variance of disequilibria (VD), the covariance of allele frequencies over populations, namely the Wahlund effect at two loci (WC), the interaction of disequilibria (AI) between MD and WC, and CI which arises from covariation in the interaction of disequilibria and the Wahlund covariance among populations (Brown and Feldman 1981). The partitioning assists in hypothesising which of the several evolutionary forces (such as epistatic natural selection, genetic drift, population subdivision, founder effect, genetic hitchhiking) are relatively important when comparing patterns of disequilibria in different sets of populations. For example, when epistatic selection is consistent among populations MD is high, VD low and AI positive. High values of CI are associated with founder effects. High WC and low AI indicate diversifying selection and population subdivision. The analysis was performed with POPGENE 1.32 (Yeh et al. 1999).

Finally, results of multilocus analysis on SBL were compared with those obtained by Brown and Feldman (1981) on barley composite cross populations and wild barley populations (Hordeum vulgare ssp. spontaneum) (see for details Brown and Feldman 1981, last column of Table 1 and third column of Table 2, respectively).

Table 2 Percentage (%) of significant pairwise associations, average squared correlation coefficient (r 2) and multilocus index of association (r d) calculated within each geographical population, the four groups identified by STRUCTURE, and the overall bulk

Results

Genetic diversity and population structure

Genetic diversity

Estimates of genetic diversity, based on the 134 S-SAP markers confirm that these populations are highly polymorphic (Table 1). Indeed, the number of haplotypes nearly matches the number of individuals assayed; each individual was genetically unique (except for one pair of individuals from VI). Within the populations, the number of polymorphic markers varied from 92 (STU) to 118 (SOR). The overall gene diversity (H E) was 0.30 and the mean within populations was 0.25. The difference in H E between STU (0.18) and COR, SIS3, SEN, SOR and VI (from 0.26 to 0.28) was statistically significant (Tukey-Kramer HSD test, P < 0.05). STU, the most elevated population of the set, was the least genetically diverse. This is consistent with previous results for allozymes in wild barley from Israel (Nevo et al. 1979), and encourages further investigations to compare this result to previous studies on barley (e.g. Tanto Hadado et al. 2010). The percentage of “rare” markers (the number of loci at which the frequency of the minor allele is <0.1) varied among populations from 20 % (SIS3 and COR) to 39 % (CUM) (Table 1).

Population structure

The AMOVA partitioning of S-SAP diversity indicated that the within-population component (82 %) significantly dominated (P < 0.001). No differentiation occurred between the two groups of populations collected in 1990 and in 1999 (F CT = 0.00). The UPGMA dendrogram (Fig. 1) clustered ten landrace populations into three main groups. Group I comprised three populations from the north, group II consisted of four populations from the south and group III included three populations from the centre of Sardinia. The southern population VI weakly clustered with group III. This analysis suggested a relationship between genetic divergence and geographic distance between pairs of populations. However, the correlation was weak and barely significant based on a Mantel non-parametric test (r = 0.27, P = 0.05).

Fig. 1
figure 1

UPGMA tree describing genetic relationships among the eleven geographical landrace populations. Genetic distances were calculated based on Nei 1978. Consistency of the tree topology was evaluated by bootstrapping over loci (10,000 reps)

Four main genetic groups (hereafter named GRP1-4) were inferred from the STRUCTURE analysis of the pooled data and the calculation of ΔK (STRUCTURE) and DIC (InStruct) (Fig. S1 and S2 of Supplementary materials). The majority of individuals (64 %) were assigned to one of the four genetic groups (membership coefficient, q i, ≥ 0.8) while 36 % were classed as admixed (q i < 0.8). As illustrated in Fig. 2, group 1 mainly consisted of individuals from populations of the north of the Island (CUM, N2 and ORO). Individuals from the centre (SIS3, PIR, and SOR) merged into group 2. Individuals from VI, COR and SEN belonged mainly to group 3. Group 4 included individuals from the south (STU, NXM and partly from SEN). The third genetic group appeared more widespread than the other three, although only a few group 4 individuals were present in the north. The PCoA showed a pattern similar to STRUCTURE when the three principal coordinates were used, which together explained 15.7 % of the total S-SAP allelic variance (Fig. 3a, b). For all assigned individuals, the first axis separated groups 1 and 4 from 2 and 3, the second axis mainly separated group 1 from group 4, and the third axis separated group 2 from group 3. Because the results for population structure obtained from the different analyses are consistent, only the genetic groups obtained from STRUCTURE (Fig. 2) will be used further.

Fig. 2
figure 2

Map of the sampled sites in Sardinia illustrating the distribution of the four genetic groups. Pie charts were obtained by using the mean value of q i (membership coefficient) per genetic group within populations including only individuals with q i > 0.8. The neighbour-joining tree on the right side was obtained by the net nucleotide distances (allele-frequency divergence) among Structure groups

Fig. 3
figure 3

Principal coordinate analysis of 337 SBL individuals based on Jaccard’s distance matrix between all pair of individuals. Scatter plots represent the grouping of the individuals according to the first coordinate versus the second one (a) and the second coordinate versus the third one (b). The four represents the different genetic groups identified by STRUCTURE analysis (q i > 0.8). Admixed individuals are indicated by “ж” (q i < 0.8). The three principal coordinates cumulatively explain the 15.7 % of the S-SAP genetic variance

Levels of linkage disequilibrium

As evident in Table 2, the levels of LD within these SBL populations and the overall bulk were low and statistically significant. In the bulk of 337 individuals, some 13.3 % of the locus pairs were in significant LD and the average r 2 was 0.012. Moreover, populations differed in their levels, with population CUM being the lowest based on all of the three measures of LD. Population N2 had the highest levels for the r 2 and r d statistics and was above average for its percentage of significant pairwise associations. Both r 2 and r d measures were highly significantly correlated over populations (Spearman ρ = 0.79, P < 0.004), whereas the correlations between r 2 and percentage of locus pairs in LD, or r d and percentage of locus pairs in LD, were not significant (ρ = 0.56, P < 0.08 and ρ = 0.28, P < 0.40, respectively). Within the four genetic groups, the percentage of locus pairs in LD was higher, but the average level of LD was less intense (r 2). To check the consistency of these estimates of LD, we repeated the analyses modifying the original dataset by (a) omitting the rare alleles (frequency <0.10) (e.g. Caldwell et al. 2006; Rossi et al. 2009), and (b) taking out very similar individuals (we set ≤0.15 % of identical alleles) to obtain a “normalised” sample as suggested in Breseghello and Sorrells (2006). The analyses confirmed the low levels of LD within populations and overall. Moreover, significant correlations were observed among the same parameters of LD estimated from different subsets of the data (Table S2).

Decay of linkage disequilibrium with increasing linkage distance

The relationship between LD and linkage distance between markers was analysed in the bulk sample of 337 individuals and in the five classes of inter-marker genetic distance (Fig. 4). Despite the limited number of mapped markers, LD tended to decrease with distance. The correlation coefficient between the inverse of intra-chromosomal linkage distance and LD r 2 was 0.243 (P < 0.01). The level of LD decayed within 3 cM distance from an average r 2 = 0.10 to below the critical background value (Breseghello and Sorrells 2006). This tendency was replicated within three of the four genetic groups identified by STRUCTURE (Fig. 5), and confirmed within the populations. Among the eleven populations, all seven instances in which the class means exceeded the sampling threshold, were for the first distance class (see Fig. S3 of supplementary materials).

Fig. 4
figure 4

LD decay as calculated across all of the 337 individuals. The horizontal straight line indicates the threshold above which r 2 values are likely due to genetic linkage. Inter Chr inter-chromosomal pairs. All bars are means ± SE (standard error). The number of pairs of loci is shown for each distance class

Fig. 5
figure 5

LD decay within the four genetic groups identified by STRUCTURE analysis. All bars are means ± SE (standard error). The number of pairs of loci is shown above each bar

The distribution of pairwise r 2 estimates among unlinked loci across SBL populations varied from 0.00 to 0.30, with a median of 0.01 (results not shown). Breseghello and Sorrells (2006) used the 95th percentile of this distribution (0.05 in our case) as an empirical estimate of the mean background LD. This represents a threshold beyond which the r 2 values between unmapped markers are likely to indicate genetic linkage in mapping populations such as segregating F2s. Across the SBL populations, LD scores above this value were observed only for the 3 cM distance class (Fig. 4). Overall, for the 337 individuals, intra-chromosomal LD was 1.14 times higher than interchromosomal LD both as percentage of pairs and as r 2 (Table S3).

Structure of multilocus LD

Table 3 presents the results of the analysis of multilocus LD in structured populations following the method of Brown and Feldman (1981). The multilocus allozyme data on barley composites crosses and wild barley populations (Hordeum vulgare ssp. spontaneum) are included from that study for comparison. Overall, the SBL resembled the composite crosses more than the wild populations. The single-locus effects accounted for only about half the total variance in heterozygosity in both sets of cultivated barley populations. In sharp contrast, the single-locus proportion was much higher for the wild populations (93 %). In SBL and composite crosses, the two-locus effects show an appreciable mean disequilibrium and an average positive AI that suggest repetitive patterns of linkage disequilibria in different populations. The SBL rather differed from COM for their WC proportions (30 vs. 9 %).

Table 3 Components of variance in the number of heterozygous comparisons between haploid genotypes for the eleven populations of a Sardinian landrace, composite crosses (COM) and wild populations (WILD)

As well as for single-locus effects, the SBL differed from WILD also for the two-locus effects, in having an appreciable MD and a higher WC (Table 3).

The fraction [(AV-MH)/MH] is a standardised measure of multilocus structure within populations, and its value for the SBL is more than threefold higher than composite cross and wild populations (Table 3). The variance of disequilibrium is the most important two-locus source in both SBL and wild populations (50 and 32 % of the average variance, respectively). This indicates that non-systematic disequilibria across populations (i.e. the frequency of multilocus haploid genotypes varies substantially among different populations) were a major contributor to multilocus structure of SBL and wild populations.

Discussion

The present study investigated the genetic diversity, the population structure and the LD levels of 11 populations of a barley landrace from the island of Sardinia (Italy) as evident from polymorphism for 134 S-SAP fragments.

Genetic diversity and population structure

The high number of unique haplotypes detected in the populations studied indicates substantial genetic variation within the SBL, comparable to a previous study of Sardinian barley (Papa et al. 1998). Overall, the level of the S-SAP diversity in these populations is appreciable (H S = 0.24), and similar to that detected by isozymes (H S = 0.35, Papa et al. 1998). The extent of genetic divergence among populations was of a similar order for S-SAP (F ST = 0.18) as for allozymes (G ST = 0.16) (Papa et al. 1998). Considerable genetic marker diversity is present within traditional varieties of barley (Jaradat and Shahid 2006) and in other selfing cereals such as rice (Thomson et al. 2007; Pusadee et al. 2009), as might be expected from the many factors affecting diversity in landraces (Teshome et al. 2001). No relevant differences were observed between the two groups of genotypes collected in 1990 and 1999, suggesting that one decade was insufficient for any significant genetic temporal differentiation among these populations to arise.

A genetic distance-based analysis clustered the populations into groups according to their geographic origin and a weak correlation was found between genetic distance and geographic distance (r = 0.27, P < 0.05). A STRUCTURE analysis of the bulk of 337 individuals allocated a majority of them to four genetic groups. The hierarchical island structure of genetic variation had four main genetic groups which tended to have distinct geographic occurrence, in agreement with the distance clusters. In our study the structural tendencies were not absolute, as different genetic groups co-occur within each population and more than 30 % of the individuals were apparently admixed i.e. derived from a hybrid between different genetic groups. Thus, this suite of barley landrace populations exhibited a complex population structure that will influence LD patterns and their exploitation in barley breeding.

Linkage disequilibrium extent and decay with linkage distance between markers

In the bulk sample of 337 individuals the level of LD was relatively low, with some 13 % of locus pairs showing statistical correlations (P < 0.01), or 22 % at P < 0.05 significance level. This value slightly exceeded that between loci in a sample of 25 accessions of the wild subspecies (Hordeum vulgare ssp. spontaneum) from across its range (c. 15 %, P < 0.05) (Morrell et al. 2005). The similar proportion of significant r 2 values probably reflects the net outcome of two opposing factors. On the one hand, the widespread geographic origin of the samples, autogamy and the isolation of wild populations might predict considerable LD within the species. On the other hand, our geographically restricted sample of individuals was 13-fold larger than that for the wild species and therefore likely to detect a greater fraction of low LD values. Yet the “relatively” low overall levels of disequilibrium in both of these studies were unexpected, given the autogamous mating system of barley.

Table 4 summarises six studies of interlocus LD in barley. In a study with a comparable number of mapped markers, Malysheva-Otto et al. (2006) found an average r 2 of 0.10 (range 0.062–0.191) at intra-chromosomal level and of 0.064 (range 0.050–0.136) at inter-chromosomal level (see Table 4 of Malysheva-Otto et al. 2006). Our r 2 values tended to be less than theirs, but the proportions of locus pairs with r 2 > 0.05 were similar. Our data also parallel those from other studies employing a higher density of markers (e.g. Zhang et al. 2009; Comadran et al. 2009). Moreover, in the present study r 2 and P values were similar for the 53 mapped S-SAP pairs of loci and the total 134 markers, which suggests that increased genome coverage would not significantly alter the overall conclusion of limited disequilibrium (data not shown).

Table 4 Estimates of interlocus LD using different criteria in various kinds of samples, of domesticated (Hordeum vulgare ssp. vulgare) and its wild progenitor (Hordeum vulgare ssp. spontaneum), compared to the results for the same criteria in eleven SBL populations using 134 S-SAP markers

In Table 4, different criteria were used in these studies to gauge the extent of LD and its rate of decay with increasing map distance between markers. Despite the low number of mapped and linked markers in our study, LD decreased below the estimate background threshold (0.05) for marker pairs separated by >3 cM. This outcome is similar to those obtained by Zhang et al. (2009) and Comadran et al. (2009). The reason why the decay is more rapid in barley than might be expected in a selfing species (Zhang et al. 2009; Comadran et al. 2009; Morrell et al. 2005; Caldwell et al. 2006) is unclear. Rostoks et al. (2006) propose this is a consequence of “unique human-induced pseudo-outbreeding” coupled with “strong selection for advantageous alleles” in agriculture. They consider that a collection of lines from diverse breeding programs might approach the recombinational dynamism of an outbreeding species like maize. However, wild and landrace populations which presumably have not been subject to such intense crossing and selection also show low LD and rapid decay.

Within the groups inferred by STRUCTURE, the number of locus pairs in significant LD was only approximately 2 % (versus 13 % in the total sample). Fewer individuals than the total (~60 vs. 337) constitute each group, so the observed drop in significant LD is likely due in part to a reduced power of the LD test (Remington et al. 2001; Liu et al. 2003). Moreover, the reduction in the number of marker pairs in significant LD was more evident for the inter-chromosomal comparison, than for the intra-chromosomal comparison (see Table S3 of supplementary materials). Thus, high LD appears to be more associated with close physical linkage within STRUCTURE groups than in the overall sample. Fewer spurious associations are expected from the analysis of the four genetic groups, as was also the case in durum wheat (Maccaferri et al. 2005).

Our results agree with others on the importance of inferring genetic structure and detect genetic differences among populations for intra-specific biodiversity assessment, evolutionary studies and association mapping (Caldwell et al. 2006; Rostoks et al. 2006; Mazzucato et al. 2008; Rossi et al. 2009; Comadran et al. 2011).

Multilocus LD dissection

Morrell et al.’s (2005) study of disequilibrium in wild barley accessions particularly stressed that geographic divergence was a major source of interlocus disequilibrium. The method of Brown and Feldman (1981) is one approach to analysing the relative importance of various sources of disequilibrium and the biology of multilocus association. Furthermore, the partitioning lends itself to comparisons among different kinds of populations, for example the pattern observed in SBL populations compared with that in composite crosses of cultivated barley and in wild populations of H. vulgare ssp. spontaneum (Brown and Feldman 1981).

In our analysis, the landraces tended to resemble the composite cross populations more than the wild ones. This was mainly due to the more prominent role of two-locus effects. At the two-locus level, each set of populations displayed a distinct LD pattern, with landraces tending to a “hybrid” pattern between wild and composite cross populations. Bulked hybrid populations such as composite crosses have been proposed as a model system for studying the evolution of genetic diversity and co-evolution with biotic factors, for the conservation in situ of agro-biodiversity (Brown 2000). However, our results suggest that neither wild populations nor composite crosses are complete mimics of landraces. Specifically, the multilocus structure of composite crosses seemed to be partly associated with systematic, repeatable selection (high MD and significant positive AI) (Brown and Feldman 1981). In contrast, wild populations showed a pattern consistent with the hypothesis that founder effects (or epistatic localised diversifying selection) might dominate (high VD and positive CI). In the SBL populations, on one hand we observed a pattern of LD that indicates systematic repeatable selection, similar to composite crosses. On the other hand, similar to wild populations a prominent role of genetic drift or founder effect, and diversifying selection was found.

When allele frequencies differ among populations, population subdivision might generate a reduction of heterozygosity (WH) and/or non-random associations between alleles at multiple loci (WC). Whereas both for SBL and composite crosses, the WC was greater than WH (seven- and twofold, respectively), the contrary was true for wild populations. This suggests that, wild populations may have had more time to “accumulate” recombination events that led to a reduction of the association between alleles at two or more loci. Under this scenario, an appreciable amount of LD persists in SBL and composite crosses even without selection. Similar results are expected for predominantly inbreeding species, such as barley, because associations among haplotypes that are generated by mutation and random genetic drift can persist in inbred and partially isolated subpopulations (Allard 1999; Morrell et al. 2005).

Conclusions

The relatively low level of LD found in the SBL populations is a desirable property for association mapping studies. However, the high variance of disequilibrium among populations means that barley lines should not be pooled indiscriminately and indicates the need to control for the presence of population structure when conducting association studies. The multilocus analyses revealed features of SBL midway between wild and composite cross populations. This result begs the question of whether the observed pattern of LD is the signature of “evolutionarily sustainable production” that has led to the formation of landraces. Further comparative studies are needed to test this hypothesis.