Keywords

1 Introduction

Genetic analysis in barley using molecular markers has been conducted extensively over the past 20 years. Based initially on the framework provided by the development of genome-wide linkage maps (Graner et al. 1990; Kleinhofs et al. 1993), important major genes and quantitative trait loci (QTL) have been located using a range of F2, RIL and doubled haploid mapping populations. These studies have yielded genetic markers that have been used extensively for the indirect selection of traits that are difficult to assess in a breeding programme context [e.g. resistance to the soilborne pathogen barley yellow mosaic virus (BaYMV) (Graner et al. 1998) and epiheterodendrin content in barley for the whisky industry (Thomas 2003)] and that, if translated into financial value, have generated millions of € by increasing yield under adverse conditions or improving product quality. These same studies have led to the identification of causal genes and corresponding alleles that confer a variety of traits, generally through the well-established route of positional cloning [e.g. mlo (Büschges et al. 1997), Mla (Wei et al. 1999), Rym4/Rym5 (Stein et al. 2005), Vrn3 (Yan et al. 2006), Ppd-H1 (Turner et al. 2005)].

The use of experimental mapping populations derived from parents that contrast for a target trait has however been of limited use to the more applied research sector because the parents used are frequently irrelevant to current breeding germplasm and the traits identified are already frequently fixed in the elite breeding gene pool. Consequently a move to assess traits that still segregate in such much more closely related germplasm has been promoted. Genome-wide association scans (GWAS) provide a mechanism to assess variation that segregates in a gene pool, rather than in a biparental population. Fashioned originally in human genetics where it was developed to take account of the types of populations available for genetic analysis, it has become popular in plant genetic research over the last decade (Waugh et al. 2009). GWAS is attractive for multiple reasons, the first of which is that it potentially provides an opportunity to exploit existing and extensive phenotypic data collected during the plant registration process, thus making it directly relevant to current breeding material. Second, it holds the promise of increasing genetic resolution because GWAS populations typically contain more genetic breakpoints and more alleles than are found in conventional mapping populations. However, GWAS approaches also raise issues in genetic analysis. These are largely caused by the origins and history of the population, which introduce a tendency to reveal significant false-positive associations due to factors other than genetic linkage. Here, we will attempt to summarise some of the progress and the problems that have been encountered in establishing effective GWAS in barley and the approaches that have been developed or applied to take account of them. Whilst several studies from various groups have shown that GWAS in barley can be an effective tool for QTL analysis, within our group we have focused on the potential of the approach for identifying the actual genes underlying specific plant phenotypes.

2 Linkage Disequilibrium in Different Barley Gene Pools

Determining the extent of linkage disequilibrium (LD) in a target gene pool allows us to estimate the number of molecular markers required to conduct a saturated GWAS and the mapping resolution it is likely to achieve. Studies in outbreeding (e.g. maize; Remington et al. 2001) and inbreeding (e.g. Arabidopsis; Nordborg et al. 2002) species revealed that the extent of LD is very different according to breeding habit and, as predicted theoretically, tends to be considerably less extensive in outbreeders. For inbreeders, the derived homozygosity reduces the effective recombination rate at each round of meiosis, and LD is more extensive. However, LD is also highly population dependent, as reported for several species including barley (Caldwell et al. 2006). This has led subsequently to significantly revised estimates of LD (Nordborg et al. 2005; Kim et al. 2007; Yan et al. 2009). Thus, in barley, whilst the initial studies of Kraakman et al. (2004) assayed a collection of 146 modern two-row spring barley cultivars using 236 AFLPs observed significant LD between markers extending up to 10 cM, Morrell et al. (2005) concluded that intra-locus LD decayed at a rate similar to that observed in outbreeding maize from looking at intra- and inter-gene LD in 18 nuclear genes in a collection of 25 wild barley accessions sampled from across its natural geographic range.

Caldwell et al. (2006) illustrated this population dependency issue very clearly. By resequencing genes present on a small BAC contig across cultivars, landraces and wild barley isolates, they observed a sharp decline in the extent of LD with increasing wildness, consistent with the evolutionary time between the individuals within each sampled set. Although this study was based on a small region of the barley genome, its general conclusions have been confirmed several times since in both diverse and narrow barley collections, and more importantly at a genome-wide scale (Malysheva-Otto et al. 2006; Cuesta-Marcos et al. 2010; Zhang et al. 2009). Subsequent studies have also shown that LD based on physical distance measurements varies enormously according to genomic position (Hamblin et al. 2010; Comadran et al. 2011a). Thus, within the same elite-cultivated gene pool, LD may extend from hundreds of kilobases in recombinogenic portions of the genome to hundreds of megabases in the rarely recombining (but gene rich) centromere-proximal regions.

3 Genetic Markers

A knowledge and understanding of how LD is elaborated in different gene pools allows us to estimate the number of genetic markers required to best capture the diversity and recombination history of the population (Fig. 18.1). In the cultivated gene pool where LD is extensive, a relatively small number of markers are theoretically required to capture the majority of the recombination events present in the population. Based on practical observations, this led Rostoks et al. (2006) to suggest that roughly 5 × 102 to 5 × 103 markers may be required to adequately survey the elite NW European barley gene pool. At the other end of the spectrum, the number required to capture the resolution afforded by thousands of years of effective recombination in wild species are likely to exceed this by orders of magnitude. In many respects the density of markers required has constrained the adoption of GWAS, and in many species there is still insufficient understanding of LD and available genetic markers and marker technologies that can be adequately applied for this purpose.

Fig. 18.1
figure 1

A cartoon of linkage disequilibrium (LD) in three different barley gene pools (black arc). The three graphs from left to right symbolise the situation observed in the cultivated, landrace and wild gene pools of barley. They show the impact of changes in the extent in LD due to recombination on the number of genetic markers (vertical red and purple lines on a portion of a hypothetical chromosome) required to detect significant associations between a marker and a target gene. Thus, with more extensive LD in the cultivated gene pool, fewer markers are required to detect the target gene by GWAS when compared to the wild gene pool

Genetic marker technologies have been evolving continuously for the last 25 years or more in barley, as in most major crops. Despite the early attempts by Kraakman et al. (2004, 2006) and Kraakman (2005) to apply AFLP technology to association mapping in barley, it only became realistic to attempt GWAS studies in large populations of related germplasm with the availability of high-throughput (HTP) single-nucleotide polymorphism (SNP) marker technologies such as Illumina’s ‘GoldenGate’ oligo pool assays (OPAs) (Fan et al. 2003; Rostoks et al. 2005, 2006; Close et al. 2009). These technologies effectively eradicated unintentional error within genotypes introduced during serial marker assays and allowed the collection of massive marker datasets that were virtually inconceivable only a few years earlier. These markers also revealed much about legacy biparental mapping populations, highlighting genotypic errors and unintentional mix-ups, sometimes at frequencies of 10 % or higher, and by eradicating single-marker double recombinants, promoted map shrinkage to lengths broadly consistent with observed numbers of chiasmata during meiosis (Nilsson et al. 1993). HTP SNP marker sets were similarly informative in germplasm collections, revealing sample incongruence, heterogeneity and duplication at previously unprecedented resolution. Recently, SNP platforms containing many thousands of markers have been developed, such as Illumina’s Barley-OPA1 (BOPA1), Barley-OPA2 (BOPA2) (Close et al. 2009) and iSELECT platforms (Comadran et al. 2012), and used widely to genotype thousands of samples in both the public and private sectors (e.g. AGOUEB, http://www.agoueb.org; BarleyCAP, http://barleycap.cfans.umn.edu; ExBarDiv: http://pgrc.ipk-gatersleben.de/barleynet/projects_exbardiv.php) (Waugh et al. 2010).

Despite their success, these SNP marker platforms are already coming under threat from methods that exploit the massive increase in data volumes and reduction in costs associated with next-generation sequencing technologies (NGS). Methods including the use of reduced-representation libraries (RRLs), complexity reduction of polymorphic sequences (CRoPSTM), restriction-site-associated DNA sequencing (RAD-seq) and low-coverage genotyping by sequencing (GbS) provide ultra-high density genotyping at extremely low cost per datapoint (reviewed in Davey et al. 2011). These sequence-based methods have no prior development requirements and can be used in species lacking reference genome sequences. In barley RAD-seq on the Oregon Wolfe Barley population generated 463 new RAD loci on all seven linkage groups (Chutimanitsakun et al. 2011) and GbS on the same population over 25,000 additional markers at exceedingly low cost (Elshire et al. 2011). However, at this point in time, the commercial propositions such as the iSELECT platform remain more accessible to the general user as the vendor provides an ‘out-of-the-box’ informatics solution to capturing, analysing, recording and exporting defined genotypic data into a wide range of analytical software. At the time of writing, the sequence-based methods still require specialised bioinformatics support to collect and interrogate the genotypic data—a big disadvantage for many smaller groups. However, it is a logical development and a significant step forward. Not surprisingly, GbS has already been implemented in barley association mapping studies.

4 Marker Ascertainment Issues

Whilst the ‘marker constrained’ highly multiplex assays such as the OPA and iSELECT technologies from Illumina are tremendously effective and simple to use, they are not ideally suited to all applications. Because their development generally involves mining sequence data extracted from a limited number of individuals, the utility of the SNPs obtained is affected by this discovery protocol. Basically, SNPs are identified in a small panel of individuals selected from a much larger population. As they represent only a small subset of the individuals, only a fraction of total polymorphisms will be discovered. When these SNPs are then scored on a larger sample of individuals, an ‘ascertainment bias’ is introduced (Nielsen 2000). Because the SNP discovery panel is small, the probability that an SNP will be identified is a function of its frequency in the discovery population. Rare SNPs will go undiscovered more often than common SNPs, and SNPs not present in the discovery population will never be incorporated in the assay platform. When the platform is then used to screen a much broader set of germplasm, this ascertainment bias will compromise measures of relatedness and genetic diversity because statistical measures that rely on allele frequency, such as nucleotide diversity, population genetics parameters and linkage disequilibrium, will be affected (Nielsen 2000; Schlotterer and Harr 2002; Rosenblum and Novembre 2007; Storz and Kelly 2008).

BOPA1, BOPA2 and the 9K iSELECT platforms were developed from SNP data extracted from a limited number of barley accessions (Rostoks et al. 2005, 2006; Close et al. 2009; Comadran et al. 2012), and several large-scale projects have used them effectively to identify marker-trait associations in elite cultivars (AGOUEB, http://www.agoueb.org; Barley CAP, http://barleycap.cfans.umn.edu; ExBarDiv: http://pgrc.ipk-gatersleben.de/barleynet/projects_exbardiv.php) (Waugh et al. 2010) and in diversity panels comprising both elite cultivars and landraces (Pasam et al. 2012). Despite these apparent successes, we should be mindful that the extent and patterns of diversity observed have been affected by ascertainment issues and that results generated in these studies in most cases still need to be validated. This is particularly true when examining diverse genotypes. For example, understanding genetic diversity inherent within accessions that tolerate extreme conditions of temperature and water availability is likely to be particularly important in future breeding efforts that seek to respond to future environmental challenges. It is therefore important that issues such as ascertainment bias are fully taken into account when using a marker platform derived from one gene pool to investigate another.

One example that highlights this issue and that has been examined in some detail is the use of SNPs sampled from the cultivated gene pool to examine diversity in collections of landrace barleys from Syria and Jordan (Fig. 18.2). Moragues et al. (2010) evaluated the effects of SNP number and selection strategy on estimates of germplasm diversity and population structure in different barley collections. Using the 1,536 BOPA1 SNP data and random or optimised subsets of 384 and 96 SNPs, they compared diversity statistics for 161 landraces from Jordan and Syria with 171 European cultivars that had previously been studied using SSRs (Russell et al. 2003). They observed differences in the patterns of SNP polymorphisms and, somewhat counter-intuitively, a lower estimate of diversity in the landraces, contradicting the SSR results. This bias could be at least partially nullified by selecting an appropriate subset of SNPs.

Fig. 18.2
figure 2

Principle coordinates analysis illustrating the effect of ascertainment bias on estimations of genetic diversity in diverse barley gene pools. The SNPs were ascertained from the cultivated gene pool and were chosen based on high allele frequencies. The wild and landrace barley germplasm ‘looks’ as if it is much narrower than the cultivated germplasm—which we know from many other studies is completely the wrong way around

More recently Russell et al. (2011) described the first application of BOPA1 to assess the evolution of barley in a portion of the Fertile Crescent. Specifically, they were interested in examining diversity across the genome but in particular those regions that have been previously identified as playing a role in domestication. They genotyped geographically matched landrace and wild barleys (448 accessions) from Jordan and Syria. One consequence of ascertainment bias would be to skew the landrace-wild comparison by excluding rarely polymorphic markers in the wild barleys, resulting in an underestimate of their true genetic diversity. However, the experimental data showed higher levels of genetic variation in wild material, and furthermore, the differences were similar to those found in previous work (Russell et al. 2004). Also, if the effect of bias introduced by using SNPs sampled from elite cultivars was problematic, the expectation would be a reduction of diversity in the wild compared to landraces around the domestication genes (because SNPs in the wild would not have been assayed). But they identified 141 cases where rolling diversity estimates were significantly different between wild and landrace barley genotypes, with diversity higher in wild material for 94 % of the cases, many in regions where domestication genes are known. As ascertainment bias would have pushed this comparison in the other direction, their observations become increasingly significant.

5 Accounting for Population Structure

When mapping by association, underlying population structure can be a strong confounding factor that results in a high frequency of false-positive associations. (Rostoks et al. 2006; Mackay and Powell 2007). Considering a hypothetical trait, if this trait was frequently associated with any sub-population, then all corresponding background markers that identify alleles with a similar clustering distribution between populations would also be associated with the trait, regardless of whether they were physically linked to it. Minimising these false-positive effects has been the focus of considerable effort in the statistical genetics community, and a number of approaches have been developed in an attempt to nullify them whilst allowing true associations to be detected.

GWAS analysis that does not account for population substructure (a naive approach) is based on the same principles as those applied in biparental QTL mapping populations. Simply, it consists of regressing the phenotype against the alleles at each genetically mapped locus to detect QTLs and is successful because each marker allele in the genetic map has a given probability of being associated with the QTL of interest. The naive approach is not generally suitable for use in structured populations for the reasons given above. However, it is suitable for use in populations in which structure has been intentionally minimised. A popular example of this type is a multiparent advanced generation intercross (MAGIC) population (Cavanagh et al. 2008). Another possibility is to use substantially unstructured sub-populations identified by PCO or STRUCTURE analysis of the associated marker data (Waugh et al. 2010), although some would argue that even within these populations, a structure correction should always be applied.

The reality is that barley germplasm sampled across the world is strongly stratified into sub-populations, reflecting growth habit, ear morphology and geographical origin, and is linked to local adaptation and crop end use. As a naive approach is unsuitable in this case, several different statistical approaches that correct and/or account for the effects of population structure within such germplasm have been developed. Indeed, correcting for structure has guided most of the research on GWAS for the last few years (Pritchard et al. 2000; Mackay and Powell 2007). Issues arise when the application of different statistical approaches reveal an inconsistent number and/or identity significant associations or remove known biological factors that are correlated at some level with population structure. This can result in uncertainty over what QTL to prioritise for further studies or to use as diagnostics in marker-assisted selection (MAS).

Structured association uses genome-wide molecular diversity data to compute statistics that define the genetic structure contained within the germplasm. The derived statistics can then be modelled within a mixed linear model (MLM) framework to account for the multiple levels of relatedness that result from historical stratification and kinship (Yu et al. 2006). Statistical softwares including Genstat (VSN International 2011), R (http://www.R-project.org/) and TASSEL (Bradbury et al. 2007, http://www.maizegenetics.net) can then provide (different) corrections for population structure. A variance covariance matrix containing coefficients of co-ancestry (kinship matrix) can be included in the mixed model to account for genetic relatedness between genotypes. Eigenanalysis (Patterson et al. 2006) uses the scores of the most significant PCA axes from the molecular marker matrix as co-variables in the mixed model, approximating the use of a kinship matrix. In barley Cockram et al. (2010) and Comadran et al. (2011b) found that a mixed linear regression model that accounts for relatedness due to kinship and historical population substructure to perform well. A significance threshold is usually estimated for each analysis using a Bonferroni-corrected p-value of 0.05. Importantly, with the observed increase in marker data volumes, methods that are able to cope with thousands to millions of computationally intensive analyses have emerged that provide a choice of both approximate [e.g. GRAMMAR (Aulchenko et al. 2007), implemented in GenABEL (http://www.genabel.org/packages/GenABEL); P3D (Zhang et al. 2010), implemented in TASSEL (http://www.maizegenetics.net/tassel); EMMAX (Kang et al. 2010) (http://genetics.cs.ucla.edu/emmax/)] and exact methods [e.g. FMM (W. Astle & D. Balding, http://www.genabel.org/MixABEL/FastMixedModel.html); FaST-LMM (Lippert et al. 2011) (http://mscompbio.codeplex.com/); GEMMA (M. Stephens lab, http://stephenslab.uchicago.edu/software.html)] to account for structure effects.

6 Data Management and Display

With the size of the datasets generated, both molecular and phenotypic, a key issue for longer-term value of an association mapping population surrounds data management, quality control and data visualisation, particularly if the dataset forms a reference for the wider research community and has been derived from multiple datasets generated by groups from remote locations. Whilst there may be local solutions to this issue, within our programme we have developed and implemented a GERMINATE data warehouse (Lee et al. 2005; http://bioinf.scri.ac.uk/public/?page_id=159) modified to hold high-density phenotypic and genotypic diversity data, Illumina iSELECT and GbS SNP metadata together with the results of our analyses. Working closely with the breeding community has prompted the development of a number of features in GERMINATE that assist data querying, manipulation and visualisation. In particular, interfacing with the Flapjack graphical genotyping environment (Milne et al. 2010) has been of particular significance, with the Flapjack data model (Fig. 18.3) now being widely adopted by other plant breeding and germplasm diversity projects including the ‘SeeD’ programme at CIMMYT, the Triticeae CAP (T-CAP) project in the United States (http://www.triticeaecap.org/?q=node/2), Gates Foundation-funded GCP Integrated Breeding Platform (http://wiki.cimmyt.org/confluence/display/MBP/Home) and the Gramene Diversity project (http://www.gramene.org/db/diversity/diversity_view). Further developments in these latter projects will enable users to automatically load data and analysis results and provide enhanced tool integration with various genetic analysis platforms. Thus, efforts are underway to more intimately integrate Flapjack with data analysis software such as TASSEL, R, Genstat and genetic simulation tools like QuGene (Podlich and Cooper 1998).

Fig. 18.3
figure 3

A screenshot of the Flapjack graphical genotyping environment. The marker alleles are colour coded (A, C, G, T, white = missing data) and arranged in genetic marker order along each chromosome (horizontal axis). Individual accessions are shown in the vertical axis. Tracks for visualising trait data are available but not shown. The pattern of SNP alleles along a chromosome can be easily inspected visually (see http://ics.hutton.ac.uk/flapjack/ for further details)

7 Phenotypic Analysis

One of the original attractions of association mapping was that it promised to be able to exploit rich phenotypic information that had already been collected either by prior academic studies or of the rigorous trialling and testing procedures that cultivars must go through as part of the official registration process. For example in the United Kingdom, up to 80 morphological-developmental traits are described and available for use in assessing the distinctiveness, uniformity and stability (DUS) of prospective cultivars and up to 40 (including grain yield, quality and disease resistance) tested for value for cultivation and use (VCU) (http://www.fera.defra.gov.uk/plants/plantVarieties/nationalListing/documents/protocolCereals10.pdf). Work carried out in the AGOUEB population in the United Kingdom and cultivated barley collections at IPK in Germany have reported the use of such data (Cockram et al. 2010; Comadran et al. 2011a; Wang et al. 2012; Matthies et al. 2009, 2012). This may be because it can often be difficult to extract this type of data from archives or because it may be difficult to use as official testing protocols and ways of recording the phenotypic data have been modified over time and accessions may have undergone further selection between the point of testing for DUS/VCU and genotyping. However, where the data are clean, it remains a highly valuable asset that obviates the need for de novo phenotyping. Conducting the necessary quality control prior to analysis is however time consuming and may involve a considerable amount of retesting.

For certain phenotypes, like disease resistance, that are tested on relatively young leaf material using a common ‘treatment’ (e.g. a pathogen population), morphological-developmental differences between accessions can have limited impact on the collected data. However, the opposite can be true when attempting to collect equivalent data on diverse genotypes that may be confounded by significant developmental and morphological differentiation. For example, wild barley isolates and landraces from around the world have highly diverse heading dates and heights and using data such as ‘grain yield’ collected in a single environment across such a diverse population may be effectively meaningless. Because of these difficulties we have found it advantageous to ‘tune’ the accessions in our association mapping population by including only those with broadly similar developmental characteristics. Whilst this necessarily restricts the amount of variation that segregates in the population, we have found that this approach enables rather than restricts genetic dissection of the considerable genetic variation that remains in the population.

8 Association Mapping in Barley

Several individual groups and consortia have recently assembled collections of germplasm into association mapping panels and have phenotyped and genotyped them at varying depths with the objective of performing GWAS (e.g. Haseneyer et al. 2010). To date, none are artificially constructed populations such as nested association mapping (NAM; McMullen et al. 2009) or MAGIC (Cavanagh et al. 2008) that are promoted as exploiting the power of both linkage analysis and association mapping approaches and designed to avoid the population structure issues that inflate false-positive associations in natural populations. Such populations are currently under development (http://triticeaecap.org/?q=node/1). Examples of some of the populations already used for GWAS are as follows.

8.1 Wild Barley Populations

Steffenson et al. (2007) assembled a Wild Barley Diversity Collection (WDBC) comprising 318 accessions selected on the basis of eco-geographic parameters that included longitude/latitude, elevation, high/low temperature, rainfall and soil type. Most were from the Fertile Crescent, Central Asia, North Africa and the Caucasus region. Single plant selections were repeatedly selfed to near homozygosity and the resulting inbreds genotyped using 558 Diversity Array Technology (DArT®; Jaccoud et al. 2001) and 2,878 BOPA1 and BOPA2 SNPs. GWAS was conducted after correcting for structure, initially for leaf, stem and stripe rust (Steffenson et al. 2007) and latterly for spot blotch (Roy et al. 2010) resistance. 13–15 significant associations of small effect, some corresponding with the location of known resistance genes, were detected for each phenotype. Given the expected extent of LD in the WDBC (Caldwell et al. 2006; Morrell et al. 2005), these results are somewhat surprising and it will be interesting to see if any of the detected associations are subsequently validated. It is tempting to speculate that SNP ascertainment issues, combined with low levels of recombination in the genetic centromeres may have played some role in these findings.

8.2 Landraces

A European Union-funded project under the acronym EXBARDIV (http://pgrc.ipk-gatersleben.de/barleynet/projects_exbardiv.php) was founded on the hypothesis that stratified germplasm collections may allow genetic resolution to be manipulated in GWAS by shuttling between cultivated, landrace and wild association mapping populations. The Europe-wide team assembled a collection of 360 elite European barley cultivars (overlapping with the UK AGOUEB Project summarised below), 480 landraces from Jordan and Syria and known as the ICARDA Syrian-Jordanian Landrace Collection (SJLC; Ceccarelli et al. 1987) and two sets of wild barleys, including a subset of 131 individuals from the WBDC summarised above. These lines have been phenotyped for a wide range of characters at multiple sites across Europe and simultaneously genotyped with the barley 9K iSELECT SNP platform. Several manuscripts describing the analysis of the data associated with several of these phenotypes are currently in the pipeline (unpublished). In addition, Casas et al. (2011) surveyed the Spanish Core Collection of barley landraces (Igartua et al. 1998) to identify candidate genes affecting flowering time variation by GWAS. There are, however, few other GWAS studies specifically of barley landraces. Some include landraces as a subset of a wider germplasm collection, e.g. Comadran et al. (2011b), and others have used a limited number of SSR markers, e.g. Jones et al. (2011).

8.3 Cultivars

Several populations have been assembled specifically to exploit the potential power of GWAS in cultivated barley material starting with the relatively small population used in the original studies of Kraakman et al. (2004, 2006) and Kraakman (2005). We focus on two of these here. However, whilst we highlight these major efforts, other association mapping populations have been assembled and that have now exploited using the BOPA marker technology. These include MABDE (Comadran et al. 2009), EXBARDIV (see above) and GABI-Genobar (Rode et al. 2012), and results from these are now starting to emerge in the literature.

8.3.1 Barley CAP

In order to conduct association mapping (AM) studies of economically important traits in US barley breeding germplasm, a panel of 3,840 US barley breeding lines originating from 10 major breeding programmes was assembled and genotyped with 3,072 SNPs (BOPA1 and BOPA2). Population structure was examined using the programme STRUCTURE (Pritchard et al. 2000) and principle component analysis (PCA), revealing 7–9 sub-populations with some correspondence with the different breeding programmes (Hamblin et al. 2010; Zhou et al. 2012). The major population subdivisions were imposed by inflorescence morphology (two-row versus six-row) growth habit (spring vs. winter) and end use (malt vs. feed). Average LD within sub-populations was found to decay across a range of 20–30 cM in Hamblin et al. (2010) and between 4.0 and 19.8 cM in Zhou and Steffenson (2012) as determined by calculating r 2. The authors estimated that quantitative trait loci (QTL) should be detected in their population with a 50 % probability within a genetic interval of 5 cM and with 95 % probability within 25 cM. These and other studies using subsets of the Barley CAP material (e.g. Cuesta-Marcos et al. 2010; von Zitzewitz et al. 2011; Wang et al. 2012; Massman et al. 2011) and phenotypic data from breeding programmes, were able to detect QTL previously detected in other studies, validating the investment in the association mapping approach. However, none so far have advanced as far as identifying the causal underlying genes. In each of these studies, the authors stress that careful consideration must be given to population diversity, size and experimental design.

8.3.2 AGOUEB

The AGOUEB (pronounced Ag-web) consortium was established as a public/private partnership in the United Kingdom and was set up to explore the diversity present in European plant breeding programmes using contemporary molecular marker technologies (BOPA1 and BOPA2). Using the same marker platform as Barley CAP, Cockram et al. (2010) genotyped a collection of c. 500 cultivars selected from UK registration trials over the past 20 years. As with Barley CAP significant population structure was detected generating high levels of false-positive associations between markers. Significant intrachromosomal LD was observed across the full length of chromosomes (mean distance between significant marker pairs = 40.2 cM, median = 30.7 cM, similar to that observed by Hamblin et al. (2010) in US germplasm). However, after adjustment using a mixed model to take account of population structure, this was reduced to <10 cM (mean = 1.2 cM, median = 0.6 cM), with the proportion of significant inter-chromosomal associations controlled to just 0.1 %. They examined historical phenotypic data for 32 different morphological traits, successfully identifying loci controlling 15 and attributing failure in the other 17 cases to low-quality or variably recorded phenotypic data (e.g. Fig. 18.4). Cockram et al. (2010) also modelled the power to detect 1, 2 and 10 independent loci distributed randomly across the genome, with heritabilities (h 2) of 0.5 and 0.9. Using a mixed model to correct for genetic substructure, simulations based on a trait controlled by one locus predicted that their experimental design had a high probability (≥0.92 for both values of h 2) of detecting significant (q value ≤0.1) associations within windows of ≤8 cM. However, for a ten-locus trait, they reported that the power to detect one or more loci after correction with the mixed model was low (0.25, h 2 = 0.5; 0.58, h 2 = 0.9). As with Barley CAP the issues associated with using highly structured populations in AGOUEB were therefore again highlighted as a potential impediment to successful GWAS.

Fig. 18.4
figure 4

GWAS for three morphological characters—sterile spikelet attitude, auricle anthocyanin intensity and hairiness of the leaf sheath using 1,536 SNPs on a collection of c. 500 mixed barley cultivars (adapted from Cockram et al. 2010). Resolution to single-gene level was achieved for anthocyanin pigmentation where a deletion in HvbHLH1 was shown to be the causal polymorphism

9 GWAS to Single Gene Resolution

An advantage of GWAS over the use of biparental populations for trait dissection is that the amount of recombination that has occurred in the population should potentially afford single-gene resolution provided that the gene target does not reside in a genomic region with restricted recombination rate, such as the peri-centromeric heterochromatin. Whilst the success of this depends on a large extent on the population assembled, several examples now exist in the literature where this has indeed turned out to be the case. In Arabidopsis, Atwell et al. (2010) provide a number of examples where large-scale phenotyping combined with high-resolution genotyping and GWAS has identified a significant enrichment of a priori candidate genes for a wide range of traits. Thus, Todesco et al. (2010) demonstrated that allelic variation at ACCELERATED CELL DEATH 6 was responsible for fitness benefits elaborated as resistance to microbial infection and herbivory. However, the same locus also had a marked impact on pleiotropic variation in vegetative growth. In the maize-nested association mapping population, Tian et al. (2011) recently showed that variation in leaf angle and size, parameters that have allowed maize planting density to be increased due to more efficient light capture, is partially controlled by allelic variation at the LIGULELESS genes. Similar successes have been achieved in a collection of c. 500 rice landraces (Huang et al. 2010).

In barley there are currently three examples in the literature of the successful use of GWAS to single-gene resolution (Fig. 18.3). In the first, Cockram et al. (2010) clearly demonstrated that this level of resolution was achievable in a germplasm collection comprised of winter and spring, two-rowed and six-rowed elite barley cultivars. By focusing on a robust single-gene phenotype, the presence or absence of anthocyanin pigmentation, they were able to show that variation in the anthocyanin pathway regulatory gene HvbHLH1 was responsible for the observed phenotype. ‘White’ alleles contained a diagnostic deletion that resulted in a premature stop codon upstream of the basic helix-loop-helix domain. By assaying for the presence of the deletion in a collection of ‘red’ and ‘white’ alleles present in landrace germplasm originating from across Europe, they were able to infer the geographical origin of the white allele and map its subsequent spread throughout Europe.

In the second, Ramsay et al. (2011) were able to identify and prove that SIX-ROWED SPIKE 5 (INTERMEDIUM-C), a gene that affects barley row type, was a functional orthologue of the maize domestication gene TEOSINTE BRANCHED 1. They achieved this despite the phenotype being a cause of major population subdivision in the germplasm used in the analysis. Although it is a simple two-state morphological character, GWAS identified four highly significant associations, suggestive of strong epistatic interactions. As would have been predicted, one association peak mapped to the SIX-ROWED SPIKE 1 (Vrs1) locus (Komatsuda et al. 2007), another with SIX-ROWED SPIKE 5 and the remaining two with separate loci on chromosome 1H. One of these latter loci has subsequently been shown to correspond to the SIX-ROWED SPIKE 3 locus (our unpublished results). Importantly, Ramsay et al. (2011) were able to validate their candidate gene using a legacy collection of independent spike mutants (Druka et al. 2011) that had previously been attributed to lesions in SIX-ROWED SPIKE 5 by allelism tests.

Finally Comadran et al. (2012) used a modified analytical approach based on divergent selection between the winter and spring barley gene pools to identify regions of the barley genome where contrasting alleles had been selected in these different lifestyle types. They eventually focussed on one such region which from QTL studies had been called EARLINESS PER SE 2 and mapped as the major determinant of earliness in a study examining adaptation of barley to droughted environments. Using available mutant resources they were able to show that the gene responsible for the observed phenotype was the barley orthologue of the Antirrhinum majus gene CENTRORADIALIS, a paralogue of the Arabidopsis flowering repressor TERMINAL FLOWER 1 (TFL1). Within our group we have now used GWAS to identify a number of additional genes and validated them using the same strategy, i.e. with independent barley mutants.

Conclusions

The successes in GWAS-associated identification of gene alleles encoding barley traits described above bode well for the future of this approach, especially since the potential power of the method is continuously increasing. It is not unreasonable to predict that in the next few years, hundreds of thousands of polymorphic sites that are mapped on a reliable physical framework for the barley genome will become available for GWAS in barley. Furthermore, the arrival of GWAS populations with lower substructure, more allelic variation and higher numbers of recombination breakpoints will increase the mapping resolution. In such circumstances single-gene resolution for GWAS will become commonplace.

Future directions of GWAS in barley will to some extent be driven by the falling cost of genotyping associated with next-generation sequencing technologies (NGS). Given the potential to saturate marker coverage of the genome, the discriminatory power of GWAS in barley will be determined by the size of the population studied and the patterns of LD and population structure within the population. The use of large more genetically balanced populations that are specifically developed for GWAS (McMullen et al. 2009; Cavanagh et al. 2008) will undoubtedly play an increasing role though recombination rates in this inbreeding crop will continue to be a limiting factor particularly in certain regions of the genome. In addition to the importance of choice of population, the potential discriminatory power of GWAS will certainly concentrate more attention onto experimental design and the opportunities offered by high-throughput phenotyping. Whilst it is now possible to conduct QTL x environment AM analyses using Genstat (VSN International 2011), current analytical methods are largely single-locus additive models. Future analytical developments will lead to multi-locus models with the potential to detect epistatic interactions, as now in biparental QTL mapping. Finally the discrimination of GWAS in barley down to the gene level will also necessitate the further development of validation strategies and the integration of future population studies with developments in functional genomics and systems analyses in the crop.

To avoid the majority of the potential issues with population substructuring, we have assembled a population of approaching 1,000 two-rowed spring barley varieties that exhibit low population substructure and show similar morpho-developmental characteristics (particularly flowering time). We are currently using this population extensively to investigate a range of simple and more complex traits, and our experience to date suggests that such populations do simplify underlying genetic complexity making it more amenable to statistical interpretation (Waugh et al. 2010). This population is a powerful resource for future genetic analysis in barley, and we welcome collaboration with groups who would like to exploit the power and resolution it affords.