Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

As genotyping becomes more accessible with faster and cheaper DNA sequencing technologies and single nucleotide polymorphism (SNP) platforms, an ever-increasing number of sequence polymorphisms are revealed in various plant species. In maize, sequence comparison of six recently sequenced inbred lines revealed more than 1,000,000 SNPs and 30,000 insertions/deletions (Indels) in the maize genome (Lai et al. 2010). Knowledge about the effect of polymorphisms on trait variation has also been increasing with results from association and nested association mapping studies. Respective quantitative trait polymorphisms (QTP), sequence polymorphisms associated with phenotypic trait variation, can be converted into Functional Markers (FMs) (Andersen and Lübberstedt 2003). Contrary to linked markers, FMs are derived from polymorphisms causing phenotypic variation.

The development of FMs, as described by Andersen and Lübberstedt (2003), requires the knowledge of functionally characterized loci. Once polymorphic sites are identified within those functional loci, statistical models can be used to test for genotype-phenotype associations. Such association studies provide inferential, i.e., statistical, evidence for correlations, not necessarily reflecting biological causality. Validation of trait-associated polymorphisms, through gene introgression provides biological evidence of functionality. Importantly, functional polymorphisms can be converted into technical assays using, e.g., any of the SNP or insertion/deletion (Indel) detection technologies (Appleby et al. 2009; Gupta et al. 2008).

Examples of QTP discovery through association studies include starch biosynthesis (Wilson et al. 2004), cell wall digestibility (Andersen et al. 2008; Brenner et al. 2010; Guillet-Claude et al. 2004), flowering time (Salvi et al. 2007; Thornsberry et al. 2001), carotinoid biosynthesis (Harjes et al. 2008; Palaisa et al. 2003), inflorescence architecture (Bortiri et al. 2006), kernel properties (He et al. 2008; Shi et al. 2008), resistance to bacterial blight (Iyer-Pascuzzi and McCouch 2007), and fruit quality (Costa et al. 2008; Ogundiwin et al. 2008). Thornsberry et al. (2001) pioneered association mapping in plants by developing linkage disequilibrium (LD) mapping and employing this to identify associations between polymorphisms within the Dwarf8 (D8) gene affecting flowering time and plant height. The study was based on 92 diverse maize inbred lines from four populations: Stiff Stalk, non-Stiff Stalks, tropical, and semi-tropical. The association analysis, which was correcting for population structure, identified nine polymorphisms significantly associated with flowering time. Andersen et al. (2005) investigated the applicability of these nine polymorphisms as FMs in an independent set of 71 elite European inbred lines. Ignoring population structure, six of the nine polymorphisms were significantly associated with flowering time, and none with plant height. However, when population structure was considered, only one association between a 2-bp Indel in the promoter region and plant height remained significant, while no association was observed for flowering time. Camus-Kulandaivelu et al. (2006) evaluated a 6 bp Indel identified by Thornsberry et al. (2001) in a larger population consisting of 375 inbred lines and 275 landraces from United States and Europe. This QTP was confirmed to be associated with flowering time under long-day conditions, with different estimated allelic effects for inbreds and landraces.

This example illustrates that availability of qualified candidate genes can facilitate development of informative molecular markers by means of association studies. QTPs may, however, not be consistent across genetic backgrounds and environments. In this review, the challenges in development, estimation of genetic effects for, and application of FMs in plants are discussed.

Power, Precision and Accuracy in QTP Detection

The goal of genetic mapping studies is to identify genomic regions associated with observed phenotypic variation. In plants, linkage mapping started as a great promise to reveal chromosome fragments with higher-than-expected associations with phenotypic variation observed in segregating bi-parental populations. Today, thousands of linkage mapping experiments have been reported (Behn et al. 2004; Blanc et al. 2006; Byrne et al. 1998; Buerstmayr et al. 2003; Pinson et al. 2005; Tang et al. 2000). Identified QTLs and estimated QTL effects, however, have been rarely consistent across and even within populations, and only a minority has been used for cultivar improvement (Bernardo 2008). The reasons for lack of repeatability and application of QTL identified from linkage mapping experiments have been extensively discussed (Beavis 1994; Bernardo 2008; Scho et al. 2004). Most linkage mapping experiments have resulted in inconsistent QTL with overestimated effects mainly due to small population sizes, stringent significance levels, and interactions with different genetic backgrounds and environments (Beavis 1994; Bernardo 2008; Xu 2003a). In addition, the limited number of recombination events accumulated in populations commonly used in linkage mapping experiments (i.e., F2 and backcross) makes it difficult to narrow the associated regions to fewer than several megabases. As a consequence, the identification of causative genes usually requires development of further recombinants of at least 500 individuals for adequate power (Beavis 1994; Lee et al. 2002).

Compared to linkage mapping in families, association mapping in populations can potentially reveal the genetic basis of phenotypic variation with much greater genetic resolution and even identify QTPs. Contrary to linkage mapping, association mapping does not rely on a controlled bi-parental segregating population, but on a collection of lines not necessarily sharing a pedigree, and therefore takes advantage of historical recombination events accumulated among lines. Smaller LD blocks due to accumulated recombination events allow greater genetic resolution, and require, as a consequence, a much higher marker density as compared to linkage mapping within families. In some plant species, like maize, reduced LD combined with large genomes may require hundreds of thousands of molecular markers to adequately cover the genome (Brown et al. 2004; Ching et al. 2002; Flint-Garcia et al. 2005; Hyten et al. 2007; Yu et al. 2008). Evaluating a massive number of markers requires multiple testing corrections to control for false positives, thus decreasing the power of identifying markers associated with phenotypic variation. Reduced power is even more problematic for quantitative traits governed by multiple genes with modest or small phenotypic effects, or alleles with strong phenotypic effect but at low frequencies in the association panel.

A third approach based on the concept of combining LD with linkage mapping has been referred to as Nested Association Mapping (NAM) (Yu et al. 2008). Several NAM populations have been developed in plant species (Guo et al. 2010). Typically these consist of families of Recombinant Inbred Lines (RILs) derived from a sample of inbred lines crossed to a reference inbred line. The relationships among progeny within families are inbred full sibs, while relationships among progeny from different families are half sibs (Bernardo 2002). NAM populations consisting of doubled haploid lines or RILs are “immortalized”, meaning that homozygous lines within each family can be evaluated in numerous locations and years without confounding effects of genetic segregation (Nordborg and Weigel 2010).

Yu et al. (2008) developed and released a maize NAM population consisting of 25 families with 200 RILs for each family. Simulations from Guo et al. (2010) suggests that NAM populations similar to the one developed from Yu et al. (2008) have adequate power to accurately and precisely identify additive polymorphisms contributing at least 5% of the variation in the phenotype. Guo et al. (2010) also observed that the resolution and power to detect QTP is maintained even if non-functional alleles are in LD with the causal variant. Two recent studies in the maize NAM populations identified alleles with small effects in association with southern leaf blight resistance and leaf architecture in maize (Kump et al. 2011; Tian et al. 2011). These results demonstrate the potential of nested designs to identify QTPs.

Challenges in Functional Marker Development: LD, Epistasis, Environmental and GxE Effects

Depending upon the sample size, LD, and genetic architecture, the mapping approaches discussed above usually identify genomic intervals associated with phenotypic variation, and FM development requires the identification of the functional variants (SNPs/Indels) within these intervals. The discrimination of functional vs. non-functional variants is often complicated by LD within candidate loci, where non-functional alleles may be associated with phenotypic variation when in LD with functional ones. Varying levels of LD have previously been observed between genes of the phenylpropanoid pathway, decaying within few hundred bps for CCoAOMT2 and COMT (Guillet-Claude et al. 2004; Zein et al. 2007) while spanning more than 3.5 kb at the PAL locus (Andersen et al. 2007). Even in populations with substantial intragenic decay of LD, adjacent polymorphic sites might still be in high or complete LD, leading to an overestimation of SNPs/Indels associated with the investigated phenotype. The identification of causal genetic polymorphisms is a difficult task, and statistical evidences and the biological nature of candidate variants may have to be analyzed mutually in order to discriminate QTPs from closely associated non-causal polymorphisms. SNPs located in coding regions causing non-synonymous non-conservative amino acid changes are more likely to be functional than non-synonymous conservative and synonymous amino acid substitution (Risch 2000). Although SNPs have received more attention in mapping studies, Indels involve larger segments of DNA, and when disrupting or causing frame shifts in coding sequences, are more likely to cause phenotypic variation (i.e. loss of function mutants). Such extreme phenotypes are more likely eliminated or fixed by (natural) selection, and as a result, Indels are usually less frequent in populations as compared to SNPs in genic sequences (Clark et al. 2007; Jones et al. 2009).

Polymorphisms in non-coding regulatory regions are potentially major sources of phenotypic variation when regulating gene expression, while variants in intronic regions may create or delete a splicing site (Talerico and Berget 1990). Salvi et al. (2007) identified a non-coding cis-acting regulatory element located 70 kb upstream of an Ap2-like transcription factor which is involved in flowering time. Clark et al. (2004) and Camus-Kulandaivelu et al. (2008) also identified cis-acting regulatory in regions 60 and 100 kb upstream of the Tb1 and D8 genes, respectively. In effect, the search for QTPs should not be limited to exonic regions, but ideally should also encompass regulatory and intronic regions with potential impact on the investigated trait (Polidoros et al. 2009).

Besides reducing the resolution of association mapping, another LD-related issue is the identification or development of optimal QTP haplotypes when several polymorphisms within the target locus affect the trait of interest. This is a concern especially when favorable QTP alleles for one trait are closely linked to QTP alleles with unfavorable effect on other traits (Chen et al. 2010). If not available in the characterized population, development of optimal QTP allele combinations based on intragenic recombination events might be difficult to achieve, even by use of large populations and intragenic markers. Alternatively, exotic germplasm might provide a source for novel intragenic combinations of QTP alleles. More recently, the use of Zinc finger nucleases (ZFNs) has been proposed as a promising technology to replace alleles by homologous recombination (Shukla et al. 2009). The induction of recombination in defined genomic intervals is, therefore, a promising approach to develop optimal QTP haplotypes even within large LD blocks.

Even after true QTPs have been identified, their transferability might be affected by the composition of populations in different studies, both with regard to allele frequencies at the target locus, and structure of the respective populations. D8 is the only example in plants so far, where the same locus has been studied independently in different experimental populations of inbred lines (Andersen et al. 2005; Camus-Kulandaivelu et al. 2006; Thornsberry et al. 2001). When correcting for population structure, the QTPs identified by Thornsberry et al. (2001) were not significantly associated with flowering time in the study of Andersen et al. (2005) as haplotypes were confounded with population structure in the latter study. Other factors, apart from population structure, with potential impact on the detection of QTPs are epistasis, dominance (so far, association studies in maize were conducted at line per se level), as well as environment and genotype by environment effects.

If the effects of an allele depend on a second allele, either in the same or different loci, the power to detect associations and the accuracy of estimated allelic effects are reduced. Dominance effects cause deviations from additive effects of alleles belonging to the same loci, and simple additive models not accounting for dominance would lead to biased estimation of allelic effects. The relevance of dominance bias for any given trait is directly dependent on the ratio between dominance and additive variances, and it might be reasonably neglected if dominance effects are weak (Hill et al. 2008). In some crop species, the use of RILs or DH lines gives the opportunity to estimate allelic effects free from dominance deviations, permitting more accurate phenotypic predictions of the progeny. In crops evaluated as hybrids, additive effects are still likely the major source of genetic variance among hybrids, but non-estimated dominance effects will probably contribute to phenotypic variation, causing deviations from predicted additive values.

Similarly, epistasis, i.e., the non-additive interaction among alleles at different loci can bias estimates of allelic effects (Cheverud and Routman 1995). Epistasis estimates are often limited by the number of loci included in respective models (Carlborg and Haley 2004). If interacting alleles are not considered or are unknown, it is not possible to model epistatic effects and their consequences in association analysis and FM development. For this reason, if a candidate gene is suspected to interact with other genes, e.g., those belonging to a common genetic network, associations identified for a single gene might be inaccurate and misleading. Numerous mapping studies have detected QTL × QTL epistasis as a statistical feature causing deviation from expected additive effects (Juenger et al. 2005; Yang et al. 2010; Zhang et al. 2008), but only a few studies have investigated gene × gene interaction affecting the phenotypic variation in plant association mapping populations (Li et al. 2010; Manicacci et al. 2009; Stracke et al. 2009).

Mapping experiments often require large population sizes for adequate power to identify QTL and accurately estimate their effects. Collecting phenotypic data across multiple environments, years, and replications is costly and challenging, and accommodating large populations in multiple environments require more efficient experimental designs involving incomplete blocks, e.g., augmented or alpha-lattice designs. Inadequate experimental designs not controlling environmental noise lead to inaccuracy in phenotypic estimation and subsequent identification of QTL and estimation of their effects, even if population sizes are adequate. Control of environmental variation within (with number of plants/plot) and among experimental rows (replications/location) are essential for estimating environment variance within locations. Experiments in multiple locations also account for Genotype x Environment interactions (GxE). Using marginal means across locations might lead to inaccurate associations and estimations of allelic effects, if GxE is significant. In case of weak genetic correlations across environments, association analyses should be conducted on an individual location basis. Clustering environments according to their genetic correlations for all pairwise comparisons across environments (Cooper and DeLacy 1994) is an alternative to classify environments into a smaller number of mega-environments based on their influence on GxE.

In conclusion, the genetic effects of QTPs are background, population, and environment dependent (Fig. 16.1). We propose to employ the term “potential” to describe the presence of a beneficial QTP allele, since this term reflects a certain potential of trait expression and is analogous to the risk concept in human genetic diseases, depending on the genetic effect and penetrance of the respective allele.

Fig. 16.1
figure 1

Association between phenotype and genotype, and key components potentially impacting the identification of FMs and the estimation of their effects: LD, population structure, GxE, epistasis

In humans, the relative risk of an individual developing a complex disease is estimated by taking into account genetic and non-genetic (i.e.: sex, age, diet, ethnicity, and others) variables. The genetic component of risk assessment is based on odds ratio: the odds of a disease occurring in individuals with a certain allele versus the odds of this disease occurring in individuals without this allele. When more than one gene (marker) is considered, the genetic risk of an individual corresponds to the product of odds ratios of individual alleles (Risch 1990; Wray et al. 2007). The same principle may be applicable in plants. Once lines are genotyped for a FM, breeding values for each genotypic class of this FM can be estimated across lines, environments and years, leading to a normal distribution of breeding values for each genotypic class. These distributions can be further characterize for their “displacement” (Risch 2000), which is defined as the number of standard deviations of the average effect of one homozygous genotypic class in relation to the other. Mendelian alleles with strong phenotypic effects are likely to have larger displacement, while alleles from genes affecting complex inherited traits are likely to have smaller displacements (Fig. 16.2).

Fig. 16.2
figure 2

Potential (P) of genotypes from a FM to pass the threshold of a fictional trait. Probabilities vary according to means and distributions. Mendelian traits (A) usually display larger displacements and larger differences in probabilities, while differences are subtle in quantitative traits (C) (Based on Risch 2000)

Even though the estimation of displacement shows the average effect of one allele in relation to the other, it does not directly measure the likelihood of a genotype to contribute to a desirable phenotype. The “potential” of an allele contributing to a phenotype of interest requires establishment of a threshold separating undesirable from desirable phenotypes (Fig. 16.3). In plant breeding, the threshold may be defined as a value above the mean phenotype of the best commercial lines (normally used as checks in breeding experimental designs). The estimation of the potential of an allele would be defined as odds of lines passing the threshold with a certain FM genotype versus the odds of lines passing the threshold without this FM genotype.

Fig. 16.3
figure 3

Using the genotype-phenotype relationship to implement selection models. The level of genetic characterization of this relationship may vary from none, such as in genomic selection (GS), to highly characterized, such as in functional markers (FMs)

Systematic Collection of Genotypic and Phenotypic Information

Marker and phenotypic data accumulate as mapping experiments designed to investigate genotype-phenotype associations and/or assist breeding decisions are performed. Combining information from different mapping experiments via meta-analysis is a promising approach to enhance statistical power, reduce type 1 errors, and evaluate effects of QTL/QTP in a broader set of genetic backgrounds and environments (Heo et al. 2001). Combining data, however, is not straightforward. The definition of a phenotype and how it is measured is seldom consistent across research groups. Although standard phenotyping techniques are a common practice in the private sector, it would require dialogue among researches in public institutions to reach a consensus. Additionally, detailed description of the experiment including information on germplasm (i.e. maturity), locations, number of replications, check lines, and statistical design, would be required for any researcher to access if an experiment should be considered for meta-analysis or not. Locations and years are not only relevant for estimation of interactions between FM and environments, but the detailed description of an environment (such as maximum/minimum daily temperatures and precipitation) might be important for specific research goals. In drought tolerance studies, for example, temperatures, and amount/distribution of precipitation during different plant development stages is essential information to map drought tolerance genes, given that maize responds differently to water stress in different developmental stages (Barker et al. 2005). With this knowledge, breeders would be able to cluster environments according to relevant climate parameters, and evaluate FM potential in different lines (backgrounds) growing in environments with stress occurring in specific developmental stages.

Besides meta-analysis, comparing mapping outcomes across independent studies is a valuable approach for accessing QTL consistency, refining estimations of QTLs/QTPs effects and narrowing QTL intervals. Pooling results from different mapping experiments is a popular practice in human genetics, where different research groups combine and compared outcomes from large genome wide association studies (GWAS) for common complex diseases, such as type 2 diabetes, coronary disease and breast cancer (McPherson et al. 2007; Scott et al. 2007; Stacey et al. 2007).

In plants, most mapping experiments have consisted of single experiments designed for QTL detection, while less attention has been given to meta and post-hoc analysis. In SoyBase, the USDA-ARS soybean genetics and genomics database, does not routinely archive raw experimental data from QTL experiments (David Grant, pers. comm.). Although the availability of such data could be used to improve QTL mapping, the soybean community has not traditionally done these analyses and so has not made the raw data available. As a consequence the genetic maps in SoyBase are constructed post-hoc by placing the published QTL positions onto a reference genetic map framework using linear scaling between this framework and the reported results. The interpretation of this composite genetic map is complicated by the facts that (1) many of the reported QTL were identified only by analysis of variance (ANOVA) based on a subset of the markers, and (2) the methods and nomenclature used for phenotypic measurements in different experiments are inconsistent. In addition, the choice of QTL mapping procedures has several important ramifications. First, QTL controlled by the same underlying gene can often show different positions due to variation in marker numbers and locations across experiments. Second, the position of the underlying gene cannot be determined relative to the reported QTL. And third, it is not possible to determine the effect of a QTL since the effect and QTL position are confounded if no composite interval mapping is used.

MaizeGDB, the USDA-ARS genetics and genomics database, contains archives for a subset of the raw data for QTL mapping experiments (Carolyn Lawrence, pers. comm.). However, because there is no community agreement on the necessity for submitting such data, it is not possible to do any comprehensive re-analysis of the data due to its incompleteness. As is the case for SoyBase, inconsistencies in trait measurement methodologies and nomenclature along with often imprecise QTL positions impair the ability to compare results between studies.

The current constraints on cross-population comparisons are being addressed by both databases. MIQAS (Minimum Information for QTL and Association Studies, http://miqas.sourceforge.net/) will be adopted to ensure that all QTL studies report a critical minimum of information about a given QTL. In particular, researchers will be encouraged to use interval mapping to identify and position QTL rather than simple ANOVA. Standard ontologies for traits and, where possible, accepted methods used to measure them are being developed.

The Buckler lab has developed standardized phenotyping tools in maize (http://www.maizegenetics.net/phenotyping-tools) which could develop into community standards. Also, all phenotypic data from the NAM population will be made publicly available. This together with the NAM GWAS (http://cbsuapps.tc.cornell.edu/namgwas.aspx) will facilitate unprecedented in silico mapping opportunities. Together these improvements to the public databases will facilitate the re-analysis of combined trait and mapping data from multiple populations. This should produce refined genetic positions for QTL which are needed to identify candidate genes.

Application of Functional Markers

Resulting from the rapid progress in sequencing technology, the genomic sequence of additional maize inbreds beyond B73 is already reality (Lai et al. 2010). Projects like the NAM community approach (Kump et al. 2011; Tian et al. 2011; Yu et al. 2008) will lead to accumulation of further characterized genes and QTPs of agronomic relevance. Thus, the number of functionally characterized polymorphisms in maize as prerequisite for functional marker development will substantially increase over the next decade. FMs might be useful for various steps along the process of cultivar development. These include (1) identification of novel or better alleles (QTPs haplotypes) for characterized genes in exotic germplasm collections, (2) identification of complementary parents for development of new inbreds, (3) description of the “genetic potential” of new inbreds, and (4) variety registration and description. FMs will also be essential to test for negative pleiotropic side-effects. This will in addition lead to a better understanding of the nature of trait correlations, or “pleiotropic” effects described for major genes (Chen and Lübberstedt 2010). Various studies found close genetic correlations between plant height and flowering time. Interestingly, flowering time associated polymorphisms in D8, a gene initially identified by its mutant allele leading to dwarfing, had no effects on plant height (Thornsberry et al. 2001). Similarly, mutant alleles of brown midrib genes in maize were found to affect other agronomic characters, including plant height and biomass yield (Pedersen et al. 2005). However, none of the polymorphisms within the Bm3 gene affecting forage quality affected any of these agronomic traits (Chen et al. 2010). In conclusion, for composition of optimal haplotypes for genes shown to affect one or more traits of interest, multiple traits need to be considered.

It remains to be seen, how FMs will contribute to marker-assisted (recurrent) selection, in particular as compared to genomic selection (GS) procedures based on low cost markers without requirements on their functional characterization (Bernardo and Yu 2007). Although most empirical studies of GS are still limited, accurate estimates of breeding values combined with the possibility for selection of kernels before planting (by seed-chipping) and selection in off-season winter nurseries makes GS very promising for maximizing genetic gain in breeding programs, especially when compared to marker assisted selection based only on markers with statistically significant trait associations (Bernardo and Yu 2007; Heffner et al. 2009; Mayor and Bernardo 2009). GS has been described as brute-force and black box procedure to increase genetic gain (Bernardo and Yu 2007), as selection is based on a large number of markers without prior knowledge of QTL positions or genetic mechanisms involved in phenotypic variation. Markers in LD with favorable QTL receive a large estimated breeding value, even if the QTL is unknown. In GS, lines are selected based on the sum of estimated breeding values of markers across the whole genome, rather than site specific introgression of significant QTL.

Current research on GS is focused on developing statistical methods that incrementally improve the accuracy, i.e., the correlation between predicted and observed breeding values of individuals in a breeding population (de los Campos et al. 2009; Gianola et al. 2006; Habier et al. 2007; Heffner et al. 2009; Kizilkaya et al. 2010; Xu 2003b; Zhong et al. 2009). Alternatively, as functional genomic knowledge increases it seems reasonable to hypothesize that the concept of gene pyramiding could be extended to genome assembly (GA) for polygenic traits. To our knowledge GS has not been compared with gene pyramiding, much less GA. The question is: what criteria should be used to make such a comparison? While genetic gain, or its accuracy component, is a simple criterion, it is not realistic. Actual breeding decisions are based upon multiple breeding objectives, such as maximizing genetic gain, while maintaining genetic diversity throughout the genomes of the breeding population.

Xu et al. (2011) used an operations research approach to address the challenges imposed by varying degrees of LD among favorable functional alleles to assemble a desired phenotype in minimal time while avoiding loss of genetic diversity for other loci in a population. Importantly, using an optimization approach changes the framework for evaluation from a simple criterion of accuracy to the more realistic situation of meeting multiple breeding objectives simultaneously. Hypothetically, GA, based on knowledge of FMs, LD, and genomic diversity should outperform GS for realistic breeding objectives. The likely outcome will be conditional, i.e., depending upon the structure of the breeding population, genetic architecture of the trait, and genome structure we will likely find Pareto Frontiers describing when the hypothesis is true and when it is false.

The question remaining is how GS would benefit from an increasing number of characterized functional genes. Calus et al. (2008) showed that haplotype versus random marker-based GS is more efficient to predict breeding values. It therefore appears likely that marker-multiplexes employed in GS procedures based on previously characterized QTPs are at least superior to random markers in populations with low LD. For populations with high LD, where markers are more likely to be in LD with favorable QTL, prior knowledge of FM might not improve genetic gain in GS (Ødegård et al. 2009). The contribution of FM to the genetic gain in this case, however, will come from an increasing knowledge of allele effects, distributions, and environment/genetic interactions.

Perspective: Future Opportunities

New sequencing platforms have motivated genome sequencing projects in larger populations in different species. In humans, the 1000 Genome project was launched in 2008 as a consortium involving more than 75 universities and companies worldwide. The goal is to sequence genomes and reveal sequence polymorphisms in more than a thousand individuals from different ethnic groups. Another large sequencing initiative is the Genome 10K Project, aiming to sequence the genomes of 10,000 vertebrate species by 2015 (http://genome10k.soe.ucsc.edu/). In plants, the 1001 Genomes Project was initiated in 2008, with the objective of revealing whole-genome sequence variants in 1,001 accessions of Arabidopsis thaliana. In maize, seven inbred lines have been resequenced by public institutions in United States and China (Lai et al. 2010; Schnable et al. 2009).

The challenge will be translating this huge amount of genomic information into QTL, QTPs and FMs for crop improvement. In plant breeding, the importance of a marker normally depends on how it predicts the phenotype, and accurate predictions depend on accurate estimation of marker effects based on phenotypic evaluations. Although phenotyping has become more efficient over the years with larger and automated field machinery and hand held computers, field characterization of breeding lines normally requires large allocation of land and labor work. As genotyping costs reduce, phenotyping becomes the major bottleneck in marker assisted breeding. More recently “phenomics”, which is using instruments arrays that allows high through-put screening of thousands of lines consistently in short periods of time, has been suggested as the approach that will make phenotyping “catch up” with genomics (Finkel 2009). The use of phenomics, however, will not surrogate field experimentation, and allocation of land and phenotyping labor will still be necessary for major plant breeding traits.

Another challenge associated with FM development is the biological validation of statistically inferred QTPs. Transgenic constructions require time consuming regulations for field evaluations, and are usually vulnerable to position effects, which substantially affect the expression of genes depending on the (random) introgression site in the genome. Backcrossing has been a traditional approach for introgression of moderate number of alleles, but it has the drawback of introgressing unwanted genome from the donor parent by linkage drag. The magnitude of linkage drag can be minimized by selection of recurrent markers flanking the target region. This approach, however, requires larger populations as flanking markers are closer to the target region (Hospital 2001). Recently, ZFN was introduced as a promising technology to assist allele introgression without some of the drawbacks from transgenic and backcross approaches. ZFN promotes recombination in defined chromosome segments, permitting allele introgression without linkage drag with smaller population sizes (Shukla et al. 2009).

Even though phenotyping, validation, and introgression of favorable QTPs are still major drawbacks, identification of candidate QTPs and subsequent FM development are increasingly reported. A number of FMs have already been developed in different plant species (Fan et al. 2009; Ji et al. 2010; Iyer-Pascuzzi and McCouch 2007; Shi et al. 2008; Su et al. 2010; Tommasini et al. 2006). Developing optimal strategies to integrate this increasing knowledge of functionality of genomic regions, and combining this information with phenotypic and GS will be essential to maximize genetic gain. Most likely, FMs will have to be evaluated on a case by cases basis, where their significance to the genetic gain will depend on the populations and environments of individual breeding programs.