Introduction

Genomic selection was suggested as a novel approach in the context of animal breeding with the potential to lead to a paradigm shift in the design and implementation of livestock and crop breeding programs (Meuwissen et al. 2001). Genomic selection differs from previous strategies such as linkage and association mapping in that it abandons the objective to map the effect of individual genes and instead focuses on an efficient estimation of breeding values on the basis of a large number of molecular markers, ideally covering the full genome (Jannink et al. 2010). As a first step in genomic selection, marker effects are estimated on the basis of a training set of genotypes, which are phenotyped and fingerprinted with dense marker data. In the second step, individuals related to the training population that have been genotyped but not phenotyped are selected based on the estimated marker effects.

Several statistical approaches have been suggested to estimate marker effects such as random regression best linear unbiased prediction (RR-BLUP; Whittaker et al. 2000; Meuwissen et al. 2001) and Bayesian shrinkage regression methods (Meuwissen et al. 2001; Xu 2003; Ter Braak et al. 2005; Calus et al. 2008). Relationships among statistical models have been theoretically investigated (e.g., Piepho 2009). Statistical models have also been compared based on simulation studies, which revealed that the accuracy depends on the genetic architecture of the trait (e.g., Daetwyler et al. 2010), the underlying population structure (e.g., Habier et al. 2007; Zhong et al. 2009), and the applied marker density (Meuwissen and Goddard 2010). Recently, statistical approaches have been compared using empirical data of cattle (Luan et al. 2009), maize, barley, wheat, and Arabidopsis (Lorenzana and Bernardo 2009; Crossa et al. 2010). In many instances, RR-BLUP showed good performance especially with low to medium marker density.

Prediction accuracy of genomic selection was estimated intensively based on simulation studies (for review see Heffner et al. 2009). First empirical results for dairy cattle support the large potential of genomic selection for livestock breeding (e.g., Hayes et al. 2009; Luan et al. 2009). The structure of the mapping populations underlying genomic selection in animals, however, differ strongly compared with mapping populations present in plant breeding programs (e.g., presence of migration, inbreeding, different family structure). For plant breeding, empirical evaluation of the accuracy of genomic selection was done using cross validation in bi-parental populations of maize, barley, and Arabidopsis (Lorenzana and Bernardo 2009). Moreover, Crossa et al. (2010) examined empirically the prospects of genomic selection in a diverse panel of maize and wheat lines. These results suggest that genomic selection can be an effective strategy in plant breeding. Selection in applied plant breeding is, however, not only done within a specific bi-parental cross or within a diverse panel of elite lines but also rather within and among crosses (Wegenast et al. 2008). Evaluation of the accuracy of genomic selection for this scenario is to the best of our knowledge not yet available.

The main goal of our study was to examine the potential of genomic selection using experimental data from a commercial maize breeding program. In particular our objectives were to (1) study the impact of the number of markers and progenies on the prediction accuracy of genomic breeding values, (2) investigate the prediction accuracy of genomic breeding values within and across six bi-parental maize populations through fivefold cross validation, and (3) compare prediction accuracies of genomic breeding values within bi-parental populations applying models with or without population effects as well as with one approach relying on preselected markers with low genetic background interaction effects.

Materials and methods

Genotypic and phenotypic data

The field experiments were described in detail by Steinhoff et al. (2011). Briefly, six F3 populations, with a total of 788 individuals were obtained from a half-diallel cross between four dent inbreds (A, B, C, and D). The number of progenies in each population was varied from 104 to 143 (Table 1). Each F3 plant was selfed to obtain an F3:4 family. Testcross (TC) progenies were produced by mating all the 788 F3:4 lines to an inbred tester from the opposite heterotic group. These testcross progenies and the four parental inbreds were evaluated in 2007 in Italy at ten locations with unreplicated trials for grain yield (Mg ha−1) adjusted to a moisture concentration of 155 g kg−1 and grain moisture (g kg−1) at harvest. Populations were evaluated in separate but adjacent field trials connected with four common checks. In each environment, phenotypic data were adjusted for block effects with four checks. Genotypic variances (σG²) among and within populations were estimated with following random model: \( y_{l} = L + {\text{Pop}} + G({\text{Pop}}) + e \), where y l refers to the adjusted values of genotypes of single locations, L, Pop and G refers to the effect of the locations, effects of the six biparental populations, and effects of the genotypes, respectively. Genotypes were treated as independent; thus, relatedness among the parental lines was not accounted for estimating genotypic variances among and within crosses. Moreover, Best Linear Unbiased Estimates (BLUEs) of testcross progenies and parents were determined by assuming fixed genotypic effects.

Table 1 Number of progenies, genotypic variance (σ 2G ) and heritability (h 2) for each of the six populations, genotypic variance within (σ 2Within ) and among populations (σ 2Among ), as well as heritability assessed across all six populations for grain yield (Mg ha−1) and grain moisture (g kg−1)

Each F3 plant was genotyped with 960 SNPs using Taqman technology (Applied Biosystems 2002). Observed genotype frequencies at each marker locus were checked for deviations from Mendelian segregation ratios and allele frequency of 0.5 using a χ2 test. Appropriate type I error rates were determined by applying the Bonferroni–Holm procedure (Holm 1979). High-quality molecular data were produced with final 857 SNP markers used for the further analyses. The average number of polymorphic markers varied from 272 for Pop-CxD to 469 for Pop-BxD (Table 2). Genetic map distances among adjacent markers of single populations averaged 4 cM. Less than 10% of the marker pairs had a genetic map distance larger than 10 cM.

Table 2 Number of polymorphic markers, accuracy (r GS) of genomic predictions, and the 0.90 confidence intervals (CI) based on estimation and prediction within each population (scenario 2) or estimation across and prediction within six segregating populations (scenario 1b) investigated by fivefold cross validation based on Model A for grain yield and grain moisture

Data analysis

Breeding values were estimated by model \( y = \mu + \sum\nolimits_{j = 1}^{{N_{m} }} {X_{j} a_{j} } + e \) (Model A), where y is a N × 1 vector of BLUEs estimated across locations; N m refers to the number of markers fitted; a j is the effect of the jth marker; X j is a N × 1 vector denoting the genotype of the individuals for marker j, with \( X_{ij} = 0 \) if individual i is homozygous for the first allele at locus j, \( X_{ij} = 1/\sqrt {\left( {2 - F} \right) p_{j} *(1 - p_{j} )} \) if heterozygous, \( X_{ij} = 2/\sqrt {(2 - F) p_{j} *(1 - p_{j} )} \) if individual i is homozygous for the second allele at locus j, where F denotes the inbreeding coefficient of individual i and p j refers to the allele frequency at marker j. The division by \( \sqrt {(2 - F) p_{j} *(1 - p_{j} )} \) standardizes the variance of the marker genotype data to 1 (Habier et al. 2007). The variance of a j is assumed to be σG²/N m . We used the error variance of the BLUEs across locations, i.e., σE² divided by the number of locations (L) (cf. Melchinger et al. 1998). Consequently, penalty parameter λ was defined as (σE²/L)/(σG²/N m ). The estimates of a j were obtained from mixed-model equations (Henderson 1984). Given the estimates of a j and the marker genotypes, genetic values are predicted as, \( PV = \mathop \sum \nolimits_{J = 1}^{{N_{m} }} X_{ij } \hat{a}_{j} \) where X ij is the marker genotype of individual i for marker j coded the same as above, and \( \widehat{a}_{j} \) is the estimated effect of marker j.

Cross validation

We applied fivefold cross validation to evaluate the accuracy of genomic selection in plant breeding trials. Here, the entire data set is randomly split into five subsets. Four subsets were combined and formed the training set for estimating genetic effects. The remaining subset form the validation set. The correlation between observed and predicted phenotypes (r MP) was estimated. The accuracy of genomic selection was expressed as r GS = r MP/h (Lande and Thompson 1990; Dekkers 2007), where h refers to the square root of heritability. The sampling of training and validation set was repeated 5,000 times. We estimated the marker effects and predicted the genomic breeding values in three different scenarios as follows:

  • Scenario 1 Estimation was performed across the six segregating populations.

    1. (a)

      Evaluation of prediction was done across the six populations.

    2. (b)

      Evaluation of prediction was done within each population.

  • Scenario 2 Estimation and prediction was performed within each segregating population.

For scenario 1, estimation of marker effects was based on the estimates of the genotypic variance of the total population. Prediction accuracy was evaluated by standardizing the heritability of the total population (scenario 1a) or with the average heritability of single segregating families (scenario 1b). In contrast, scenario 2 is based on the estimates of the average genotypic variance and heritability within segregating families.

Moreover, we studied the effect of the number of markers and also the number of progenies on the prediction accuracy of genomic breeding values through cross validation. Therefore, we varied the number of markers from 100 to 800 with an interval of 100 markers each and the number of individuals from 12.5 to 100% with the interval of 12.5% of the total population size.

Models to improve prediction accuracy within populations

Along with Model A, two more models were used in this study. Model B differs from Model A by additionally including a population effect. In Model C markers, which in a pre-screen showed significant marker × population interaction effects (P < 0.1), were excluded from the analyses. Further, we compared the results of Model C with a scenario where we randomly selected markers of the same dimension as Model C to correct for effects of using a reduced marker sample size.

Results

For both traits, genotypic variance was significantly (P < 0.01) larger than zero within each of the six populations (Table 1). The ratio of genotypic variance among versus within populations was substantially larger for grain moisture (2.20) compared with grain yield (0.04). Heritability within populations and for the total population was high for grain moisture and moderate for grain yield.

The following results are presented only for Model A if not stated otherwise: The prediction accuracy of genomic breeding values across populations (scenario 1a) was higher for grain moisture (0.90) compared with grain yield (0.58) (Fig. 1). The confidence intervals for prediction accuracy ranged from 0.87 to 0.97 for grain moisture and from 0.45 to 0.69 for grain yield. Re-sampling reduced sets of SNPs, revealed that the accuracy was nearly reaching a plateau at 800 SNPs (Fig. 2). In contrast, re-sampling reduced sets of individuals revealed that the slope of the curve stays high and that the curve does not reach a plateau towards the large population sizes (Fig. 3). This was particularly true for grain yield.

Fig. 1
figure 1

Distribution of the accuracy (r GS) of genomic predictions across populations (scenario 1a) revealed by fivefold cross validation with Model A for grain yield and grain moisture

Fig. 2
figure 2

Effect of the number of markers on the accuracy of genomic selection for grain yield and grain moisture when the number of markers varied from 100 to 800

Fig. 3
figure 3

Effect of the number of individuals on the accuracy of genomic selection when the size of the training population varied from 12.5 to 100% of the total population size shown for grain yield and grain moisture

For grain moisture, comparison of the prediction accuracy of genomic breeding values was higher for scenario 1a (0.90; Fig. 1) than for scenario 1b (0.64; Table 1). Similarly for grain yield, we also observed a higher accuracy for scenario 1a (0.58; Fig. 1) than for scenario 1b (0.54; Table 1). However, the difference in the accuracy of genomic breeding values was small and similar for both traits at scenario 1b and scenario 2 (Table 2). The number of polymorphic markers within populations was not associated with the accuracy for both traits.

Further, we tested two more models for genomic selection with the aim to improve the prediction efficiency within populations. Including a population effect in the estimation model (Model B) led to a negligible improvement in the accuracy to predict genomic breeding values for grain moisture, but not for grain yield (Table 3). Selecting SNPs with non-significant (P < 0.1) SNP × population interaction effects in a pre-screen (Model C) also did not yield an improved accuracy for predicting genomic breeding values. In contrast, values of the accuracy were decreased.

Table 3 Differences in the prediction accuracy (r GS) of genome-wide breeding values between Model A with B and C as investigated by fivefold cross validation for grain yield and grain moisture

Discussion

Simulation studies suggest that genomic selection is promising for a rapid improvement of quantitative traits in plants (for review see Heffner et al. 2009) and better suited compared to marker-assisted recurrent selection (Bernardo and Yu 2007). However, the prospects of genomic selection finally have to be validated with empirical data. For plants, a previous cross validation study of genomic selection with empirical data was based on diverse panels of elite maize and wheat lines (Crossa et al. 2010). These diverse panels of elite lines are the final product after intensive selection but do not reflect the population structure and genetic variance typically present in plant breeding programs, where promising candidates are selected within and among segregating populations (Wegenast et al. 2008). This stimulated us to evaluate the prediction efficiency of genomic breeding values within and across populations using empirical data from a commercial maize breeding program.

Influence of SNPs and population size on the accuracy of genomic selection

Implementing genomic selection with a low-density marker panel is desired to achieve a good cost-benefit ratio (Heffner et al. 2010; Jannink et al. 2010). The number of markers needed for accurate predictions of genotypic values depends on the extent of linkage disequilibrium (LD) between markers and QTL (Meuwissen et al. 2001) and also on the germplasm under consideration (Zhong et al. 2009). For a large panel of European elite maize inbred lines, Van Inghelandt et al. (2011) observed a rapid decay of LD with genetic map distance and proposed a density of up to 1 million markers for effective genome-wide association mapping. In contrast, Lorenzana and Bernardo (2009) suggested <100 markers for bi-parental populations and 200–800 markers for random-mated maize populations for genome-wide prediction of genotypic values. We used a design with a few segregating populations, which leads to an extent of LD between the two above-described scenarios of single bi-parental populations and a diverse panel of inbred lines. In line with this expectation, findings on re-sampling subsets of SNPs revealed that the applied marker density is not a major limiting factor for the accuracy of genomic selection in the present study (Fig. 2). Our finding also corroborates the results of simulation studies and suggests that the required marker density also depends on the statistical method employed (e.g., Habier et al. 2007). We applied RR-BLUP to estimate marker effects, since it performed well especially in situations with a low marker density (Luan et al. 2009). The reason for this good performance can be due to the efficient exploitation of genetic relationships in genomic selection with RR-BLUP (Habier et al. 2007).

Simulation studies showed that the population size is crucial for the prediction accuracy in genomic selection (Habier et al. 2007). In accordance with this expectation, we observed a monotonic increase in the prediction accuracy for grain yield with increasing population size without any substantial decrease in the slope compared with re-sampling markers (Figs. 2, 3). Moreover, the range of prediction accuracy in re-sampling individuals was substantially larger compared with re-sampling markers. Consequently, our result for grain yield clearly underlines the findings of simulation studies in that the size of the training population is of crucial importance in genomic selection. The impact of the population size on the accuracy of genomic selection was less pronounced for grain moisture (Fig. 3), which might be due to presence of high variance among populations that can be efficiently exploited by few individuals per population.

Accuracy of genomic selection evaluated across populations

In breeding program selection is performed within and across populations with the final goal to detect the best performing individuals in the total population. For grain moisture, prediction accuracy of genomic breeding values across populations was of remarkable magnitude amounting to 0.90 (Fig. 1). This clearly underlines the huge potential to estimate genomic breeding values in elite maize germplasm across populations. In contrast, for grain yield, the accuracy of genomic breeding values was medium with r GS = 0.58. These findings are in accordance with results reported by Crossa et al. (2010) who analyzed the prospects of genomic selection for maize grain yield under drought-stress conditions. The large differences in the prediction accuracy of genomic breeding values of grain yield compared with grain moisture can be explained by the high magnitude of genotypic variance among populations observed for grain moisture (Table 1), which can be efficiently exploited in genomic selection. Alternatively, the differences may also reflect different complexities of the underlying genetic architecture of both traits as supported by substantial variation in prediction efficiency within populations (Table 2).

Accuracy of genomic selection evaluated within populations and potential overestimation of the prospects of genomic selection

Genetic variation among populations can be efficiently exploited in plant breeding through parental selection, which does not require genomic selection. As the genetic variance among populations was substantial for grain moisture in our study (Table 1), we compared the prediction accuracy of genomic breeding values within (scenario 2) versus across populations (scenario 1a). For grain moisture, the prediction accuracy of genomic breeding values was substantially lower for within compared to across populations (Fig. 1; Table 2), which clearly indicates that a high magnitude of genetic variance among populations in the training set may lead to an overestimation of the prospects of genomic selection in plant breeding.

Interestingly, we observed a comparable accuracy of genomic selection for across-within (scenario 1b) compared with within–within populations (scenario 2) despite a six times larger number of individuals in the training population (Table 2). This result is in contrast to the findings of a simulation study of de Roos et al. (2009) who suggested estimating SNP effects across and not within populations. The low prediction accuracy of genomic selection for across-within populations can be due to a high proportion of SNPs with significant (P < 0.05) SNP × population interaction effects (e.g., for grain moisture 46%). Substantial SNP × population interaction effects have also been reported recently in an elite maize breeding germplasm (Liu et al. 2011) and can also be caused by epistasis (Blanc et al. 2006), multiple alleles (Calus et al. 2008) and by the fact that the associations between SNPs and QTL might not be conserved between the different populations.

Enhancing accuracy of genomic selection within populations

For joint linkage association mapping, inclusion of a population effect has been proposed to obtain unbiased estimates of SNP effects (Reif et al. 2010; Liu et al. 2011). Modeling general population effects in the prediction of marker effects, as done in Model B, yielded no substantial improvement in predicting genomic breeding values (Table 3). Alternatively, in Model C we excluded the markers with significant SNP × population interaction effects to improve genomic selection. Nevertheless, the prediction accuracy was decreased in Model C compared with Model A. This reduction in the accuracy for both traits can be explained by the elimination of SNPs with significant contribution to the genetic variance among but also within populations. Summarizing, none of the tested alternative methods yielded a significant improvement in predicting genomic breeding values within populations and thus Model A appears to be a good choice for the routine implementation of genomic selection in plant breeding programs.

Most of the models applied in the context of genomic selection focus exclusively on main effects. Extending the existing models towards epistasis possesses the potential to further improve the prediction efficiency within populations. Necessary information to accommodate epistasis in genomic selection is good prior knowledge on the relative importance of variance due to main versus epistatic effects. This information is still very limited since many designs to unravel the role of epistasis are hampered by the fact that the estimated main effects also contain epistasis (cf. Melchinger et al. 2007). Moreover, population sizes required to obtain robust estimates of SNP effects are much higher for epistatic effects than for main effects (Carlborg and Haley 2004).

Prospects of genomic selection in maize breeding

The response to one cycle of genomic selection is equal to one cycle of phenotypic selection when the prediction accuracy of genomic breeding values is equal to h (Lande and Thompson 1990; Dekkers 2007). In the present study we observed an accuracy of 0.58 for grain yield (Fig. 1). Considering the estimates of variance components for grain yield (Table 1) suggests that this precision corresponds to unreplicated field trials at 3–4 locations. Costs for genotyping per line are currently equivalent to 3–4 plots and, thus, genomic selection seems to hold great promise for maize breeding programs. It is important to note, however, that our cross validation study is based only on data of one cycle of selection. Therefore, the observed prediction accuracies should be considered as an upper level for situations of genomic breeding value prediction with the underlying population size of around 1,000 individuals.

Selection gain per unit time is crucial to compare the potential of genomic versus phenotypic selection. For maize, up to three cycles of genomic selection per year are possible (Lorenzana and Bernardo 2009). Therefore, genomic selection would be more efficient in terms of genetic gain per year compared with phenotypic selection even if the prediction accuracy decreases due to recombination and fixation of alleles.