Introduction

Genomic prediction (GP; see Supplementary Table 1 for a list of abbreviations) was pioneered by Meuwissen et al. (2001) for estimating breeding values in cattle breeding. It has proven to be a powerful tool for improving the efficiency of selection in animal and plant breeding (García-Ruiz et al. 2016; Crossa et al. 2017). GP consists of training a statistical model by using phenotypic and genomic information from genotypes of a so-called training set (TS) to predict the breeding value of candidates with only genomic information, the prediction set (PS). Prediction accuracy (\(\rho\)), defined as the correlation between the genomic estimated breeding values (GEBVs) and true breeding values (TBVs), represents a key parameter when comparing the expected response from phenotypic and genomic selection (Isidro et al. 2015).

According to theoretical results, \(\rho\) depends on the trait heritability (\(h^{2}\)), the TS size (\(N_{\text{TS}}\)), and the effective number of chromosome segments (Daetwyler et al. 2008; 2010). Additionally, information from close relatives in the TS has a strong positive influence on the accuracy of GEBVs (Habier et al. 2007; Clark et al. 2012). In animal breeding of major breeds such as Holstein–Friesian cattle (Olson et al. 2012), TS sizes are usually large (\(N_{\text{TS}}\) > 1000), leading to high \(\rho\) and a substantial increase in the selection gain and efficiency of breeding programs employing GEBVs (Meuwissen et al. 2016; Hickey et al. 2017).

In plant breeding for line and hybrid cultivars, TS sizes are generally small because production and phenotyping of the candidates are usually laborious and expensive (Riedelsheimer and Melchinger 2013; Akdemir and Isidro-Sánchez 2019). Moreover, different from the large panmictic populations handled in animal breeding that include thousands of genotypes in the TS, the germplasm is composed of numerous biparental families (BPFs) with small size, because recycling the best lines from the previous cycle(s) represents the predominant method for development of new lines (Mikel and Dudley 2006). Despite these less favorable conditions, GP promises a quantum leap in selection gain due to higher selection intensity and reduced breeding cycle length, provided \(\rho\) is sufficiently large compared with \(h^{2}\) (Bernardo and Yu 2007; Heffner et al. 2009).

With multiple BPFs, there are several options for composing the TS to be used in GP for selecting genotypes within each BPF. In the simplest case, the genotypes in the TS and PS originate from the same BPF, i.e., they are full-sibs. This is expected to yield highest \(\rho\) for a given family size (Lehermeier et al. 2014; Crossa et al. 2017), because Mendelian sampling can be fully accounted for through co-segregation between quantitative trait loci (QTL) and genome-wide markers (Habier et al. 2007; Schopp et al. 2017b). Numerous experimental studies in various crops (e.g., Riedelsheimer et al. 2013; Lehermeier et al. 2014; Liu et al. 2015; Würschum et al. 2017) and simulations (Schopp et al. 2017b) found high \(\rho\) for GP within BPFs, albeit with considerable variation among BPFs. However, even in this most favorable case, the size of the TS must still be sufficiently large (\(N_{\text{TS}}\) ≥ 50) to reach an adequate \(\rho\) (Marulanda et al. 2015; Schopp et al. 2017b), which poses severe restrictions for GP of multiple BPFs under a limited budget.

A second option is to include information from a different BPF for composing the TS, adopting the idea from animal breeding to borrow information from major breeds for GP of minor breeds (Iheshiulor et al. 2016). For a given family \(F\) targeted for GP, candidates of which represent the PS, the TS may comprise related or unrelated BPFs having one or no parent in common with \(F\). As expected under weaker pedigree relationships (Schopp et al. 2017b), experimental studies demonstrated a substantial decrease in \(\rho\) from full-sib to half-sib and unrelated families for given \(N_{\text{TS}}\) including negative estimates in the latter case (Riedelsheimer et al. 2013). Thus, \(N_{\text{TS}}\) must be large when constructing the TS with a half-sib or an unrelated family, but \(\rho\) increases at a much smaller rate beyond \(N_{\text{TS}}\) > 100 and approaches much lower asymptotic values than for full-sib families (Lehermeier et al. 2014). Moreover, simulations revealed a large variation in \(\rho\) among BPFs used as TS, which increased from full-sib to half-sib and unrelated families (Schopp et al. 2017b).

A third alternative is merging several BPFs in a combined TS in order to increase \(N_{\text{TS}}\) beyond what is possible if only a single BPF is used. In an extensive study with testcrosses of maize, Lehermeier et al. (2014) obtained for several traits the same \(\rho\) when the TS was composed of 10 half-sib families instead of the full-sib family, because \(N_{\text{TS}}\) was fivefold increased. However, Riedelsheimer et al. (2013) reported negative \(\rho\) values between pairs of unrelated BPFs and Schopp et al. (2017b) found in simulations a large variation in \(\rho\) when individual half-sib or unrelated families were used as TS. This raises the question whether uncritical pooling of information from all available BPFs into a joint TS might lead to lower \(\rho\) than using a selected subset of all available families.

To avoid possible negative effects of incorporating in the TS any BPF having poor \(\rho\) with the prediction set family \(F\), identifying of such families by simple criteria would be beneficial. Schopp et al. (2017b) found that linkage phase similarity (LPS) has a relatively small effect on \(\rho\) compared to the proportion of shared segregating markers between TS and PS (\(\theta ).\) Further criteria that come into consideration are the genomic relationship between the genotypes in the TS and PS or forecasts of \(\rho\) described by Wientjes et al. (2015). The most promising BPFs identified by these criteria could be merged in the TS for approaching the optimum TS composition. Whether any of these criteria are useful for reaching this goal has not yet been investigated to the best of our knowledge.

Our main objective was to find for a given PS family \(F\) and a given set \(A\) of all families, including half-sib and unrelated BPFs as well as full-sibs of \(F,\) the best possible subset \(O^{F}\) of \(A\) that yields highest \(\rho\). In particular, we investigated four questions: (1) Whether the BPFs are ranked in descending order according to their \(\rho\) with \(F\), and how does sequentially adding the BPFs, from highest to lowest rank, to the TS affect \(\rho\)? (2) How big is the difference in \(\rho\) for set \(O^{F}\) and set \(A\)? (3) Since true values of \(\rho\) are needed for finding set \(O^{F}\), which is not available in practice, can we find a subset \(E^{F}\) approaching \(\rho\) for set \(O^{F }\) based on empirical criteria between \(F\) and each BPF in \(A\), such as estimates of \(\rho\), the average simple matching coefficient (SM), the linkage phase similarity or the forecasted prediction accuracy \(\rho^{W}\)? (4) What are the implications of our findings regarding the design of breeding programs applying GP? To answer these questions, we (a) used sets of BPFs simulated from real genomic data of elite germplasm as well as genetic resources and (b) re-analyzed data sets of two experiments from the literature.

Materials and methods

Genotypic data

Genotypic data obtained with the Illumina MaizeSNP50 BeadChip for two populations of maize were taken from two publications (Melchinger et al. 2017; Schrag et al. 2018) and served as starting point for our simulations, following the study of Schopp et al. (2017b). As detailed by these authors, the ancestral population Elite consisted of 149 elite flint lines from the maize breeding program of the University of Hohenheim and displayed mainly long-range linkage disequilibrium (LD). The ancestral population Landrace consisted of 59 doubled haploid (DH) lines derived from the German maize landrace “Gelber Badischer” and displayed mainly short-range LD. The original set of 29,729 polymorphic SNPs was reduced to 10,769 SNPs after removing loci with a minor allele frequency less than 0.05 in either germplasm.

Simulation of families and traits

In each simulation run, a set of BPFs was simulated by selecting at random a number of parents (\(n_{P}\)) that were drawn from the respective ancestral population (either Elite or Landrace) and crossed in silico in a half-diallel mating design, yielding \(n_{P} \left( {n_{P} - 1} \right)/2\) crosses (Supplementary Fig. 1). One of these crosses was randomly chosen as PS family \(F\). From the \(2\left( {n_{P} - 2} \right)\) crosses sharing one common parent with \(F\), set \(H\) composed of \(n_{H}\) half-sib families was developed. Likewise, from the remaining crosses having no parent in common with \(F\), set \(U\) composed of \(n_{U}\) unrelated families was developed. From each cross, \(n_{g}\) DH lines were generated in silico using the R-package Meiosis (Müller and Broman 2017), which simulated meioses in the F1 generation and subsequently duplicated the genome of the resulting gametes. For family \(F\), we had two disjoint subfamilies of DH lines: (1) subfamily \(F^{\text{TS }}\) comprising \(n_{g}\) DH lines that were included in the TS either alone or in combination with other families from \(H\) and/or \(U\); (2) subfamily \(F^{\text{PS}}\) comprising \(n_{g}^{\text{PS}}\) = 1000 DH lines that served as independent PS for precise estimation of \(\rho\) within F. Set \(A = F^{\text{TS}} \cup H \cup U\) with \(n_{A} = 1 + n_{H} + n_{U}\) families, each having \(n_{g}\) genotyped and phenotyped DH lines, represented the maximum TS available for GP of set \(F^{\text{PS}}\) and its composition is specified for each scenario described below.

Next, we randomly sampled in each simulation run two disjoint sets of \(n_{\text{SNP}}\) and \(n_{\text{QTL}}\) loci from the 10,769 SNPs, which served as markers and QTL, respectively. For each QTL \(j\), an allele substitution effect \(\alpha_{j}\) was simulated as \(\alpha_{j} = \left( {2\beta - 1} \right)\gamma\). Here, \(\gamma\) was sampled from a Gamma distribution \(\Gamma \left( {1.66, 0.4} \right)\) (Meuwissen et al. 2001) and \(\beta\) was drawn from a Bernoulli distribution with probability 0.5. We assumed an additive-genetic model underlying the allele substitution effects, which is expected to hold true in the absence of epistasis for per se performance of pure-breeding lines as well as for testcross performance of lines in hybrid breeding, as described by Melchinger et al. (1998).

The true breeding value (TBV) for each DH line was computed by summation of the allele substitution effects across the whole genome: \(\mathop \sum \limits_{j} q_{j} \alpha_{j}\), where \(q_{j} \varepsilon \left\{ {0, 2} \right\}\) is the genotypic score at QTL \(j\) observed for the respective DH line and \(\alpha_{j}\) the allele substitution effect. Phenotypes were simulated by adding a noise variable \(e \sim N\left( {0,\sigma_{e}^{2} } \right)\) to the TBV. For achieving a desired heritability (\(h^{2} )\) across all BPFs, the variance \(\sigma_{e}^{2}\) was calculated as numerical solution of the following equation:

$$h^{2} = \frac{1}{{n_{A} }} \mathop \sum \limits_{v \in A} \left\{ {h_{v}^{2} } \right\} = \frac{1}{{n_{A} }}\mathop \sum \limits_{v \in A} \left\{ {\frac{{\sigma_{{g_{v} }}^{2} }}{{\sigma_{{g_{v} }}^{2} + \sigma_{e}^{2} }}} \right\},$$
(1)

where \(\sigma_{{g_{v} }}^{2}\) and \(h_{v}^{2}\) refer to the variance of TBVs and heritability in family \(v\), respectively.

This procedure was exercised in each of 200 simulation runs, from which we calculated the mean and standard deviation (SD) of \(\rho\). The parameter values underlying our simulations are described in Supplementary Table 2. Default values were ancestral population Elite, \(n_{P}\) = 8, \(n_{A}\) = 28, \(n_{H}\) = 12, \(n_{U}\) = 15, \(n_{g}\) = 50, \(n_{\text{SNP}}\) = 5000, \(n_{\text{QTL}}\) = 1000 and \(h^{2}\) = 0.6, which were subsequently modified depending on the scenarios described below.

Genomic prediction

We used the standard genomic best linear unbiased prediction model (Habier et al. 2007; VanRaden 2008):

$$\varvec{y}_{{\varvec{TS}}} = {\varvec{\upmu}} + \varvec{Z\alpha } +\varvec{\varepsilon}$$
(2)

where \(\varvec{y}_{{\varvec{TS}}}\) is the vector of phenotypes of all DH lines in the TS, \({\varvec{\upmu}}\) is the general intercept, \(\varvec{\alpha}\) is the vector of breeding values, \(\varvec{Z}\) is an incidence matrix associating phenotypes with breeding values and \(\varvec{\varepsilon}\) is a vector of residuals. Standard assumptions were \(\varvec{a} \sim N\left( {0,{\mathbf{G}}\sigma_{\alpha }^{2} } \right)\) and \(\varvec{\varepsilon } \sim N\left( {0,{\mathbf{I}}\sigma_{\varepsilon }^{2} } \right)\), where \(\sigma_{\alpha }^{2}\) is the additive-genetic variance and \(\sigma_{\varepsilon }^{2}\) the residual error variance.

The genomic relationship matrix \({\mathbf{G}}\) based on the \(n_{\text{SNP}}\) SNP markers was calculated by extending the approach of Schopp et al. (2017b) to multiple BPFs. Let \(x_{vi,l} \in \left\{ {0,2} \right\}\) be the genotypic score for the genotype at locus \(l\) of DH line \(i\) from family \(v\). The genomic relationship \(G_{i,j}\) between two DH lines \(i\) and \(j\) from families \(v\) and \(w\), respectively, was then calculated as:

$$G_{i,j} = \frac{{\mathop \sum \nolimits_{l} \left( {x_{vi,l} - p_{v,l} } \right)\left( {x_{wj,l} - p_{w,l} } \right)}}{{\sqrt {2\mathop \sum \nolimits_{l} p_{v,l} \left( {1 - p_{v,l} } \right)} \sqrt {2\mathop \sum \nolimits_{l} p_{w,l} \left( {1 - p_{w,l} } \right)} }}$$
(3)

where \(p_{v,l} ,p_{w,l} \in \left\{ {0, 0.5, 1} \right\}\) refer to the allele frequencies at locus \(l\) in family \(v\) and \(w,\) respectively. For \(v = w\), the formula simplifies to Method 1 of VanRaden (2008). Estimates of \(\sigma_{\alpha }^{2}\) and \(\sigma_{\varepsilon }^{2}\) in the TS as well as GEBVs of the PS were calculated using the R-package rrBLUP (Endelman 2011) using the mixed.solve function. We computed for each PS family \(F\) and TS composed of families in set \(T\) the Pearson correlation between the TBV and GEBV for the \(n_{g}^{\text{PS}}\) = 1000 DH lines from \(F^{\text{PS}}\) and regarded these values as true prediction accuracy \(\rho \left( { T,F^{\text{PS}} } \right)\) for GP of \(F^{\text{PS}}\) using TS \(T\), given the large sample size of the PS.

Potential for optimizing the training set composition

Investigating the potential for optimizing the TS composition involved three steps. First, we calculated \(\rho \left( {A,F^{\text{PS}} } \right)\) as benchmark, because set \(A\) is often uncritically used as TS. Second, we searched among all possible subsets \(T \subset A\) for the optimum TS composition, denoted as set \(O^{F}\), which maximizes \(\rho \left( { T,F^{\text{PS}} } \right)\). In principle, this task can be solved by complete enumeration, but since computation time grows exponentially with \(n_{A}\), this approach becomes excessive for large \(n_{A}\). To approach \(O^{F }\) in such a situation, we resorted to an algorithm for numerical optimization called binary particle swarm optimization (BPSO). We implemented a modification of this algorithm described by Khanesar et al. (2007) in a publicly available R-package BPSO (Müller 2019). For \(n_{A}\) = 10, we compared \(\rho (O^{F} ,F^{\text{PS}}\)) for \(O^{F }\) identified with this package and complete enumeration and found close agreement between both methods (data not shown).

Empirical optimization of the training set composition

Since the TBVs were known in our simulation study, we were able to calculate \(\rho\) for \(F^{\text{PS}}\). In real data sets, however, TBVs are unknown and only phenotypic data of the lines in \(F^{\text{TS}}\) are available. As an empirical method for approaching \(O^{F}\) by a set \(E^{F}\), we used the phenotypic values of DH lines available from \(F^{\text{TS}}\) to obtain a subset \(T \subset A\) yielding the highest estimate of \(\rho\). The procedure for identifying \(E^{F }\) was almost identical to the one described above for identifying \(O^{F}\) with two exceptions. First, we temporarily restricted our search to all possible subsets of \(T \subset H \cup U\), using either complete enumeration or the BPSO algorithm. At this stage, subfamily \(F^{\text{TS}}\) was excluded from the search in \(A\), because including full-sibs in the TS raises \(\rho\) to a high level and, thus, would reduce the differences in \(\rho\)(\(T\), \(F^{\text{TS}}\)) between alternative TS compositions \(T \subset A\). Second, we used phenotypic values of the lines in \(F^{\text{TS}}\) as PS. Strictly, this allows only estimation of the predictive ability \(r_{a}\) (i.e., the correlation of phenotypic values with the GEBVs in the DH lines of \(F^{\text{TS}}\)), which, divided by \(\sqrt {h^{2} }\) of \(F^{\text{TS}}\), yields an estimate of the prediction accuracy (Dekkers 2007). However, this step is unnecessary because it does not alter the ranking of different TS compositions. Set \(E^{F}\) was obtained by combining \(F^{\text{TS}}\) with the best TS composition identified in the previous step, because full-sibs contribute most information to GP (Schopp et al. 2017b). Finally, we computed \(\rho\) (\(E^{F} , F^{\text{PS}} )\).

Training set composition scenarios

We analyzed the influence of different TS compositions on \(\rho\) in three scenarios. In Scenario 1 detailed in Supplementary Fig. 2, we investigated the possibility that predictively poor families can reduce \(\rho (T, F^{\text{PS}}\)) when they are included in a combined TS \(T \subset A\). We simulated a half-diallel with \(n_{P} = 12\) parents and selected at random one of the BPFs produced from these crosses as family \(F\), kept the \(n_{H}\) = 20 half-sib families of \(F\) as set \(H\) and sampled from the remaining families \(n_{U}\) = 20 for set \(U\), but used default values for all other factors. We ordered the families \(H_{i}\) within set \(H = \{ H_{i} |i = 1, \ldots 20\}\) according to decreasing values of \(\rho\)(\(H_{i}\), \(F^{\text{PS}} ).\) Likewise, we ordered the families \(U_{i}\) within set \(U = \{ U_{i} |i = 1, \ldots 20\}\) according to decreasing values of \(\rho\)(\(U_{i}\), \(F^{PS}\)). For both \(H\) and \(U\), starting with family \(i = 1\) we built combined TS of increasing size \(N_{\text{TS}}\) by sequentially including individual families in the order of their rank \(i\), denoted as \(\cup H_{i}\) and \(\cup U_{i}\). The same procedure was applied using other criteria for ranking the BPFs: (1) the genome-wide linkage phase similarity (LPS) calculated after de Roos et al. (2008), (2) the forecasted pairwise prediction accuracy \(\rho^{W}\) obtained from deterministic equations described by Wientjes et al. (2015), (3) the average simple matching coefficient (SM) (Sneath and Sokal 1973) between the parents of BPF \(H_{i}\) or \(U_{i}\) with family \(F\), and (4) the proportion of polymorphic markers \(\theta\) shared between \(F\) and \(H_{i}\) or \(U_{i}\) (Schopp et al. 2017b).

In Scenario 2 detailed in Supplementary Fig. 3, we investigated different TS compositions. We compared \(\rho (T,\)\(F^{\text{PS}} )\) for TS compositions \(T = F^{\text{TS}}\), \(H\), \(U\), \(F^{\text{TS}} \cup H\), \(F^{\text{TS}} \cup U\), \(H \cup U\), \(A\), \(O^{F }\) and \(E^{F}\). Besides the default values, we also investigated under ceteris paribus conditions how changes in \(h^{2}\), \(n_{\text{SNP}}\), \(n_{\text{QTL}}\), \(n_{H}\) or \(n_{U}\) affect \(\rho\) (Supplementary Table 2).

In Scenario 3 detailed in Supplementary Fig. 4, we investigated how different choices of \(n_{H}\) influence \(\rho\) for set \(F^{\text{TS}} \cup H\), because preliminary results showed that this TS composition yields high and robust estimates of \(\rho\). We used \(n_{H}\) = 0, 2, 4, 12, 20, and varied \(n_{g}\) = 25, 50, 100 as well as \(h^{2}\) = 0.3, 0.6, 0.9. For all other factors, we used default values (Supplementary Table 2).

Experimental data

We used two data sets of multiple BPFs of maize available from the literature. The first was the set of \(n_{H}\) = 10 half-sib families from Lehermeier et al. (2014), in which 10 dent founder lines (B73, D06, D09, EC169, F252, F618, Mo17, UH250, UH304, and W117) were crossed to the central dent line F353. Population size per family ranged between 53 and 104 DH lines, which were evaluated as testcrosses using a tester from the opposite heterotic pool. We chose three traits to cover different genetic architecture and \(h^{2}\): dry matter yield, plant height, and female flowering. A total of 34,116 polymorphic SNPs were available across all BPFs for GP.

The second data set was composed of 10 BPFs from tropical maize germplasm published by Zhang et al. (2015) with different pedigree relationships, including both half-sib and unrelated families. Population sizes ranged between 172 and 191 F2:3 lines, which were evaluated tropical environments as testcrosses using a tester from the opposite heterotic pool. We used three traits: grain yield, male flowering and plant height recorded under well-watered conditions. The lines for each BPF were genotyped using genotyping by sequencing. After quality check, 3866 SNPs were available across all families.

In each data set, we performed a cross-validation scheme, where \(n_{g}\) = 50 genotypes were sampled from each BPF to construct \(F^{\text{TS }}\), \(H\) and \(U\). The remaining genotypes from the PS family not sampled for \(F^{\text{TS }}\) were used as prediction set \(F^{\text{PS}}\). Prediction accuracy \(\rho\) was calculated as the predictive ability \(r_{a}\) divided by \(\sqrt {h^{2} }\) of the trait estimated from all lines in \(F\) (Dekkers 2007).

Referring to Scenario 1, we investigated also with experimental data if predictively poor families can reduce the \(\rho\) for \(F^{\text{PS}}\) when they are sequentially included in a combined TS. Using each family once as PS, we computed \(\rho\) of \(F^{\text{PS}}\) with every other BPF and ranked these estimates accordingly. As described above, we built combined TS of increasing size \(N_{\text{TS}}\) by sequentially including individual BPFs in the order of their rank. This procedure was applied to the half-sib panel from Lehermeier et al. (2014) as well as the group of unrelated families from Zhang et al. (2015). In the latter study, using the numbering of BPFs employed by these authors, only BPFs F9 and F10 served as PS family, because they had no parent in common with all other BPFs. Referring to Scenario 2, we investigated the optimization of the TS composition for BPFs F1 to F8 from Zhang et al. (2015). Using each BPF once as PS, we analyzed \(\rho\) using different TS compositions. For both scenarios, we calculated the mean and SD of \(\rho\) across 50 repetitions of the cross-validation scheme and every family treated once as PS.

Data availability statement

Simulations and data analysis were carried out within the R environment (R core 2019). All functions used in our simulations, including the data of both ancestral populations, can be found within the R-package “HOT”, available at: https://gitlab.com/HOT.

Results

Genomic prediction with pairs and sequential union of families in the training set (Scenario 1)

All subsequent simulation results refer to means over the 200 simulations runs. Using full-sib family \(F^{TS}\) as TS yielded \(\rho \left( {F^{\text{TS}} ,F^{\text{PS}} } \right)\) = 0.71, whereas the pairwise prediction accuracies \(\rho \left( {H_{i} ,F^{\text{PS}} } \right)\) for the half-sib families \(H_{i} \in H\) and \(\rho \left( {U_{i} ,F^{\text{PS}} } \right)\) for the unrelated families \(U_{i} \in U\) averaged 0.37 and 0.20, respectively (Fig. 1). Estimates of \(\rho \left( {H_{i} ,F^{\text{PS}} } \right)\) varied from 0.68 to − 0.06 and those of \(\rho \left( {U_{i} ,F^{PS} } \right)\) from 0.54 to − 0.19. Combining the two half-sib families (i = 1, 2) with the highest pairwise \(\rho\) in the TS with \(N_{\text{TS}}\) = 100 yielded \(\rho \left( { \cup H_{i} ,F^{\text{PS}} } \right)\) = 0.72. Adding further half-sib families with rank i = 3 to 20 sequentially to the TS yielded a steady increase in \(\rho \left( {H_{i} ,F^{\text{PS}} } \right)\) up to 0.84, after which the beneficial effect on \(\rho\) flattened off.

Fig. 1
figure 1

Prediction accuracy (\(\rho\) ± SD) for genomic prediction of family \(F^{\text{PS}}\) with single half-sib families (\(H_{i} \in H\)) and unrelated families (\(U_{i} \in U\)) shown in red and composite training sets (TS) shown in blue. Families were ranked in descending order according to the magnitude of \(\rho\) with \(F^{\text{PS}}\), with ranks i shown on the x-axis, and sequentially added to the composite TS. Black horizontal dashed line: mean prediction accuracy obtained for the full-sib family \(F^{\text{TS}}\); red horizontal dashed line: mean pairwise prediction accuracies for all \(H_{i} \in H\) and \(U_{i} \in U\). Results were averaged across 200 simulation runs using default values: ancestral population Elite, \(n_{g}\) = 50, \(n_{\text{SNP}}\) = 5000, \(n_{\text{QTL}}\) = 1000, \(h^{2}\) = 0.6

Combining unrelated families from set \(U\) in the order of their rank resulted in a concave curve (Fig. 1). It reached a maximum of \(\rho \left( { \cup U_{i} ,F^{\text{PS}} } \right)\) = 0.59 with eight families in the TS, but decreased afterward to 0.53, even slightly below \(\rho \left( {U_{1} ,F^{\text{PS}} } \right)\) = 0.54 for the best unrelated family. While SD of \(\rho\) for the combined TS decreased as the number of sequentially added families from set \(H\) increased, SD increased when merging families with lower rank from set \(U\).

For the panel of 10 half-sib families from Lehermeier et al. (2014), \(\rho\) for \(F^{\text{TS}}\) averaged 0.52 for dry matter yield, 0.65 for plant height and 0.67 for female flowering, whereas \(\rho \left( {H_{i} ,F^{\text{PS}} } \right)\) averaged 0.19 for dry matter yield, 0.33 for plant height and 0.38 for female flowering (Fig. 2). Estimates of \(\rho \left( {H_{i} ,F^{\text{PS}} } \right)\) ranged from 0.62 to − 0.34 for dry matter yield, 0.65 to − 0.10 for plant height, and 0.72 to − 0.05 for female flowering for rank i = 1 to 9. Sequentially adding families to the TS improved \(\rho \left( { \cup H_{i} ,F^{\text{PS}} } \right)\) for dry matter yield (0.68), plant height (0.69) and female flowering (0.75) up to rank 3, yielding values higher than predictions with \(F^{\text{TS}}\). However, including further families from rank 4 to 9 to the TS resulted in \(\rho \left( { \cup H_{i} ,F^{\text{PS}} } \right)\) lower than for \(\rho \left( {H_{1} ,F^{\text{PS}} } \right)\). In contrast to the simulations, the SD for \(\rho\) of the combined TS became larger with an increasing number of half-sib families.

Fig. 2
figure 2

Prediction accuracy (\(\rho\) ± SD) for genomic prediction of family \(F^{\text{PS}}\) with single half-sib families (\(H_{i} \in H\)) and unrelated families (\(U_{i} \in U\)) shown in red and composite training sets (TS) shown in blue for experimental data taken from Lehermeier et al. (2014) and Zhang et al. (2015) for testcrosses of maize lines. Families were ranked in descending order according to the magnitude of \(\rho\) with \(F^{\text{PS}}\), with ranks i shown on the x-axis, and sequentially added to the composite TS. Results were averaged over 10 families for Lehermeier et al. (2014) and across two families (populations F9 and F10) unrelated with all other populations for Zhang et al. (2015). The black horizontal dashed line and the red horizontal dashed line have the same meaning as in Fig. 1

For the set of 10 unrelated families taken from Zhang et al. (2015), \(\rho\) for \(F^{\text{TS}}\) as TS averaged 0.26 for grain yield, 0.57 for plant height and 0.51 for male flowering (Fig. 2). Estimates of \(\rho \left( {U_{i} ,F^{\text{PS}} } \right)\) varied from 0.29 to − 0.27 for grain yield, from 0.20 to − 0.27 for plant height, and from 0.23 to − 0.28 for male flowering for rank i = 1 and 9, respectively, with a mean close to zero for all traits. In contrast to the simulations, \(\rho \left( { \cup U_{i} ,F^{\text{PS}} } \right)\) immediately decreased after including the second ranked family. Only for male flowering did \(\rho \left( { \cup U_{i} ,F^{\text{PS}} } \right)\) increase initially when combining the families with rank i = 1, 2. However, sequentially adding further families into the TS up to rank i = 4 hardly altered the prediction accuracy for grain yield and plant height. Adding further families had actually a negative effect so that when all nine unrelated families were combined, \(\rho \left( { \cup U_{i} ,F^{\text{PS}} } \right)\) was either zero for grain yield and male flowering or even negative for plant height. For all traits, the SD of the combined TS increased with a larger number of families.

Genomic prediction with different compositions of the training set (Scenario 2)

Simulations with default values averaged \(\rho\) = 0.71, 0.81 and 0.50 for \(F^{\text{TS}}\), \(H\) and \(U\), respectively (Fig. 3). Using all families in sets \(F^{\text{TS}} \cup H\) or \(H \cup U\) resulted in a higher \(\rho\) of ~ 0.83, whereas \(F^{\text{TS}} \cup U\) yielded a lower \(\rho\) value than \(F^{TS}\) alone. Combining all available families \(A = F^{\text{TS}} \cup H \cup U\) yielded \(\rho\) = 0.84. In comparison with \(\rho \left( {A,F^{\text{PS}} } \right)\), \(\rho\) was 0.05 higher for \(O^{F }\) and 0.05 lower for \(E^{F}\). SD was highest for set \(U\) and \(F^{\text{TS}} \cup U\) and lowest for set \(O^{F }\), with all other TS compositions showing intermediate SD values.

Fig. 3
figure 3

Prediction accuracy (\(\rho\) + SD) in the prediction set family \(F^{\text{PS}}\) by genomic prediction with training sets (TS) composed of the full-sib family (\(F^{\text{TS}}\)), half-sib families (\(H_{i} \in H\)) and unrelated families (\(U_{i} \in U\)) or complete and partial combinations of them as well the optimum TS composition (\(O^{F}\)) and empirically optimal TS composition (\(E^{F}\)). Results were averaged across 200 simulations using default values (ancestral population Elite, \(n_{P}\) = 8, \(n_{A}\) = 28, \(n_{H}\) = 12, \(n_{U}\) = 15, \(n_{g}\) = 50, \(n_{\text{SNP}}\) = 5000, \(n_{\text{QTL}}\) = 1000 and \(h^{2}\) = 0.6) or modifications in one of these parameters mentioned above each graph

For the set of BPFs from Zhang et al. (2015), the mean of \(\rho\) across all traits was 0.42, 0.34 and 0.07 when the TS comprised \(F^{\text{TS}}\) or all families in \(H\) or \(U\), respectively (Table 1). Constructing the TS with \(F^{\text{TS}} \cup H\) yielded highest \(\rho\) for plant height (0.52) and male flowering (0.46) and the second highest for grain yield (0.49). Combinations \(F^{\text{TS}} \cup U\) and \(H \cup U\) both had \(\rho\) lower than \(F^{\text{TS}}\) and \(H\) alone for plant height and male flowering, whereas for grain yield the values were higher. Combining all families (set \(A\)) yielded the highest \(\rho\) for grain yield (0.51) and third highest for plant height (0.46) and male flowering (0.41). Set \(E^{F}\) had higher \(\rho\) than \(A\) for plant height and male flowering, but for grain yield, the reverse was true.

Table 1 Prediction accuracy (\(\rho\)) in the prediction set (PS) family \(F^{\text{PS}}\) for genomic prediction from composite training sets (TS) using different families with \(n_{g }\) = 50 as well as different number of half-sib and unrelated families (\(n_{H}\) and \(n_{U}\), respectively)

Factors influencing the prediction accuracy for different compositions of the training set

Decreasing the number of half-sib families from default \(n_{H}\) = 12 to \(3\) and increasing the number of unrelated families from default \(n_{U}\) = 15 to 24 reduced \(\rho\) for all TS compositions including \(H\) by at least 19%, with the largest reduction from 0.81 to 0.50 for set \(H\) (Fig. 3). However, it hardly changed \(\rho\) for set \(U\) and combinations with \(U\), whereas the difference between set \(A\) and \(O^{F}\) increased from 0.05 to 0.08. Increasing \(h^{2}\), \(n_{\text{QTL}}\) or \(n_{\text{SNP}}\) improved \(\rho\) uniformly for all TS compositions but hardly changed their relative differences. One exception was \(F^{\text{TS}} \cup H\), which was after \(O^{F}\) the best performing TS composition under low \(n_{\text{SNP}}\). Another exception was \(E^{F }\), which outperformed all other TS compositions except \(O^{F}\) for low \(n_{\text{QTL}}\) and showed similar \(\rho\) as \(A\) for high \(h^{2}\). SD of \(\rho\) substantially increased with lower \(n_{\text{QTL}}\) but increased only slightly with higher \(n_{U}\) (and smaller \(n_{H}\) at the same time), lower \(h^{2}\) and smaller \(n_{\text{SNP}}\). When sampling the parents from ancestral population Landrace, the observed \(\rho\) were slightly lower than for ancestral population Elite, with the strongest reductions observed for \(U\) and combinations with \(U\) (Supplementary Fig. 5), but the relative differences in \(\rho\) were hardly affected.

When \(n_{H}\) increased from 0 to 20, \(\rho \left( {F^{\text{TS}} \cup H,F^{\text{PS}} } \right)\) increased from 0.43 to 0.72 for \(n_{g} = 25\) and \(h^{2}\) = 0.3, but only from 0.88 to 0.93 for \(n_{g}\) = 100 and \(h^{2}\) = 0.9 (Fig. 4). Thus, the gain in \(\rho\) obtained from adding half-sib families was smaller under large \(n_{g}\) and high \(h^{2}\). At intermediate values of \(n_{g}\) and \(h^{2} ,\) the differences between \(\rho\) for \(n_{H}\) = 0, 2 and 4 were at most 0.05.

Fig. 4
figure 4

Prediction accuracy (\(\rho\) ± SD) in the prediction set family \(F^{\text{PS}}\) by genomic prediction with the full-sib family (\(F^{TS}\)) and combinations of \(F^{\text{TS }}\) with half-sib families from \(H\) in the training set \(F^{\text{TS}} \cup H\). Set \(H\) had \(n_{H} = 0, 2, 4, 12, 20\) families, with both parents of \(F\) being equally represented. Results were averaged across 200 simulation runs using default values (ancestral population Elite, \(n_{\text{SNP}}\) = 5000, \(n_{\text{QTL}}\) = 1000) as well as three different sizes of \(n_{g}\) and \(h^{2}\) mentioned above and to the side of graphs, respectively

Ranking families for their predictive value using other criteria

Calculating SM, \(\theta\), LPS and \(\rho^{W}\) between each BPF and \(F\) yielded for both half-sib and unrelated families substantial differences between the top and lowest ranking family for each of the four criteria (Fig. 5). The largest range was found for \(\theta\) in set \(H\) (from 0.77 to 0.23) and LPS in set \(U\) (from 0.82 to 0.19) and the smallest for SM in set \(U\). The top ranking families identified with \(\theta\) and LPS had \(\rho \left( {H_{1} ,F^{\text{PS}} } \right)\) and \(\rho \left( {U_{1} ,F^{\text{PS}} } \right)\) values that were on average 34 and 51% lower, respectively, than when the ranking was based on the actual pairwise \(\rho\) values shown in Fig. 1. Sequentially combining families in the TS ranked by these criteria yielded monotonic increasing concave curves for \(\rho \left( { \cup H_{i} ,F^{\text{PS}} } \right)\) and \(\rho \left( { \cup U_{i} ,F^{\text{PS}} } \right)\) with biggest increases observed for SM and LPS.

Fig. 5
figure 5

Prediction accuracy (\(\rho\) ± SD) for composite training sets (TS) shown in blue and estimates of simple matching coefficient (SM), linkage phase similarity (LPS), \(\theta\) and \(\rho^{W}\)(x ± SD) shown in black between single half-sib families (\(H_{i} \in H\)) or unrelated families (\(U_{i} \in U\)) with family \(F\). Estimates of LPS, \(\theta\) and \(\rho^{W}\) for the composite TS are shown in the black line. Families were ranked in descending order according to the magnitude of the criteria (SM, LPS, \(\theta , \rho^{W}\)) with \(F\), with ranks i shown on the x-axis, and sequentially added to the composite TS. Results were averaged across 200 simulation runs using default values (ancestral population Elite, \(n_{g}\) = 50, \(n_{\text{SNP}}\) = 5000, \(n_{\text{QTL}}\) = 1000, \(h^{2}\) = 0.6)

Discussion

Previous studies evaluated methods for constructing an optimum TS prior to recording phenotypic data in field trials using two optimization criteria based on genomic data alone: the coefficient of determination (Laloë 1993) or the prediction error variance (Rincent et al. 2012). These criteria can be used to select a subset of lines within a population that improves \(\rho\) compared to a random sample of genotypes (Akdemir et al. 2015; Bustos-Korts et al. 2016; Akdemir and Isidro-Sanchez 2019). In contrast, we evaluated how both genomic and preexisting phenotypic information from genotypes of multiple BPFs can be used to identify for each BPF the best TS composition in GP. This reflects the situation often faced in breeding programs, where numerous BPFs with different degree of relatedness to the PS are produced and at least partly tested in field trials (Albrecht et al. 2011; Lian et al. 2014; Zhang et al. 2015).

Comparison of simulation and experimental results

We used both simulated and experimental data to analyze GP with multiple BPFs under Scenario 1 and 2. Simulations can be rapidly conducted and, due to their flexibility, allow new research questions to be readily addressed. Thus, important factors influencing the prediction accuracy can be investigated and new insights be gained for how to improve the development of genetic materials to best exploit the benefits of GP. Addressing such questions with experimental data is not feasible in practice due to excessive expenditures for large experiments that would be required. However, the conclusions drawn from simulations strongly depend on how well the underlying models reflect reality. For this reason, we complemented our simulations with a re-analysis of experimental data from two studies in the literature (Lehermeier et al. 2014; Zhang et al. 2015) to validate the simulations.

In general, prediction accuracies of the TS compositions investigated in Scenario 1 were higher in simulations than in experiments (Figs. 1, 2). Most strikingly, the decline in \(\rho \left( { \cup U_{i} ,F^{\text{PS}} } \right)\), when higher ranking unrelated families were sequentially added to the TS, occurred already for i ≥ 3 with experimental data and was much more severe than in the simulations. These discrepancies are partly attributable to differences in the parental inbreds used for producing the BPFs. The lines of the ancestral populations used for generating the BPFs in silico were genetically not very diverse because they originated from the same breeding program or a single landrace. In contrast, the dent founder lines used for producing the half-sib families in the study of Lehermeier et al. (2014) were genetically very distant, because they originated from different maize breeding programs for temperate maize in Europe and the US corn belt. Thus, values for \(\theta\) and LPS were lower, which contributes to a reduction in \(\rho\). Similar arguments apply to the study of Zhang et al. (2015), in which BPFs were derived from crosses among tropical elite lines from CIMMYT and evaluated in tropical environments.

The SNP data for our simulations were taken from a public breeding program to reflect allele frequencies and LD decay encountered in practice, following Daetwyler et al. (2013). However, SNPs with marker allele frequency < 0.05 were removed during the quality check of our data so that rare QTL alleles were not accounted for, which may be important in reality. Furthermore, simulations of the trait architecture assumed a purely additive-genetic model and borrowed distributions of QTL effects from the literature (Meuwissen et al. 2001). Ignoring epistasis is expected to reduce prediction accuracy (Jiang and Reif 2015; Martini et al. 2017) and may, together with the above-mentioned reasons, explain the differences between simulation and experimental results (Figs. 1, 2).

We considered parents from ancestral population Elite as unrelated although latent genetic relationships cannot be excluded. Hence, the prediction accuracies of the simulated half-sib and unrelated families might be biased upward compared to the experimental results because GP benefits from closer relationships between the TS and PS (Clark et al. 2012; Schopp et al. 2017b). However, this bias was most likely small because prediction accuracies obtained for ancestral population Landrace, which included only unrelated lines (Brauner et al. 2018), were only slightly lower than for ancestral population Elite (Fig. 1 and Supplementary Fig. 6).

Factors influencing the prediction accuracy under different training set compositions

Factors influencing \(\rho\) were evaluated under ceteris paribus conditions because their main effects are of primary interest and interactions among them are presumably small. In general, modulating the factor levels did not change the ranking of \(\rho\) for different TS compositions. Reducing the number of half-sib families from \(n_{H}\) = 12 to 3 in favor of an increase in the number of unrelated families resulted for the simulations in a more than 10% reduction of \(\rho\) for all TS compositions except \(U\), \(O^{F}\), and \(E^{F}\) (Fig. 3). This finding was confirmed by the experimental data (Table 1), where the families with \(n_{H}\) = 4 or 3 had generally higher \(\rho\) values than those with \(n_{H}\) = 2 or 1 for plant height and male flowering. This suggests that in the design of breeding programs, clear preference should be given to half-sib families over unrelated families, which might be totally excluded for compiling TS, given the potential negative effects on \(\rho\).

Increasing \(h^{2}\) had the most favorable effect on \(\rho\) for \(F^{\text{TS}}\) and \(E^{F}\), which benefited more from higher \(h^{2}\) values than set \(O^{F}\) (Fig. 3). As candidates are phenotyped with higher precision, predictive abilities \(r_{a} \left( {T,F^{\text{TS}} } \right)\) become more precise for every \(T \subset A\). Therefore, identification of the best TS composition \(E^{F }\) gets more reliable.

Increasing \(n_{SNP}\) was mainly beneficial, when the TS included DH lines from set \(U\) either alone or in combination with other sets (Fig. 3). This is in harmony with results for GP in synthetics, where \(\rho\) is primarily driven by linkage between QTL and markers, which requires a high marker density and large \(N_{\text{TS}}\) (Schopp et al. 2017a). However, an increase in marker density beyond one marker per cM hardly pays off for increasing \(\rho\). Reducing \(n_{\text{QTL}}\) from 1000 to 100 or 20 slightly reduced the differences in \(\rho\) among TS compositions except for \(O^{F}\) and \(E^{F}\), which outperformed the other TS compositions. Thus, the genetic architecture of the trait is apparently of secondary importance for the strategy to find the best TS composition.

Causal analysis of prediction accuracy under different training set compositions

In addition to \(h^{2}\) and \(N_{\text{TS}}\), \(\rho\) between BPFs is associated with the LPS between the PS and TS (Fig. 5) (Riedelsheimer et al. 2013; Lehermeier et al. 2014). Moreover, Schopp et al. (2017b) found the parameter \(\theta\), which reflects the proportion of polymorphic QTL in the PS that are also polymorphic in the TS, to be of equal or even greater importance than LPS. This was our rationale for exploring these measures as well as SM and \(\rho^{W}\) for identifying families having a negative effect on \(\rho\) when merging multiple BPFs in the TS. However, none of these criteria could be effectively used to assemble an optimum TS, as was evident from the lower \(\rho\) for the BPF in rank one identified with the criteria (Fig. 5) compared to the pairwise \(\rho\) (Fig. 1).

Obviously, \(\theta\) limits the proportion of the genetic variance in the PS, which can be explained by the QTL in the TS, and is more variable among half-sib families than among unrelated families (Fig. 5). In contrast, LPS is more variable among unrelated BPFs, because sampling the two parents of a BPF from an ancestral population generates extensive new “sample” LD that can differ substantially between BPFs (Schopp et al. 2017a). Different linkage phases among loci (both QTL and markers) in two BPFs can cause different substitution effects of short chromosome segments, which reduce the “local” prediction accuracy for the segment and ultimately reduce \(\rho\) (Schopp et al. 2017b). Regarding the effect of LPS on \(\rho\) for a combined TS, we speculate that chromosome segment substitution effects estimated from the latter correspond closely to the average of these effects across all BPFs included in the TS. This can explain the observed reduction in \(\rho\) of set \(\cup U_{i}\), when higher rank families \(U_{i}\) were included in the TS, despite an increase in \(N_{\text{TS}}\) (Figs. 1, 2). Moreover, we hypothesize that the superior \(\rho\) of set \(O^{F }\) is due to a specific combination of BPFs, which optimally balances \(N_{\text{TS}}\), \(\theta\) and LPS for the chromosome segments relevant for the trait and family \(F\), because set \(O^{F}\) often comprised approximately the same number of half-sib and unrelated families (Supplementary Fig. 7).

To investigate the effect of \(N_{\text{TS}}\) on the prediction accuracy independent of \(\theta\), we determined \(\rho\) for individual families \(H_{i}\) or \(U_{i}\) with rank i = 1, 10, 20 under increasing values of \(n_{g}\) (Supplementary Fig. 8). Besides drastic differences in the level of \(\rho\) among these families, increasing \(n_{g}\) markedly improved \(\rho \left( {H_{1} ,F^{\text{PS}} } \right)\) and \(\rho \left( {U_{1} ,F^{\text{PS}} } \right)\), yet the improvement leveled off for \(n_{g}\) ≥ 200. Conversely, \(\rho \left( {H_{20} ,F^{\text{PS}} } \right)\) and \(\rho \left( {U_{20} ,F^{\text{PS}} } \right)\) hardly changed for larger \(n_{g}\). This agrees with the strong increase in both \(\rho\) and \(\theta\) for i < 4 during sequential addition of half-sib families (Fig. 5). However, sequentially including BPFs by order of their value of \(\theta\) with family \(F\) soon approaches a plateau for i ≥ 4 close to 1.00 for set \(H\) and 0.87 for set \(U\). Hence, combining more than four half-sib or unrelated families hardly increases \(\theta\) and \(\rho .\)

Strategies for improving the training set composition

The optimum TS composition \(O^{F }\) had \(\rho\) values about 5–10% higher than set \(A\) and other combinations of \(F^{TS}\), \(H\) and \(U\) (Fig. 3). In reality, TBVs required for determining set \(O^{F }\) are unknown and finding set \(E^{F}\) with the goal to approach \(O^{F}\) was not successful. Even with \(h^{2}\) = 1.0, \(\rho\) of \(E^{F}\) is considerably lower than for \(O^{F }\) because the predictive ability \(r_{a} \left( {T, F^{\text{TS}} } \right)\) of each subset \(T \subset A\) employed for identifying \(E^{F}\) is associated with considerable uncertainty due to the limited size \(n_{g}\) of subfamily \(F^{TS}\) used for calculating the correlation \(r_{a}\) (data not shown). Thus, the large standard error of the \(r_{a}\) values used for identifying the best TS composition is likely the reason for the gap in \(\rho\) between set \(O^{F}\) and \(E^{F} .\) Increasing \(n_{g}\) for \(F^{\text{TS}}\) is an obvious solution for reducing this gap, but this remedy has limitations regarding the optimum allocation of resources discussed below. Moreover, for large values of \(n_{g}\), the prediction accuracy achieved with subfamily \(F^{\text{TS}}\) is already very high so that including half-sib families, not to mention unrelated families, is of little value.

Other methods envisioned for determining the optimal training set composition by sequential inclusion of BPFs in the TS may employ various criteria such as SM, LPS, \(\theta\) or forecasts of \(\rho\) with \(\rho^{W}\). In previous studies, these criteria were associated with \(\rho\) for pairs of BPFs (Lehermeier et al. 2014; Schopp et al. 2017b). However, for multiple BPFs we were unable to approach \(O^{F}\) empirically by using these criteria (data not shown). One reason why genomic information is insufficient for finding \(O^{F}\) is that \(\rho\) between pairs of BPFs varies considerably among traits (Schopp et al. 2017b), which is not accounted for by any of these criteria. For a given family \(F\), set \(O^{F}\) may vary between traits due to differences in \(\theta\) and LPS for the underlying QTL. Use of parental information from previous breeding cycles might help to improve the forecasting ability of these criteria if the genomic information for haplotypes, and their substitution effects are integrated in order to restrict these measures to specific chromosome segments influencing the trait of interest. However, this warrants further research beyond the scope of this study.

Although pooling all available BPFs did not have a detrimental effect on \(\rho\) in the simulations irrespective of the ancestral population (Fig. 3 and Supplementary Fig. 5), this result did not hold true for the experimental data. With the data of Zhang et al. (2015), combining \(F^{TS}\) with unrelated families \(U\) in the TS had mainly negative effects on \(\rho\), although for grain yield there was a slight advantage (0.02) of set \(A\) over \(F^{\text{TS}} \cup H\) (Table 1). Moreover, inclusion of unrelated families generally increased SD of \(\rho\), reflecting that GP becomes more risky to apply. Thus, we recommend omitting unrelated families from the TS in agreement with the concerns expressed by Riedelsheimer et al. (2013).

Design of breeding programs using GP with multiple biparental families

So far, we focused on the composition of the TS for maximizing the prediction accuracy of a single PS family. In practice, breeders work with multiple BPFs and want to apply GP simultaneously to each of them. Consequently, an important question is how to design \(A\), the set of all BPFs produced in a breeding cycle, for making best use of GP. Adopting the above recommendation to exclude unrelated families from GP, we limited ourselves to consider only full-sib and half-sib families in the TS for every family \(F\) in \(A\). Moreover, we assumed a fixed budget for producing, genotyping and phenotyping altogether \(N_{\text{Tot}}\) DH lines in the entire breeding program, ignoring the problem of balancing the resources spent for the TS and PS addressed by Riedelsheimer and Melchinger (2013) for a single BPF. As main criterion for comparison, we used \(\rho\) for each family, because this determines the selection gain achieved by GP. Long-term consequences for the genetic variation in the breeding program, which can erode by uncritical implementation of GP (Jannink 2010), might be addressed by considering the effective population size (Falconer and Mackay 1996) of all BPFs in set \(A\) as further criterion.

The first decision to be made by the breeder concerns the number of all families \(n_{A}\)versus the size \(n_{g}\) of each family under the side condition that \(N_{\text{Tot}} = n_{A} \times n_{g}\). The second decision concerns the mating design employed to produce the BPFs for set \(A\), which determines \(n_{H}\), the number of half-sib families available for inclusion in the TS of each BPF. One extreme case is that the BPFs in \(A\) are produced by a half-diallel mating design from \(n_{P}\) parents so that \(n_{A} = n_{P} \left( {n_{P} - 1} \right)/2\) and \(n_{H} =\) 2 \(\left( {n_{P} - 2} \right)\). Another extreme case is that set \(A\) consists only of unrelated crosses between \(n_{P}\) parents so that \(n_{A} = n_{P} /2\), \(n_{H} = 0\) and merely the \(n_{g}\) full-sibs from each family serve as TS. In between these two extremes is the round-robin design, where \(n_{P}\) parents are used to produce \(n_{A} = n_{H} n_{P} /2\) BPFs, each having \(n_{H}\) half-sib families to be determined by the breeder. These formulas show that for given \(n_{A}\), the number of parents \(n_{P}\) used for producing the BPFs in \(A\), which is directly proportional to the effective population size, is much smaller for the half-diallel mating design than for the unrelated crosses, and in between for the Round-Robin design.

With unrelated BPFs and \(n_{g}\) = 50 or 100 full-sibs for GP of each family, a sufficient level of \(\rho\) is reached unless the heritability is extremely low (Fig. 3). However, the improvement in \(\rho\) observed for traits with medium and high \(h^{2}\), when \(n_{g}\) is doubled from 50 to 100, hardly justifies halving \(n_{A}\) and \(n_{P. }\) Thus, composing \(A\) only of unrelated crosses is appealing for warranting a long-term selection progress in GP. Conversely, producing \(A\) with a half-diallel mating design has a high risk of narrowing the genetic diversity without substantial improvement in the prediction accuracy, as is evident from the comparison of \(\rho\) values for \(n_{H}\) = 12 or 20 with \(n_{H}\) = 0, unless \(h^{2}\) is low and \(n_{g}\) is small (Fig. 4). However, \(n_{H}\) = 12 or 20 correspond to half-diallels produced from only \(n_{P}\) = 8 and 12 parents requiring \(n_{A}\) = 28 and 66 BPFs, respectively, which limits \(n_{g}\) so that a positive net effect on \(\rho\) is rather questionable. Using a round-robin design with \(n_{H}\) = 2 or 4 and \(n_{g}\) = 50 seems to be a good compromise. It slightly increases \(\rho\) and reduces its SD compared to unrelated BPFs especially under those conditions, where full-sibs alone yield only medium \(\rho\) values. Moreover, if seed production or phenotyping of testcrosses of some BPFs fail, so that they cannot be included in the TS, their half-sib families can still serve as backup. However, compared to composing \(A\) from unrelated crosses, the number of parents \(n_{P}\) in a Round-Robin design is inversely related to \(n_{H}\).

In this study, we investigated the design of \(A\) for a single breeding cycle. In practice, GP is applied in subsequent breeding cycles, which are closely interconnected because the top lines selected in one cycle generally serve as parents for producing the BPFs of the next cycle. Thus, further research is warranted to investigate how information from previous breeding cycles can be integrated in the design of subsequent cycles.

Conclusions

When implementing GP in recycling breeding with elite lines, breeders are faced with the conflict of working with a sufficiently large TS for predicting candidates in each of multiple BPFs and yet utilizing large number of BPFs to warrant a sufficient effective population size. A popular solution to this problem is combining all available BPFs in the TS, including unrelated families. We confirmed with simulations and experimental data that big differences exist in the predictive value of individual BPFs as TS, depending on their relationship to the PS family. Whereas the mean prediction accuracy was highest for full-sibs followed by half-sib and unrelated families, the variation among \(\rho\) values showed an opposite trend. Thus, \(\rho\) of unrelated BPFs was often negative and including these families in a combined TS is usually detrimental. Consequently, merging all available BPFs in the TS for the sake of increasing \(N_{\text{TS}}\) is not the best option, because other TS compositions exist that yield higher \(\rho\) values. However, identifying the optimal set of BPFs to be combined in a TS, based on existing genomic and phenotypic data, is still an unsolved problem. To be on the safe side for achieving a positive selection response with GP, we recommend to include only full-sib and half-sib families in the TS for each BPF. For producing the entire set \(A\) of BPFs subject to GP in a breeding cycle, we propose a mating scheme such as the Round-Robin design, which yields two to four half-sib families for each BPF. A medium number of DH lines (e.g., \(n_{g}\) ~ 50) from each BPF is genotyped and phenotyped for training the prediction model. Finally, GP of the numerous candidates in the PS of each BPF is based on a specific TS, which comprises \(n_{g}\) DH lines from (a) the BPF itself (i.e., full-sibs) and (b) few (\(n_{H}\) = 2 to 4) half-sib families of this BPF. This represents a compromise between achieving high \(\rho\) values and securing a sufficiently large effective population size of the entire breeding program, thus, warranting a balance between short- and long-term selection progress with genomic selection.