Introduction

Since genomic prediction was first proposed by Meuwissen et al. (2001), it has proven to be a promising approach for numerous applications in both animal (e.g., Hayes et al. 2009; Hayes and Goddard 2010) and plant breeding (e.g., Bernardo and Yu 2007; Riedelsheimer et al. 2012). In the literature, the focus has so far been on the reliability of GEBVs for unobserved genotypes, whereas the training set (TS) of individuals used for calibrating the prediction model has received only little attention. However, in applied plant breeding programs, the TS individuals constitute a considerable fraction of the total breeding population and are usually themselves selection candidates. For TS individuals, both their phenotypic values and their GEBVs are available.

One of the most popular methods for genomic prediction is genomic best linear unbiased prediction (GBLUP), which has proven to be simple and efficient with performance that compares well with more sophisticated prediction methods (de Los Campos et al. 2013). It is based on the animal model (Lynch and Walsh 1998) that has been widely used by animal breeders for decades. The difference lies in the definition of the relationship matrix \({\mathbf {A}}\). While in the classical animal breeding literature, \({\mathbf {A}}\) is calculated from pedigree data (e.g., Lynch and Walsh 1998), the principal innovation of GBLUP was to calculate \({\mathbf {A}}\) from genome-wide marker data (Habier et al. 2007; VanRaden 2008; Goddard et al. 2009), often referred to as the genomic relationship matrix (GRM).

The elements of the GRM are estimates of the genetic correlation between alleles taken from pairs of individuals and can be conveniently computed with reference to the current population (Powell et al. 2010). As such, they can be interpreted as deviations from expected allele sharing between individuals, given the allele frequencies of the current population (Astle and Balding 2009). These deviations are a result of Mendelian sampling and linkage during the segregation of loci (Hill and Weir 2011).

Estimating genetic covariances from marker data allows for defining relationships among individuals of unknown ancestry, which would classically be treated as unrelated. An example in plant breeding would be a diversity panel of lines. Furthermore, it enables to identify additive-genetic variation within groups of individuals having identical pedigree relationships, for instance full-sib families.

Endelman and Jannink (2012) examined genomic prediction using GBLUP in the TS and demonstrated that the reliability of GEBVs of TS individuals can be substantially increased by shrinking the GRM towards a less complex target matrix that can be estimated from the data with higher precision. The problem was also addressed by Riedelsheimer and Melchinger (2013), who applied selection index theory to construct a selection index that aims to optimally combine GEBVs and phenotypic values of TS individuals. Apart from those previous studies, the importance of genomic prediction in the TS has not been appropriately recognized in the literature so far. Our study aims to alleviate this neglect by comparing the performance of several alternative shrinkage methods as well as the method of Riedelsheimer and Melchinger (2013). Besides two novel shrinkage methods that are based on measures of linkage disequilibrium between marker loci, we applied a regression approach similar to the proposal of Yang et al. (2010) and Goddard et al. (2011) and also used the method presented by Endelman and Jannink (2012). The objective of our study was to compare the alternative shrinkage methods in terms of reliabilities of GEBVs for different population types and marker densities.

Material and methods

Statistical model

The GEBVs were computed by GBLUP with the basic linear mixed model

$$\begin{aligned} y_i = \mu + a_i + e_i, \end{aligned}$$
(1)

where the phenotypic value \(y_i\) of the \(i\)th individual is decomposed into a common intercept \(\mu\) (fixed), a true genetic value \(a_i\) (random), and a residual term \(e_i\). Using vector notation, the model assumes that \({\mathbf {a}} \sim \mathcal {N} \big ( 0, {\mathbf {A}} \sigma _{a}^{2} \big )\) and \({\mathbf {e}} \sim \mathcal {N} \big ( 0, {\mathbf {I}} \sigma _{e}^{2} \big )\), where \(\sigma _{a}^{2}\) and \(\sigma _{e}^{2}\) are the genetic and residual variance components, respectively. The matrix \({\mathbf {A}}\) is the GRM and its computation will be detailed later. The genetic values were predicted using the standard BLUP formulas (Lynch and Walsh 1998)

$$\begin{aligned} \hat{\mu }&= \left( {\mathbf {1}}^T {\mathbf {V}}^{-1} {\mathbf {1}} \right) ^{-1} {\mathbf {1}}^T {\mathbf {V}}^{-1} {\mathbf {y}}\end{aligned}$$
(2)
$$\begin{aligned} \hat{{\mathbf {a}}}&= \sigma _{a}^{2} {\mathbf {A}} {\mathbf {V}}^{-1} \left( {\mathbf {y}} - {\mathbf {1}} \hat{\mu } \right) , \end{aligned}$$
(3)

where \({\mathbf {V}} = {\mathbf {A}} \sigma _{a}^{2} + {\mathbf {I}} \sigma _{e}^{2}\). Variance components and heritabilities were estimated using the spectral decomposition algorithm of Kang et al. (2008) as implemented in the R package rrBLUP (Endelman 2011).

Simulation

We simulated two different population types, a population of unrelated lines (UR) and a biparental family of lines (BP). The UR population was simulated by sampling genotypes from a joint distribution as described in Montana (2005) using allele frequencies sampled from the interval \(\left[ 0.35, 0.65\right]\) and LD modelled following the exponential decay function \({\text {LD}}(d) = 0.8 \times e^{-20d}\), where \(d\) is the genetic distance in Morgan. The BP population was generated by recombining the genomes of two divergent parental lines (i.e., lines that were generated by randomly assigning SNP alleles to one or the other parent with equal probability) using the R package hypred (Technow 2013). In both populations, haplotypes were doubled to obtain fully homozygous doubled haploid lines. We simulated ten chromosomes, the lengths of which were taken from the Genetics (2008) Composite Map of Maize (http://www.maizegdb.org) with a total map length of \(\sim \!18\) Morgan. We used a constant number of 200 QTL, such that the QTL density amounted to about 11 QTL per Morgan. In both scenarios, we used different TS sizes \(N \in \left\{ 50, 100, 200 \right\}\) and heritabilities \(h^2 \in \left\{ 0.25, 0.5, 0.75 \right\}\). The size of the prediction set (PS) was held constant at 200 individuals. TS sizes were chosen to reflect the numbers used in practical plant breeding programs.

In order to vary linkage disequilibrium between markers and QTL, we used increasing numbers of markers \(M \in \left\{ 50, 100, 500, 1,000, 2,500 \right\}\). To place QTL and markers on the genome, first their number per chromosome was sampled from a multinomial distribution with class probabilities equal to the relative chromosome lengths. Subsequently, QTL and markers were uniformly distributed along the respective chromosomes. QTL effects were drawn from a gamma distribution (Meuwissen et al. 2001) with shape 1.0 and rate 2.0. The signs of the effects were sampled from a Bernoulli distribution with \(p=0.5\). The QTL effects were then scaled to achieve an overall genetic variance equal to 1.0. Phenotypes were simulated by adding an independent Gaussian error term with \(\sigma _e^2 = \tfrac{1-h^2}{h^2}\), depending on the heritability \(h^2\). The reliability of GEBVs was calculated as the squared correlation coefficient between GEBVs and the simulated true genetic values and is denoted by \(\rho ^2\).

All of our results were obtained from 500 independent simulation runs. In order to determine the maximum reliability \(\rho ^2_{\text {max}}\) in the TS and the corresponding optimum shrinkage coefficient \(\delta _{\text {opt}}\) to be used in Eq. 5 described below, we computed the reliability of the resulting GEBVs in the TS at a sequence of 100 shrinkage coefficients equally spaced between 0 and 0.9 for each simulation run. Averages across all runs were calculated for each position in the sequence and \(\rho ^2_{\text {max}}\) and \(\delta _{\text {opt}}\) were determined numerically. The reliability of the phenotypic values, i.e., the squared correlation coefficient between phenotypic values and true genetic values corresponded to the heritability \(h^2\). All computations were performed within the statistical computing environment R (R Core Team 2014).

Shrinkage methods

As a starting point and reference for all methods, the GRM was computed according to the first method of VanRaden (2008), which we refer to as Method VR1. As shown by Endelman and Jannink (2012), this method is also suitable for populations of inbred lines and the GRM is computed according to the following formula:

$$\begin{aligned} \widehat{{\mathbf {A}}} = \frac{{\mathbf {W}} {\mathbf {W}}^T}{2\sum _{k} p_k (1-p_k)}, \end{aligned}$$
(4)

(Habier et al. 2007; VanRaden 2008; Endelman and Jannink 2012), where \({\mathbf {W}}\) is the column-centered genotype matrix with \(w_{i\!k} = x_{i\!k} - 2p_k\); here \(x_{i\!k} \in \left\{ 0,1,2 \right\}\) codes the number of major alleles at the \(k\)th locus in the \(i\)th individual and \(p_k\) is the sample allele frequency at the \(k\)th locus. Under the infinitesimal model, the genetic value is determined by an infinitely large number of unlinked loci each of which contributes a small effect (Hill 2010). Given these assumptions, the genomic relationship matrix can be optimally estimated from the observed marker loci by Eq. 4 (Endelman and Jannink 2012).

In the following, we describe four methods that are based on the principle of imposing shrinkage on \(\widehat{{\mathbf {A}}}\) to obtain a modified relationship matrix that can be written as

$$\begin{aligned} \widehat{{\mathbf {A}}}^* = \delta {\mathbf {T}} + \left( 1-\delta \right) \widehat{{\mathbf {A}}}, \end{aligned}$$
(5)

where \({\mathbf {T}}\) is a target matrix toward which \(\widehat{{\mathbf {A}}}\) is shrunken. The shrinkage coefficient \(\delta\) specifies the strength of shrinkage imposed on \(\widehat{{\mathbf {A}}}\). Methods 1 and 2 are novel, Method 3 is based on Yang et al. (2010) and Goddard et al. (2011) and further developed by us, and method 4 was presented by Endelman and Jannink (2012). In Methods 1–3, the target matrix toward which \(\widehat{{\mathbf {A}}}\) is shrunken is a diagonal matrix with elements equal to the average of the diagonal elements of \(\widehat{{\mathbf {A}}}\), which is equal to \(1 + \widehat{f}\). Here \(\widehat{f}\) is the average inbreeding coefficient in the population, which equals 2 for fully inbred lines as used in the present study.

Method 1: adjLD

In preliminary analyses we observed that the optimum shrinkage coefficient is in a strong relationship with LD. We, therefore, developed a heuristic method in which the LD between adjacent marker loci (\({\text{ LD }}_{\text {adj}}\)) was used to compute the shrinkage coefficient as \(\delta _{\text {adjLD}} = 1 - {\text {LD}}_{\text {adj}}\). The LD between adjacent markers was obtained as the average of the squared correlation between all pairs of neighboring markers across the genome (Hill and Robertson 1968).

Method 2: effLD

Because \({\text {LD}}_{\text {adj}}\) only captures LD between adjacent loci, we devised a measure for effective LD (\({\text {LD}}_{\text {eff}}\)) between a single hypothetical QTL and its surrounding markers. In short, \({\text {LD}}_{\text {eff}}\) measures the amount of variation in the genotype of a single locus that is simultaneously explained by the genotypes of several surrounding loci. The shrinkage coefficient \(\delta\) is then analogously computed as \(\delta _{\text {effLD}} = 1 - {\text {LD}}_{\text {eff}}\). A detailed description of the method is provided in the “Appendix”.

Method 3: RG

The third method extends the regression approach described by Yang et al. (2010) and Goddard et al. (2011). Here, the rationale is to regress relationship coefficients computed with QTL on those computed with markers and use the slope \(\beta\) for shrinkage to obtain an unbiased estimate of the GRM . In practice, \(\beta\) has to be estimated based on marker data alone, because the QTL are unkown. In Yang et al. (2010), \(\beta\) is estimated by randomly splitting markers into two equally sized sets for different numbers of markers and subsequently treating one set as proxies for QTL. The regression coefficient \(\beta\) is obtained by regressing the elements of \(({\mathbf {A-I}})\) on the elements of \(({\mathbf {\widehat{A}-I}})\), where \({\mathbf {A}}\) is the GRM computed with the (pseudo-) QTL and \({\mathbf {\widehat{A}}}\) the GRM computed with the markers. In our study, we estimated \(\beta\) by randomly splitting the total number of markers into two distinct sets. Because the number of QTL is relevant for the estimation of \(\beta\), we varied the set size of the pseudo-QTL starting from 5 up to half the number of all markers. Then we performed separate regressions for each set size with 25 replications, where we regressed the elements of \(({\mathbf {A}} - {\mathbf {T}}^{\text {QTL}})\) on the elements of \((\widehat{{\mathbf {A}}} - {\mathbf {T}})\), including the diagonal. Here, \({\mathbf {T}}\) and \({\mathbf {T}}^{\text {QTL}}\) are the diagonal matrices that contain the average of the diagonal elements of \(\widehat{{\mathbf {A}}}\) and \({\mathbf {A}}\), respectively. The mean of all regression coefficients was used as an estimate \(\widehat{\beta }\) and the corresponding shrinkage coefficient was obtained as \(\delta _{\text {RG}} = 1 - \widehat{\beta }\). In addition, we computed the shrinkage coefficient of Method RG using the true QTL genotypes to calculate \({\mathbf {A}}\), denoted \(\delta _{\text {RG}}^{\text {QTL}}\), for comparison.

Method 4: EJ

This method was devised by Endelman and Jannink (2012) and differs from the previous ones in that a different target for shrinkage is used. In the original presentation of Endelman and Jannink (2012), the shrunken GRM is computed as

$$\begin{aligned} {\widehat{{\mathbf {A}}}^* = \frac{\delta _{\text {EJ}} \; \left\langle {\mathbf {S}}_{ii} \right\rangle {\mathbf {I}} + (1 - \delta _{\text {EJ}}) {\mathbf {S}} + \left\langle {\mathbf {W}}_{\cdot k} \right\rangle \! \left\langle {\mathbf {W}}_{\cdot k} \right\rangle ^T}{2 \left\langle p_k q_k \right\rangle },} \end{aligned}$$

where \(\left\langle {\mathbf {S}}_{ii} \right\rangle\) is the mean of the diagonal elements of \({\mathbf {S}}\) with \({\mathbf {S}} = M^{-1} {\mathbf {W}}{\mathbf {W}}^T - \left\langle {\mathbf {W}}_{\cdot k} \right\rangle \left\langle {\mathbf {W}}_{\cdot k} \right\rangle ^T\) being the sample covariance matrix, \(\left\langle {\mathbf {W}}_{\cdot k} \right\rangle\) is a column vector containing the row means of \({\mathbf {W}}\), and \(\left\langle p_k q_k \right\rangle\) is the average of the product between allele frequencies across all loci. This can be rearranged to

$$\begin{aligned} {\widehat{{\mathbf {A}}}^* = \delta _{\text {EJ}} \left( \frac{ \left\langle {\mathbf {S}}_{ii} \right\rangle {\mathbf {I}}}{2 \left\langle p_k q_k \right\rangle } + \frac{\left\langle {\mathbf {W}}_{\cdot k} \right\rangle \! \left\langle {\mathbf {W}}_{\cdot k} \right\rangle ^T}{2 \left\langle p_k q_k \right\rangle } \right) + (1 - \delta _{\text {EJ}}) \widehat{{\mathbf {A}}}.} \end{aligned}$$
(6)

Hence, Endelman and Jannink (2012) use a similar target matrix as we do, which has the same diagonal elements as \({\mathbf {T}}\), but has in addition non-zero off-diagonal elements determined by the second term in the first parenthesis in Eq. 6. The computation of the shrinkage coefficient \(\delta _{\text {EJ}}\) was described in Endelman and Jannink (2012).

Method 5: RM

In the context of ressource optimization for a single breeding cycle with genomic selection, Riedelsheimer and Melchinger (2013) proposed a selection index that combines GEBVs with phenotypic data for individuals in the training set. Their index is based on the theory presented in Lande and Thompson (1990) originally developed for marker-assisted selection. Although this method is not based on shrinkage estimation of the GRM, we included it in our analyses because it was originally constructed with the objective to improve the reliability of GEBVs of training set individuals, which is also the ultimate goal of the shrinkage methods presented earlier. Moreover, shrinkage estimation of the GRM effectively leads to an up-weighting of the own phenotypic value of an individual, while down-weighting the information of related individuals. Thus, the shrinkage coefficient can be conceptually regarded as a selection index combining a phenotype’s own value with its GEBVs, estimated by using a non-shrunken GRM. In the “Appendix”, we provide a detailed derivation of the formulas presented in Riedelsheimer and Melchinger (2013) and point out that some key assumptions implicitly made are violated.

Results

Reliability of method VR1 in the TS and PS

Table 1 Reliability in the training and prediction set using the standard GRM after VanRaden (2008) (Method VR1) for different training set sizes (\(N = 50, 100, 200\)), heritabilities (\(h^2 = 0.25, 0.50, 0.75\)) and number of markers (\(M = 100, 500, 2,500\)) uniformly distributed on 10 chromosomes with a total length of about 18 Morgans

For the same size of the training set \(N\), heritability \(h^2\), and number of markers \(M\), reliabilities for both TS and PS using Method VR1 were always higher in the BP population than in the UR population (Table 1) . In general, reliabilities increased with increasing \(N\), \(h^2\) and \(M\). In the BP population, reliabilities in the PS amounted to 51–61 % of those observed in the TS for \(N=50\) and to 81–88 % for \(N=200\), with increasing percentage value for increasing number of markers. On the other hand, in the UR population reliabilities in the PS amounted to 11–25 % of those in the TS for \(N=50\) and 37–57 % for \(N=200\). While the reliabilities for \(N=50\) were above 0.17 and thus reasonably high in the BP population, they were lower than 0.17 in the UR population. In the UR population, the reliability in the TS decreased for increasing TS size when the number of markers was \(<\)500, but increased for \(M\ge 500\) (Online Resource 1, Table S2) . Moreover, the reliability in the TS of the UR population only surpassed \(h^2\) when \(M>200\), for all levels of \(N\) and \(h^2\).

Reliabilities in the BP and UR population

Fig. 1
figure 1

Reliability (\(\rho ^2\)) in the UR population for a training set size of \(N=200\) for different numbers of markers (\(M = 100, 500, 2,500\)) and heritabilities (\(h^2 = 0.25, 0.50, 0.75\)). The solid black curve shows the reliability when the shrinkage coefficient \(\delta\) is systematically varied between 0 and 0.9. The maximum of this curve, \(\rho ^2_{\text {max}}\), is indicated by the dashed horizontal black line, and the value of \(\delta\) for which \(\rho ^2_{\text {max}}\) was achieved, which is \(\delta _{\text {opt}}\), is shown by the vertical red line, surrounded by a shaded region where the reliability is not \(<\)99.5 % of the \(\rho ^2_{\text {max}}\). The boxplots shows the mean and the 0.90, 0.65, 0.35, and 0.10 quantiles for the different methods, centered at the average shrinkage coefficient of the respective method. The boxplots for Methods EJ and RM are drawn without a scale on the x-axis in a separate section within each panel, because they cannot be compared to the other methods based on their shrinkage coefficient

The relative performance of the methods was similar for all levels of \(N\). We, therefore, limit our presentation of results to those obtained for \(N=200\), for the sake of brevity. Results for \(N=50\) and \(N=100\) are shown in Online Resource 1. The performance of the various methods in the UR population for a training set size of 200 showed a strong dependency on the heritability \(h^2\) and the number of markers \(M\) (Fig. 1). The difference between Method VR1 and the maximum reliability \(\rho ^2_{\text {max}}\) was largest for high \(h^2\) and low \(M\) and smallest vice versa. For \(M=100\), the methods adjLD, effLD, and EJ performed equally well, whereas RG showed slightly lower performance, especially for high \(h^2\). Method RM led to the lowest reliability of GEBVs compared to all the other methods and was hardly better than Method VR1. For \(M=500\), effLD and RG were superior, followed by EJ and RM, which had comparable reliabilities. The reliability of Method adjLD was lowest. For \(M=\) 2,500, the reliability of VR1 was already almost identical with the optimum \(\rho ^2_{\text {opt}}\). Here, the best methods were RG, EJ, and RM, whereas effLD and adjLD showed the lowest reliability.

In the BP population, for \(M=100\), Method RG and effLD had the highest reliability. Method RM showed comparable performance to VR1, whereas methods adjLD and EJ were only marginally better than VR1 for \(h^2 = 0.75\) and otherwise worse. For \(M\ge 500\), however, the differences between the methods and VR1 were very small. However, for \(M=\) 2,500 and \(h^2 = 0.75\), Method effLD showed a distinctly lower performance than the other methods.

Fig. 2
figure 2

Reliability (\(\rho ^2\)) in the BP population for a training set size of \(N=200\) for different numbers of markers (\(M =\) 100, 500, 2,500) and heritabilities (\(h^2 = 0.25, 0.50, 0.75\)). The solid black curve shows the reliability when the shrinkage coefficient \(\delta\) is systematically varied between 0 and 0.9. The maximum of this curve, \(\rho ^2_{\text {max}}\), is indicated by the dashed horizontal black line, and the value of \(\delta\) for which \(\rho ^2_{\text {max}}\) was achieved, which is \(\delta _{\text {opt}}\), is shown by the vertical red line, surrounded by a shaded region where the reliability is not \(<\)99.5 % of the \(\rho ^2_{\text {max}}\). The boxplots shows the mean and the 0.90, 0.65, 0.35, and 0.10 quantiles for the different methods, centered at the average shrinkage coefficient of the respective method. The boxplots for Methods EJ and RM are drawn without a scale on the x-axis in a separate section within each panel, because they cannot be compared to the other methods based on their shrinkage coefficient

Shrinkage coefficients

Table 2 Linear Regression of the optimum shrinkage coefficient \(\delta _{\text {opt}}\) on the number of markers (\(M\)), heritability (\(h^2\)) and training set size (\(N\)) as predictors scaled by subtracting the mean and dividing by the standard deviation

In our simulations, we numerically determined the optimum shrinkage coefficient \(\delta _{\text {opt}}\) that maximized the reliability in the TS. To assess the relative importance of the number of markers \(M\), heritability \(h^2\), and training set size \(N\) on the variation in \(\delta _{\text {opt}}\), we used linear regression with scaled predictors (Table 2) .

In the UR and BP populations, the total variation in the optimum shrinkage coefficient \(\delta _{\text {opt}}\) explained by the linear regression amounted to \(R^2 = 0.633\) and \(R^2 = 0.394\), respectively. In both population types, the number of markers \(M\) showed the largest regression coefficient, with \(-2.16\) in UR and \(-0.095\) in BP. Compared to \(M\), heritability \(h^2\) and training set size \(N\) had only a small influence on \(\delta _{\text {opt}}\) in both population types.

Because of this, we computed \(\delta _{\text {opt}}\) for different numbers of markers, averaging over heritability and training set size and compared it to the shrinkage coefficients obtained by Methods adjLD, effLD, and RG (Table 3) , which do not vary with \(h^2\) and \(N\) by definition. In addition, we calculated the shrinkage coefficient for Method RG using the true QTL (\(\delta _{\text {RG}}^{\text {QTL}}\)).

In the UR (BP) population, \(\delta _{\text {opt}}\) was 0.81 (0.39) for \(M=50\) and was reduced to 0.05 (0.01) for \(M=2,500\). Across both population types, \(\delta _{\text {RG}}^{\text {QTL}}\) was remarkably close to \(\delta _{\text {opt}}\), with a correlation of 0.98. For Method RG, \(\delta _{\text {RG}}\) was considerably lower than \(\delta _{\text {opt}}\) for \(M \le 100\), but in good agreement otherwise. The shrinkage coefficient \(\delta _{\text {adjLD}}\) was generally higher than \(\delta _{\text {opt}}\) in both population types for all levels of \(M\) and decreased only to 0.37 for \(M=2,500\) in the UR population. For Method effLD, \(\delta _{\text {effLD}}\) was close to \(\delta _{\text {opt}}\) for \(M \le 200\), but its value stayed almost constant for \(M \ge 500\) in the UR population and even increased in the BP population. We found that the optimum shrinkage coefficient \(\delta _{\text {opt}}\) and \(\delta _{\text {RG}}^{\text {QTL}}\) were almost identical. The estimate \(\delta _{\text {RG}}\) matched \(\delta _{\text {opt}}\) for \(M=100\) and upward.

Table 3 Shrinkage coefficients for different numbers of markers (\(M = 50, 100, 200, 500, 1,000, 2,500\)), averaged across heritability and training set size

Discussion

Shrinkage estimation of the GRM

Best linear unbiased prediction (BLUP) is equivalent to a selection index when fixed effects are first estimated using generalized least-squares and subsequently used to correct phenotypic values (Henderson 1973). This index optimally combines the available phenotypic information of related individuals and maximizes the correlation between predicted and true genetic values (Searle et al. 1992). However, this property depends on the correct specification of the covariance structure, i.e., the GRM and the variance components. If markers are not in sufficient LD with QTL, the relationships derived from marker genotypes deviate from the actual relationships at the QTL (Yang et al. 2010), resulting in a misrepresentation of the true QTL relationships in the GRM. This leads to spurious signals coming from the phenotypic values of other individuals and, as a consequence, the reliability of the GEBVs is impaired and can even be significantly lower than the heritability (Figs. 1,  2). A similar phenomenon was observed by Habier et al. (2013), where they showed that increasing the TS size can even lead to reduced reliability of individuals in the PS because of ‘relationship noise’ due to the misrepresentation of the actual pedigree relationships in the GRM. Shrinkage estimation of the GRM can then recover some of the lost reliability when a proportionally larger amount of ‘noise’ due to incomplete LD is shrunken to zero compared to actual QTL relationships traced by markers. In terms of the BLUP selection index, shrinkage leads to an up-weighting of the own phenotypic value of an individual and down-weighting of phenotypic values of other individuals and by this reduces the negative impact of spurious signals from misrepresented relationships.

Optimum shrinkage coefficient

By using linear regression , we found that in both population types most of the variation in the optimum shrinkage coefficient \(\delta _{\text {opt}}\) can be explained by the number of markers (Table 2). The number of markers is strongly related to LD, so that in turn, LD is an important influencing factor of \(\delta _{\text {opt}}\). Consequently, if a sufficient number of markers is present to ensure a high level of LD, relationships in the GRM are specified correctly and shrinkage is not required. This corroborates the notion that information about actual relationships conveyed by markers is tightly associated with LD (Yang et al. 2010). LD also strongly impacted the reliability of GEBVs. The lower LD in the UR compared to BP population can explain the generally lower reliability in both TS and PS in the former. The presence of extended linkage blocks due to cosegregation (Frisch and Melchinger 2007; Smith et al. 2008) in biparental populations of doubled haploid lines can explain the higher reliability in the BP compared to the UR population (Habier et al. 2013).

The difference between the maximum reliability \(\rho ^2_{\text {max}}\) obtained using the optimum shrinkage coefficient \(\delta _{\text {opt}}\) and the reliability obtained for Method VR1 can be regarded as the maximum achievable gain in reliability that can be brought about by shrinkage. This gain was generally highest for a low number of markers \(M\) and high heritability \(h^2\), and vice versa (Figs. 1, 2). However, because the focus is on the reliability in the TS, for which phenotypic values are available, any gain in reliability due to shrinkage has to be set into relationship to \(h^2\), which represents the reliability achieved when selecting on the phenotypic values directly. Therefore, although the gain in reliability went up with increasing \(h^2\), the difference between \(\rho ^2_{\text {max}}\) and \(h^2\) went down. Hence, there is a range where \(h^2\) is high enough to allow shrinkage to substantially improve the reliability of GEBVs in the TS relative to the one obtained with Methods VR1, but yet low enough to allow \(\rho ^2_{\text {max}}\) to be appreciably higher than \(h^2\). This range is precisely what was termed the “sweet spot” by Endelman and Jannink (2012). In their article, the showed that shrinkage estimation of the GRM using Methods EJ can improve the reliability of GEBVs in the TS in an “unstructured” population of 274 maize inbred lines genotyped for 384 markers, where by “unstructured” they implied that the first principal component explained only 5 % of the total variation.

In the PS, regardless of the combination of the parameters \(M\), \(h^2\) and \(N\), shrinkage did not lead to any gain in reliability, i.e, the maximum achievable gain in reliability was essentially zero (Online Resource 1, Table S3). This result corroborates the findings of Endelman and Jannink (2012) that shrinkage did not improve the GEBV reliability for unphenotyped individuals, even for a low number of markers.

Comparison between methods

In our simulation study, the optimum shrinkage coefficient \(\delta _{\text {opt}}\) could be identified because the true genetic values and the QTL were known. For real applications, however, the shrinkage coefficient must be estimated from the data. The regression methods RG would lead to a shrinkage coefficient \(\delta _{\text {RG}}^{\text {QTL}}\) that closely matches \(\delta _{\text {opt}}\) if the QTL were known, which demonstrates that Method RG is in principal the right approach. However, neither QTL nor their number is known in practice, which is the reason why markers have to be employed as a proxy for QTL. This poses the problem to decide on the proportion of the sets into which the markers are partitioned, which should best reflect the unknown true proportion between QTL and markers. Our strategy of assuming the number of QTL ranging from a minimum of 5 up to half the number of markers ensured that values \(\delta _{\text {RG}}\) close to \(\delta _{\text {RG}}^{\text {QTL}}\) were achieved for a high number of markers, but it causes \(\delta _{\text {RG}}\) to have a pronounced downward bias relative to \(\delta _{\text {RG}}^{\text {QTL}}\) when \(<\)200 markers were used (Table 3), which equals the number of QTL we used throughout our simulations. Consequently, Methods RG featured shrinkage coefficients close to \(\delta _{\text {opt}}\) for \(M \ge 200\) and thus was one of the best performing methods for both population types. The Methods effLD had a shrinkage coefficient in good agreement with \(\delta _{\text {opt}}\) for \(M \le 500\), where it showed reliabilities close to \(\rho ^2_{\text {max}}\). However, for more than 500 markers, \(\delta _{\text {effLD}}\) was considerably higher than \(\delta _{\text {opt}}\), which led to shrinkage that was too strong and consequently reliabilities were even lower than those obtained for Method VR1. The same trend was observed for Method LDadj with shrinkage coefficients \(\delta _{\text {adjLD}}\) that were even more exaggerated for a large number of markers. Method EJ is also based on a shrinkage approach, but towards a slightly different target matrix than methods RG, effLD and adjLD, which is the reason why it cannot be compared to the other methods based on its shrinkage coefficient. The method showed superior performance in the UR population, especially for a low number of markers, but revealed deficiencies in the BP population for low to medium number of markers, where it can underperform Method VR1. The method RM is not based on shrinkage, but on a selection index approach (Riedelsheimer and Melchinger 2013). Although critical assumptions of the method are not fulfilled, it shows reasonable performance in both population types for \(M\le 500\), but is hardly better than Method VR1 for \(M=50\), particularly in the UR population.

In conclusion, our results demonstrate that shrinkage estimation of the GRM can substantially improve the reliability of GEBVs of TS individuals, in particular when the number of markers is low and the heritability is at intermediate values. Of the shrinkage methods evaluated, Method RG was the most promising with superior performance and reliabilities always as high as or higher than those obtained from VR1.

Author contribution statement

Author contribution statement: DM conducted all simulations and analyses, devised Methods effLD, adjLD and RG, and wrote the manuscript. FT supported the development of the shrinkage concept, contributed software to conduct the simulations and revised the manuscript. AEM initiated and guided through the study, did the algebra of the ‘Method RM’ part of the manuscript and revised the manuscript.