Introduction

Single-step best linear unbiased prediction is a method of combining pedigree information and molecular marker data for use in mixed model analyses for the prediction of breeding values in a single analysis. The method overcomes the cumbersome process of analyzing datasets for populations with partial genotyping which previously was accomplished in multiple stages (Guillaume et al. 2008; VanRaden 2008). DNA markers are used to create a realized relationship matrix (VanRaden 2008) for all genotyped individuals which is populated with very accurate pairwise relationship estimates rather than average information used in a pedigree-based numerator relationship matrix (NRM). The information for genotyped individuals (realized relationship matrix) is combined with the average NRM (from pedigree information) that includes relationship information for all individuals in the population (Legarra et al. 2009; Christensen and Lund 2010). Information from the realized relationship matrix is used to adjust the pedigree-based NRM for related individuals resulting in a hybrid, referred to as the “H-matrix.” The H-matrix can then be used in traditional best linear unbiased prediction (BLUP) (Henderson 1975) to predict breeding values. The H-matrix is more dense (less 0 elements) than the pedigree-based NRM and thus can impose computational issues; however, these have been overcome with the approach proposed by Misztal et al. (2009) and used to calculate the genetic value of over 10 M cattle (Aguilar et al. 2010).

There are many methods for creating the realized relationship matrix from DNA markers (VanRaden 2008). Zapata-Valenzuela et al. (2013) used two methods for creating the realized relationship matrix and found no effect on prediction accuracy of breeding values in a clonal population of loblolly pine (Pinus taeda L.); however, when creating the H-matrix, it is important to scale the realized relationship matrix to the pedigree-based NRM (Christensen and Lund 2010; Forni et al. 2011; Meuwissen et al. 2011). Also, integrating both pedigree and DNA marker data into a single analysis helps improve accuracy of predicted breeding values. For example, the single-step method was more accurate for breeding value prediction than pedigree-based BLUP and the multi-step approach in a simulated dataset (Christensen and Lund 2010) and improved the overall accuracy in the genomic evaluation of pigs (Christensen et al. 2012). Ratcliffe et al. (2017) used single-step BLUP in a population of 1694 genotyped white spruce (Picea glauca (Moench) Voss) trees to investigate performance in the context of a tree breeding program for height growth and wood density. The authors created various H-matrices with different proportions of genotyped trees (genotyping effort) and found that trait heritability decreased with increasing genotyping effort, accuracy of the breeding values improved for genotyped trees compared to non-genotyped trees, and tree ranks were comparable between single-step BLUP and traditional pedigree-based BLUP for both tree height and wood density. Similar results were found using nine microsatellites in a population of Pinus sylvestris L. (Korecký et al. 2013) and Eucalyptus grandis W. Hill ex Maiden using DArT markers for tree growth traits (Cappa et al. 2017). However, Klápště et al. (2018) emphasized the need to make corrections to the pedigree using sib-ship reconstruction (Wang 2004; Klápště et al. 2017) prior to scaling the realized relationship matrix, which resulted in better prediction accuracy.

Molecular DNA markers can be used to predict the genetic value of trees at a young age prior to lengthy testing and costly phenotyping which can shorten the breeding cycle and improve overall efficiency by reducing gain per unit time (Resende et al. 2012a; Beaulieu et al. 2014a; Grattapaglia 2017). One cycle of a tree breeding program can often take many years (> 15 years), and researchers have proposed methods of combining molecular markers with other techniques to improve efficiency (Li and Dungey 2018). Genomic selection typically describes any technique for predicting breeding values without prior knowledge about the underlying causal variants. The original idea of genomic selection was to combine high-density marker data with special analytical techniques to predict the genetic value of individuals (Meuwissen et al. 2001). In theory, these models would be driven by the linkage between markers and quantitative trait loci (QTL); however, it has been shown that the high-prediction accuracy is due to relatedness in the population rather than linkage disequilibrium. An important aspect of genomic selection is the ability to capture the Mendelian segregation component of an individual’s breeding value when no phenotypic information is available, and therefore allow the breeder to rank trees within families. There are several analytical techniques for genomic selection in tree breeding including Bayesian models with different prior densities (Resende et al. 2012b; de Almeida Filho et al. 2016; Durán et al. 2017; Ukrainetz and Mansfield 2020), ridge-regression BLUP (Resende et al. 2012a; Beaulieu et al. 2014b; Gamal El-Dien et al. 2015), or substituting the pedigree-based NRM with a realized relationship matrix or H-matrix (Korecký et al. 2013; Klápště et al. 2014; Ratcliffe et al. 2017). The single-step method uses molecular markers to improve pairwise relationship estimates which will improve estimates of variance components; however, it is unclear if the transfer of information from genotyped to non-genotyped trees in the H-matrix will lead to more accurate breeding value predictions and affect prediction accuracy and the ability to rank trees within families.

Lodgepole pine (Pinus contorta Dougl. ex. Loud.) is a commercially and ecologically important species in British Columbia, Canada, and seed for reforestation is derived from wild collections and seed orchards which are based on two cycles of progeny testing. The goal of first-cycle progeny tests was to provide accurate estimates of the genetic value for base parents in order to select top candidates for seed orchards and future breeding. This was done by establishing first-cycle tests using open-pollinated seed collected from plus trees (base parents) in the wild. Parents were ranked based on the performance of progeny for growth and wood quality (Pilodyn pin penetration) and selected for crossing to produce F1s for second-cycle tests. The F1 generation was deployed across five breeding zones. Despite efforts to minimize relatedness when selecting trees for the base population, Ukrainetz and Mansfield (2020) found evidence of some relatedness between two sets of parents selected from the same provenance as well as other errors in the pedigree using single-nucleotide polymorphism (SNP) markers. The structure of the lodgepole pine breeding program makes it ideal for the use of single-step BLUP because both testing cycles are linked through pedigree, and there is substantial overlap in families tested across all breeding zones (Ukrainetz et al. 2018). Single-step BLUP can be used to complement the existing investments made in the breeding and testing program while integrating genotyping information for a portion of the population and correcting for pedigree errors and unknown relatedness.

The aim of this study was to compare the prediction accuracy of single-step BLUP (HBLUP) to equivalent models using a pedigree (ABLUP) and realized relationship matrix (GBLUP) using cross validation. Using cross validation, we assessed the sensitivity of genetic parameters and prediction accuracy of HBLUP to the proportion of trees that were genotyped. We also quantified the accuracy of predicted breeding values across environments (test sites) and testing cycles (environments and ages) and rank correlations within families.

Materials and methods

Sample population

The lodgepole pine breeding program in British Columbia, Canada, is divided into five high-priority breeding zones that have each been subjected to two cycles of testing: the first cycle of testing, planted between 1984 and 1988, focused on ranking base parents using data from open-pollinated progeny tests, while the second cycle, planted between 2002 and 2006, consisted of F1 progeny from crosses between base parents that ranked high for growth and wood density (based on Pilodyn pin penetration measured on trees between 10 and 13 years from planting). Progeny tests were established using a randomized complete block design with 8 to 10 blocks per site and families planted in row-plots within each block. Four sites were selected for the current study based on patterns of GxE and the structure of the breeding program. Specifically, two sites were selected from each test cycle that represented the breeding zones in northern British Columbia. Site characteristics and climate across northern zones are more homogenous than southern zones, resulting in lower levels of GxE. We focused this pilot study on the breeding populations and the testing program in northern British Columbia. For a more comprehensive description of the lodgepole pine breeding program in British Columbia, see Ukrainetz et al. (2018).

The four sites specifically selected for this study were Tachie Road (TACH) and Chowsunket Lake (CHOW) representing the first cycle of testing, and Chief Lake (CHIE) and Grizzly Lake (GRIZ) representing the second cycle of testing. The base parents selected for this study had been selected for breeding and testing in second-cycle tests and were common to the three northern breeding programs. The final sample contained a mix of half- and full-sibs represented by 57 base parents and 42 full-sib families with a status number of 92 calculated as 1/(2 × θ), where θ is the group co-ancestry (Lindgren et al. 1996). Freshly flushing needles were collected from the upper crowns of all sample trees, placed in cryovials, and transported in a vapor tank at − 80 °C to the University of British Columbia, Vancouver, British Columbia, Canada.

Phenotyping

A detailed description of the phenotyping for this project can be found in Ukrainetz and Mansfield (2020). Height growth was measured to the nearest centimeter at age 10 using a height pole, and a 5-mm increment core was extracted from the northern face of each tree in 2014 (2016 at GRIZ). The cores were then dried and later processed using a precision pneumatic saw to produce sections that were exactly 1.67 mm thick. The wood sections were then Soxhlet-extracted overnight using hot acetone and allowed to dry to achieve a moisture content of 7%. Each sample had growth rings from pith to bark, and the radial face was exposed for X-ray densitometry assessment (QTRS-01X, Quintek Measurement Systems Inc., USA). Mean core density (AWD), earlywood density (EWD), and latewood density (LWD) were measured as the mean for the entire core, and latewood proportion (LWP) was determined as the proportion of the latewood width to the width of the growth ring and averaged across the core. Microfibril angle (MFA) was measured on the most recent growth ring (year of core extraction) for each core sample using X-ray diffraction. The 002 diffraction arc was captured using a Bruker D8 Discovery X-ray diffraction unit fit with a general area detector diffraction system (GADDS). The intensity profile was integrated 360° along the 002 diffraction arc and is symmetrical around two peaks which are indicative of the size of the microfibril angle. The distance between the points of intersection between the baseline and tangent lines was determined for each peak and averaged for each sample, and this is referred to as 2 T which is proportional to the microfibril angle of the sample. The 2 T value for each sample was used in subsequent analyses as MFA.

Marker dataset

The methods used for DNA extraction and genotyping are described in Ukrainetz and Mansfield (2020), and a brief description of the marker dataset is presented here. Genotyping was conducted using a fixed-content SNP array (Affymetrix Axiom array) with 51,213 SNPs selected using a sequence capture dataset (Suren et al. 2016; Yeaman et al. 2016) and the 1.01 draft of the loblolly pine (Pinus taeda L.) draft genome for variant calling (Neale et al. 2014). The Axiom Analysis Suite software package (version 4.0.2.4) was used to screen samples and call genotypes. Samples with a dish QC value of less than 0.82 were removed from further analysis (7 samples). SNPs were screened using samples with a call rate > 97% (1198 samples). SNPs with a call rate > 95% and minor allele frequency > 0.05 were retained for further analysis. The final dataset contained 19,584 SNPs from 1569 trees (37 base parents and 1532 descendants) with a mean of 2.1% missing data. The mean heterozygosity of the sample population using the final SNP dataset was 0.30, and 59% of SNPs met the assumptions of Hardy-Weinberg equilibrium (α = 0.05). The SNP dataset was coded as − 1 and 1 for the two homozygotes, and 0 for the heterozygote. Given the low amount of missing data in the final SNP dataset, a simple method for imputing missing values was employed, where missing values were imputed using the rrBLUP package in R and the “mean” imputation option that uses the mean of each marker.

Statistical analysis

Linear mixed model

Mixed model analyses were conducted using the following linear model:

$$ y= X\tau +{Z}_b{u}_b+{Z}_a{u}_a+e $$
(1)

where y is a vector of phenotypic data, X is an incidence matrix for the fixed effects (site), τ is a vector of fixed effects, Zb is an incidence matrix for the random experimental design features (block nested within site or planting series), ub is a vector of random effects for the design features, Za is an incidence matrix for the random additive effects, ua is the vector of additive genetic effects or estimated breeding values (EBVs), and e is a vector of residual effects. The random effects in the model are assumed to be IID and normally distributed with a mean of 0 and the following variance structure:

$$ \left[\begin{array}{c}{u}_b\\ {}{u}_a\\ {}e\end{array}\right]\sim \left(N\left[\begin{array}{c}0\\ {}0\\ {}0\end{array}\right],\left[\begin{array}{ccc}{G}_b& 0& 0\\ {}0& {G}_a& 0\\ {}0& 0& R\end{array}\right]\right) $$
(2)

The variance of random effects for the experimental design features is

$$ \mathit{\operatorname{var}}\left({u}_b\right)={G}_b=\left[\begin{array}{ccc}{\sigma}_{b1}^2& \cdots & 0\\ {}\vdots & \ddots & \vdots \\ {}0& \cdots & {\sigma}_{bj}^2\end{array}\right]\bigotimes {I}_n $$
(3)

where j refers to block nested within site (for second-cycle tests) or planting series (for first-cycle sites), In is a diagonal incidence matrix for n trees, and ⨂ refers to the direct product of two matrices. A heterogenous (co)variance model was used to model additive variance across sites. The variance of EBVs is defined as

$$ \mathit{\operatorname{var}}\left({u}_a\right)={G}_a=\left[\begin{array}{cccc}{\sigma}_{a1}^2& {\sigma}_a& \cdots & {\sigma}_a\\ {}{\sigma}_a& {\sigma}_{a2}^2& & \\ {}\vdots & & \ddots & \\ {}{\sigma}_a& & & {\sigma}_{as}^2\end{array}\right]\bigotimes A $$
(4)

where \( {\sigma}_{as}^2 \) is the additive variance for site s, σa is the additive covariance among sites, and A is the numerator relationship matrix (Henderson 1975). This model produces a predicted BLUP for each tree at each site, and the mean BLUP across all four sites was used as the individual tree BLUP. The variance of residuals is

$$ \mathit{\operatorname{var}}(e)=R=\left[\begin{array}{cccc}{\sigma}_{e1}^2& 0& \cdots & 0\\ {}0& {\sigma}_{e2}^2& & \\ {}\vdots & & \ddots & \\ {}0& & & {\sigma}_{es}^2\end{array}\right]\bigotimes {I}_n $$
(5)

where In is a diagonal incidence matrix for n trees and \( {\sigma}_{es}^2 \) is the residual variance component for site s. For BLUP calculations, G = Gb ⨁ Ga, Z = Zb ⨁ Za, and ⨁ is the direct sum.

Calculating the G-matrix

The realized relationship matrix (G) was created using the first method proposed by VanRaden (2008). Let M be a matrix of genotypes (SNP markers) with the dimensions n × m, where n is the number of trees and m is the number of SNP markers, coded − 1, 0, and 1 for the homozygote, heterozygote, and opposing homozygote, respectively. Matrix P has the same dimensions as matrix M (n × m) and is a matrix of allele frequencies with the values in each column equal to 2(pk − 0.5) where pk is the allele frequency of the second allele at locus k. Then, subtracting P from M gives Z, a matrix with allele effects set to 0 and the realized relationship matrix is then calculated as

$$ G=\frac{Z{Z}^{\prime }}{2\sum {p}_k\left(1-{p}_k\right)} $$
(6)

The denominator ensures that G is scaled analogous to the NRM A.

Calculating the H-matrix

The H-matrix was calculated according the original method proposed by Legarra et al. (2009) as modified by Christensen et al. (2012). The A matrix was organized according to trees with and without SNP data:

$$ \mathbf{A}=\left[\begin{array}{cc}{\mathbf{A}}_{\mathbf{11}}& {\mathbf{A}}_{\mathbf{12}}\\ {}{\mathbf{A}}_{\mathbf{21}}& {\mathbf{A}}_{\mathbf{22}}\end{array}\right] $$
(7)

where, A11 and A22 are submatrices of A for non-genotyped and genotyped trees, respectively, and the submatrices A12 and A21 are the relationships between genotyped and non-genotyped trees. The H-matrix is then constructed to accommodate genomic relationships according to:

$$ \mathbf{H}=\left[\begin{array}{cc}{\mathbf{A}}_{11}+{\mathbf{A}}_{12}{\mathbf{A}}_{22}^{-1}\left({\mathbf{G}}_{\mathbf{w}}-{\mathbf{A}}_{22}\right){\mathbf{A}}_{22}^{-1}{\mathbf{A}}_{21}& {\mathbf{A}}_{12}{\mathbf{A}}_{22}^{-1}{\mathbf{G}}_{\mathbf{w}}\\ {}{\mathbf{G}}_{\mathbf{w}}{\mathbf{A}}_{22}^{-1}{\mathbf{A}}_{21}& {\mathbf{G}}_{\mathbf{w}}\end{array}\right] $$
(8)

Prior to constructing H, the G matrix was scaled so that G and A11 were compatible according to

$$ {G}_a=\beta G+\alpha $$
(9)

where β and α solve the following system of equations:

$$ {\displaystyle \begin{array}{c} Avg\left(\mathit{\operatorname{diag}}(G)\right)\beta +\alpha = Avg\left(\mathit{\operatorname{diag}}\left({A}_{11}\right)\right)\\ {} Avg(G)\beta +\alpha = Avg\left({A}_{11}\right)\end{array}} $$
(10)

Finally, Ga was weighted in order to avoid problems with inversion according to Aguilar et al. (2010):

$$ {\mathbf{G}}_{\mathbf{w}}=0.95\ast {\mathbf{G}}_{\mathbf{a}}+\left(1-0.95\right)\ast {A}_{22} $$
(11)

Throughout this paper, we refer to BLUP analysis using the traditional pedigree-based NRM (A) as ABLUP, the realized relationship matrix (G) as GBLUP, and the hybrid matrix (H) as HBLUP. Variance components were estimated using REML, and all analyses were conducted in ASReml-R (version 4.1.0.90).

Cross validation and evaluation

The effect of the proportion of genotyped trees in the H-matrix on genetic parameters was assessed by randomly sampling trees to be included in Gw. For each trait, a random sample (set) of 20%, 40%, 60%, and 80% of the trees were included in Gw and this was repeated 8 times for each sampling scheme and trait. Narrow-sense heritability was calculated as:

$$ {h}^2=\frac{\sigma_a^2}{\sigma_a^2+{\sigma}_e^2} $$
(12)

where \( {\sigma}_a^2 \) and \( {\sigma}_e^2 \) are the additive and residual variance components, respectively, averaged across the four sites. Furthermore, we assessed the effect of the proportion of trees in Gw on predictions of (G)EBVs using cross validation (CV1). CV1 was used to show the prediction accuracy in a closed breeding population with close relatedness between the training and validation populations. For each set (20%, 40%, 60%, and 80%), three replicates were run with a unique H-matrix for each, and for each set and replicate, three validation populations were created by randomly sampling 20% of the trees and removing phenotypic data. Trees in the validation set can be genotyped or not genotyped. Therefore, there were 36 models run for each trait (4 sets × 3 replicates × 3 validation populations). Prediction accuracy of ABLUP and GBLUP for CV1 was assessed by creating five validation populations by randomly sampling 20% of the trees and removing phenotypic data (5 models per trait).

An important application of GS is to predict breeding values for special populations of trees that have been grown under different biotic and abiotic conditions (environments). For CV2, the H-matrix was created assuming that the training population (3 sites) has been phenotyped but not genotyped, and that a special population exists that has been genotyped but not phenotyped (validation population). The validation and training populations are connected through pedigree, but not environment. This was compared to the ABLUP model (no genotyping) and GBLUP model (all trees were genotyped). HBLUP, ABLUP, and GBLUP were run so that each site was used once as the validation set. Furthermore, we show the change in prediction accuracy when a proportion of trees in the training population was genotyped. A random sample of 40% of the trees in the training populations was included in the Gw portion of the H-matrix (Eq. 8). Therefore, we report the prediction accuracy when the training and validation populations are connected by average relatedness estimates or a proportion with realized relatedness estimates based on SNP data. We refer to the model with 40% of the training population included in the Gw portion of the H-matrix as HBLUP40.

Finally, we assess the performance of HBLUP to predict breeding values across testing cycles (CV3) which indicates the ability to predict breeding values for mature traits measured in the first test cycle for young trees. For CV3, the H-matrix was composed of trees in one test cycle with phenotypic data but no SNP data (training population), and the validation population had SNP data but no phenotypic data. The HBLUP model was again compared to ABLUP (no genotypes) and GBLUP (all trees are genotyped). ABLUP, GBLUP, and HBLUP models were run so that each test cycle was used as the validation population once (2 models per trait). Similar to CV2, we randomly sampled 40% of the trees in the training population to be included in Gw (genotyped trees) to show the change in prediction accuracy (HBLUP40).

For both CV2 and CV3, the heterogenous variance model in Eq. 4 was replaced with a homogeneous variance model where \( {\sigma}_{a1}^2={\sigma}_{a2}^2={\sigma}_{as}^2 \) and Eq. 5 simplifies to \( R={\sigma}_e^2{I}_n \). We assume that GEBVs from GBLUP are the best approximation of the true breeding value and prediction accuracy is calculated as the Pearson correlation between the (G)EBVs in the validation population and GEBVs from GBLUP with full phenotypic data. ANOVA were conducted using the “aov” method in R and the “drop1” method to determine Type III sums of squares and F-tests. Within-family rank correlations were determined for CV3 as the Spearman correlation between the predicted breeding values of trees within half-sib (first-cycle tests) and full-sib (second-cycle tests) families, and the true breeding value for trees in the validation population.

Results

Heritability and Type B genetic correlations

Heritability estimates were highest for AWD (ranging from 0.47 to 0.57) and lowest for HT (ranging from 0.24 to 0.27) regardless of the model (Table 1). The estimates for all other traits fell between those for AWD and HT. Across the three models, ABLUP had the highest heritability estimates for AWD, EWD, and LWD; GBLUP had higher estimates for MFA and LWP; and the estimate for HT was highest using HBLUP. Moreover, as the proportion of genotyped trees in the H-matrix increased from 20 to 80%, the heritability estimates decreased for AWD, EWD, and LWD, while the trends were inconsistent for LWP, MFA, and HT (Fig. 1a).

Table 1 The heritability estimates for linear mixed model analyses using a pedigree-based relationship matrix (ABLUP), a hybrid matrix of pedigree and realized relationships (HBLUP), and a realized relationship matrix employing SNP data (GBLUP). Standard errors of the heritability estimate are presented in brackets and values for HBLUP are average across all sets (20%, 40%, 60%, and 80% of trees genotyped) and replicates (8 per set)
Fig. 1
figure 1

The heritability for each trait (a) and Type B genetic correlation (b) for varying levels of genotyped trees included in the H-matrix (20%, 40%, 60%, and 80%). ABLUP uses a relationship matrix with 0% genotyped trees, and GBLUP uses the realized relationship matrix with 100% of trees genotyped. Error bars are the mean standard error of the heritability estimates averaged across replicates and validation populations (a), the standard error of the Type B genetic correlations for ABLUP and GBLUP, or standard error of the mean for HBLUP (b)

All wood quality traits (AWD, EWD, LWD, LWP, and MFA) had high Type B genetic correlations (ranging between 0.77 and 1.0) while HT had the lowest ranging from 0.46 to 0.54 across the three models (Table 2). The estimates from the GBLUP models were generally lower than from other models, except for HT and EWD. For AWD, EWD, and LWP, adding a small number of genotyped trees to the H-matrix resulted in slightly higher Type B genetic correlations; however, they decreased as the extent of genotyping increased (Fig. 1b). The Type B correlations were very high and stable across all levels of genotyping effort for MFA. For EWD, the Type B correlations increased to a maximum at 80% genotyping effort, and the trends across levels of genotyping were inconsistent for HT with a maximum at 20% and 60% (Fig. 1b).

Table 2 Type B genetic correlations for linear mixed model analyses using a pedigree-based relationship matrix (ABLUP), a hybrid matrix of pedigree and realized relationships (HBLUP), and a realized relationship matrix employing SNP data (GBLUP). For the HBLUP models, Type B genetic correlations were average across all sets (20%, 40%, 60%, and 80% of trees genotyped) and replicates (8 per set) and standard errors of the mean are presented in brackets. For the ABLUP and GBLUP models, the standard error of the Type B genetic correlation estimates is presented in brackets

Prediction within a closed population

The CV1 scenario is an example of a sampling scheme with good pedigree connections and genotyping across the training and validation populations and across environments. The prediction accuracy was not significantly different among the three models (F = 1.35; p = 0.26). GBLUP had slightly higher accuracy (0.82) than ABLUP (0.76) and HBLUP (0.78). For HBLUP, there was no difference in the prediction accuracy when more genotyped trees were added to the H-matrix (F = 1.01; p = 0.39). Although not significant, there was a trend of increasing prediction accuracy for genotyped trees in the H-matrix from 20 to 80% genotyping effort (Table 3). For non-genotyped trees, the prediction accuracies for all HBLUP models were the same as ABLUP. The largest increases in accuracy occurred between 20 and 40% (0.77 to 0.79) and between 80 and 100% (0.80 to 0.82) genotyping effort. There was no effect on the prediction accuracy of non-genotyped trees.

Table 3 Prediction accuracy among wood and tree growth traits for varying levels of genotyping effort (CV1) and linear mixed models using different relationship matrices

Prediction across environments and test cycles

For CV2 (prediction across test sites), there was no significant variation among mean prediction accuracy for the four models (F = 0.14; p = 0.94), but there was significant variation among the traits (F = 11.3; p < 0.001) and between the training and validation populations (F = 66.0; p > 0.001). The prediction accuracy for the HBLUP model (no genotyping in the training population) ranged from 0.46 for HT at the CHOW and MFA at TACH to 0.85 for LWP at GRIZ (Table 4). The prediction accuracies from the HBLUP40 model were higher for all validation populations and traits except LWP at CHIE, LWD at GRIZ, and EWD at CHOW (Table 4). Prediction accuracy ranged from 0.49 for MFA at TACH to 0.89 for LWP at GRIZ.

Table 4 Prediction accuracy for HBLUP when phenotypic data is available for three sites (training population) and SNP data is available for the fourth site (validation population), or when 40% of the trees in the training population have both genotype and phenotype data (HBLUP40)

For CV3 (prediction across test cycles), there was significant variation in mean prediction accuracy between the two test cycles (F = 116.4; p = <0.001) and six traits (F = 30.9; p < 0.001). Prediction accuracy varied little among the models (0.59–0.62) and differences were not significant (F = 1.37; p = 0.32). The prediction accuracy was equal to or higher for HBLUP40 compared to HBLUP across all traits, and validation sets except MFA using trees in second-cycle tests as the validation set (Table 5). The prediction accuracy was higher when using older first-cycle tests to predict breeding values for the younger second-cycle tests with an H-matrix (HBLUP and HBLUP40) for all traits, except MFA (Table 5).

Table 5 Prediction accuracy for HBLUP when phenotypic data is available for one test cycle (training population) and SNP data is available for the second test cycle (validation population), or when 40% of the trees in the training population have both genotype and phenotype data (HBLUP40)

It is not possible to rank trees within families using the ABLUP models. The high prediction accuracies from ABLUP are driven by variation among families in the population. All trees within families are given the same mid-parent value when using ABLUP to predict breeding values for trees without phenotypic data. The rank correlations for trees within families (comparing the ability of each model to rank trees within open-pollinated and full-sib families) were much higher for GBLUP than HBLUP (0.38 compared to 0.08), while rank correlations for the HBLUP40 model were consistently between those for HBLUP and GBLUP. For HBLUP, the rank correlations were higher when trees in the second-cycle were used to predict breeding values for trees in the first cycle; however, this was the opposite for HBLUP40 and GBLUP across all traits, except MFA (Fig. 2).

Fig. 2
figure 2

The mean within-family Spearman rank correlation when predicting breeding values across test cycles (CV3) for average wood density (a), earlywood density (b), latewood density (c), latewood proportion (e), microfibril angle (d), and tree height at age 10 (f). The error bars are the standard error of the mean

Discussion

Prediction accuracy and genetic parameters

Marker data can be used to precisely estimate relatedness between individuals, correct for errors in the pedigree, and account for underlying population structure that cannot otherwise be detected. GBLUP uses a realized relationship matrix with pairwise estimates based on marker data and is the closest approximation to the true breeding value; however, the G matrix is dense (no 0 elements) requiring extensive computer resources and time-consuming analyses for large datasets and DNA marker data for all trees in the population. The single-step approach (HBLUP) provided a nice framework for integrating DNA marker data for portions of the test population into the NRM used in BLUP analyses. The models were a compromise between having no marker data (ABLUP) and full marker data (GBLUP) with similar genetic parameters (heritability and Type B genetic correlations) and prediction accuracy in all cross-validation scenarios. The within-family rank correlations for HBLUP using results from the CV3 scenario (prediction across test cycles) were very low indicating that the predicted values were almost useless for ranking trees. However, when 40% of the trees in the training population were genotyped and phenotyped, the mean rank correlations increased considerably (HBLUP = 0.08; HBLUP40 = 0.25). The high prediction accuracies for ABLUP and HBLUP models were driven by the variation across families in the validation populations of the different cross-validation scenarios, but little benefit was transferred from genotyped trees to non-genotyped trees through the pedigree that would help capture the Mendelian segregation within families. The clear advantage of using HBLUP is for improving estimates of genetic parameters which will provide more accurate breeding values and can be used to rank trees within families when related trees in the training population are also genotyped.

Estimates of additive genetic variance are prone to error due to underlying genetic structure in the population that cannot be detected without marker data. Wild open-pollinated seed collections from parent trees are assumed to be half-sibs (coefficient of relatedness of 0.25) but are likely more closely related resulting in inflated heritability estimates and breeding values for some traits. Determining relationships based on marker data can help correct these issues and consequently results in more precise estimates of genetic parameters and breeding values. Ratcliffe et al. (2017) used varying proportion of genotyped trees in the H-matrix and showed that HBLUP and GBLUP reduced the inflated genetic parameters of ABLUP which is consistent with the results from our study. However, Klápště et al. (2018) compared HBLUP results using an H-matrix weighted with the regular pedigree and one weighted with a corrected pedigree (Wang 2004; Korecký et al. 2013) and found that heritability estimates were similar, but prediction accuracy increased slightly when using the corrected pedigree.

Models that use DNA markers to predict breeding values (genomic selection models) work well within closed populations (CV1) with low effective population size and within environments with little genotype-environment interaction (Grattapaglia 2017). Using a heterogenous variance model and accounting for additive covariance among test environments, we showed that prediction accuracy was > 0.76 when using HBLUP. Ratcliffe et al. (2017) showed that the theoretical prediction accuracy for genotyped trees increased as more genotyped trees were added to the H-matrix, but not for non-genotyped trees. We found a small increase in prediction accuracy using cross validation within a closed breeding population and within the same environments (CV1) for genotyped trees (Table 3) but once again showed no effect on the accuracy associated with non-genotyped trees. Although using a realized relationship matrix is the best approach, in reality, only a small portion of the breeding population will have marker information that can be used to correct the pedigree, and HBLUP was a good compromise between ABLUP and GBLUP. Prediction accuracy increased little after adding more than 40% genotyped trees.

Prediction across environments and test cycles

Implementation of HBLUP in tree breeding programs will employ a training population from one set of environments to predict breeding values for trees grown in a different environment, such as a different field site or a nursery. Therefore, it is important to understand the effect of GxE on prediction accuracy. Growth traits are more sensitive to GxE than wood traits (Baltunis et al. 2010), and there is substantial GxE for height growth across the lodgepole pine breeding program in British Columbia (Ukrainetz et al. 2018) which is consistent with the results presented here. Wood traits were less affected by environmental variation and had higher Type B genetic correlations indicating less GxE. Height had the lowest Type B correlation coefficients, and tree ranks for growth are expected to be affected by the environment. The prediction accuracy of wood traits in different environments was high with few exceptions. Despite the low Type B correlations for height, the prediction accuracy estimates were above 0.65 for all validation populations with the exception of CHOW. These results lend confidence that with an appropriate (co)variance model to account for environmental variation, prediction accuracy for HBLUP can be high when applied to different environments.

Predicting breeding values using markers can improve efficiency in breeding programs by facilitating more accurate estimates of breeding values for young trees using phenotypic data collected on mature trees (Resende et al. 2012a; Beaulieu et al. 2014a). However, phenotypic data must be collected in the training population at an appropriate age for the target trait because low age-age correlations can affect the prediction accuracy (Thistlethwaite et al. 2017). Tree breeding programs deploy testing cycles as programs advance, but older tests are often available for long periods of time, making them ideal targets as training populations for traits that take many years to mature. We showed that prediction accuracy was high for all wood traits, except MFA. The wood traits from first-cycle tests included more growth rings and a more comprehensive assessment of wood density and latewood proportion that more closely resembles what would be expected at rotation age. Despite the different ages between the test cycles, prediction accuracies were high. MFA is highly affected by maturity of the wood (Mansfield et al. 2007) and changes as the tree matures (Mansfield et al. 2009), and given that data was collected at a single point representing the oldest growth ring for each core sample, the low prediction accuracies for MFA between testing cycles likely reflects the effect of low age-age genetic correlations for this trait. HT was collected at age 10 at all test sites, and the lower prediction accuracies were due to the effect of GxE. The Type B genetic correlation for MFA was high and approached unity indicating very little GxE for this trait across the four sites tested. With such low GxE, we would expect high prediction accuracy for CV2 (prediction across sites) yet the values were of the same magnitude as other traits with lower Type B correlations. The high estimates of Type B genetic correlations likely indicate a problem fitting a model to the MFA data, and the results should be interpreted with caution.

Single-step BLUP in tree breeding

The single-step approach for calculating breeding values is ideal for integrating new genotyping technologies that offer a variety of benefits with the extensive investment made in long-term field trials by tree breeding programs. The H-matrix provides equivalent breeding value estimates to ABLUP and GBLUP and corrects for inflated additive variance estimates due to underestimated average relationship parameters used in the numerator relationship matrix (Ratcliffe et al. 2017; Klápště et al. 2018).

The H-matrix uses genomic marker data, through the realized relationship matrix, to adjust the relationships throughout the entire pedigree, resulting in more accurate estimates of relatedness between individuals, better estimates of variance components, and more precise predictions of breeding values within families (Legarra et al. 2009). One of the benefits of incorporating genotyping data into mixed models is the ability to better estimate the Mendelian segregation within family which makes it possible to rank trees within families even for trees without phenotypic data, although this requires all trees be genotyped. Although there was low value in using HBLUP to rank trees within families when predicting between test cycles, there was an effect on heritability and Type B genetic correlations with even a small number of genotyped trees included in the H-matrix. In order to be effective at ranking trees within families, there must be a significant proportion of related trees (40% in this study) with both genotype and phenotype data in the H-matrix. Future research still needs to be done to find the optimum number of genotyped trees and sampling for building the H-matrix.

There was a small increase in the theoretical prediction accuracy of breeding values for genotyped trees in both height and wood density as genotyping effort increased for Picea glauca (Ratcliffe et al. 2017). However, there was no effect on the theoretical prediction accuracy of maternal or progeny without genotype data. Results from the current study are consistent with Ratcliffe et al. (2017) and show a small, but non-significant, improvement in prediction accuracy with increasing genotyping effort, but no improvement for non-genotyped trees (Table 3). The effect of increasing genotyping effort on heritability and Type B genetic correlations was unique to each trait. For AWD, EWD, and LWD, there was a decrease in heritability as more genotyped trees were added to the H-matrix and this trend was most likely a result of increased precision in pairwise relatedness estimates throughout the relationship matrix resulting in correction of inflated heritability estimates when using average coefficients of relatedness in the pedigree-based NRM (Fig. 1). The change in heritability with increasing genotyping effort was less clear for LWP, MFA, and HT which was likely due to the way variation was partitioned between additive, microsite (blocks within site), and residual effects. For most sites, additive variance estimates decreased as the proportion of genotyped trees increased in the H-matrix; however, the effect on heritability depends on the allocation of the excess variation to the other variance components (blocks and residuals). For some traits (LWP, MFA, and HT), the decrease in additive variance did not result in a corresponding decrease in heritability because the variance no longer attributed to additive effects was captured by block effects rather than being added to the residual variance which resulted in relatively stable heritability estimates across models (Table 1). Breeders planning to incorporate HBLUP into tree breeding programs should expect an equivalent model to ABLUP with slight improvements to the breeding value predictions for genotyped trees and better estimates of variance components. It is no surprise that HBLUP models are still sensitive to GxE which must be considered carefully when calculating breeding values. Tree breeders can also expect good predictions between testing cycles that are connected through pedigree but occur in different environments, which allows prediction of mature traits on young trees or seedlings.

Comparison of single-step BLUP and genomic selection

One advantage of the single-step method is that it considers all phenotypic and genotypic data simultaneously in a single analysis. Genomic selection methods require marker data for all individuals in the population in order to predict GEBVs and a multi-step approach to make use of data from non-genotyped individuals. Computational efficiency becomes an issue for programs with considerable phenotypic data and large marker datasets, but Misztal et al. (2009) provided an approach for the single-step method and large datasets, while Aguilar et al. (2010) showed that the single-step approach was superior to the multi-step approach when using an extremely large number of individuals, such as 10 M cattle records. The single-step method can be used to predict breeding values for trees without phenotypic data but should be viewed as an alternative to ABLUP rather than a genomic selection alternative. Genomic selection can be used to improve the efficiency of breeding programs by accurately predicting the breeding value of trees from marker data and eliminating costly and time-consuming testing (Beaulieu et al. 2014b; Resende et al. 2017). Breeding value predictions must be accurate in order to achieve gains with selection, and cross validation is a good approach to estimate accuracy using different scenarios. The prediction accuracies from our work for genotyped trees are consistent with prediction accuracies for genomic selection studies in other conifers for wood and growth traits using similar cross-validation methods (Zapata-Valenzuela et al. 2013; Beaulieu et al. 2014b; Durán et al. 2017).

Conclusions

The single-step method (HBLUP) for combining pedigree and genomic marker data into a single relationship matrix can improve the estimation of genetic parameters thereby improving breeding value estimates, but should not be used as a method for predicting the genetic value of trees in the absence of phenotypic data as has been proposed for other molecular-based models. The inflated heritabilities for some traits were reduced when marker data was incorporated into the NRM and approached those for GBLUP with full marker data. Prediction accuracy for cross-validation across test environments and test cycles of different ages were high and similar to those reported in the literature for genomic selection models; however, the ranking of trees within families was poor (< 0.25) unless a significant proportion of trees with (training population) and without (validation or selection population) phenotypic data is genotyped. Even a small amount of genotyping (20% of trees in the population) was enough to affect prediction accuracy, heritability, and Type B genetic correlations. HBLUP should be regarded as an effective method for combining pedigree and marker data allowing breeders to fully utilize the investments made in testing, phenotyping, and genotyping, ultimately resulting in more precise estimates of genetic parameters and breeding values.