Introduction

The selection methodology has strongly evolved in plants by incorporating molecular information and developing the concept of marker-assisted selection (Bernardo 2008) to optimize ideotype creation (Donald 1968). Molecular markers have been used to define quantitative trait loci (Robbins et al. 2002), but this inefficient approach (Jannink 2010) was supplanted by genomic selection (GS) (Meuwissen et al. 2001), particularly since the introduction of low-cost high-speed sequencing methods.

Genomic selection (GS) method was quickly implemented in animal breeding (Luan 2009; Hayes and Goddard 2010; Meuwissen and Goddard 2010) and more recently in plants (Heffner et al. 2009; Heffner et al. 2010). GS is very beneficial for perennial crops because the length of the selection cycle can be significantly shortened by the selection of individuals at the juvenile stage using genomic markers to predict the genetic value of adults, in forest trees for example in pines (Isik et al. 2016) or Eucalyptus (Grattapaglia and Resende 2010; Grattapaglia 2014; Bouvet et al. 2015) or oil palm (Cros et al. 2015). Simulation and experimental studies in the context of perennial plants showed that GS increased the genetic gain per unit of time (Grattapaglia and Resende 2010; Denis and Bouvet 2011, 2013).

Most GS studies carried out in plant only considered the single-trait approach and very few have addressed the potential of GS with a multi-trait approach. The accuracy of multi-trait genomic selection (MT-GS) using different methods has first been addressed by simulation (Calus and Veerkamp 2011). Jia and Jannink (2012) showed that the predictive accuracy of a trait that has low heritability can be increased by MT-GS when traits correlated with high heritability are available. Similar results were obtained for perennial crops like oil palm (Marchal et al. 2016) and other plants (Guo et al. 2014).

Additional studies are therefore needed in plants to explore the potential of MT-GS in different breeding contexts. Among the numerous perennial crops of interest in this regard, the genus Eucalyptus is particularly interesting. Widely planted in the tropical and Mediterranean zones, this genus has contributed significantly to the production of pulpwood and fuel wood, with more than 20 million hectares of plantations (Vigneron and Bouvet 1997), and the development of numerous breeding programs stresses the need to implement new selection methods to optimize it.

We implemented our study with Eucalyptus robusta, a species planted in Madagascar for fuel wood and covering about 150,000 ha of plantations managed by smallholders. This species is particularly important since 80% of the population uses its wood for domestic purposes. Two main types of traits are important to consider: volume and two chemical components of wood, namely lignin and cellulose (Higuchi 1997). The place of these traits in selection criteria is critical according to the selection goal. On one side, if the objective is to increase the fuel wood quality, more lignin is recommended, on the other side, if pulp production is the target more cellulose is needed.

Moreover, this species was introduced 100 years ago in Madagascar (Verhaegen et al. 2011); the first results concerning genetic diversity estimated by molecular markers suggested that the seed origin was from a single Australian zone and that the level of relatedness was high within plantations (Daniel Verhaegen, personal communication). One of the best strategies for future plantations would therefore be to put in place a new improvement program using a broader genetic base and innovative breeding methods. The selection method is crucial for establishing the base population of parents that will favorably recombine their genes during subsequent generations. As observed for other Eucalyptus species (Hung et al. 2015), multi-trait selection is one option that can lead to Eucalyptus plantations with improved productivity and wood quality.

The main objective of this study was to evaluate the performance of multi-trait genomic selection (MT-GS) in the context of genetic improvement of Eucalyptus robusta by (i) analyzing the variability of growth and chemical properties of wood within a provenance trial covering the distribution area of E. robusta in Australia, (ii) identifying the best linear mixed model incorporating genomic data for genetic value prediction, and (iii) comparing single- and multi-trait-based selection.

Materials and methods

Population studied

We conducted the present study with a Eucalyptus robusta provenance trial located in the central eastern part of Madagascar at the Mahela Forest Station (19 ° 01 ′16.4′ ′S; 48 ° 57′ 31.7 ″E). It was established in 1993 and included 19 Australian provenances sampled in the entire area of natural distribution of the species (see the identity and geographical information of their origin in Table 1). The experimental design consisted of 18 unbalanced blocks of 15 to 19 plots, each plot representing a single provenance. The plots were randomly distributed into the blocks. Originally, the total number of plots was 324 and each plot was planted with 16 (4 × 4) trees from the same provenance (5184 trees in total). After successive thinnings, 1314 trees remained within the trial, representing an approximate selection rate of 25% in each provenance. Among those 1314 individuals, 415 were randomly retained for measurement, representing on average 22 trees per provenance. The trees were genotyped by single nucleotide polymorphism (SNP) markers (see below) and 395 were measured by near infrared spectroscopy to estimate their wood chemical components.

Table 1 Identity and geographical origin on the E. robusta provenances established in the Mahela trial in Madagascar

Variables studied

We considered three critical traits for the production of fuel wood or pulpwood: stem volume at 49 months (V49), the total lignin (TL) and the holo-cellulose content (Holo). The volume was calculated as a function of the height (H) and the stem circumference at breast height (C) of trees with the formula: V = (0.3 × (H-130) + 130 × 4 × π × C) / 1000. In this case, the volume is expressed in dm3. The age considered was relatively young, 49 months, to minimize the important environmental effect on growth traits due to wind damage after 60 months. Chemical properties, including TL and Holo, were analyzed at the age of 20 years by near infrared spectroscopy on the 395 trees. For chemical predictions, we used existing near infrared spectroscopy models of multispecies Eucalyptus that included samples from this study (Denis et al. 2013).

Molecular markers

We used SNP markers. This information was then integrated into different models to predict genetic value.

DNA was extracted from the leaves of 415 trees that were genotyped using the DArTseq technology (Wenzl et al. 2004). Of the 30,000 SNPs identified, 2919 were selected on the basis of repeatability, a minor allele frequency higher than 2.5%, and no missing data.

Statistical models associating phenotypes and genotypes

A first approach consisted in detecting the models that best fitted the data. Except for the general mean, all effects were considered random (block, provenance, individual within provenance, and residual). The plot effect was not explicitly included in the model as plots were only represented by two individuals on average.

The different models were tested according to a hierarchical approach. The models were compared according to the Akaike information criterion (Akaike 1974). Three models have shown Akaike information criteria that are very close and stand out from the other models tested (see the comparison of models in supplementary material no. 1a). These are the mixed linear model with the random “block” and “provenance” effects (P model), the model with the random “block” and “individual” effects (I model) and the model with the random “block”, “provenance” and “individual” (PI model). Given the small differences between these three models (see the comparison of models in supplementary material no. 1a), we decided to test only the P and I models in the multi-trait selection. It was important to consider the P model in the following analyses as a baseline method, because in Madagascar, provenance selection and the reintroduction of the best provenances remain a very efficient option to get genetic gain.

We then tested three methods of predicting the individual genetic value associating the “block” and the “SNP marker” effects of each individual. These are: elastic net (EN), genomic best linear unbiased predictor (GBLUP) and reproducing kernel Hilbert space (RKHS) (Jacquin et al. 2016). The results showed that these models are very similar in terms of the quality of the prediction (see supplementary material no. 1b). We finally used the GBLUP method, because it performs best in many GS studies (Cros et al. 2015). Moreover, its programming for a multivariate approach is facilitated by ASReml software (Gilmour et al. 2009). Following these preliminary analyses, the two linear mixed models are presented as follows.

The provenance model:

$$ {\mathbf{y}}_{\mathbf{i}}={\upmu}_{\mathbf{i}}+{\mathbf{X}}_{\mathbf{i}}\ {\upbeta}_{\mathbf{i}}+{\mathbf{Q}}_{\mathbf{i}}\ {\uplambda}_{\mathbf{i}}+{\upvarepsilon}_{\mathbf{i}}\ \left(P\ \mathrm{model}\right) $$

The individual model:

$$ {\mathbf{y}}_{\mathbf{i}}={\boldsymbol{\upmu}}_{\mathbf{i}}+{\mathbf{X}}_{\mathbf{i}}\ {\boldsymbol{\upbeta}}_{\mathbf{i}}+{\mathbf{Z}}_{\mathbf{i}}\ {\mathbf{u}}_{\mathbf{i}}+{\boldsymbol{\upvarepsilon}}_{\mathbf{i}}\ \left(\mathrm{I}\ \mathrm{model}\right) $$

where yi (415 × 1) is the vector of phenotypic values for trait, μi is a vector corresponding to the general mean of the trait i, βi (18 × 1)~N(0, Id σ2b) is the random block effect, λi (19 × 1)~N(0, Idσ2prov), is the vector of random provenance effect, ui (415 × 1)~N(0, σ2aA) is the vector of random individual additive effect where, A is the relationship matrix among individuals (estimation is given in the next paragraph), and εi (415 × 1) ~N(0, Id σ2r) is the random residual effect. X i (415 × 18),Q i (415 × 19), and Z i (415 × 415) are the incidence matrices for “block”, “provenance”, and “individual” effects, respectively and Id is the identity matrix.

The formulation of the single and multi-trait models is given in matrix form as an example for the I model:

  • Individual model with single-trait selection (I-ST) for trait 1:

$$ {\boldsymbol{y}}_1={\upmu}_{\mathbf{1}}+{\boldsymbol{X}}_{\mathbf{1}}{\boldsymbol{\beta}}_{\mathbf{1}}+{\boldsymbol{Z}}_{\mathbf{1}}{\boldsymbol{u}}_{\mathbf{1}}+{\boldsymbol{\upvarepsilon}}_{\mathbf{1}} $$
  • Individual model with bivariate selection (I-MT2) for traits 1 and 2:

$$ \left[\begin{array}{c}{\boldsymbol{y}}_1\\ {}{\boldsymbol{y}}_2\end{array}\right]=\left[\begin{array}{c}{\boldsymbol{\mu}}_1\\ {}{\boldsymbol{\mu}}_2\end{array}\right]+\left[\begin{array}{cc}{\boldsymbol{X}}_1& 0\\ {}0& {\boldsymbol{X}}_2\end{array}\right]\left[\begin{array}{c}{\boldsymbol{\beta}}_1\\ {}{\boldsymbol{\beta}}_2\end{array}\right]+\left[\begin{array}{cc}{\boldsymbol{Z}}_1& 0\\ {}0& {\boldsymbol{Z}}_2\end{array}\right]\left[\begin{array}{c}{\boldsymbol{u}}_1\\ {}{\boldsymbol{u}}_2\end{array}\right]+\left[\begin{array}{c}{\boldsymbol{\upvarepsilon}}_1\\ {}{\boldsymbol{\upvarepsilon}}_2\end{array}\right] $$

Random block effect: \( \left[\begin{array}{c}{\boldsymbol{\beta}}_1\\ {}{\boldsymbol{\beta}}_2\end{array}\right]\sim \mathrm{N}\ \left(\left[\begin{array}{c}0\\ {}0\end{array}\right],\left[\begin{array}{cc}{\boldsymbol{\upsigma}}_{b\left(1,1\right)}^2& {\boldsymbol{\upsigma}}_{b\left(1,2\right)}\\ {}{\boldsymbol{\upsigma}}_{b\left(2,1\right)}& {\boldsymbol{\upsigma}}_{b\left(2,2\right)}^2\end{array}\right]\otimes {\boldsymbol{I}}_{\boldsymbol{d}.}\right) \)

Random individual (additive) effect: \( \left[\begin{array}{c}{u}_1\\ {}{u}_2\end{array}\right]\sim \mathrm{N}\ \left(\left[\begin{array}{c}0\\ {}0\end{array}\right],\left[\begin{array}{cc}{\boldsymbol{\upsigma}}_{a\left(1,1\right)}^2& {\boldsymbol{\upsigma}}_{a\left(1,2\right)}\\ {}{\boldsymbol{\upsigma}}_{a\left(2,1\right)}& {\boldsymbol{\upsigma}}_{a\left(2,2\right)}^2\end{array}\right]\otimes \mathbf{A}\right) \)

  • Combined individual model with trivariate selection (I-MT3) for traits 1, 2 and 3:

$$ \left[\begin{array}{c}{\boldsymbol{y}}_1\\ {}{\boldsymbol{y}}_2\\ {}{\boldsymbol{y}}_3\end{array}\right]=\left[\begin{array}{c}{\boldsymbol{\mu}}_1\\ {}{\boldsymbol{\mu}}_2\\ {}{\boldsymbol{\mu}}_3\end{array}\right]+\left[\begin{array}{ccc}{\boldsymbol{X}}_1& 0& 0\\ {}0& {\boldsymbol{X}}_2& 0\\ {}0& 0& {\boldsymbol{X}}_3\end{array}\right]\left[\begin{array}{c}{\boldsymbol{\beta}}_1\\ {}{\boldsymbol{\beta}}_2\\ {}{\boldsymbol{\beta}}_3\end{array}\right]+\left[\begin{array}{ccc}{\boldsymbol{Z}}_1& 0& 0\\ {}0& {\boldsymbol{Z}}_2& 0\\ {}0& 0& {\boldsymbol{Z}}_3\end{array}\right]\left[\begin{array}{c}{\boldsymbol{u}}_1\\ {}{\boldsymbol{u}}_2\\ {}{\boldsymbol{u}}_3\end{array}\right]+\left[\begin{array}{c}{\boldsymbol{\upvarepsilon}}_1\\ {}{\boldsymbol{\upvarepsilon}}_2\\ {}{\boldsymbol{\upvarepsilon}}_3\end{array}\right] $$

random block effect \( \left[\begin{array}{c}{\beta}_1\\ {}{\beta}_2\\ {}{\beta}_3\end{array}\right]\sim \mathrm{N}\ \left(\left[\begin{array}{c}0\\ {}0\\ {}0\end{array}\right],\left[\begin{array}{ccc}{\upsigma}_{b\left(1,1\right)}^2& {\upsigma}_{b\left(1,2\right)}& {\upsigma}_{b\left(1,3\right)}\\ {}{\upsigma}_{b\left(2,1\right)}& {\upsigma}_{b\left(2,2\right)}^2& {\upsigma}_{b\left(2,3\right)}\\ {}{\upsigma}_{b\left(3,1\right)}& {\upsigma}_{b\left(3,2\right)}& {\upsigma}_{b\left(3,3\right)}^2\end{array}\right]\otimes {\boldsymbol{I}}_{\boldsymbol{d}}\right) \)random individual (additive) Effect \( \left[\begin{array}{c}{u}_1\\ {}{u}_2\\ {}{u}_3\end{array}\right]\sim \mathrm{N}\ \left(\left[\begin{array}{c}0\\ {}0\\ {}0\end{array}\right],\left[\begin{array}{ccc}{\upsigma}_{a\left(1,1\right)}^2& {\upsigma}_{a\left(1,2\right)}& {\upsigma}_{a\left(1,3\right)}\\ {}{\upsigma}_{a\left(2,1\right)}& {\upsigma}_{a\left(2,2\right)}^2& {\upsigma}_{a\left(2,3\right)}\\ {}{\upsigma}_{a\left(3,1\right)}& {\upsigma}_{a\left(3,2\right)}& {\upsigma}_{a\left(3,3\right)}^2\end{array}\right]\otimes \mathbf{A}\right) \)

For the provenance model, the single, bivariate and trivariate approaches were abbreviated P-ST, P-MT2, and P-MT3, respectively, and the abbreviations I-ST, I-MT2, and I-MT3 for the individual model.

Relationship matrix A

The relationship matrix used in the I model was defined on the basis of the relationship coefficients between the n = 415 individuals. The relationship coefficient, using the SNP markers, was defined as the probability that two alleles taken at random are identical by state and was estimated using Van Raden’s formula (Van Raden 2007, 2008). The relationship matrix A [n, n] was calculated as follows:

$$ \mathbf{A}=\frac{\left(\mathbf{M}-\mathbf{P}\right){\left(\mathbf{M}-\mathbf{P}\right)}^{\prime }}{2\sum \limits_1^{\mathrm{s}}{\mathrm{p}}_{\mathrm{i}}\left(1-{\mathrm{p}}_{\mathrm{i}}\right)} $$

where M [n, s] is the matrix corresponding to the genotype for all loci of all individuals. Each column of M is coded 0, 1, and 2 for genotypes AA, Aa, and aa, respectively, representing the number of minor alleles; and “s” is the total number of SNPs (2919 SNPs). The matrix P [n, s] contains the frequencies (pi) of the second allele at each locus such that the column i of P is 2pi.

Estimation of variance components, heritabilities, and correlations

The variance components were estimated with ASReml-R (Gilmour et al. 2009). The solutions (BLUP) were obtained by solving the mixed linear model equations (Mrode and Thompson 2005). For the two models, the heritabilities at the provenance level \( {h}_{\mathrm{prov}}^2 \) and individual level \( {\mathrm{h}}_{\mathrm{i}}^2 \) were estimated from the variance components.

P model

$$ {h}_{\mathrm{prov}}^2=\frac{\sigma_{\mathrm{prov}}^2}{\sigma_{\mathrm{b}}^2+{\sigma}_{\mathrm{prov}}^2+\frac{\sigma_{\mathrm{r}}^2}{N\mathrm{prov}}} $$

I model

$$ {h}_{\mathrm{i}}^2=\frac{\sigma_{\mathrm{a}}^2}{\sigma_{\mathrm{b}}^2+{\sigma}_{\mathrm{a}}^2+{\sigma}_{\mathrm{r}}^2} $$

where Nprov is the harmonic mean of the number of individuals per provenance.

The correlations related to genetic effects between two traits (1 and 2) were calculated from variance/covariance estimates with the following formulas:

  • the additive genetic correlation ρa = (cova(1,2)) / (σa1) (σa2),

  • the provenance correlation ρrov = (covprov(1,2)) / (σprov1) (σprov2),

  • the residual correlation ρr = (covr(1,2)) / (σr1) (σr2),

  • the phenotypic correlation ρp = (covp(1,2)) / (σp1) (σp2),

where cov(1,2) is the covariance between traits 1 and 2 and σ1 and σ2 are the standard deviations of traits 1 and 2, respectively.

The phenotypic variance σ2p and covariance covp were calculated as follows:

$$ {\upsigma^2}_{\mathrm{p}}={\upsigma^2}_{\mathrm{b}}+{\upsigma^2}_{\mathrm{p}\mathrm{rov}}+{\upsigma^2}_{\mathrm{r}}\ \mathrm{and}\ {\operatorname{cov}}_{\mathrm{p}\left(1,2\right)}={\operatorname{cov}}_{\mathrm{b}\left(1,2\right)}+{\operatorname{cov}}_{\mathrm{p}\mathrm{rov}\left(1,2\right)}+{\operatorname{cov}}_{\mathrm{r}\left(1,2\right)}\ \mathrm{for}\ \mathrm{the}\ P\ \mathrm{model} $$
$$ {\upsigma^2}_{\mathrm{p}}={\upsigma^2}_{\mathrm{b}}+{\upsigma^2}_{\mathrm{a}}+{\upsigma^2}_{\mathrm{r}}\ \mathrm{and}\ {\operatorname{cov}}_{\mathrm{p}\left(1,2\right)}={\operatorname{cov}}_{\mathrm{b}\left(1,2\right)}+{\operatorname{cov}}_{\mathrm{a}\left(1,2\right)}+{\operatorname{cov}}_{\mathrm{r}\left(1,2\right)}\ \mathrm{for}\ \mathrm{the}\ I\ \mathrm{model} $$

The CVp, CVa, and CVr are respectively the phenotypic, additive, and the residual coefficients of variation and are defined by the ratio standard deviation to the mean over the population.

Goodness-of-fit of the models

The models were compared by evaluating the goodness-of-fit, using the full data set, i.e., 415 trees. The goodness-of-fit was assessed by the correlation between estimated genetic values and phenotypes.

With the P model, we estimated the genetic value at provenance level \( {\hat{\mathrm{G}}}_{\mathbf{P}} \) (which is the same for all individuals from the same provenance) by the sum of the general mean \( {\hat{\upmu}}_{\mathrm{i}} \) and the BLUPs of the provenance effect Q i\( {\overset{\wedge }{\lambda}}_{\mathrm{i}} \).

$$ {\overset{\wedge }{\mathrm{G}}}_{\mathrm{P}}={\overset{\wedge }{\upmu}}_{\mathrm{i}}+{\mathrm{Q}}_{\mathrm{i}}{\overset{\wedge }{\uplambda}}_i $$

The goodness-of-fit was calculated using the Spearman correlation between \( {\overset{\wedge }{\mathrm{G}}}_{\mathrm{P}} \) and the observed phenotype Y: rP_Full = cor (\( {\overset{\wedge }{\mathrm{G}}}_{\mathrm{P}} \),Y).

With the I model, we estimated the individual genetic value ĜI by the sum of the general mean \( {\overset{\wedge }{\upmu}}_{\mathrm{i}} \) and the BLUPs of the individual genetic effect (additive) Zi\( {\overset{\wedge }{u}}_{\mathrm{i}} \).

$$ {\overset{\wedge }{\mathrm{G}}}_{\mathrm{I}}={\overset{\wedge }{\upmu}}_{\mathrm{i}}+{Z}_{\mathrm{i}}{\overset{\wedge }{U}}_i $$

The goodness-of-fit was calculated using the Spearman correlation between ĜI and the observed phenotype Y: rI_Full = cor (\( {\overset{\wedge }{\mathrm{G}}}_{\mathrm{I}} \),Y).

These correlation coefficients were used to compare the single-trait (ST), bivariate, (MT2), and trivariate (MT3) selection according to the two models (P and I) leading to the six situations: (P-ST, P-MT2, P-MT3), (I-ST, I-MT2, I-MT3).

Evaluation of model prediction accuracy

The comparison of the models was also addressed by evaluating their ability to predict the genotype with unobserved phenotypic data. The prediction accuracy of the three models associating the multi-trait approach was carried out by cross-validation. We made up the calibration population by randomly selecting two-thirds of the total population and the validation population was composed of the remaining one-third. The calibration population was used to estimate the variance components that were then integrated into the prediction equations of the mixed linear model (Mrode and Thompson 2005) to estimate the genetic values of the population of validation. Previously, the accuracy of the prediction was measured by the Spearman correlation between the predicted genetic and phenotypic values of the validation population for the two models: rP = cor (\( {\overset{\wedge }{\mathrm{G}}}_{\mathrm{P}} \), Y) and rI = Cor (\( {\overset{\wedge }{\mathrm{G}}}_{\mathrm{I}} \), Y). We used 100 cross-validation procedures to estimate the variability of the prediction accuracy.

Estimation of genetic gains using selection index

The genetic gain was calculated using a selection index defined as the linear combination of the standardized genetic values of each trait estimated using the multi-trait approach with the two models, and coefficients (weights) depending on the economic importance given to the different traits (Gallais and Poly 1990). The objective was to provide a selection method combining the volume production and the chemical wood properties. We therefore considered the indexes based on bivariate analyses (V49, TL) and (V49, Holo), which correspond respectively to two breeding objectives: fuelwood and pulpwood production. To test different scenarios, we chose three pairs of weights (0.75, 0.25), (0.50, 0.50), and (0.25, 0.75), the first weight being associated with V49 and the second with TL or Holo. For each pair of traits, the three indexes followed the formulas below:

Index combining V49 and TL

$$ {\mathrm{Index}}_{75V49,25 TL}=0.75\frac{{\hat{G}}_{V49}}{\upsigma_{{\hat{\Big(G}}_{V49}\Big)}}+0.25\frac{{\hat{G}}_{TL}}{\upsigma_{{\hat{\Big(G}}_{TL}\Big)}} $$
$$ {\mathrm{Index}}_{50V49,50 TL}=0.50\frac{{\hat{G}}_{V49}}{\upsigma_{{\hat{\Big(G}}_{V49}\Big)}}+0.50\frac{{\hat{G}}_{TL}}{\upsigma_{{\hat{\Big(G}}_{TL}\Big)}} $$
$$ {\mathrm{Index}}_{25V49,75 TL}=0.25\frac{{\hat{G}}_{V49}}{\upsigma_{{\hat{\Big(G}}_{V49}\Big)}}+0.75\frac{{\hat{G}}_{TL}}{\upsigma_{{\hat{\Big(G}}_{TL}\Big)}} $$

Index combining V49 and Holo

$$ {\mathrm{Index}}_{75V49,25 Holo}=0.75\frac{{\hat{G}}_{V49}}{\upsigma_{{\hat{\Big(G}}_{V49}\Big)}}+0.25\frac{{\hat{G}}_{Holo}}{\upsigma_{{\hat{\Big(G}}_{Holo}\Big)}} $$
$$ {\mathrm{Index}}_{50V49,50 Holo}=0.50\frac{{\hat{G}}_{V49}}{\upsigma_{{\hat{\Big(G}}_{V49}\Big)}}+0.50\frac{{\hat{G}}_{Holo}}{\upsigma_{{\hat{\Big(G}}_{Holo}\Big)}} $$
$$ {\mathrm{Index}}_{25V49,75 Holo}=0.25\frac{{\hat{G}}_{V49}}{\upsigma_{{\hat{\Big(G}}_{V49}\Big)}}+0.75\frac{{\hat{G}}_{Holo}}{\upsigma_{{\hat{\Big(G}}_{Holo}\Big)}} $$

where \( {\sigma}_{{\hat{\Big(G}}_{V49}\Big)},{\sigma}_{{\hat{\Big(G}}_{V49}\Big)} \) And \( {\sigma}_{{\hat{\Big(G}}_{Holo}\Big)} \) are the standard deviation of the genetic value of V49, TL, and Holo.

The genetic gain ΔG induced by the index selection for the trait T (V49, TL or Holo) was calculated using the formula:

$$ \varDelta G=100\ast \frac{\overline{T}s-\overline{T} tot}{\overline{T} tot} $$

where \( \overline{T}s \) is the mean of the individuals selected for the trait T based on the index ranking, and \( \overline{T} tot \) is the population mean.

The genetic gains obtained through the previous index based on the multi-trait approach were compared with genetic gains based on the single-trait approach. We considered three selection intensities that can be used in the context of breeding program in Madagascar, 5, 10 and 15%.

Results

Phenotypic performance of the provenances

Considering the total population, we observed that the mean volume was 104 dm3 and that trees contained on average 61% Holo and 37% TL (Table 2). Volume had a high variability with a coefficient of variation of 35%, while TL and Holo showed very low variability, with coefficients of variation less than 6%, which is generally observed with this type of variable in Eucalyptus (Hein et al. 2012; Makouanzi et al. 2017). Although measured trees resulted from successive thinning, reducing the population from 5184 to 1314, of which 415 were analyzed, the total variability within the trial remained high in line with variability observed in similar populations (Denis et al. 2013).

Table 2 Mean phenotypic performance of the 19 provenances in volume at 49 months (V49), total lignin (TL), and holo-cellulose (Holo) content

The results presented for each provenance showed a clear difference in V49 among provenances, varying from 81.59 to 124.98 dm3. As expected, this variation was less marked for TL, which varied from 35.56 to 37.29%, and Holo, which varied from 59.46 to 62.33%.

Variance components and heritabilities

For the P model (Table 3), whatever the trait, the algorithm converged easily leading to the variance component estimation. The results showed the higher variance of the residual effect followed by the variance of the block and that of the provenance effects. This result emphasized the preponderance of individual trees coupled to micro-environmental variability within the experimental design. This distribution of variance among the different effects remained stable using the single- and multi-trait approaches, indicating that covariance between traits did not contribute that much to trait variation. The mean heritability in the single- and multi-trait approaches was higher for V49, h2prov = 0.35 than for wood chemistry traits; h2prov = 0.25 for Holo and h2prov = 0.23 for TL.

Table 3 Variance components, heritability, and goodness-of-fit using the GBLUP method with the single-trait provenance model (P-ST), bivariate provenance model (P-MT2), and trivariate provenance model (P-MT3)

With the I model (Table 4), the residual variance remained preponderant, followed by the block and the additive variance. Compared with the P model (Table 3), we observed a decrease in the residual variance resulting from a partial redistribution into the additive variance. We noted, with the I model, in the case of trivariate analysis that the algorithm failed to converge correctly, and therefore to give solutions. We tried to improve the convergence using the factor analysis model with ASReml, but similarly the algorithm failed to converge with trivariate model. We suspect that the important rate of missing values for wood traits explains this non-convergence. We observed a decrease in heritabilities compared with the P model, which can be explained by a higher residual variance. As for the P model, TL and Holo presented lower heritability than V49, h2i = 0.05 for TL and Holo, with h2i = 0.16 for V49, whatever the multi-trait approach (Table 4).

Table 4 Variance components, heritability, and goodness-of-fit using the GBLUP method with the single-trait individual model (I-ST), bivariate individual model (I-MT2), and trivariate individual model (I-MT3)

Correlation between traits

The phenotypic and genetic correlations between the three traits were estimated for each model (Table 5). With the P model, the correlations between V49/TL and V49/Holo for block, provenance, and residual effects remained moderate to low, with high standard deviations. For example, for the provenance effect, ρprov = 0.41 with a standard deviation (SD) of 0.37, underlining the imprecision of the estimate. The correlations between TL/Holo were high for provenance ρprov = −0.92 and residual ρr = −0.91 with a low standard deviation of SD = 0.11 and SD = 0.01, respectively, underlining a strong opposition in the expression of these two traits.

Table 5 Phenotypic correlations (ρP), environmental block effect correlations (ρBloc), provenance correlation (ρprov), additive correlations (ρa), and residual correlation (ρr) according to the three linear mixed models

With the I model, we also found for V49/TL and V49/Holo moderate correlations for block, individual, and residual effects, with high standard deviations. Estimates were on average close to zero.

Goodness-of-fit of the models

With the P model, the goodness-of-fit was higher for V49 (rP_Full = 0.37) than for TL (rP_Full ranging from 0.23 to 0.25) and Holo (rP_Full ranging from 0.27 to 0.29) (Table 3). The bi- and tri-variate approaches did not improve and even seemed to worsen accuracy, but variations did not exceed 0.02.

With the I model, the goodness-of-fit was higher than for the P model with rI_Full = 0.79 for V49, varying from rI_Full = 0.54 to 0.66 for TL and from rI_Full = 0.70 to 0.74 for Holo (Table 4). With this model, the bi- and tri-variate approaches did not improve the goodness-of-fit, the correlations sometimes being lower than with the single-trait approach.

Prediction accuracy of the models

The effect of models and multi-trait options on prediction accuracy, using the cross-validation procedure, is illustrated in Table 6. Only one model with individual genetic estimation was considered. For V49, the prediction accuracy was, on average, rI = 0.30 for the I model (Table 6), while goodness-of-fit was rI_Full = 0.79 (Table 5). For TL, mean accuracy was rI = 0.05 (Table 7), while with the whole dataset goodness-of-fit was rI_Full = 0.60 (Table 4). Finally, for Holo, mean accuracy was rI = 0.09 (Table 6), while with the whole dataset goodness-of-fit was rI_Full = 0.72 (Table 4).

Table 6 Prediction accuracya (rI) for all trait using single-trait (ST) and bivariate (MT2) of the P and I models
Table 7 Number of provenances represented, which corresponds to the genetic diversity, in each selected set with a selection intensity “i” varying from 5 to 15% according to the different models and selection indices

Evolution of genetic gains by index selection

The evolution of genetic gains for each trait is presented according to the three pairs of economic coefficients (0.75/0.25, 0.50/0.50, and 0.25/0.75) and the two models (Figs. 1 and 2). The gain obtained by index selection was compared to the gain obtained by single trait selection.

Fig. 1
figure 1

Trends in genetic gain for volume at 49 months (V49) according to the I and P models with single-trait (I-ST and P-ST) and bivariate (I-MT2 and P-MT2) models with varying index selection coefficients: (0.75, 0.25), (0.50, 0.50), and (0.25, 0.75)

Fig. 2
figure 2

Trends in genetic gain for (a) total lignin (TL) and (b) holo-cellulose (Holo) according to the I and P models with single-trait (I-ST and P-ST) and bivariate (I-MT2 and P-MT2) models with varying index selection coefficients: (0.75, 0.25), (0.50, 0.50), and (0.25, 0.75)

For V49, the trends in genetic gain were presented for two index types: V49 associated with TL and V49 associated with Holo. As expected, the gain decreased when the selection intensity decreased, i.e., when the percentage of trees selected varied from 5 to 15%. This decrease is generally moderate, around 5% (Fig. 1a and b). We also noted a decrease in the V49 genetic gain when the economic coefficient changed from 0.75 to 0.25, whatever the model, which was an expected result too. The I model led to a greater genetic gain varying from 26 to 37% for the best situation. The higher gains for V49 were obtained using a coefficient equal to 0.75 for both (V49, TL) and (V49, Holo) indexes. Whatever the model, the genetic gain using the ST approach was the best or similar to the index with coefficient pair equal to 0.75/0.25 (Fig. 1).

For TL and Holo, the major difference with V49 was the gain in magnitude, which did not exceed 2% under the most favorable conditions (Fig. 2a and b). Except that, the same trends were observed for wood chemical traits, i.e., a decrease of genetic gain with percentage of trees selected (from 5 to 15%); however, these differences are smaller than 1% between the two extreme indices. As expected, the most important gain for TL and Holo was obtained with economic coefficient equal to 0.75.

Provenance representativeness in the selected population

We studied the evolution of diversity within the selected population by counting the number of provenances represented in the selected population. The comparisons were done according to the different selection strategies combining model-, single-, or multi-trait and selection intensity. Table 7 shows that, with the P model, one to three provenances were represented in the selected population with a selection intensity varying from 5 to 15%, whereas with the I model, the number of provenances varied between 6 and 16 depending on the indexes. The single-trait approach was close to the bivariate approach, as the number of provenances represented in the selected population was not very different and no clear trend was observed. Moreover, the representativeness of provenances was more or less the same using of the different indexes.

Discussion

Heritability and correlations between traits

Although the provenance trial underwent successive thinnings, the genetic diversity was not markedly affected. The phenotypic coefficients of variation for V49 and wood traits were very close to those observed in other experiments on Eucalyptus (Tripiana et al. 2007; Denis et al. 2013; Razafimahatratra et al. 2016).

In our study, whatever the model, we found a lower heritability for TL and Holo than for V49 (Tables 3 and 4). Generally, the wood chemical properties appeared to be more heritable than growth traits. This is the case for lignin in Eucalyptus (Poke et al. 2006; Stackpole et al. 2011; Hein et al. 2012; Mandrou et al. 2012; Makouanzi et al. 2017) and for the syringyl-to-guaiacyl ratio (Hamilton et al. 2009; Poke et al. 2006; Hamilton et al. 2009; Stackpole et al. 2011; Hein et al. 2012; Makouanzi et al. 2017). In our experiment, this unexpected result can be explained by a preponderant residual variance for the wood properties. For example, in the I model, the additive coefficient of variation is three to four times smaller than the residual coefficient of variation for TL (CVa = 1.2% and CVr = 5.0%) and Holo (CVa = 1.4% and CVr = 4.0%), whereas for V49 this ratio is 1.5 (CVa = 16.2% and CVr = 24.6%). The preponderance of residual variance for wood chemical traits is probably related to the strong environmental impact that trees have undergone after 50 months. Strong winds affected the plantation; some of the trees surrounding the sampled trees were broken and produced shoot stumps, thus exacerbating the micro-environmental effects.

On average, whatever the models, the correlations between biomass and wood chemical properties showed a weak but positive estimate between V49 and Holo and a weak but negative estimate between V49 and TL. Although clear conclusions cannot be drawn given the large confidence intervals (Table 5), these results were consistent with other experiments with Eucalyptus, which tended to show a negative correlation between biomass and lignin and a positive correlation between growth and cellulose (Novaes et al. 2010; Denis et al. 2013; Makouanzi et al. 2017). Note that these results have also been observed in other species, such as poplars (Novaes et al. 2010).

The performance of multi-trait genomic selection

One of the assumptions of our study was that the selection accuracy and consequently the genetic gain could be improved by the multi-trait approach. In the context of Eucalyptus breeding, we combined growth and wood chemical traits, both of which are of economic interest. With models that take into account only one or two traits, the algorithm converged easily (ST and MT2 approach). But convergence was difficult to achieve with three traits (MT3 approach). Convergence of the ASReml algorithm depends very much on the size and structure of the variance/covariance matrix to be estimated. This difficulty increases when the data are scarce, which is probably the case with our data set.

There was neither better goodness-of-fit with all the data (Tables 3 and 4) nor better prediction accuracy in the framework of GS (Table 6) by using the bivariate approach compared with a single-trait approach. Using simulations, Jia and Jannink (2012) showed that multi-trait GS increased accuracy, while low-heritability traits benefitted from correlated high-heritability traits when the genetic correlation was higher than 0.5. In our experiment, TL and Holo presented a lower heritability than V49, but the low correlations between V49 and both TL and Holo may explain the absence of impact on the accuracy of wood traits.

Consequence for genetic improvement of Eucalyptus robusta

Our study has shown that efficient individual selection can be achieved in a provenance trial using dense genotyping. We have adopted GBLUP, which is well adapted for implementing multi-trait selection rather than other prediction methods. Estimation of the A relationship matrix between individuals, by means of a consistent set of SNPs, makes it possible to estimate individual genetic value and to increase significantly the genetic gains compared with selection based on the provenance means.

Our analyses have produced various pieces of information to orient the constitution of the E. robusta breeding population. They have shown that biomass production, estimated through the volume of the tree, presented a higher variability than TL and Holo, leading to a higher genetic gain for growth than for wood chemical properties.

The association of volume with wood chemical property traits did not improve the selection accuracy compared with single-trait GS. This could be due to the weak correlations between growth and these wood chemical components. This result corroborates those found by Makouanzi et al. (2017) in E. urophylla×E. grandis hybrid, which suggested that selection for growth will not result in a significant reduction in wood quality traits.

Different selection strategies can impact genetic gain. For example, with equivalent coefficients for volume and wood chemical traits (0.50/0.50), index selection will still lead to a substantial gain in volume (around 30 and 25% associated with TL and Holo with selection intensity of 10% and I model), without affecting the gain for holo-cellulose and total lignin. With a higher coefficient for wood chemical traits (0.25/0.75), the gain in volume is reduced (around 20 and 15% associated with TL and Holo with selection intensity of 10% and I model), but the gain for wood traits changed only by 0.5%. This analysis showed that a strong emphasis on wood chemical traits is needed to achieve genetic gain in these traits during a breeding cycle.

Individual selection increased the genetic diversity in terms of the number of provenances represented in the selected population compared with selection based on provenance means. This result is encouraging if we want to build the breeding population while maximizing diversity. However, this conclusion should not obscure the fact that different provenances may have asynchronous flowering phenology, as has been observed in other species introduced in Madagascar, which may reduce panmixia within seed orchards and increase inbreeding (Chaix et al. 2003). It is therefore very important to incorporate flowering phenology as a selection trait for building the breeding population. On the basis of this latter observation, a two-stage selection may also be envisaged, first by selecting the best provenances on the basis of their growth and flowering phenology and then by selecting the individuals within these provenances using the indexes associating volume and wood properties.