Introduction

Evaluation of breeding programs based on predictions of genetic and phenotypic parameters obtained by studies in quantitative genetics is important both for conservation and to assure efficient selection, especially for perennial plants (Pires and Resende 2011; Bernardo 2010). Studies on many species have been carried out assuming that they have either a predominantly autogamous reproductive system, whose outcrossing rate (\(t\)) ranges from 0.00 to 0.05, or an allogamous one, with \(0.95 \le t < 1.00\) (Fuchs et al. 2015). However, rates of \(t\) in eucalypt species (Eucalyptus and Corymbia) range widely, for example Eldridge et al. (1993) document rates between 0.45 and 0.96. The assumption that the eucalypt reproductive system is predominantly allogamous is therefore inappropriate.

Species with a mixed-mating system are problematic for geneticists and plant breeders, as modelling relationships within breeding populations and making selections is complicated (Namkoong 1966; Vencovsky et al. 2001). Considering species as either completely autogamous or allogamous when they in fact have a mixed-mating system will lead to errors in prediction of breeding values, and consequently, in the selection of superior genotypes.

To avoid overestimation of genetic parameters such as heritability, genetic variances (i.e. \(\sigma_{A}^{2}\) = additive variance, \(\sigma_{D}^{2}\) = dominance variance) and genetic gains in breeding programs, the genetic variance of progeny (\(\sigma_{p}^{2}\)) or covariance of any two relatives (\(\varsigma_{XY}\)) based on a test of open-pollinated families should be estimated (Cockerham and Weir 1984) as:

$$\hat{\sigma }_{p}^{2} = 2\theta_{XY} \sigma_{A}^{2} + 2(\Delta_{{\ddot{X} + \ddot{Y}}} - \delta_{{\ddot{X}\ddot{Y}}} )\sigma_{D}^{2} + 2(\gamma_{{\ddot{X}Y}} + \gamma_{{X\ddot{Y}}} )D_{1} + \delta_{{\ddot{X}\ddot{Y}}} D_{2}^{*} \quad + (\Delta_{{\ddot{X} \cdot \ddot{Y}}} - F_{X} F_{Y} )H^{*} + (\tilde{\Delta }_{XY} - F_{X} F_{Y} )(H^{2} - H^{*} )$$
(1)

where \(F_{X}\), \(F_{Y}\), \(\theta_{XY}\), \(\Delta_{{\ddot{X} + \ddot{Y}}}\), \(\Delta_{{\ddot{X} - \ddot{Y}}}\), \(\delta_{{\ddot{X}\ddot{Y}}}\), \(\gamma_{{\ddot{X}Y}}\), \(\gamma_{{X\ddot{Y}}}\) measures the identity for a locus for individual X with alleles \(x_{1}\) and \(x_{2}\), and individual Y with alleles \(y_{1}\) and \(y_{2}\) (see Cockerham and Weir 1984), \(D_{1}\) is the covariance between the additive effects and homozygous dominance effects, \(D_{2}^{*}\) is the total variance of homozygous dominance effects, \(\tilde{\Delta }_{XY}\) is the probability for two individuals X and Y that two genes at the ith locus of one and the same two genes at the jth locus of the other are identical by descent, and \(H^{{}}\)\(H^{*}\) are the correlation of inbreeding at different loci within individuals.

The assumption that \(\sigma_{p}^{2} = 1/4\sigma_{A}^{2}\), which is used in the case of true half-sib seedling families (\(t = 1\)) is a misleading over-simplification. The traditional approach to dealing with the mixed-mating system in open-pollinated progeny tests of forest trees has been to modify estimates of additive genetic variance by assuming a coefficient of relationship (\(\rho\)) of less than \(1/4\) (e.g. Squillace 1974). The coefficient of relationship is a measure of the proportion of genes that two individuals have inherited directly from a common ancestor (Wright 1922). This is to account for an assumed proportion of selfed and other inbred individuals. The coefficient of relationship between two individuals is given by:

$$\rho_{XY} = \frac{{2\theta_{XY} }}{{\sqrt {(1 + F_{X} )(1 + F_{Y} )} }}$$
(2)

where \(F_{X}\) and \(F_{Y}\) are their inbreeding coefficients and \(\theta_{XY}\) is their coefficient of co-ancestry.

The coefficient of relationship can then be used to appropriately scale the estimate of additive variance (\(\sigma_{\text{A}}^{2}\)) in the calculation of narrow sense heritability \(\hat{h}_{a}^{2}\) thus:

$${\hat{h}}_{a}^{2} = \frac{(\text{1}/\rho ){\hat{\sigma }_{\text{A}}^{2}}} {\hat{\sigma }_{P}^{2} }$$
(3)

where \(\sigma_{P}^{2}\) is the estimated phenotypic variance component. Eldridge et al. (1993), reiterated by Bush et al. (2011), recommended a value of \(\rho\) = 1/2.5 as appropriate for eucalypts, though values between 1/4 and 1/1.85 have been used in various eucalypt studies (Hodge et al. 1996). However, a recent trend has seen several eucalypt studies published that simply assume that \(\rho\) = 1/4 (e.g. Apiolaza et al. 2011; Cane-Retamales et al. 2011; Denis et al. 2013; Henson et al. 2008; Mora et al. 2009; Pelletier et al. 2008; Raymond et al. 2008; Rojas et al. 2017; Vargas-Reeve et al. 2013), noting that some of these studies have acknowledged that their heritability estimates may be upwardly biased as a result. A possible justification for setting \(\rho\) = 1/4, even when a mixed-mating system is likely or known to be in operation, is that actual the degree of inbreeding is unknown.

In the case of the (\(\varsigma_{XY}\) or \(\sigma_{p}^{2}\)) expression proposed by Cockerham and Weir (1984) the level of kinship (\(F_{X} ,F_{Y} ,\theta_{XY} ,\Delta_{{\ddot{X} + \ddot{Y}}} ,\delta_{{\ddot{X}\ddot{Y}}} ,\gamma_{{\ddot{X}Y}} ,\gamma_{{X\ddot{Y}}} ,\Delta_{{\ddot{X}\ddot{Y}}} ,\tilde{\Delta }_{XY}\) and \(\delta_{{_{{\ddot{X}\ddot{Y}}} }}\)) is increased by the mixture of different genetic constitutions of individuals. However, the dominance components, in particular, cannot be predicted in open-pollinated progeny tests. Thus, to simplify the expression we have: \(\hat{\sigma }_{p}^{2} = 2\frac{1}{4}(1 + \hat{F})\hat{\sigma }_{A}^{2} + \hat{V} = \frac{{(1 + \hat{F})\hat{\sigma }_{A}^{2} }}{2} + \hat{V}\), we would have an approximate estimate of \(\sigma_{A}^{2}\) where \(\hat{V}\) is a bias [showed in Eq. (1)], that is not easily estimated in tree species. Thus, the effect of selection (genetic gain) is complex in this type of population.

The use of molecular markers to provide information on forest-tree breeding systems is now becoming more prevalent, and the opportunity to provide a better-informed estimate of the degree of selfing for a breeding population is now possible. In the case of Corymbia species, a number of marker-based determinations of outcrossing have been made. Selfing rates for Corymbia citriodora subsp. citriodora are estimated as \({\it\text{s}} = 0.15\) based on an isozyme markers (Yeh et al. 1983), and for C. citriodora subsp. variegata as s = 0.10 using microsatellite markers (Bacles et al. 2009).

Commercial eucalypt plantations are important around the world, however only a few species and their hybrids are widely planted, especially species that belong to the genus Eucalyptus L’Hér. Within Eucalyptus the subgenus Symphyomyrtus Schauer in J.G.C. Lehmann contributes the most widely-planted species (Harwood 2011; Potts and Dungey 2004). However, Corymbia K.D. Hill & L.A.S. Johnson species, particularly C. citriodora (Hook.) K.D. Hill & L.A.S. Johnson and hybrids with C. torelliana (F. Muell.) K.D. Hill & L.A.S. Johnson, have shown promise in tropical and subtropical regions (Lee 2007; Lee et al. 2009).

The Corymbia genus includes the spotted gum complex of species that is distributed naturally on the east coast of Australia. The complex has a distribution from tropical north Queensland (C. citriodora subsp. citriodora) through subtropical latitudes in southern Queensland and northern NSW (C. citriodora subsp. variegata and C. henryi) to temperate Victoria in south-east Australia (C. maculata). In Brazil, Corymbia taxa are of interest for planting in regions that may become unsuitable for Eucalyptus species in future due to abiotic and biotic stress driven by climate change. In addition, pest and disease incidences are also expected to increase, due to the expansion of large areas of genetically- and silviculturally-homogeneous forest plantations (Wingfield et al. 2008). The Corymbia spp. have shown differences in resistance and tolerance to heat, water stress and pest and disease resistance relative to Eucalyptus species (Brawner et al. 2011; Dianese et al. 1986). These differences may be used to advantage when Corymbia species and hybrids are planted as components of a more-diverse mix of plantation species.

The area planted to spotted gums has recently expanded in many countries of the world, especially sub-tropical and tropical plantation regions, due to their edaphic and climatic adaptation, potential for fast growth and useful wood properties (e.g. Lee 2007; Gardner et al. 2007; Morais et al. 2010; Brawner et al. 2012; Lin et al. 2017). Pure-species breeding programs of C. citriodora are not being progressed in Brazil, though interest in Corymbia hybridization for environmental-stress-resilient commercial plantations (Lee et al. 2009) with improved wood quality and resistance to pests and diseases (Segura and Silva Jr. 2016) is now emerging. Brazil is also one of the largest producers of C. citriodora essential oil in the world (Reis et al. 2013), and this may provide further impetus for renewed interest in genetic improvement.

This study aimed to answer the follow questions regarding two Corymbia spp.: (1) is there enough genetic variability in silvicultural traits in open-pollinated families, evaluated at ages 18 and 36 months, to provide a sufficiently broad base populations?; (2) how does the marker-based estimate of inbreeding (s) influence the estimated genetic parameters at these ages?

Materials and methods

Experimental location and design

In August 2013, two seedling-based Corymbia spp. progeny trials were established using open-pollinated, seed-orchard seedlots. For C. citriodora subsp. citriodora (CCC), family seedlots were sourced from plus trees located at Ouricangas, Bahia State, (ex Zimbabwe), while C. citriodora subsp. variegata (CCV) originated from Anhembi, São Paulo State (based on wild families from Woondum and Wondai—CSIRO seedlots 14426 and 14434). The commercial controls (checks) used in the trials were seedlots from Anhembi (ex Lat 15°00′S to 17°00′S), Barra Bonita (ex. Rio Claro) and Bauru (ex Rio Claro). The trials were established in the “Estação Experimental de Ciências Florestais” owned by Escola Superior de Agricultura “Luiz de Queiroz”, Universidade de São Paulo (ESALQ/USP), located in Itatinga Municipality, São Paulo State, (23°03′S, 48°22′W) at 840 m altitude and with climate classified as Cwa under the Koeppen Climate Classification (CEPAGRI 2018). The experimental designs were randomized complete block with 48 and 32 families of CCC and CCV, respectively, with three-tree linear plots and 8 or 10 replicates per trial (Table 1).

Table 1 Number of families, replicates and total trees evaluated per species in the experimental trials at Ouricangas, Bahia State Brazil

Measurement and mating system assumptions

In July 2016, 36 months after planting, diameter at breast height (DBH) and height (H) of the individual trees were evaluated. The estimates of genetic parameters were obtained using different selfing rates (\(s\)), given by \(s = 1 - t\), first considering \(s = 0\), then considering \({\it\text{s}} = 0.15\) for CCC based on an isozyme marker estimate (Yeh et al. 1983) and \(s = 0.10\) for CCV based on a microsatellite marker estimate (Bacles et al. 2009).

Linear model and genetic parameter estimation

For this analysis, we used a linear mixed model of the form:

$$y{\mathbf{ = X}}r{\mathbf{ + Z}}a{\mathbf{ + W}{b + e}}$$
(4)

where y is the phenotypic data vector for a trait; X is the incidence matrix of fixed effects; r is the fixed effects vector (general mean and complete-block replicate effects); Z is the incidence matrix of genetic random effects; a is the vector of family genetic random effects; W is the incidence matrix of plot effects; b is the vector of plot effects (random) and e is the vector of residuals (random).

We fitted a univariate normal plot function that uses both the numerical Shapiro–Wilks test and graphical methods to test for normality of residuals, using R statistical language (R Core Team 2018).

We used the Restricted Maximum Likelihood procedure (REML) to estimate the variance components from Eq. (1): genetic variance among families (\(\hat{\sigma }_{p}^{2}\)), environmental variance among plots (\(\sigma_{plot}^{2}\)) and residual variance (\(\hat{\sigma }_{e}^{2}\)). Phenotypic individual variance (\(\hat{\sigma }_{P}^{2}\)) is given as

$$\hat{\sigma }_{P}^{2} = \hat{\sigma }_{p}^{2} + \hat{\sigma }_{plot}^{2} + \hat{\sigma }_{e}^{2}$$
(5)

We estimated narrow-sense heritability (\(\hat{h}_{a}^{2}\)), the determination coefficient of plot effects (\(\hat{c}_{plot}^{2}\)) and selection accuracy (\(Ac_{prog}^{{}}\)). Narrow-sense heritability was estimated using Eq. (3) with \(\rho\) = 1/4 corresponding to true half-sib relationships among open-pollinated family members.

We estimated the coefficient of experimental variation (\(\hat{C}V_{\exp } (\% )\)) and the genetic variation coefficient (\(\hat{C}V_{g} \%\)), using the expressions \(\hat{C}V_{\exp } (\% ) = \left( {{{\sqrt {\hat{\sigma }_{e}^{2} } } \mathord{\left/ {\vphantom {{\sqrt {\hat{\sigma }_{e}^{2} } } m}} \right. \kern-0pt} m}} \right) \times 100\) and \(\hat{C}V_{g} (\% ) = \left( {{{\sqrt {\hat{\sigma }_{p}^{2} } } \mathord{\left/ {\vphantom {{\sqrt {\hat{\sigma }_{p}^{2} } } m}} \right. \kern-0pt} m}} \right) \times 100\), where m is the general mean of the original population. For all these estimates we considered the species as perfectly allogamous (\(s = 0\)).

To examine the effects of mixed mating on heritability estimates, \(h_{a}^{2}\) was again estimated with values of \(\rho\) updated to reflect the published marker-based selfing rates for CCC and CCV. From Resende et al. (1995), the coefficient used to correct the value of variance among families considering the mixed-mating system was

$$\frac{{(1 + \hat{s})^{2} }}{{2(2 - \hat{s})}}$$
(6)

that is derived following Wright and Cockerham (1986). Thus, for a selfing rate of s = 0.1 applicable to CCV this corresponds to \(\rho\) = 1/3.3; and for a selfing rate of s = 0.15, applicable to CCC, this corresponds to \(\rho\) = 1/3.0. It also corresponds to Squillace (1974) correction for heritability, that is given in this situation by \(\hat{h}_{a}^{2} = \frac{{\hat{\sigma }_{p}^{2} }}{{r_{OO} \times \hat{\sigma }_{P}^{2} }}\) when parents are uncorrelated (artificial stands), and considering the proportion of effective natural selfing and correlation among the families (\(r_{OO}\)). In our case, for s = 0.10 and s = 0.15, the correlations among families were \(r_{OO}\) = 0.30 and \(r_{OO}\) = 0.33, respectively and correspond to the \(\rho\) values previously stated.

All computations were implemented with model 110 of the SELEGEN-REML/BLUP Software (Resende 2007). This model has in-built functionality that allows the user to apply various estimates of s and automatically applies Eq. (6).

The significance of the family variance component (\(\hat{\sigma }_{p}^{2}\)) was tested by comparison with a separate model, where the term had been omitted, using the likelihood ratio test (see for example Resende 2007; Gilmour et al. 2009) implemented using SELEGEN-REML/BLUP Software Model 5 (Resende 2007).

Results and discussion

Genetic parameter estimates

The residuals were normally distributed, and we found significant family effects (\(\hat{\sigma }_{p}^{2}\)) for all traits (Table 2), indicating genetic variability among the families studied, making selection and breeding for growth a viable option (Namkoong 1966; Zimback et al. 2011). Other authors have also observed significant variability for growth and silvicultural traits in C. citriodora at ages between 13 months and 25 years (Morais et al. 2010; Brawner et al. 2011).

Table 2 Deviance analysis and likelihood ratio tests for families (LRTProg), and control seedlots (LRTGM) for each trait at ages 18 and 36 months

The high \(\hat{C}V_{\exp } (\% )\) values that we observed may be due to within-trial environmental variance not accounted for by the experimental design (Pimentel-Gomes and Garcia 2002), which is reflected by the large proportion of \(\hat{\sigma }_{e}^{2}\) to \(\hat{\sigma }_{P}^{2}\) (Table 3), greater than 76% for all traits. The \(\hat{C}V_{\exp } (\% )\) values were classified as high (ranging from 24.2 to 30.2%) for DBH and as high (ranging from 16.8 to 21.2%) to very-high (> 21.2%) (Mora and Arriagada 2016). However, \(\hat{C}V_{\exp } (\% )\) with high values are not unusual in progeny testing of eucalypt species (Moraes et al. 2007; Berti et al. 2011; Costa et al. 2015).

Table 3 Estimates of variance components and variation coefficients for Corymbia spp. traits at 18 and 36 months old, assuming open-pollinated families comprise entirely of half-sib, within-family relationships

In contrast to the \(\hat{C}V_{\exp } (\% )\) classification, the estimated determination coefficient of plot effects (\(\hat{c}_{plot}^{2}\)) was considered low (Table 3), as gauged by the relation \({{c_{plot}^{2} } \mathord{\left/ {\vphantom {{c_{plot}^{2} } {h_{a}^{2} }}} \right. \kern-0pt} {h_{a}^{2} }} \le {1 \mathord{\left/ {\vphantom {1 3}} \right. \kern-0pt} 3}\), which indicates relatively little environmental variation among plots, within blocks, and low environmental correlation among observations within plots (Resende and Sturion 2003). These results mean that we had high experimental precision and low environmental variability within plots, contributing to good accuracy of genetic parameters estimates (Resende 2002; Pagliarini et al. 2016).

The \(\hat{C}V_{g} (\% )\) estimates were much higher than what has been reported in the literature for open-pollinated families of other eucalypt species including Eucalyptus tereticornis Sm. at age 25 years (Macedo et al. 2013), Corymbia maculata (Hook.) K.D. Hill & L.A.S. Johnson at age 4 years (Sato et al. 2010) and E. camaldulensis at age 19 years (Moraes et al. 2007), which ranged between 2.8 and 5.39% for DBH and H, whereas we observed values between 6.79 and 11.92%. We observed high values of \(Ac_{prog}^{{}}\) for DBH and H at both ages (\(0.70 \le Ac_{prog}^{{}} < 0.90\)) (Resende and Duarte 2007). These results allowed us to infer high precision in genetic variation of observed phenotypic traits. Indeed, \(Ac_{prog}^{{}}\) refers to the correlation between predicted genetic values and the true genetic values for each tree (Resende 2002; Moraes et al. 2007; Costa et al. 2015).

The \(\hat{h}_{a}^{2}\) estimates, related to the component of inheritance that is effectively transmitted to the next generation (Falconer and Mackay 1996; Bernardo 2010), were classified as moderate (\(0.15 < \hat{h}_{a}^{2} \le 0.50\)) to high (\(0.50 < \hat{h}_{a}^{2}\)) (Resende 2002) for all the traits evaluated (Table 3) with estimates for CCV higher than those for CCC. Hung et al. (2016) observed estimates of \(\hat{h}_{a}^{2}\) for CCV at age 3 years of 0.34 ± 0.03 and 0.42 ± 0.04 for DBH and H, respectively, which was similar to what we found in CCC and lower than what we observed for the same species. Though Hung et al. (2016) postulated that their results may be biased due to the use of one-third rather than 1/2.5 for the coefficient of relationship among families and the heavy thinning in some trials, their assumption is similar to our assumption of 1/3.3. Moreover, is probable that our own estimates of additive variance are inflated, due to (1) confounding of genotype-by-environment interaction (GxE) variance with additive variance, due to the single-site trials that do not allow estimation of GxE and (2) lack of pedigree information including provenance of ancestral origin information for the families. Although these trials are based on later-generation selections, persistent provenance effects (that include non-additive genetic variation) have been observed in other, later-generation eucalypt trials (Swain et al. 2015).

Genetic gains

The importance of considering the mixed reproductive system to obtain genetic estimates and predicting genetic gains with the selection of the best individuals is evidenced by the results that we observe in the present study.

When we use \(\hat{t} = 0.85\) and \(\hat{t} = 0.90\) for CCC and CCV, respectively, the estimates of \(\hat{h}_{a}^{2}\) at both ages were reduced (Table 4) relative to the estimates from the half-sib model. Open-pollinated families of plants with mixed-mating system will comprise at least three types of relative (selfs, half-sibs and full-sibs) (Namkoong 1966), and more if near-relative inbreeding and ancestral inbreeding are considered. This can result in overestimates of heritabilities and, consequently, inflation of predicted genetic gain if we consider that all within-family relationships are half-sibs, as has been done by some forest tree breeders. Inbreeding depression and dominance effects are also likely to be complicating factors associated with mixed mating.

Table 4 Estimates of narrow-sense heritability (\(\hat{h}_{a}^{2}\)) considering different outcrossing rates (\(t\)) for Corymbia spp. traits at ages 18 and 36 months

The inattention of breeders and researchers to the importance of the reproductive system in estimates of genetic parameters and gain by selection is easily observed with only a few studies (e.g. Zanata et al. 2010; Berti et al. 2011; Macedo et al. 2013; Bush et al. 2011) dealing with the effect of a mixed breeding system on the prediction of genetic gain. In the present research, the overestimates of \(\hat{h}_{a}^{2}\) range from 20.93 to 21.01% and 32.22 to 32.25% for CCV and CCC, respectively (Table 4). These values are lower than those estimated (\(\hat{h}_{a}^{2}\) = 46.5%) for DBH and H in Eucalyptus pellita (F. Muell.) families at age 23 years (Zanata et al. 2010). These authors used a mean \(\hat{t} = 0.557\) obtained from three populations of E. pellita (House and Bell 1996). Similarly, Berti et al. (2011) obtained overestimates of \(\hat{h}_{a}^{2}\) between 40.38 and 64.31% for 24-year-old Eucalyptus cloeziana (F. Muell.), using a mean \(\hat{t} = 0.754\) obtained from 18 species of Eucalyptus.

The predicted genetic gain by selection, in percentage, is given by \(Gs\% = \left\{ {{{\left[ {(\bar{y}_{s} - \bar{y}_{p} ) \times \hat{h}_{a}^{2} } \right]} \mathord{\left/ {\vphantom {{\left[ {(\bar{y}_{s} - \bar{y}_{p} ) \times \hat{h}_{a}^{2} } \right]} {\bar{y}_{p} }}} \right. \kern-0pt} {\bar{y}_{p} }}} \right\} \times 100\), where \(\bar{y}_{s}\) is the selected families average and \(\bar{y}_{p}\) is the families general average. Then, the lower estimates of \(\hat{h}_{a}^{2}\) when we consider the mixed reproductive system leads to lower values of genetic gain (\(Gs\%\)), and these underestimates will be greater than the observed to \(\hat{h}_{a}^{2}\), since the mean of the selected population tends to be reduced too. Goodwillie et al. (2005) presented updated evidence suggesting that mixed reproductive systems are more frequent than previously thought, showing the importance of studying the evolutionary and biometric scope for these species.

Though we have used molecular marker-based estimates of outcrossing to better estimate the coefficient of relationship in this study, there are a number of complicating factors in connection with the eucalypt mixed-mating system that should also be considered. Firstly, the estimation of outcrossing rates from seedling populations may lead to an overestimation of s in the adult population of trees. Selection against inbred individuals is normally very high in eucalypts, such that in some species, very few inbred individuals survive to reproductive maturity (e.g. Costa e Silva et al. 2010; Griffin and Cotterill 1988; Hardner and Potts 1997), though this does not appear to be the case for all eucalypt populations (Bush and Thumma 2013). A further, often-overlooked complication is the likelihood of full-sib progeny within families (i.e. individuals that share a common pollen parent). The assumption of panmixia in eucalypt seed orchards is unrealistic, especially in the quite-common case that a few, unrelated individuals flower synchronously with each other but not with the majority of other genotypes in the stand. As full-sibs are more-closely related than are half-sibs, but are not inbred, they are likely to survive to reproductive maturity. Their presence is a reason to choose a coefficient of relationship reflecting slightly greater average relatedness than would be chosen on the basis of correcting for inbreeding alone. In addition to these considerations, the issue of non-homogenous rates of inbreeding and relatedness among families and provenances can be significant (Bush et al. 2015), especially where breeding populations contain a mix of wild (first generation), land-race and later-generation materials.

In addition, breeders of these species may be misrepresenting the number of trees (\(m\)) required to retain a suitable effective population size (\(N_{e}\)). If breeders do not properly model the relationship between progeny within a species with a mixed-mating system, they neglect the intraclass correlation that will exist between these individuals. Indeed, mixed-mating populations differ from allogamous and autogamous ones because they are likely to be composed of a mixture of individuals with different degrees of inbreeding. Inbreeding influences individual phenotypic values and genetic variances. As an example, if we consider an open-pollinated population of a species without intraclass correlation and that the species reproduces by panmictic outcrossing, the value of coancestry (\(\theta_{xy}\)) will be 0.125, inbreeding (f) will be 0 and m can be estimated in a straightforward manner. However, if there is 10% selfing, one in ten trees will have a self-half-sib relationship with the other nine, and on average, each family will have a higher intraclass correlation than expected in a panmictic, allogamous population. To retain the same effective population size, the breeder needs to maintain a larger number of trees, in a sample (n) taken from a large population in equilibrium \(N_{e} = \frac{n}{1 + F} = n(1 - 0.5s)\), where n is the number of generations and F is the fixation index (inbreeding coefficient).

Concluding remarks and future considerations for eucalypt breeding

The study populations exhibited moderate genetic variability, with potential to become breeding base populations. Estimation of variance components and prediction of genetic gain under the assumption of complete outcrossing in a species which has a mixed-mating system results in overprediction of genetic gain, leading to significant errors in the selection of superior trees and/or families.

Our results show that when we model eucalypt species with complete allogamy there is some upward bias in additive genetic variance estimates and therefore of genetic parameters that depend on this quantity. Thus, seeds for this species can be produced with deviations from random mating, with open-pollinated seedling families presenting various levels of relatedness.

With the recent emergence of next-generation molecular techniques, studying the breeding system and making estimates of relatedness among individual trees has become faster and cheaper. This allows a much more accurate characterisation of a species’ mating system, in which case there is no justification for using an approximation that leads to error in selection of superior genotypes, as illustrated by our results. Though not routinely carried out, a number of studies of eucalypt genetic parameters informed by marker-based estimates of relatedness and population structure have been made in recent years, either based on marker data taken directly from the breeding populations (e.g. Bush et al. 2015; Cappa et al. 2016; Klápšte et al. 2017), or like in the present study, making use of previously determined marker-based estimates of outcrossing (Varghese et al. 2017). The advent of high-density markers that can be obtained through genotyping-by-sequencing or “SNP-chip” technologies has opened the door for detailed estimation of inter-tree relatedness. In some cases, it may even be possible to estimate dominance variance in populations where sufficient full-sib relationships exist (e.g. Doerksen and Herbinger 2010; Klápšte et al. 2017), opening the door for significantly more-realistic estimation of quantitative genetic parameters in species with mixed-mating systems.