Introduction

Species of genus Eucalyptus L′Hér. constitute the most outstanding source of plantation forest trees in tropical and subtropical regions of the world. Rapid growth, extensive adaptability to a wide diversity of environments, coupled with multipurpose wood properties for energy, solid wood products, pulp and paper, justify their leading position. Although the genus includes more than 890 species in different subgenera and sections (Slee et al. 2006), approximately twenty species of subgenus Symphyomyrtus account for 95% of the world’s planted eucalypts (Harwood 2011). For over 50 years, extensive seed collections of large numbers of open pollinated progenies in several provenances of these species have been carried out by the CSIRO Australian Tree Seed Centre (ATSC). Continued introductions from this rich germplasm resource have been the key driver of several advanced breeding programs worldwide, which in turn led to the remarkable eucalypt forest productivities experienced today in several countries (Birhanu and Kumsa 2018). The evolution of Brazilian plantation forestry is probably one of the most illustrative cases of successful use of eucalypt germplasm. Large seed introductions from the CSIRO seed bank, enhanced by occasional seed collection expeditions of Brazilian scientists, resulted in a large and diverse network of eucalypt genetic resources currently managed both by public institutions and private forest-based companies in Brazil (Oliveira and Pinto Júnior 2021; Silva et al. 2023).

Despite the wealth of these genetic resources, information on the actual standing of the managed and conserved genetic diversity of eucalypts in Brazil is generally only available for species of higher commercial interest and wide geographic distribution such as E. grandis (Miranda et al. 2019) and E. urophylla (Silva et al. 2019). Moreover, the case of eucalypt germplasm conserved in ex situ conditions requires further attention due to the possibility of genetic admixture following accidental interspecific hybridization once germplasm is introduced in an exotic habitat where potentially fewer interspecific reproductive barriers are encountered (Oliveira et al. 2023). No information is available on the genetic diversity available for minor eucalypt species, more recently introduced and successfully adopted by breeders, usually to solve specific problems of adaptation. This becomes even more relevant when such species themselves experience limited genetic diversity in their natural populations in their center of origin.

Eucalyptus benthamii Maiden & Cambage is an emblematic example of an increasingly important subtropical species for plantation forestry, with restricted distribution in its natural range. Its native populations are found in an area to the south-west of Sydney, approximately 100 km long and 40 km wide in the alluvial floodplains around the Nepean River (Benson 1985). These populations have been reduced due to agricultural and urban development. Extensive flooding caused by the construction of the Warragamba Dam on the Nepean River is also likely to have significantly reduced the species’ natural extent (Butcher et al. 2005). Due to its restricted occurrence and ongoing threats to the remaining stands, particularly those occurring in urbanised areas along the Nepean River, the species has been classified as threatened under the Australian State and Commonwealth legislation and listed as Endangered in the IUCN Red List of Threatened Species: (2022.2 list). Like most eucalypt species, E. benthamii reproduces under a mixed mating system (Butcher et al. 2005; Tambarussi et al. 2022). Selfed progenies are likely to suffer from inbreeding depression, which may cause mortality and reduced fitness (Nicholas 2009). In fact, molecular marker surveys have shown that despite reasonable levels of remnant genetic diversity, trees in these fragmented stands display high kinship with elevated selfing rates due to restricted pollen flow among more unrelated individuals (Butcher et al. 2005; Han et al. 2020).

Unlike many species occurring in subtropical latitudes, E. benthamii is well adapted to low temperatures. This property, coupled with fast growth and multipurpose wood qualities, has made it a desirable species for use in subtropical sites exposed to occasional cold snaps (Bush et al. 2023). Eucalyptus benthamii was first introduced to Brazil in the 1980s aiming at wood pulp production in southern areas of the country subject to periodic frosts (Gardner 2007) and has since then been introduced to other countries including parts of Uruguay and the United States, experiencing similar conditions. Initial germplasm introduction in Brazil involved a bulk collection derived from fewer than ten trees located at Wentworth Falls (NSW), Australia (Graça et al. 1999). Additional introductions comprising a supposedly wider genetic base followed the initial imports, but the exact origins of such introductions were not documented. Furthermore, once the species was declared protected, more than a decade ago, all E. benthamii seed exported from Australia has been from cultivated trees in seed orchards derived from collections undertaken in the 1990s. Knowledge on the exact composition of the Brazilian breeding base for the species is therefore scant. While still in their early stages, breeding programs of E. benthamii in Brazil are rapidly developing, driven by its productive potential for cellulose pulp and tolerance to frost in temperate regions of Southern Brazil (Konzen et al. 2017; Tambarussi et al. 2023). Although seed production of E. benthamii is challenging, clonal propagation of pure species individuals or in hybrid combinations with E. dunni has been somewhat successful but limited to specific genotypes (Estopa et al. 2017). Clearly, a better understanding of the extent and distribution of the genetic resources available for this critically important species for frost prone areas in Brazil is needed to support the ongoing efforts to further adapt this germplasm to the target Brazilian environments.

To assess the genetic diversity and recover the pedigree structure of the most representative breeding populations of E. benthamii in Brazil, we used microsatellite markers. Microsatellites are highly polymorphic, co-dominant markers that are widely distributed throughout the Eucalyptus genome. They have a well-established track-record for evaluating genetic diversity and population structure within and between eucalypt populations (Grattapaglia et al. 2012) and well-developed resources for the genus (Grattapaglia et al. 2015). Previous studies have demonstrated the utility of microsatellite markers for population and kinship analysis of E. benthamii (Butcher et al. 2005; Han et al. 2020; Tambarussi et al. 2022; Souza et al. 2023). In this study we evaluated 648 adult trees managed by two Brazilian companies. These two companies retain most, if not all, the genetic variability of E. benthamii germplasm introduced to Brazil. Additionally, to gauge the available diversity in the breeding populations in Brazil and globally, we examined collection and export records of the species from the CSIRO Australian Tree Seed Centre. With this study we wanted to contribute to a better understanding of the history of the introductions and current status of the genetic diversity of E. benthamii available in Brazil. The data provided should help guide the management of the breeding programs and conservation efforts in the country. Due to the likelihood of common origin of the earlier introductions of the species in Brazil, our working hypotheses was that the that current breeding populations managed by the two companies share a common origin and genetic constitution. Although direct comparisons of microsatellite diversity data across studies are only tentative, we also postulated that the genetic diversity available for E. benthamii in Brazil is likely to be narrower than the reported genetic diversity surveyed in natural populations in Australia.

Materials and methods

Analysis of E. benthamii exports to Brazil

Collection and export data from the Australian Tree Seed Centre (ATSC) database, dating back to 1979 when the species was first collected, were analysed, and summarised to determine the likely diversity of genetic material that might be found in Brazil.

Description of the breeding population samples

A total of 694 E. benthamii individuals were drawn from the breeding populations of two Brazilian forest-based companies (A and B) (Table 1). These trees were identified by membership to basic subpopulations, but detailed pedigree information was absent.

Table 1 Summary of samples

Sampling – company B

Genetic samples from B comprised 362 samples from eight different trials based on the base population of open-pollinated families. The trials are located in Santa Catarina and Paraná states, Brazil. According to the Köppen-Geiger climatic classification (Köppen and Geiger 1936) this region presents a Cfb climate, with an average temperature of 17.1 °C and a mean annual rainfall of 1800 mm (Alvares et al. 2013). The genetic material was introduced to Brazil as seeds between 1994 and 2015 from sources including wild populations named Kedumba (Kedumba Valley) and Nepean (Bents Basin and isolated trees growing along the Nepean River) (following Han et al. (2020), as well as Australian genetically improved material based entirely on the Kedumba subpopulation (Kowen Seedling Seed Orchard), and an additional unknown seed source.

Sampling – company A

Genetic samples from A comprised 332 samples from four sources: (i) Kedumba, (ii) Kylisa (an Australian private seed collector that likely acquired the stock from the ATSC), (iii) Ouro Verde, and (iv) three trees labelled “controls” in a clonal seed orchard (Supplementary material- Table S1 and Figure S1).

Genotyping

Microsatellite data provided by the two companies were generated by Hereditas Tecnologia em Análise de DNA Ltda. in Brasília, Brazil using the same genotyping and data analysis protocols. Genomic DNA was extracted from 100 to 150 mg of fresh leaves or vascular cambium of E. benthamii using a previously described method (Inglis et al. 2018). Microsatellites were co-amplified into multiplex panels of 3 to 6 loci and genotyped as described previously (Faria et al. 2011). Alleles were separated and detected on an ABI PRISM® 3100xl Genetic Analyzer using POP-4™ Polymer, using a custom-made ROX-labelled size standard (Brondani and Grattapaglia 2001) and data collected under dye set D spectral calibration using GeneScan® v3.7 and analysed with Genotyper® v3.7 (Life Technologies). The allele sizes were estimated in base pairs using two decimal places and converted to allele size using the Tandem2 software (Matschiner and Salzburger 2009). A common control sample DNA was used in the two, time-spaced analyses of the two sample sets. The genotype profile of the shared control sample was used to adjust for occasional allele size mobility shifts across the two analytical runs, frequently seen for microsatellites genotyping due to different polymer batches, fluorochrome labels and equipment calibration settings used for the production of different datasets (Moran et al. 2006). A set of 19 EMBRA Eucalyptus microsatellites were interrogated for the samples of company B and 13 microsatellites were analysed for the samples provided by A (Supplementary Material - Table S2). However, the analyses were spaced approximately three years apart and the resultant datasets were opportunistically merged for a final set of nine shared microsatellites between the two studies, used to undertake the analysis reported here.

Filtering of clonal ramets

A total of 29 clones were identified in the A breeding population, each represented by two to five ramets. The dataset was filtered to randomly choose one ramet per clone, reducing the number of unique individuals from 332 to 292. In the B population, only one tree had a clonal replicate, resulting in 362 unique individuals after filtering. The combined dataset (A plus population) comprised therefore 648 individuals.

Genetic diversity and population structure

Genetic diversity parameters including expected heterozygosity (HE), observed heterozygosity (HO) and the Wright’s inbreeding coefficients (FIS) were estimated for the two breeding populations. The analyses were performed using the software adegenet (Jombart 2008) in the R environment (R Core Team 2022).

Two alternative population genetic structure analyses were carried out with the genotype data of the combined dataset to verify the possible existance of genetic clusters and juxtapose them with the two breeding populations. Initially, we employed Discriminant Analysis of Principal Components (DAPC) using the R package adegenet (Jombart et al. 2010). The number of genetic clusters involved were determined based on the successive k-means method within the find.clusters function in adegenet, employing 10 random starting centroids. Subsequently, the resulting clusters were fed into the dapc function. In this analysis, we used 15 principal components, determined as optimal through cross-validation procedures recommended by the software, and retained three axes in the discriminant analysis step. Results were visualized using ggplot2 (Wickham 2016). To pinpoint alleles strongly correlated with each discriminant function, we leveraged the variable contributions reported by DAPC.

The second analysis involved Bayesian clustering implemented with the software STRUCTURE (Pritchard et al. 2000). This Markov Chain Monte Carlo technique identifies population clusters within the whole sample by detecting allele frequency differences within the data and assigns individuals to those sub-populations based on analysis of likelihoods. The analysis was carried out under an admixture model with correlated allele frequencies applying a burn-in length of 50,000 and 500,000 iterations for data collection with K hypothetical populations from K = 1 (i.e., a panmictic population) up to K = 5, performed with 10 independent runs each. We analysed the patterns of log-likelihood probability and the associated standard deviation (averaged across the 10 runs) to identify the most strongly supported number of population clusters. We interpreted the results using the approaches of Pritchard et al. (2000) and Evanno et al. (2005).

Co-ancestry

Pairwise co-ancestry (θij) between pairs (dyads) of trees and inbreeding coefficients (f) for each individual were estimated within the two breeding populations using the approach and Coancestry software described by Wang (2011) implementing the method of Milligan (2003). The co-ancestry coefficient gives the probability that randomly sampled homologous genes from the two individuals are identical by descent. To assess the likely precision and bias of the Milligan estimator when applied to the empirical datasets, genotypes representing 1000 dyads each of unrelated, half-sib, full-sib and self-full-sib trees were simulated. The simulations generated individual multi-locus genotypes with allele frequencies matching those of the empirical SSR datasets. The simulations assumed 5% missing data and a 5% genotyping error rate.

Group co-ancestry (Θ) and status number (Ns) were calculated for each breeding population as described by Wei and Lindgren (1996) based on the pairwise co-ancestry estimates described above. Briefly, group co-ancestry was calculated for each breeding population as:

$${\Theta}=\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{n}\,{\theta}_{ij}}{{N}^{2}}$$
(1)

where N is the census number of individual genotypes in the breeding population. Note that the numerator of Eq. 1 includes estimates of each individual’s coefficient of self-kinship which is given by 0.5 + f/2 as well as the pairwise co-ancestries of all individuals.

The status number is equivalent to the census number of a population of unrelated, non-inbred individuals, and can be calculated as:

$$\:{\text{N}}_{S}=\frac{0.5\:}{{\Theta\:}}$$
(2)

The relative status number is given by:

$$\:{\text{N}}_{r}=\frac{{\text{N}}_{s}\:}{\text{N}}$$
(3)

Results

Analysis of exports from the ATSC

Records from the CSIRO Australian Tree Seed Centre (Table 2) show that a total of 76 wild mother trees have been sampled since the species was first collected in 1979. The first line in Table 2 details ATSC seedlot 14,214, an introduction from Wentworth Falls. Wentworth Falls is in fact the nearest town to the Kedumba Valley, the latter area being where the seed was collected, and this seedlot may potentially contain close relatives of others from the Kedumba subpopulation. It seems likely that this is the seedlot that comprised the first introduction to Brazil (as mentioned by Graça et al. 1999), although these authors note the bulk comprised ten mother trees whereas records show that it actually had only eight. Only 18 of the families are from the Nepean subpopulation. As these were sampled on three separate occasions, and the known population at that time consisted of only a few trees (Butcher et al. 2005), some of the mother trees are likely to be represented in the Brazilian breeding populations more than once. Ten of the listed families have not been exported to Brazil by the ATSC. As the species is protected, and the main subpopulation in Kedumba Valley is very difficult to access, even without authority, we hypothesize that the material summarised below likely represents a large proportion of the available genetic resource of this species. The genetically improved sources in Australia include seedling seed orchards (SSO) based entirely on the Kedumba subpopulation (Kowen SSO) and a mix of Kowen and Nepean material (Barclays and Crossleys SSOs) – although the mixed source SSOs are predominantly based on Kedumba.

Table 2 All wild collections of E. Benthamii carried out by the ATSC and ATSC exports to Brazil made between 1979 and 2007

Based on the data in Table 2, the trials are highly likely to have a large amount of common genetic material. A’s population has accessions supplied by Kylisa Pty Ltd which are almost certain to have originated from the ATSC. The accessions found in the B population have been sampled from both major subpopulations of E. benthamii – with inclusion from both Kedumba and Nepean subpopulations.

Genetic diversity and population structure

Observed heterozygosity estimates for the breeding populations and their constituent subpopulations were very similar at Ho = 0.6. Expected heterozygosity was uniformly higher than Ho and similar among subpopulations at around 0.7, although there were some minor differences among the B subpopulations which ranged from 0.68 to 0.75 (Table 3).

Table 3 Diversity statistics for the A, B, and A + B (overall statistics) breeding populations including observed and expected heterozygosity (Ho and He) and the inbreeding coefficient (FIS)

The A trees formed two clusters, differentiated on the x-axis (Fig. 1), with several trees occupying the intervening space. It is evident, however, that these do not correspond in any systematic way to the three main sources of origin. Analysis of the loadings plot revealed that allele 124 at locus EMBRA2, was chiefly responsible for driving the differentiation (Supplementary material - Figures S3 and S4).

Fig. 1
figure 1

DAPC diagram showing clustering within the A breeding population comprising populations 1: Kedumba Valley (green), 2: Kilysa (orange), and 3: Ouro Verde (purple)

Strong clustering was not observed in the B breeding population with seed sources occupying a single, contiguous cluster (Fig. 2). However, seed source 1 (unknown origin) and the majority of 2 (Kedumba Valley) occupy the quadrants to the right of the origin on the x axis. Locus/allele combinations EMBRA204/132 and 142 and EMBRA210/200 and 210 were primarily responsible for the differentiation (Supplementary material – Figure S3).

Fig. 2
figure 2

DAPC diagram showing clustering within the B breeding population comprising populations 1: Unknown origin (green), 2: Kedumba Valley (orange), 3: Kowen SSO (purple), and 4: Bents Basin (pink)

The DK method applied to the STRUCTURE analysis results for K = 1 to 5 hypothetical groups showed K = 2 as the most likely hypothesis for both breeding populations (Fig. 3). In the successive k-means method, however, the optimal number of clusters that minimized the BIC were respectively 19 and 11 for the A (Figure S5) and B (Figure S6) populations. There is clearly a discrepancy in the results of these two analytical approaches. We hypothesized that the k-means clustering approach is most likely capturing an additional substructure within each population grouping more related individuals possibly sharing a common familial ancestry.

Fig. 3
figure 3

K versus DK for breeding populations A and B indicate the most likely number of clusters where DK is at its highest (K = 2)

Figure 4 comprises stacked bars showing the proportional allocation of the individual trees to each of K = 2 hypothetical subpopulations, coloured blue, and orange. There are significant numbers of trees from each source that appear to be predominantly allocated to either K1 or K2, with a large number also showing significant admixture.

Fig. 4
figure 4

Genomic assignment and composition of 292 individual trees from four identified sources in the A breeding population to K = 2 hypothetical subpopulations denoted by the two colours

The probability of assignment of each tree of the B breeding population to the K = 2 subpopulations detected is shown in Fig. 5. For the unknown seed source, all trees have a high (> 0.7) probability of belonging to the K1 (blue) subpopulation. For the remaining three sources (Kedumba, Kowen, and Bents Basin) there is a continuum of probabilities from low to high with some trees having high probabilities of belonging to one or the other cluster and some intermediate between the two, indicating admixture.

Fig. 5
figure 5

Genomic assignment and composition of 360 individual trees from four identified sources in the B breeding population to K = 2 hypothetical subpopulations denoted by the two colours, being orange the subpopulation 1 and blue the subpopulation 2

Kinship within breeding populations

Simulations show that the microsatellite marker panels for the two datasets analysed separately are reasonably reliable in terms of precision and bias for estimating pairwise co-ancestry, while the combined marker panel, comprising nine common loci, is less likely to be able to precisely estimate pairwise kinship, although it may still give acceptable results (Table 4). Bias in the estimation of unrelated dyads (estimates from pairs of trees) was around 8% for the two separate datasets and 10% for the combined dataset. As unrelated dyads would be expected to comprise most of the data in early-generation breeding populations, this is the parameter that would likely have the greatest impact on the estimation of group co-ancestry.

Table 4 Results of relatedness (2q) estimation (Milligan 2003) on simulated datasets comprising 1000 dyads each of unrelated (UR), half-sib (HS), full-sib (FS), and self-full-sib relationship types. The genotype data were simulated with allele frequencies matching each of the three empirical datasets (A, B and these combined) and assuming 5% missing data and 5% genotyping errors

Estimation of pairwise co-ancestry using the method of Milligan (2003) revealed close relatedness and significant family structure in both breeding populations. Visual inspection of heatmaps of genomic relatedness matrices for the individual populations (Fig. 6) reveals small numbers of discrete families, appearing as blocks flanking the diagonals. These diagonal-flanking blocks comprise close relatives with off-diagonal elements 2θ > 0.15. Significant inter-relatedness between these families is also apparent. Many of the creamy-colored blocks, appearing more-or-less randomly as “noise” in areas away from the diagonal, are likely to be dyads that are in fact unrelated, bearing in mind the positive bias in estimation of unrelated dyads demonstrated by the simulations. Analysis of the combined data showed a similar pattern, with numerous individuals across the two breeding populations being close relatives, confirming their common ancestry.

Fig. 6
figure 6

Heat mapped genomic relationship matrix for the A, B and combined (A + B) populations. The asterisked box in the lower left pane is presented at higher resolution in the lower right pane showing inter-relatedness between individuals in the two breeding populations. The matrices comprise elements corresponding to 1 + f on the diagonal and 2\(\:\theta\:\) on the off-diagonals

Several of the families, particularly in the A population, comprise individuals that are significantly more-closely related than half-sibs (HS) (visible as orange, red, mauve, and purple blocks on the off-diagonal).

Status number

Breeding population status numbers (NS) calculated on the raw co-ancestry estimates were very low in each breeding population and in the combined data (Table 5). As the combined data are based on only nine loci, with comparatively poorer power to detect the difference between unrelated and weakly related dyads, bias adjustment (θ = − 0.1) had a major effect on the resulting group co-ancestry. Note that the group co-ancestry.

Table 5 Census number (N) Group co-ancestry (\(\:{\Theta\:}\)) and status number (Ns) for the A, B, and A + B populations breeding populations calculated using the raw marker-based co-ancestry estimates and bias-adjusted estimates

(ϴ) for each breeding population is effectively half the average of the relatedness elements shown graphically in Fig. 6 (including the diagonal). While \(\:{N}_{S}\) based on the bias-adjusted co-ancestry estimates are significantly higher, they are still low relative to the breeding population census numbers (N). The group co-ancestry estimated over the combined populations is lower than the sum of the two separate estimates, reflecting the high proportion of relatives across the two breeding populations (see for example the lower right pane of Fig. 6). A comparison of estimated status numbers for several forest tree breeding populations is presented in the Supplementary material – Table S4.

Discussion

Eucalyptus benthamii is a threatened species with a limited natural distribution but a unique forestry application due to its adaptation to sub-tropical sites that may experience severe frosts. Due to the inaccessibility and fragmentation of its natural stands, the genetic diversity of germplasm collected from the wild is low, as gauged by the number of open-pollinated families sampled, relative to other key industrial eucalypts such as E. grandis (Miranda et al. 2019) and even E. urophylla, which is also relatively inaccessible (Payn et al. 2008). This is borne out by the analysis of collections and exports from the ATSC, which is highly likely to have carried out most of the sampling of this species.

Although a significant portion of the available diversity of E. benthamii has been exported to Brazil, with additional genetic material likely to have arrived via intermediate routes, the historical export records and the genetic data gathered in this study indicate that the two breeding populations surveyed were primarily founded on a restricted number of open-pollinated families. The labelling of seed imports and the parallel importation of seedstocks derived from the same limited wild collections may lead tree breeders in Brazil, and potentially elsewhere, to overestimate the size of their base breeding populations. Most of the sampling from natural stands has been from the larger Kedumba Valley population, and it might be expected that this subpopulation would be the dominant contributor to the Brazilian gene pool.

The two breeding populations analysed here appear to have levels of gene diversity very similar to those previously estimated using microsatellite markers for E. benthamii, although we are aware that direct comparisons should be made cautiously due to different microsatellite sets and variable resolutions of the analytical procedures used. Our estimates of He in the range 0.68 to 0.75 are, nevertheless, comparable to those of both Han et al. (2020) who estimated He between 0.58 and 0.67 and Butcher et al. (2005) who estimated He of mature trees from 0.62 to 0.78. As Han et al. (2020) note, these levels are at the upper end of those reported for eucalypts with a localised or regional distribution.

Investigation of population structure using DAPC revealed clustering that did not closely align with the seed source origins in either the A or B breeding populations. In the case of the A population, two separate clusters were visible, while the B breeding population formed a single cluster, with some evident seed source differentiation. The Bayesian ‘ΔK’ metric of Evanno et al. (2005) applied to the outputs of STRUCTURE revealed that K = 2 clusters was the most likely number for both populations. Although estimation of co-ancestry using small-sized microsatellite panels can only be expected to provide estimates of modest precision and bias at the individual dyad level, relative to high density SNP panels for example, the overall patterns of relatedness in the two breeding populations is reasonably clear. The genomic relationship matrix shows that there is considerable inter-relatedness among families, with significant variation in family size. This may be due to uneven contributions of pollen parents, uneven selection of progeny from particular mother trees, formation of the breeding population from differently labelled source materials that were actually from the same genetic origin, or a combination of these factors.

The analyses carried out here, taken together, indicate that there has probably been significant admixture among most of the breeding population materials, the pedigree of which is not known with certainty. The Bayesian Structure analysis indicates that there are significant numbers of individuals with high probabilities of belonging to the K = 2 hypothetical subpopulations. These subpopulations might correspond to the Kedumba and Nepean subpopulations identified by Han et al. (2020). The STRUCTURE analysis carried out by those authors indicated that wild trees from Kedumba clustered into a single population, while those of Nepean origin formed up to three clusters with very low probability of admixture with the Kedumba cluster. The pattern identified in our study is different, probably indicating some inter-population admixture between Nepean and Kedumba accessions as well as within-subpopulation matings in the first-generation seed orchards from which the Brazilian breeding populations were derived. Progeny within the B breeding population of the “unknown” ancestral origin do not fit this pattern: they all have a high probability of belonging to only one of the two hypothetical groups and do not appear to have significant proportions of admixed ancestry. The co-ancestry analyses provide additional evidence of significant admixture among populations with significant inter-relatedness evident among families of different origins.

The estimated group co-ancestries for the two populations, either considered separately or in combination, are consequently high, with correspondingly low status numbers, even after adjusting for likely upwards bias in the estimator. Table S4 (Supplementary material) compares the group co-ancestry and status number estimates with other small breeding populations of broadly similar sizes (as gauged by the census number). The relative status number range for the E. benthamii breeding populations is among the lowest in the published literature (Supplementary material- Table S4). As status number tends to further decrease through subsequent rounds of directional selection and breeding, careful management of the pedigree and further infusions of unrelated genetic material are strongly recommended to avoid or at least minimize inbreeding by related matings.

Besides providing an overview of the extent and organization of the genetic diversity available in E. benthamii breeding populations in Brazil, the genomic relationship matrices estimated among the 648 trees in our study constitute roadmaps of significant practical and immediate use in devising more efficient breeding strategies and tactics. For example, these data can be used to maximize unrelated matings both within each breeding population as well as between populations through selected germplasm exchange between the two companies. This would maximize the production of novel, high diversity recombinant individuals and expand the available diversity to both breeding programs for more sustainable long term genetic gins. Within each program, genomic relationship information would allow an optimal distribution of genetic diversity across genetic hierarchies of breeding subgroups. Such an approach would enhance the implementation of nucleus breeding or multiple population strategies that include elite selections for intense breeding effort, mainline population managed for long-term genetic gain and archive of diverse selections for the maintaining genetic diversity (White et al. 2007). Alternatively, molecular relatedness data can be used to establish subline breeding strategies allocating more related individuals to each subline depending on whether the objective is to exploit inbreeding to purge genetic load or less related when the idea is to maximize diversity within each subline. Inbreeding builds up within each subline but trees for plantation are derived from crosses among individuals from different sublines (Burdon and Namkoong 1983). Finally, on a more tactical level, genomic relatedness would allow tailoring mixed pollen pools of more unrelated selections to pollinate all breeding trees for better estimation of their breeding values.

While the panel of microsatellite markers used here was sufficient to give a first overview on the diversity, structure and kinship of the Brazilian E. benthamii germplasm, we acknowledge that there would be a significant opportunity to increase the precision of all reported estimates by using a much larger number of SNP markers. E. benthamii with a pooled sample of 12 individuals was one of the 12 species whose genome was resequenced for SNP discovery and development of the EUChip60K genotyping platform (Silva-Junior et al. 2015). Interestingly, in that study although a large number of 558 E. benthamii individuals from a plantation in Brazil were genotyped, only 12,000 (23.5%) of the 51,000 SNPs on the chip were polymorphic when compared to 58.8% for 79 genotyped individuals of E. grandis. That result was attributed to the low genetic diversity available in the E. benthamii as a result of its restricted occurrence and genetic vulnerability. Nevertheless, 12,000 SNPs would certainly allow a much more granular view of the genetic relationships among individuals and possibly the opportunity to refine the population structure results reported in our study.

Conclusion

Our study has shown that moderate genetic diversity is available within the main breeding germplasm of E. benthamii in Brazil. However, accumulation of inbreeding and widespread close relatedness is already evident in these breeding populations. These findings support our initial hypotheses: significant close relatedness was observed between the breeding populations managed by the two companies, strongly suggesting that they have been derived from the same germplasm accessions. This conclusion was expected from our analysis of germplasm exports records from Australia. Most of the seedlots from natural stands exported to Brazil are very likely from the Kedumba Valley subpopulation. Moreover, lower genetic diversity was observed in Brazilian E. benthamii compared to the diversity reported for native Australian populations.

The introduction history, genetic diversity, and structure of E. benthamii breeding populations in Brazil suggests thus, that breeders should focus on improving the overall performance of these population while maintaining or increasing genetic diversity and carefully managing the populations to avoid mating among near relatives as a preferred approach. One of the best management interventions would evidently be to import new, unrelated selections from the wild. This might prove difficult due to current restrictions for germplasm collection in wild stands, and even if accomplished it would ideally require a molecular marker survey to assess whether the new introductions would in fact contribute “new blood” to the current populations. A more viable and efficient alternative to both breeding programs would involve implementing breeding strategies and tactics that take full advantage of the DNA marker data reported in this study to organize and recombine the diversity available within and between the two breeding programs. Establishing a recurrent selection program with a base population composed of individuals from subpopulations spanning both companies holds significance. This strategy should facilitate targeted genetic enhancement and fostering synergies between distinct genetic pools, thereby potentially yielding superior outcomes in the short run and maintain adequate diversity levels for long term genetic gains.

Finally, hybrid breeding with compatible Eucalyptus species of subgenus Symphyomyrtus is also a valuable alternative to exploit the E. benthamii superior cold tolerance. However, there are challenges associated with the production and deployment of E. benthamii hybrids in Brazil. A major problem is that virtually no other species that is adapted to subtropical growing conditions has the cold tolerance of E. benthamii. Interspecific hybrids are therefore not likely to be as cold tolerant as the pure material, a critical trait in the E. benthamii deployment niche. While technical challenges associated with producing E. benthamii hybrids with cold tolerant species such as E. dunnii and E. dorrigoensis might be overcome, none of these species produces rooted cuttings easily, making deployment of hybrids involving them likely to be more difficult.