Introduction

Genome-wide association studies (GWAS) are used to identify significant associations among quantitative traits and genetic loci in plant and animal genomes (Bush and Moore 2012). GWAS have been used extensively to understand the genetic complexity of economically important traits in tree species (Korte and Farlow 2013). Within the Eucalyptus genus, the species Eucalyptus grandis stands out because of its fast growth, high adaptability, and superior wood quality (Malan 1993). It is the most commonly planted hardwood tree globally, with a diverse range of applications in cellulose, paper, timber, and charcoal production (Malan and Gerischer, 1987; Grattapaglia 2008; Carocha et al. 2015).

Cellulose in particular is a key wood product that meets a wide variety of primary human needs, such as paper (Hollertz et al. 2017; Jin et al. 2021), pharmaceuticals (Beyger and Nairn 1986; Giri et al. 2020), biofuels (Carere et al. 2008; Rubin 2008; Carroll and Somerville 2009), and food (Lavanya et al. 2011; Shi et al. 2014). Therefore, tree breeding strategies should focus on selecting genotypes considering not only growth characteristics, but also wood quality traits (Byram et al. 2005; Grattapaglia and Kirst, 2008; Apiolaza et al. 2013). To improve the quality of wood production, several studies have emphasized the importance of finding the genetic basis of wood quality traits such as lignin (Li et al. 2008; Hisano et al. 2009; Mizrachi et al 2017), syringyl/guaiacyl ratio (Stackpole et al. 2011; Denis and Bouvet, 2013), wood density (Osorio et al. 2001; Stackpole et al. 2010), total extractives (Gallo et al. 2018; Makouanzi et al. 2018), and cellulose yield (Schimleck et al. 2004; Kien et al. 2009). In this context, the application of genomics on forest improvement (Grattapaglia et al. 2009) and the development of new GWAS strategies are essential for identifying associations between genomic regions of the traits of interest and those significantly associated with the phenotype (Hirschhorn and Daly 2005). Several studies have identified loci related to the expression of growth and wood quality traits in Eucalyptus (Thavamanikumar et al. 2014; Lamara et al. 2016; Resende et al. 2017; Müller et al. 2017, 2019). Generally, growth traits tend to be more correlated with moderate levels of heritability, while wood quality traits are less correlated, but commonly present higher levels of heritability (Mphahlele et al. 2020). For Eucalyptus, Kainer et al. (2019) examined the genetic effects on oil yield, while Resende et al. (2017) conducted regional heritability mapping for growth and wood quality traits to identify quantitative trait loci (QTLs). Nevertheless, few studies have sought to understand the genetic effect of pleiotropic loci in Eucalyptus by comparing single-trait and multi-traits GWAS (Tan and Ingvarsson 2018; Rambolarimanana et al. 2018).

Pleiotropic effects occur when genetic loci have an influence on more than one trait (Solovieff et al. 2013). The application of pleiotropy in breeding means a movement away from selecting for one trait at the genetic level to selecting for multiple traits at a phenotypic level (Paaby and Rockman 2013). Although single-trait GWAS has identified the polygenic inheritance effect of markers, several efforts have been made to understand the pleiotropism between quantitative traits (Liu and Yan 2019), such as multiple trait selection assisted by genetic markers. Among single-trait GWAS algorithms, the fixed and random model circulating probability unification procedure (FarmCPU) performs a multi-locus linear mixed model (MLMM) to effectively control for spurious associations (Liu et al. 2016). On the other hand, multi-trait mixed models (MTMMs) were developed by Korte et al. (2012) to perform multi-trait GWAS and examine the common genetic effects that act in pleiotropy on two correlated phenotypic traits.

The MTMM algorithm performs three different analyses, categorized as full, common, and interaction. While the full model considers both common and interaction effects, the common and interaction models separate these effects individually. Thus, the common model performs a statistical analysis that demonstrates the coincident effects on two traits. Meanwhile, the interaction model identifies interacting genetic effects that act in the opposite direction between two traits (Korte et al. 2012). In the presence of pleiotropy, the power of the multi-trait GWAS is superior to single-trait GWAS because of the additional accuracy obtained when data for two traits are considered together (Korte et al. 2012; Korte and Farlow 2013; Oladzad et al. 2019).

The present study focused on using GWAS to assess the genetic architecture of growth and wood quality traits of an open-pollinated E. grandis seed orchard. The specific objectives of the present study were to (1) develop and compare the significant loci using single- and multi-trait GWAS models in the identification of significant SNP markers related to growth and wood quality traits, (2) identify QTLs significantly associated with the expression of phenotypic traits, and (3) understand the pleiotropic effects and the genetic architecture of important traits.

Material and methods

Plant material and phenotypes

The study population was an open-pollinated seed orchard of E. grandis located in the municipality of São Miguel Arcanjo, São Paulo, Brazil (− 23.890188, − 47.937138). The population was established in September 2012 by the Suzano company’s breeding team. The experiment consisted of a randomized complete block design, with four blocks, each containing 25 families (treatments) and one clonal control test (commercial clone), with four plots of 20 individuals each (five plants per plot). The spacing between plants was 3 m × 2 m, resulting in a planted area of 1.344 ha with 2240 trees. The open-pollinated seeds used to establish the experiment were collected from seven different locations across Brazil (Rio Claro — São Paulo (SP); Teixeira de Freitas — Bahia; Biritiba Mirim — SP; Salto — SP; Sarapui — SP; Mogi Guaçu — SP; and São Simão — SP) and one from Zimbabwe, Africa. The 25 families are originally from Coff’s Harbour (New South Wales, NSW) and Atherton (Queensland, QLD), Australia.

For the analysis, we considered the genomic and phenotypic information from 1772 individuals. The control genotype was an E. grandis commercial clone used by the SUZANO company. The phenotypic information was subdivided into growth traits (GWTs) and wood quality traits (WQTs). Growth traits were measured at two different ages (3 and 6 years after planting) and were classified as height (HEI3 and HEI6) in meters and diameter at breast height (DBH3 and DBH6) in centimeters. The DBH (DBH3/DBH6) and height (HEI3/HEI6) were used to estimate tree volume at 3 and 6 years of age (VOL3 and VOL6, respectively) in cubic meters according to the formula described by Schumacher and Hall (1933):

$$\mathrm{VOL}= {\mathrm{DBH}}^{2}\times \frac{\pi }{40000}\times \mathrm{HEI}\times f$$

Furthermore, we analyzed six wood quality traits related to cellulose production. To do so, an increment borer was used at breast height to collect wood cores of 12 mm at 6.5 years after planting. Then, wood material was sent to the laboratory for processing to obtain spectral information using near-infrared spectroscopy (NIRS).

Sawdust samples from 69 genotypes were used to create the curve calibration. The wood material was retained in a mesh sieve and placed in circular cells. The NIR reflectance spectra were obtained using scans of wavelength ranges. Curve calibration was based on samples from five different species (Eucalyptus grandis, Eucalyptus urophylla, Eucalyptus brassiana, Eucalyptus tereticornis, and Eucalyptus pellita) collected in three different regions of Brazil (Maranhão, São Paulo, and Bahia) at 6 years after planting. An internal company calibration (SUZANO S. A.) model was developed using the Bruker FT-NIR spectrophotometer MPA II. The resulting calibration database containing NIR wood spectra was obtained through following methods outlined in the reference literature. The prediction of constituent values based on existing calibration curves was used to estimate the following wood quality traits: pure cellulose yield (PCY) in percentage, basic wood density (WBD) in cubic meters, syringyl/guaiacyl ratio (SGR), soluble lignin (SOL) in percentage, total solid content (TSC), and total extractives (TEX) in percentage.

Phenotypic data analysis

Each of the 1772 samples was evaluated using the Bonferroni outlier test to find the mean-shift outlier with studentized residuals in linear mixed models. Thus, outliers were removed by deleting observations based on standard deviation with the car package in the R software environment (Fox et al. 2012). Then, the normal distribution of phenotypic data was verified using the Shapiro–Wilk test, and data normalization was performed using the bestNormalize package in R (Peterson 2021). Finally, with the normalized dataset, the best linear unbiased predictions (BLUPs) (Rodriguez et al. 2020) were estimated for each trait with the breedR package in R (Muñoz and Sanchez, 2015) using the following mixed model:

$${Y}_{ijk} = \mu + {Xb}_{j}+{Z}_{{t}_{i}}+ {Zp}_{k}+ {\varepsilon }_{ij}$$

where \(\mu\) is the average mean; \({b}_{j}\) is the fixed effect of the \({j}^{th}\) block; \({t}_{j}\) is the fixed effect of the \({j}^{th}\) family effect (progeny); \({P}_{k}\) is the random effect of the \({j}^{th}\) plot with p ~ N(0, \({\sigma }_{P}^{2}\)); and \({\varepsilon }_{ij}\) is the residual error that represents the nongenetic effects. The matrices \(X\) and \(Z\) are the incidence matrices for the fixed and random effects, respectively. Deregressed best linear unbiased prediction/predictor (dBLUP) was then estimated to avoid shrinkage properties (Henderson 1975) according to the formula \(\frac{\widehat{g}}{{r}^{2}}\) (Garrick et al. 2009), where \(\widehat{g}\) is the genomic BLUP, and \({r}^{2}\) is the reliability, estimated as \(1-(\mathrm{PEV}/{\sigma }_{g}^{2})\), where \(\mathrm{PEV}\) is the prediction error variance, and \({\sigma }_{g}^{2}\) is the genotypic variance. Pearson’s genetic correlation tests were then performed using the BLUPs to verify the correlation between the 12 growth and wood quality traits. Correlation distributions were plotted using the ggcorrplot package in R (Kassambara 2019). The significant p-values were estimated using function “p.mat”.

DNA extraction and quality control

Cambium tissue was collected individually from 1772 trees and processed using the CTAB Lysis Buffer. DNA was extracted using the CTAB method (Doyle and Doyle 1987). DNA integrity was confirmed in 1% agarose gel electrophoresis and quantified by the Nanodrop spectrophotometer (Thermo Fisher, Waltham, MA, USA). DNA genotyping was performed using the EUChip60K high-density Illumina Infinium SNPchip for Eucalyptus species (Silva‐Junior et al. 2015). Duplicate SNPs were eliminated from the raw dataset based on markers with the lowest call rate. Quality control was conducted using the R package snpReady (Granato et al. 2018). Markers were removed if they were monomorphic or had a call rate lower than 95%. Alleles with minor allele frequency (MAF) lower than or equal to 0.05 were also excluded. The genotypes were coded as “0” and “2” for homozygotes and “1” for heterozygotes. The remaining genotypic data was imputed using the R package snpReady considering Wright’s equilibrium of the probability of occurrence considering the combination of allelic frequency and heterozygosity observed from the markers (Granato et al 2018). Later, the filtered markers were submitted to linkage disequilibrium (LD) pruning, removing markers with a pairwise r2 higher than 0.99. This step was performed using the SNPRelate package in R (Zheng et al. 2012). After quality control, high-quality SNPs were selected for association mapping.

SNP repositioning

We repositioned the markers using the information from the SNP probes in Illumina. Probe sequences were used to align with the second version of the Eucalyptus grandis reference genome (v2.0) (https://data.jgi.doe.gov/refine-download/phytozome?genome_id=297) with the bowtie 2 aligner (Langmead and Salzberg 2012) and sensitive global alignment settings. The SNP position from version 2.0 was used in the GWAS analysis. We removed all scaffolds from the Brasuz v2.0 that were not in the linkage groups (from chromosome 1 to 11). The success of the repositioning was analyzed using a comparison map, and the dotplot coincidence graphs of the positioning of the two reference genomes (v1.0 and v2.0) were plotted using the R packages RIdeogram (Hao et al. 2020) and ggplot2 (Wickham 2011), respectively.

Genetic parameters and population structure

The effective population size (\({N}_{e}\)) was estimated using the molecular linkage disequilibrium method (Waples and Do 2008) as implemented in NeEstimator V2.1 (Do et al. 2014). Population genetic parameters were estimated using the popgen function in the R package SNPReady (Granato et al. 2018), and include Nei’s genetic diversity, as \(\mathrm{GD}=1- {p}_{j}^{2}- {q}_{j}^{2}\); polymorphic information content, where \(\mathrm{PIC}=1-\left({p}_{j}^{2}+{q}_{j}^{2}\right)-(2{p}_{j}^{2}{q}_{j}^{2})\); and minor allele frequency using the formula \(\mathrm{MAF}=\mathrm{min}({p}_{j},{q}_{j}\)). The observed heterozygosity (\({H}_{o}\)) was obtained with the formula: \({H}_{o}= n{H}_{j}/N\), where \({H}_{j}\) is the number of heterozygous individuals, and \(N\) is the number of individuals. For each trait, we estimated the narrow-sense (\({h}_{a}^{2}= {\sigma }_{a}^{2}/ {({\sigma }_{a}^{2}+\sigma }_{e}^{2}\)) and broad-sense (\({h}_{g}^{2}= {\sigma }_{a}^{2}+{\sigma }_{d}^{2}/{(\sigma }_{a}^{2}+{\sigma }_{d}^{2}+{\sigma }_{e}^{2}\)) genomic heritability, where \({\sigma }_{a}^{2}\) represents the additive variance, \({\sigma }_{e}^{2}\) is the residual variance, and \({\sigma }_{d}^{2}\) is the dominance variance. The narrow and broad sense heritabilities were estimated using the ASReml R package (Gilmour et al. 2017). Then, the degree of differentiation between the two origin populations (\({F}_{ST})\) was estimated using the formula \({F}_{ST}=1-{H}_{S}/{H}_{T}\), where \({H}_{S}\) is the average expected heterozygosity for each population (two different origins), and \({H}_{T}\) is the expected heterozygosity in the total population.

The population structure was first analyzed by a principal component analysis (PCA) using genotypic data, where the first two principal components (PC1 and PC2) were used to determine the extent of population structuration. The two different origins were represented by different colors. We subsequently used the ADMIXTURE software to identify different genetic clusters with a fixed number of populations (K) ranging from 1 to 40. Genetic correlation between phenotypes was estimated using the BreedR package in R (Munoz and Rodriguez 2014). Correlation was estimated in pairs considering the same model used to estimate BLUPs (Item 2.2). The genomic kinship matrix (\({G}_{a}\)) was obtained using the SNPReady package in R (Granato et al. 2018), following VanRaden (2008), with the following equation:

$${G}_{a}=\frac{{Z}_{A}{Z}_{A}^{T}}{2 {\sum }_{1}^{{m}_{i}}{p}_{i}\left(1-{p}_{i}\right)}$$

where \({Z}_{A}\) is a matrix coded as 0 for homozygote \({A}_{1}{A}_{1}\), 1 for heterozygote \({A}_{1}{A}_{2}\), and 2 for homozygote \({A}_{2}{A}_{2}\); \({p}_{i}\) is the frequency of an allele from locus \(i\); and \(Z\) is an n × m matrix of marker incidence (n is the number of genotypes, and m is the number of markers). In order to compare the difference between genomic and pedigree information, we estimated the pedigree relationship matrix (A) using the R package pedigreem (Bates and Vazquez, 2013).

LD decay

Genome-wide pairwise linkage disequilibrium (LD) was estimated for each chromosome using the function LD.decay from the sommer package v 2.9 in R v 4.0.2 (Covarrubias-Pazaran 2016). LD was estimated by the squared allele frequency correlation r2 between marker pairs, and the decay was plotted considering the first distance classes based on the marker matrix and a map with distances between SNPs on a loess curve. To investigate the average LD decay in the whole genome and within chromosomes, significant intra-chromosomal r2 values were plotted against the genetic distance between markers using the ggplot2 package in R (Wickham 2011).

Genome-wide association study

We performed single-trait GWAS using the fixed and random model circulating probability unification (FarmCPU) (Liu et al. 2016) and multi-trait GWAS using the multi-trait mixed model (MTMM) (Korte et al. 2012) to identify genetic factors associated with the expression of phenotypic traits. The corrected phenotypic data (BLUP) and the genotypic information were used for single- and multi-trait GWAS. The single-trait association was performed using the genome association and prediction integrated tool (GAPIT) (Lipka et al. 2012; Tang et al. 2016). The population structure based on PCA matrix (Q) and kinship (K) were automatically generated (VanRaden 2008; Lipka et al. 2012) using genotypic data and the default GAPIT parameters. Using the GWAS results, we estimated the phenotypic variance explained by a significant marker (\(\mathrm{PVE}\)), described as follows:

$$\mathrm{PVE}= 2*\left({\beta }^{2}\right)*\mathrm{MAF}*\left(1-\mathrm{MAF}\right)$$

where \(\beta\) is the effect of allele substitution, and \(\mathrm{MAF}\) is the minor allele frequency of markers. The pleiotropic effect among phenotypic traits, which is a SNP marker having an effect on two or more traits, was estimated using the multi-trait mixed model (MTMM) (Korte et al. 2012). We performed multi-trait GWAS in pairs for the significantly associated growth and wood quality phenotypic traits. The R scripts provided by Korte et al. (2012) partition the interaction effects into three different analysis models: interaction, common, and full. Thus, considering two traits using a single marker model, the MTMM model can be written as (Korte, 2012):

$$y= \left[\begin{array}{c}{y}_{1}\\ {y}_{2}\end{array}\right]= {s}_{1}{\mu }_{1}+{s}_{2}{\mu }_{2}+x\beta + \left(x\times {s}_{1}\right)\alpha + \nu$$

where \({y}_{1}\) and \({y}_{2}\) are phenotypic values for genotype interactions of two traits. The \(y\) value is estimated as \(X\beta +{u}_{G}+ {u}_{G\times E}+e\), considering \({u}_{G}\) and \({u}_{G\times E}\) are the genotype and genotype-by-environment interaction values; \({s}_{1}\) and \({s}_{2}\) are vectors of 1 or 0 for all values of the trait in question;\({\mu }_{1}\) and \({\mu }_{2}\) are the means; \(x\) is the marker effect; \(\beta\) represents the effect size of fixed effects; and \(\nu\) is the prediction error. The interaction and common models identify markers that act differentially or in the same direction for two traits. On the other hand, the full model identifies SNPs with either an interaction or common effect. The significance threshold used for the p-values estimated by single- and multi-trait GWAS was calculated using the Bonferroni method (α = 0.05). The p-values (-log10 P) for each evaluated SNP and model was used to generate Manhattan and QQ (quantile–quantile) plots using the R package CMPlot (Yin, 2018).

Gene ontology

The significant SNPs for growth and wood quality traits were used to conduct a gene ontology analysis according to the physical distance within the GWAS peak regions. Since there were no strong LD blocks along the genome, which is probably related to LD-pruning, the downstream and upstream distance to search for candidate loci were estimated considering the distance of the two nearest flanking markers to the significant SNP. The genetic annotation and predicted functional effect of each gene were obtained by searching the database for version 2.0. of E. grandis from Phytozome v11.0 (Egrandis_297_v2.0.gene.gff3.gz). Venn diagrams were developed using the jvenn plot (Bardou et al. 2014).

Results

Phenotypic data

The number of outliers removed varied among the 12 phenotypic traits (DBH3: 63; HEI3: 111; VOL3: 2; DBH6: 0; HEI6: 6; VOL6: 1; PCY: 22; WBD: 21; SGR: 111; TSC: 23; SOL: 78; and TEX: 83). The genetic correlation among phenotypic traits ranged from − 0.96 (PCY/TSC) to 1 (DBH6/VOL6) (Fig. 1a and b). Similarly, the highest correlation between wood quality traits was found between TEX and SOL (0.62). The PCA biplot represents the first two components for the full set of 12 traits (six growth and six wood quality). The first two axes account for 50.7% and 17.9% of the variation in the phenotypic data (Fig. 1c).

Fig. 1
figure 1figure 1

a Genotypic correlation and b distribution of phenotypic traits for growth and wood quality categories across the 1772 Eucalyptus grandis genotypes; c principal component analysis for wood quality and growth traits. DBH3 diameter at breast height at 3 years; DBH6 DBH at 6 years; VOL3 volume at 3 years, VOL6 volume at 6 years; HEI3 height at 3 years; HEI6 height at 6 years; PCY pure cellulose yield; WBD basic wood density; SGR Syringyl/guaiacyl ratio; SOL = soluble lignin; TSC total solid content; and TEX total extractives

Population structure and genetic diversity parameters

The PCA using genotypic data revealed that the first component was mainly responsible for the genetic variation (55%) (Fig. S1). Although there was a slight grouping of genotypes according to their origin by PCA, the ADMIXTURE analysis showed an absence of population genetic structure (Fig. S2). Accordingly, the genetic differentiation (FST) between individuals from two different origins presented a value of 0.036, indicating limited genetic divergence between them. A similar pattern was found for the kinship matrix (VanRaden 2008), where different subpopulations were identified but with no evidence of a strong population structure (Fig. 2a). Also, we notice that the genomic relationship matrix increased the prediction accuracy when compared with the pedigree information (Fig. 2b), with genotypes more and less related. Although the genotypes evaluated are originally from two native populations, the seeds which were used to establish the breeding population are from open-pollinated trials installed in eight different locations. Thus, we believe that crossings among genotypes from different origins may have generated stratification in the population.

Fig. 2
figure 2

Kinship heatmaps for the a marker relationships matrix estimated using the 21,254 SNPs based on the VanRaden method and b pedigree relationship matrix for the Eucalyptus grandis breeding population of 1772 individuals

In general, genetic diversity parameters showed moderate values. Nei’s genetic diversity of the whole population ranged from 0.07 to 0.50, with an average of 0.35. The marker polymorphic information content (PIC) ranged from 0.07 to 0.38, with an average of 0.28. The minor allele frequency (MAF) showed a mean value of 0.26, ranging from 0.06 to 0.50. The observed heterozygosity (\({H}_{o}\)) had an average of 0.40, ranging from 0.24 to 0.47. Similarly, the inbreeding coefficient ranged from 0.04 to 0.50, with a mean value of 0.26. We found an effective population size (\({N}_{e}\)) of 31.5 considering linkage disequilibrium between markers (\({LDN}_{e}\)).

SNP repositioning and quality control

In general, several SNPs changed their original relative position between the first (Myburg et al. 2014) and second (Bartholomé et al. 2015) versions of the E. grandis reference genome, and some even changed chromosomes. However, the genome-scale SNP collinearity (Fig. 3) between the two versions showed that most SNPs maintained similar positions. We did note a high collinearity pattern and more reliable linkage maps with version 2.0 (Bartholomé et al. 2015). Thus, we chose SNP positions estimated using the second version to perform GWAS analysis and identify QTLs and candidate loci related to trait expression.

Fig. 3
figure 3

a SNP density plot across each chromosome representing the number of SNPs after quality control within a 1 Mb window size; b pairwise LD-decay across the 11 chromosomes of the 1772 individuals genotyped using the EUChip60K. Different colors represent different SNP density, and “Chr” represents the E. grandis chromosomes

For quality control, from the initial total of 64,639 markers, 3425 duplicate SNPs were removed considering the call rate, leaving 61,214 markers. After SNP repositioning, 1946 markers were removed as they were located in small scaffolds. A total of 28,957 markers were removed due to MAF (0.05), and 8,229 markers were removed due to the call rate (0.9), leaving 22,082 markers. Furthermore, 1.08% missing points were imputed. Finally, after LD pruning, 828 SNPs with high linkage disequilibrium were removed, leaving a final total of 21,254 markers for the analysis.

The informative SNPs selected were uniformly distributed across the 11 chromosomes of the E. grandis genome. Figure 3a shows the occurrence of SNPs along the E. grandis chromosomes, where the number of SNPs is summed within adjacent 1 Mb windows. LD showed a quick and similar decay pattern across the 11 E. grandis chromosomes (Fig. 3b). The ad hoc value of r2 (0.10) indicated an average LD across chromosomes ranging from 150 to 200 kb (Fig. S4).

Genome-wide association studies

Broad- and narrow-sense heritability and single-trait genome-wide association study

For growth traits, we found moderate values of narrow-sense heritability, ranging from 0.4299 (HEI3) to 0.5816 (DBH6) (Table 1). Three wood quality traits (SGR, SOL, and TEX) presented relatively low narrow-sense heritability (0.1599, 0.1845, and 0.1515, respectively). On the other hand, pure cellulose yield (PCY) presented the highest broad-sense heritability (0.7107) among all growth and wood quality traits.

Table 1 Significant associations for growth and wood quality traits using the single-trait model (FarmCPU) for a Eucalyptus grandis breeding population. Traits are divided into growth (GWT) and wood quality (WQT). The number of SNPs, MAF, PVE, and number of loci are related to the significant number of associations found by the FarmCPU model

The FarmCPU model successfully performed single-trait GWAS, indicating significant associations between growth and wood quality traits in E. grandis. After Bonferroni correction, a total of 81 SNPs with a significant association were identified for six growth traits (43 SNPs) and five wood quality traits (38 SNPs). Only the wood quality trait total extractives (TEX) showed no significant associations with markers (Table 1). The number of significant markers associated with phenotypic traits ranged from 2 (DBH6) to 14 (WBD) (Fig. 4a and b, respectively).

Fig. 4
figure 4

Manhattan and QQ-plots of GWAS for growth traits (a, c) and wood quality traits (b, d), respectively, using the FarmCPU model for an Eucalyptus grandis breeding populations with 21,254 markers. Different colors represent different tested traits. Dashed line indicates the Bonferroni threshold (α = 0.05)

The average minor allele frequency (MAF) for the significant markers ranged from 0.0946 (DBH6) to 0.3164 (TSC). For all significant SNPs, the total phenotypic variance explained by a given SNP (PVE) was low, ranging from 0.0529 (SOL) to 0.2110 (PCY). Marker EuBR04s9558885 (PCY) showed the highest phenotypic variance (0.1014), suggesting a strong influence of this marker on phenotypic expression. Several markers were found associated with multiple phenotypic traits for trait expression and candidate gene annotation (Fig. 4).

The number of annotated loci associated with the expression of phenotypic traits ranged from 0, with no gene annotation for the significant SNP (DBH6) to 46 (WBD). We found QTLs significantly related to more than one trait for both categories (Table S1; Fig. S3). In general, functional gene annotation presented several categories and descriptions associated with tissue growth on cell walls, cellulose biosynthetic process, transporter activity, DNA, ion and protein biding, oxidation–reduction process, and catalytic activity, among others. The function and description of all candidate loci for both growth and wood quality traits are shown in Table S1.

The pleiotropic effect among loci for growth traits was first seen for the SNP marker EuBR09s24960947, which presented the most significant association for traits DBH3 and HEI3, with p-values of 9.34 × 10−9 and 1.91 × 10−9, respectively. This marker tags seven different loci (Eucgr.I01459, Eucgr.I01460, Eucgr.I01461, Eucgr.I01462, Eucgr.I01463, Eucgr.I01464, and Eucgr.I01465). In general, the single-trait GWAS revealed 13 candidate loci significantly associated with DBH3. Similarly, most loci found for HEI3 also showed comparable genome locations and molecular functions. Considering the trait HEI3, we found no annotation for candidate loci located near three significant SNPs (EuBR07s925067, EuBR06s38139098, and EuBR08s70063929) (Table S1). On the other hand, for HEI6, we found 44 annotated loci related to trait expression with different descriptions and gene ontology terms.

The SNP EuBR11s17004419 (HEI6) showed four different flanking loci (Eucgr.K01383, Eucgr.K01384, Eucgr.K01385, and Eucgr.K01386). The marker EuBR06s23565060 was identified for both ages for volume (VOL3 and VOL6) and for DBH3 (p-values 8.29 × 10−8, 1.01 × 10−8, 8.44 × 10−8, respectively). We found a similar pattern of significant SNPs correlated with more than one phenotypic trait for wood quality in single-trait GWAS. For PCY and TSC, three different SNPs (EuBR07s252985, EuBR08s57640594, and EuBR10s1696823) were significantly correlated with the expression of these traits. These two phenotypic traits (PCY and TSC) presented the highest negative correlation (− 0.96), indicating that negative correlations can be effective in identifying pleiotropic loci.

Regarding pure cellulose yield, SNP marker EuBR04s9558885 had the highest significance (p-values 1.06 × 10−11), presenting five different flanking loci related to trait expression (Eucgr.D00522, Eucgr.D00523, Eucgr.D00525, Eucgr.D00526, and Eucgr.D00527). Additionally, SNP EuBR10s1696823 was related to PCY, with the presence of gene Eucgr.J00155, a wound-induced protein. Similarly, for TSC, we found 12 annotated loci significantly associated with trait expression. The marker EuBR01s39512949 presented six different loci related to its expression (Eucgr.A02909, Eucgr.A02910, Eucgr.A02911, and Eucgr.A02912).

Multi-trait genome-wide association study

The multi-trait GWAS showed good performance for all significant combinations among traits (Fig. S3). We found significant marker-phenotype associations for growth and wood quality traits that were not identified with the single-trait GWAS. Considering the 33 phenotypic correlations among the 12 traits, 22 combinations showed significant associations (Table 1; Fig. S3). The MT models (full, common, and/or interaction) for some models were unable to properly perform the GWAS since the p-values seemed to be deflated, and the QQ-plot showed more noise (e.g., Fig. S4d). These models were also unable to find significant associations considering the Bonferroni correction, which suggests no influence of possible false positives on the results.

The combinations among growth traits in multi-trait GWAS resulted in the highest number of significant SNPs with pleiotropic effects (24). Furthermore, the multi-trait GWAS analysis among wood quality (6) and between the two categories (GWT and WQT) (10) tended to express less significant markers. The multi-trait GWAS revealed 40 SNPs influencing the expression of multiple phenotypic traits (Table S6). Not surprisingly, the multi-trait methodology showed greater power to identify associations considering that most single-trait analyses using the MTMM methodology (Korte et al. 2012) could not identify associations due to the strict Bonferroni cutoff (α = 0.05; p-value = 1.63 × 10−5).

Significant associations between the same trait in different years of data sampling (e.g., EuBR03s72654230 for DBH3 and DBH6; EuBR04s246324 for HEI3 and HEI6; EuBR02s2712998 for VOL3 and VOL6) indicate a strong pleiotropic effect on trait association. The SNP EuBR06s39120397 presented a strong p-value, and this marker was also statistically significant in the expression of HEI3, DBH6, and VOL6. This pattern may be related to the strong genetic correlation among these traits. Similarly, SNP EuBR01s28498846 was also found for the combination of traits TSC, DBH6, VOL6, and VOL3, indicating evidence of a pleiotropic effect of the marker on growth and wood quality traits.

We identified one SNP that was significantly associated with two trait combinations (HEI3 and DBH6; HEI3 and VOL6) (EuBR06s39120397; p-values 7.57 × 10−6 and 1.14 × 10−5, respectively), with nine candidate loci related to its expression (Eucgr.F02939 ~ Eucgr.F02947). One SNP was found to be significant by the full model between the combinations of traits DBH3 and DBH6 and DBH3 and VOL6 (EuBR03s72654230; p-values 1.59 × 10−5 and 1.02 × 10−5, respectively).

The SNP EuBR03s43394028 was significant for three combinations of traits (HEI6 and VOL6, VOL3 and HEI3, and DBH3 and HEI6) (p-values 9.33 × 10−7, 1.01 × 10−5, and 1.54 × 10−6). However, between traits HEI3 and HEI6, although SNP marker EuBR03s22449999 (p-value 8.41 × 10−6) was identified as significant by the common model, there were no annotations for candidate loci. On the other hand, SNP EuBR04s246324 was significant for the common and full models with a high p-value (9.05 × 10−7). Considering that both traits HEI3 and HEI6 represent plant height, the power of multi-trait GWAS to detect significant candidate loci proved to be effective even for the same trait, considering different developmental stages.

One SNP marker was detected as significant for traits DBH3 and VOL3 (EuBR05s62102817; p-value 2.19 × 10−6) (Table S1). Similarly, between traits VOL3 and DBH6, the marker EuBR07s16969079 showed significant association (p-value 1.38 × 10−5). We found two significant markers for the first WQT combination SOL and TEX (EuBR06s19529730 and EuBR06s52964694; p-values 2.66 × 10−7 and 5.06 × 10−6, respectively). The multi-trait GWAS combination between the traits SGR and TEX identified three significant markers (EuBR11s43922247, EuBR11s44284539, and EuBR03s16484895; p-values 7.26 × 10−6, 1.16 × 10−5, and 9.83 × 10−6) through the full and interaction models. Similarly, the genomic regions for marker EuBR11s44284539 revealed two flanking candidates loci (Eucgr.K03516 and Eucgr.K03517).

Discussion

Single and multi-trait GWAS were effective in properly identifying QTLs as well as annotated loci related to phenotypic expression in the studied E. grandis breeding population. Additionally, the quality control process was able to remove uninformative markers, leaving a total of 21,254 highly informative markers that were used in the GWAS analysis. In general, most of the markers removed (28,957) during quality control were due to a low minor allele frequency (< 5%), which is the frequency of the second most common allele in the population. In relation to the rearrangement of the E. grandis genome assembly, Bartholomé et al. (2015) identified 43 non-collinear and 13 non-synthetic regions. Thus, although there are modifications in marker collinearity found by the linear trend between the two versions of the genome, the new arrangement may be related to modifications in genome assembly.

Although there were some rearrangements during SNP reposition, using new SNP positions for v2.0 of the genome was effective in finding QTLs and annotated loci. We reinforce that as far as we know, this is the first GWAS study developed using repositioned SNP probes that compares the positions of the two genome versions. Furthermore, considering that gene annotation is based on the second version of the Eucalyptus genome (Bartholomé et al. 2015), we believe that the possibility of errors was reduced.

Another important point to consider is related to population structure. Herein, we found no clear structuration of the population between individuals, which may be related to the population’s breeding history. Although there are two origins and it is likely that there would have been population structure, the breeding population was established from eight different provenances, which might have promoted outcrossing between individuals from different origins. According to Hayes (2013), not considering population structure in GWAS can cause false-positive associations. Thus, both models (single- and multi-trait GWAS) were tested against population structure, and we believe that this effect did not have an impact on our results as they were considered in the analysis.

Several genetic mapping through association studies have been used to assess the complexity of the genetic architecture of growth (Freeman et al. 2013; Müller et al. 2017, 2019), wood quality traits (Cappa et al. 2013; Freeman et al. 2013; Resende et al. 2017; Dasgupta et al. 2021), and non-wood traits (Resende et al. 2017; Kainer et al. 2019; Mhoswa et al. 2020) of Eucalyptus. Using the second version of the Eucalyptus genome, it was possible to more accurately identify QTLs. Many studies have also developed single-trait GWAS for growth, wood quality, and disease resistance in Eucalyptus spp. (Resende et al. 2017; Kainer et al. 2019; Müller et al. 2019; Ballesta et al. 2020; Mhoswa et al. 2020; Valenzuela et al. 2021). However, few studies have evaluated the multi-trait association models for growth and even fewer for wood quality in eucalypts (Rambolarimanana et al. 2018; Tan and Ingvarsson 2018). As expected, although several markers were found to be significant, the results from the single- and multi-trait GWAS indicate limited genetic variance, which can explain the relatively low number of associations. This pattern might be related to the polygenic nature of quantitative traits (Grattapaglia et al. 2018), indicating that there are many loci related to trait expression, as predicted by Fisher’s infinitesimal model (Fisher 1918).

Although the complexity of multiple loci influences the expression of quantitative traits, the number of significant SNPs identified herein, and consequently the number of QTLs for both single and multi-trait GWAS, was similar to previous studies (Müller et al. 2019; Ballesta et al. 2020). Additionally, besides the reliable accuracy achieved by the single- and multi-trait GWAS, the phenotypic information used in the present study was obtained from a single environment, which may have limited the phenotypic precision of each individual. Thus, our study reinforces the importance of using multi-trait models combined with single-trait models for highly complex quantitative traits. According to Liu et al. (2016), the FarmCPU model offers the best trade-off between predictive power and false positives. On the other hand, the power of the MTMM approach considering the correlation between two traits (multi-trait GWAS) can improve the identification of more evident pleiotropic effects than those found using a single marginal trait analysis (Korte et al. 2012).

The implementation of GWAS using phenotypic information from different traits can lead to the discovery of effects stronger than those identified by single-trait analysis (Korte et al. 2012). To increase the statistical power of GWAS, several studies have used multi-trait analysis to identify significant genetic-phenotypic associations (Jaiswal et al. 2016; Thoen et al. 2017; Yoshida and Yáñez 2021). Thus, multi-trait GWAS can increase the power of single-trait GWAS using different measures or multiple traits with a high pattern of genetic correlation (Porter and O’Reilly 2017). Regarding Pearson’s genetic correlations between phenotypic traits, the strongest associations between growth variables found herein are expected because diameter, height, and volume are directly related. On the other hand, wood quality traits did not show strong patterns of association, except for PCY which presented several significant and positive associations with growth traits. This finding suggests that selection for growth traits might lead to a large increase in cellulose yield, which, for example, could have a further effect of reducing the total solid content production. Thus, pleiotropic QTLs are important when using marker-assisted selection for multiple traits.

Generally, our results show that multi-trait GWAS was able to increase the power of single-trait GWAS (FarmCPU) to identify loci that directly affect mutual traits, thus increasing the capacity to identify markers with minor effects. Furthermore, compared to the multi-trait GWAS (MTMM), the FarmCPU showed a lack of power to identify pleiotropic markers and correlated traits with low phenotypic correlation, as shown in previous studies (Korte et al. 2012). The joint association analysis, which considered the full, common, and interaction models, suggested genetic factors acting in the same direction, differentially, or with an interaction or common effect for the expression of the growth and wood quality traits.

Considering single-trait GWAS, several studies identified that FarmCPU increased the power of GWAS for complex traits (Tang et al. 2016; Kusmec and Schnable 2018; Miao et al. 2019). Our study corroborated this finding, identifying 81 significant markers for growth (43) and wood quality (38) traits. Furthermore, FarmCPU was able to control for false positives caused by population structure and kinship because of the distribution of quantile–quantile (QQ) plots. On the other hand, the MTMM model performed in the multi-trait GWAS identified a smaller number of significant markers (31) among all significant trait combinations (Table S6). The importance of finding pleiotropic QTLs is related to marker-assisted selection, which can be used together to select multiple regions related to the expression of both growth and wood quality traits (Gupta et al. 2010). Regarding genomic heritability, low/moderate heritability levels were found for growth traits. The high/moderate heritability for the wood quality traits PCY and WBD indicates that they are less influenced by the environment. However, three wood quality traits (SGR, SOL, and TEX) showed a critically low heritability, making GWAS not appropriate for these traits.

Herein, the pleiotropic effect of loci influencing the expression of phenotypic traits was primarily found through single-trait GWAS analysis. A similar tendency was found by Ward et al. (2019) comparing yield traits in soft red winter wheat, where several markers presenting loci with pleiotropic effects were identified by the FarmCPU model. Here, pleiotropy was identified for both growth and wood quality traits in E. grandis. However, the markers with a pleiotropic effect identified for different traits by single-trait GWAS were not identified when using the multi-trait GWAS. The difference of significant SNPs found in these analyses might result from the different statistical methodologies that explore GWAS associations (Hayes 2013).

Conclusion

Our study highlights the importance of examining associations between markers and phenotypes for eucalypt species. Herein, we identified markers that act individually on each trait using the single-trait GWAS and markers that have pleiotropic effects and influence several traits using multi-trait GWAS. The results corroborate previously published data for eucalypt species using moderate-size populations along with high-density SNP datasets. As far as we know, most of the markers identified herein have never been described in previous GWAS for eucalypt species. The results discussed herein provide a better understanding of gene expression and offer important information to inform marker-assisted selection.

In terms of identifying QTLs using single- and multi-trait GWAS, we were able to find clear results related to gene interaction. Gene ontology analysis of GWAS was also important in identifying the biological context of loci. The different GWAS methodologies applied involved the scanning of the whole genome from different trees and identifying genetic markers that can be used to predict phenotypic traits. As a result, GWAS effectively identified candidate loci related to the expression of phenotypic traits. We believe that the results can be used in genetic selection to increase the productivity of eucalypt plantations and improve future breeding programs. Nevertheless, further studies should be conducted to identify significant associations with multiple environmental conditions. Thus, it is essential to continue evaluating the genetic effects and the complexity of the genetic architecture of economically important traits to continue to accumulate genetic gains in each breeding cycle.