Abstract
To improve multi-environmental trial (MET) analysis, a compound method—which combines factor analytic (FA) model with additive main effect and multiplicative interaction (AMMI) and genotype main effect plus genotype-by-environment interaction (GGE) biplot—was conducted in this study. The diameter at breast height of 36 open-pollinated (OP) families of Pinus taeda at six sites in South China was used as a raw dataset. The best linear unbiased prediction (BLUP) data of all individual trees in each site was obtained by fitting the spatial effects with the FA method from raw data. The raw data and BLUP data were analyzed and compared by using the AMMI and GGE biplot. BLUP results showed that the six sites were heterogeneous and spatial variation could be effectively fitted by spatial analysis with the FA method. AMMI analysis identified that two datasets had highly significant effects on the site, family, and their interactions, while BLUP data had a smaller residual error, but higher variation explaining ability and more credible stability than raw data. GGE biplot results revealed that raw data and BLUP data had different results in mega-environment delineation, test-environment evaluation, and genotype evaluation. In addition, BLUP data results were more reasonable due to the stronger analytical ability of the first two principal components. Our study suggests that the compound method combing the FA method with the AMMI and GGE biplot could improve the analysis result of MET data in Pinus teada as it was more reliable than direct AMMI and GGE biplot analysis on raw data.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The multi-environment trial (MET) should be carried out in forest genetic tests due to the geographical applicability of forest species (Dutkowski 2005). One of the most important phenomenon in MET is that genotype-by-environment interaction (GEI) is often significant. How to accurately assess GEI is critical for further breeding and subsequent promotion of tree varieties. Thus far, most of the agricultural MET analysis adopts the joint regression method (Finlay and Wilkinson 1963); additive main effect and multiplicative interaction (AMMI) (Gauch and Zobel 1997); and genotype main effect plus genotype-by-environment interaction (GGE) biplot (Yan 2001). MET analysis generally includes genotype evaluation, test-environment evaluation, and mega-environment delineation. Both AMMI and GGE biplot are equally capable of delineating mega-environment delineation (Gauch et al. 2008). Furthermore, GGE biplot has a more complete and visual advantage in representing genotype performance and stability and identifying representative for test environments (Yan 2001). As far as we know, the application of AMMI and GGE biplot in trees has been only reported on a few species such as Pinus radiata (Ding and Wu 2008), Populus (Sixto et al. 2011) and Michelia chapensis (Wang et al. 2016).
Although AMMI and GGE biplot are widely applied in crops using phenotypic data, these methods have some limitations: (1) these methods are limited to the fixed effect model; (2) tested environment should be homogenous, and (3) trial data should be balanced. First, homogeneity of the test environment is hardly achievable in forest trails, which was confirmed by the present study that the forest test environment had various spatial variations. Secondly, the forest trials usually have missing data and are more often highly imbalanced. Moreover, genetic parameters, such as breeding value, require a random effect model to be estimated. Therefore, these problems might greatly restrict the application of AMMI and GGE biplot in forest trials.
In recent years, factor analytic (FA) method is used in forest MET, since it could possess highly imbalanced datasets (Costa e Silva et al. 2006; Costa e Silva and Graudal 2008; Cullis et al. 2014; Ivkovic et al. 2015; Chen et al. 2017). Moreover, FA model can capture complex variance structures with a relatively small number of variance parameters (Kelly et al. 2007). Unlike AMMI and GGE biplots, the FA method does not have the intuitive results, such as discrimination and representativeness of test environments, yield performance, and stability of test genotypes.
In this study, we presented one compound approach to combine FA method with AMMI and GGE biplot in order to fully demonstrate advantages of these methods in MET. We first used spatial analysis combined with the FA method to obtain the best linear unbiased prediction (BLUP) of each individual tree at each test site, and then the AMMI and GGE biplot were employed to evaluate family, test environment, and mega-environment delineation. This method enables us to avoid the limitations of a fixed effect model and test environment homogeneity, to improve the analysis result of AMMI and GGE biplot, and to serve as a reference for future implication in tree breeding programs.
Materials and methods
Experimental design
A total number of 36 open-pollinated (OP) families of Pinus taeda were used in this study. The progeny trials were carried out by randomized complete block design (RCBD) with three replications and five trees in each plot. The same design was implemented in the all six sites. The detailed information of the test sites is summarized in Table 1. Diameter at breast height (DBH) of all trees at age 15 was measured in 2010.
Statistical analysis
The model for MET BLUP by FA method (Lin 2016) is expressed as
where \(y_{ij}\) is the observation of individual tree in the ith site, μ is overall mean, Si is the fixed effect of ith site, SGij is the random interaction effect of the individual tree with the ith site, and \(e_{ij}\) is random residual.
The variance matrix (G) of \(SG_{ij}\) could be fitted by FA method (Smith et al. 2001) as
where \(\Gamma\) is a matrix of site loadings, \(\Psi\) is a diagonal matrix with special variances for each site, and A is the numerator relationship matrix of individual trees.
For spatial analysis (Dutkowski et al. 2002), the residual (\(e_{ijk} , {\text{R}}\)) could be partitioned into spatial correlated error (δ2ξ) and independent error (δ2η), and the variance matrix of R could be written as
where δ2ξ is spatial correlated error, \(\mathop \sum \limits_{c} (\rho_{c} )\), \(\mathop \sum \limits_{r} (\rho_{r} )\) is autoregression matrix for column and row, ρc, ρr is autoregression parameters in column and row direction, δ2η is independent error, and I is an identity matrix. The significant levels of all parameters for spatial model were tested using the Loglikehood ratio test (LRT).
After fitting the model in ASReml 4.0 (Gilmour et al. 2016), the estimated breeding values of individual trees in each site can be obtained. Further, these values plus the overall mean (μ) were treated as BLUP data for the following analysis.
The AMMI model equation (Crossa 1990) is written as
where \(y_{ijk}\) is the raw observation or the estimated BLUP data, μ is the grand mean,\(\alpha_{i}\) is the main effect of ith family, βj is the main effect of jth site, Rep(β)jk is the main effect of kth replicate within jth site, λn is the singular value for the interaction principal component (IPC) axis n, γin is family i IPC scores for axis n, δjn is site j IPC scores for axis n, θij is the interaction residual not explained by IPC axis n, and ɛijk is the residual error.
The GGE model equation for the first two principal components (Yan 2001) is written as
where \(y_{ij}\) is the measured mean or the estimated breeding value mean for ith family in jth site, βj is the measured mean or the estimated breeding value mean for all families in jth site, λ1 and λ2 are the singular values for the first two principal components (PC1 and PC2), ξi1 and ξi2 are the scores of family i for PC1 and PC2, ηj1 and ηj2 are the scores of site j for PC1 and PC2, and ɛij is the residual error.
AMMI stability value (ASV) is calculated by the following formula (Purchase 1997) as
where \(SS_{IPC1} {\text{and }}SS_{IPC2}\) are the sum of squares for IPC1 and IPC2, IPC1 and IPC2 are the family scores for IPC1 and IPC2.
MET BLUP procedure was implemented by the program codes (Lin 2016) using software ASReml 4.0 (Gilmour et al. 2016), AMMI and GGEbiplot analysis were employed by R package agricolae (De Mendiburu 2016) and GGEBiplotGUI (Frutos et al. 2014), respectively. The GGE biplot was based on singular value decomposition with symmetrical scaling and focused on the environment (Yan 2010).
Results and discussions
Test environment heterogeneity
Since trial experiment homogeneity is required in AMMI and GGE biplot analysis, it was necessary to test whether all the trial experiments were homogenous. As a result, the environmental errors varied greatly among trial sites (Table 2). For random independent errors, site 5 was the biggest (5.04), while site 4 was not estimated and might be zero. For spatial correlated errors, site 4 was the biggest (6.96), followed by site 2 (4.58), while site 5 was not estimated and might be zero. When the random independent error was large and uncorrelated, then the spatial correlated error might be zero, and vice versa (Dutkowski 2005). In addition, except for site 5, significant autocorrelation existed in row or/and column direction at each site. There was negative autocorrelation in row (site 1 and 3) or column (site 5) direction despite the fact that they were not statistically significant, showing that there might be no or weak competition. These results indicated that forest trial environments were usually heterogeneous and that spatial analysis could effectively account for the environment heterogeneity within each site. Our study was consistent with studies in other trees species (Costa et al. 2001; Dutkowski et al. 2002, 2006; Terrance and Jayawickrama 2008). Therefore, it should be cautious to use phenotypic data directly for AMMI and GGE biplot analysis.
AMMI analysis of raw data and BLUP data
The AMMI combined analysis of variance for both raw data and BLUP data revealed that the effects of site, family and their interactions (GEI) were highly significant (Table 3). For raw data, the site explained 41.05% of the total sum of squares (TSS) while GEI captured 16.03% of TSS, indicating that sites were quite diverse and could affect the family performance. Replication accounted for 3.89% of TSS and was highly significant. In addition, the first two interaction principal components (IPC1 and IPC2) only accounted for 63.99% of GEI, and only IPC1 had a significant effect. However, for BLUP data, site and GEI had extremely significant effects, along with their higher F values than those of raw data, implying stronger site and GEI effects than raw data. Although environment explained a greater percent of TSS (69.31%), it was obvious that the residual error of TSS was reduced to 16.69%, which was much smaller than the raw data (28.17%). Furthermore, replication effect only accounted for a small percent of TSS (0.23%) and was not significant, compared to the raw data (3.89%). The result was identical to the fact that ideal spatial analysis could effectively reduce the residual error and greatly cut down design effects (Dutkowski 2005, Dutkowski et al. 2006, Terrance and Jayawickrama 2008). Moreover, both IPC1 and IPC2 had significant effects and accounted for 72.81% of total GEI, which was higher than that in the raw data, even when GEI captured less percent of TSS (11.42%). These results implied that BLUP data was more reliable on the interpretation of DBH variation than the raw data.
The overall mean of DBH for each family was similar between raw data (15.62) and BLUP data (15.43), but their ASV and rank of ASV (rASV) were greatly diverse (Table 4). In addition, The Spearman correlation of rASV between raw data and BLUP data was only 0.36 (p < 0.05). According to index of rASV, family NO 30 of raw data had the lowest ASV (0.10), indicating that it was the most stable genotype, but it was rank 12 in BLUP data. For BLUP data, family NO 32 had the lowest ASV (0.10) and was the most stable genotype while it was only rank 7 in raw data. Only a few families had stable rASV in both row data and BLUP data, such as family NO 7, 18, 20, and 34, only 11.11% of total families. Since the first two IPC of BLUP data accounted for 72.81% of total GEI, higher than raw data (63.99%), revealing that ASV results of BLUP data were more credible.
GGE analysis of raw data and BLUP data
Which-Win-Where
The “Which-Win-Where” function of GGE biplot lines the outermost genotypes to a polygon and makes a vertical line for each edge of the polygon through the origin. Then the test environments are grouped and the superior genotypes are marked within each environment group (Yan 2010). The results showed that the six test sites of raw data were divided into three groups, sites 1, 2 and 5 in one group, sites 4 and 6 in another group, and site 3 in an independent group (Fig. 1a). Family No 10 was the highest genotype in sites 4 and 6, and family 20 was the highest genotype in sites 1, 2 and 5. Compared with raw data, BLUP data divided six sites into four groups, sites 1, 5, and 6 in one group, sites 2, 4, and 6 in independent one group, respectively (Fig. 1b). Family NO 19 was also the highest genotype in sites 1, 5, and 6, but families 9, 22, and 11 was the highest genotype in sites 2, 3 and 4, respectively. Furthermore, the first two PC of raw data only accounted for 66.21% of total GEI, less than BLUP data (78.46%). Similar to the AMMI results, the Which-Win-Where results of BLUP data were more reasonable.
Discrimination and representation of environments
The choice of test environment is directly related to the reliability of variety breeding, and an ideal test environment should be strongly discriminative and representative. The blue line with arrows represents the average environment axis, and the length of the dotted line between the test environment and the origin represents discriminative ability of test environment (Fig. 2). The angle between the test environment vector and the average environment axis represents the representative of the test environment. The smaller the angle, the stronger the representation of the test environment. If the angle is obtuse, it is not suitable as a test environment. The results showed that site 3, 4, and 6 were the best discriminative environments, and site 5 was the best representative environment for raw data (Fig. 2a). Therefore, site 5 was the ideal test environment for raw data. For BLUP data, sites 1, 4, and 6 were the best discriminative environments while site 2 was the worst one, and site 1 was the best representative environment (Fig. 2b).
Yield and stability analysis
The GGE biplot used average environment coordination (AEC) to evaluate the yield and stability of genotypes (Yan 2001). AEC included the average environmental axis (green solid line with arrow) and its solid vertical line through the origin (Fig. 3). The solid green line with an arrow was the average environmental axis, and the vertical black dotted line represented the average yield and stability of each genotype across all environments. The longer dotted line represented that the yield was more unstable.
The solid green line vertical line through the origin stood for the grand (overall) mean. The genotype on the left side of the green vertical line represented its yield below the grand mean, while the genotype on the right side of the genotype represented its yield above the grand mean. Yield performance and stability were greatly differed between raw data and BLUP data. For yield performance in raw data (Fig. 3a), family NO 20 was the highest, and family NO 23 was around the overall mean, while family 9 was the lowest. The most stable genotypes were families 14 and 24, and the most unstable were families 3, 4, 10, 28, and 34.
An ideal genotype should take into account both high yield and stability (Yan 2010) making family NO 21 the best genotype in raw data. Compared with raw data, in BLUP data, the highest yield genotypes were familes 19 and 10, and the most stable was family NO 10, while the most unstable was families 11, 22, and 24 (Fig. 3b). The ideal genotype was family NO 10 in BLUP data. Although it seemed that yield performance and stability were greatly different between raw data and BLUP data, some consistency was found between these two datasets. For example, families 9, 2, and 32 were the worst genotypes in both two datasets. Similar to Which-Win-Where results, we thought that BLUP data had better analysis results in yield performance and genotype stability than raw data.
Conclusions
When using AMMI and GGE biplot directly from phenotypic data, it is not possible to obtain genetic parameters and calculate selection efficiency. Nevertheless, AMMI and GGE biplot have obvious advantages in mega-environment delineation, test-environment evaluation, and genotype evaluation. Whether the AMMI and GGE biplot is suitable for forest remains to be seen. We used a data set of six progeny testing sites to test this combined spatial/FA model with AMMI and GGE biplot for use in forest progeny trials. Our results showed that if we first obtained BLUP data from raw phenotypic data of forest MET by spatial effects with FA method, it would significantly improve the analysis result of AMMI and GGE biplot. First, spatial analysis with the FA method could eliminate effects of different spatial variation patterns from phenotypic data that resolved test-environment homogeneity. Second, BLUP data greatly reduced the percent of residual error on TSS and obviously increased variation explaining ability from AMMI analysis. Finally, raw data and BLUP data had substantially different results in the GGE biplot. Furthermore, the Spearman correlation of rASV between raw data and BLUP data was low (r = 0.36, p < 0.05), and the percent of the first two principal components on GEI in BLUP data was higher than in raw data. Therefore, we suggest that carrying out the BLUP procedure by spatial analysis with FA method will yield more credible results. It should be noted that the spatial model might be adjusted by design effects (such as block or plot) for different measured traits if their effects were significant. In addition, this BLUP procedure might also have some advantages. For example, if we have pedigree files or other relationships (such as genomic relationships) for test genotypes, we could get the missing value even if the dataset was highly imbalanced. Another advantage is that since the FA method belongs to the random effect model, it is realistic to estimate genetic and residual variances for further analysis of genetic parameters.
References
Chen ZQ, Karlsson B, Wu H (2017) Patterns of additive genotype-by-environment interaction in tree height of Norway spruce in southern and central Sweden. Tree Genet Genomes 13:25
Costa e Silva J, Graudal L (2008) Evaluation of an international series of Pinus kesiya provenance trials for growth and wood quality traits. For Ecol Manag 255:3477–3488
Costa e Silva J, Potts B, Dutkowski G (2006) Genotype by environment interaction for growth of Eucalyptus globulus in Australia. Tree Genet Genomes 2:61–75
Costa ESJ, Dutkowski GW, Gilmour AR (2001) Analysis of early tree height in forest genetic trials is enhanced by including a spatially correlated residual. Can J For Res 31(11):1887–1893
Crossa J (1990) Statistical analysis of multi-location trials. Adv Agron 44:55–85
Cullis BR, Jefferson P, Thompson R, Smith AB (2014) Factor analytic and reduced animal models for the investigation of additive genotype-by-environment interaction in outcrossing plant species with application to a Pinus radiata breeding programme. Theor Appl Genet 127(10):2193–2210
De Mendiburu F (2016) Agricolae: statistical procedures for agricultural research. R Package Version 1. pp 2–4
Ding M, Wu HX (2008) Application of GGE Biplot analysis to evaluate genotype (G), environment (E) and G × E interaction on Pinus radiata: a case of study. N Z J For Sci 38(1):132–142
Dutkowski GW (2005) Improved models for the prediction of breeding values in trees. Ph.D. Thesis. University of Tasmania, 79–107
Dutkowski GW, Costa ESJ, Gilmour AR, Lopez GA (2002) Spatial analysis methods for forest genetic trials. Can J For Res 32(12):2201–2214
Dutkowski GW, Costa ESJ, Gilmour AR, Wellendorf H, Aguiar A (2006) Spatial analysis enhances modeling of a wide variety of traits in forest genetic trials. Can J For Res 36(7):1851–1870
Finlay KW, Wilkinson GN (1963) The analysis of adaptation in a plant breeding programme. Aust J Agric Res 14(6):742–754
Frutos E, Galindo MP, Leiva V (2014) An interactive biplot implementation in R for modeling genotype-by-environment interaction. Stoch Environ Res Risk Assess 28(7):1629–1641
Gauch HG, Zobel RW (1997) Identifying mega-environments and targeting genotypes. Crop Sci 37(2):311–326
Gauch HG, Piepho HP, Annicchiarico P (2008) Statistical analysis of yield trials by AMMI and GGE: further considerations. Crop Sci 48(3):866–889
Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2016) ASReml user guide release 4.0. Vsn International Ltd, Hemel
Ivković M, Gapare W, Yang H, Dutkowski G, Buxton P, Wu H (2015) Pattern of genotype by environment interaction for radiata pine in southern Australia. Ann For Sci 72:391–401
Kelly AM, Smith AB, Eccleston JA, Cullis BR (2007) The accuracy of varietal selection using factor analytic models for multi-environment plant breeding trials. Crop Sci 47(3):1063–1070
Lin YZ (2016) R and ASReml-R statistics. China Forestry Publishing House, Beijing, pp 524–533. ISBN 978-7-50-388869-4
Purchase JL (1997) Parametric analysis to described G × E interaction and yield stability in winter yield. Ph. D Thesis. Department of Agronomy, Faculty of Agriculture, University of Orange Free State, Bloemfontein, pp 4–83
Sixto H, Salvia J, Barrio M, Ciria MP, Cañellas I (2011) Genetic variation and genotype-environment interactions in short rotation Populus, plantations in southern Europe. New For 42(2):163–177
Smith A, Cullis B, Thompson R (2001) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57(4):1138–1147
Terrance ZY, Jayawickrama KJ (2008) Efficiency of using spatial analysis in first-generation coastal Douglas-fir progeny tests in the US Pacific Northwest. Tree Genet Genomes 4(4):677–692
Wang RH, Hu DH, Zheng HQ, Yan S, Wei RP (2016) Genotype × environmental interaction by AMMI and GGE biplot analysis for the provenances of Michelia chapensis in South China. J For Res 27(3):659–664
Yan W (2001) GGEbiplot-a windows application for graphical analysis of multi-environment trial data and other types of two-way data. Agron J 93(5):1111–1118
Yan W (2010) Optimal use of biplots in analysis of multi-location variety test data. Acta Agron Sin 36(11):1805–1819
Acknowledgements
The authors wish to thank all members of each test forest farm for maintaining the trials, and Dr. Xiaohui Yang (Guangdong Academy of Forestry, China) and Dr. Jerry Zhang (Boston University, USA) for their suggestions on the manuscript. We also wish to express our appreciation to the anonymous reviewers and technical editors of the Journal of Forestry Research for their comments and corrections of the article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project Funding: This work has been supported by State Key Laboratory of Tree Genetics and Breeding (Northeast Forestry University) (K2013204) and co-financed with NSFC project (31470673) and Guangdong Science and Technology Planning Project (2016B070701008).
The online version is available at http://www.springerlink.com
Corresponding editor: Hu Yanbo.
Rights and permissions
About this article
Cite this article
Zhang, W., Hu, J., Yang, Y. et al. One compound approach combining factor-analytic model with AMMI and GGE biplot to improve multi-environment trials analysis. J. For. Res. 31, 123–130 (2020). https://doi.org/10.1007/s11676-018-0846-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11676-018-0846-8