Introduction

In Venezuela, sugarcane is cultivated under different soil types, fertility levels and humidity. The unequal performance of genotypes in different environments (genotype × environment interaction or GEI) in yield assays is a challenge for breeders. It has been shown that GEI reduces selection progress and complicates the identification of superior cultivars of sugarcane in regional trials (Rea et al. 2011; Rea and De Sousa-Vieira 2002). When GEI is present, one of the options is to use stability analysis to identify cultivars with higher and more stable yields. Therefore, several statistical methods have been proposed and used to study adaptation and stability of varieties in different locations (Roostaei et al. 2014; Dolédec and Chessel 1994; Dray et al. 2003b). Some multivariate models such as additive main effects and multiplicative interaction (AMMI), site regression model (SREG) and coinertia analysis have been used for the interpretation of GEI. The AMMI and SREG models are similar, but in SREG the linear terms of genotypes (G) and environments (E) are not considered individually, instead they are added to the multiplicative term of genotype × environment (GE) interaction (Crossa et al. 2002; Silveira et al. 2013; Yan 2011).

AMMI models are capable of measuring the weight of the environments, the genotypes and their interactions throughout a value that measures how stable a genotype is in all environments in terms of yield. In contrast, the SREG emphasizes the behavior of genotypes through a regression upon locations, which is important for studying the possible existence of different megaenvironments in a region (Yan et al. 2007). In fact, SREG models include G + GE or E + GE in the bilinear terms and provide a graphical analysis of easy interpretation called GGE biplot (Yan and Tinker 2006) that has been used in many cultivar × environment interaction studies (Akcura et al. 2011; Jalata 2011; Mortazavian et al. 2014; Rodríguez et al. 2012; Ramburan and Zhou 2011; Rao et al. 2011; Rea et al. 2011). The biplot representation displays the grouping of genotypes and environments with similar response patterns and permits to identify the most representative and most discriminating environments (Yan 2011). GGE biplot is developed from the first two principal components (PC) of the SREG model. Genotypes close to each other in the biplot indicate similar response patterns across environments. Meanwhile, nearby environments exhibit an acute angle between them, indicating similar environmental conduct. The lack of association between environments is given by a 90° angle between vectors and negative association by angles greater than 90°–180° (Ibáñez et al. 2006). Genotypes located on the vertices of the polygon performed either the best or the poorest in one or more environments (Yan and Tinker 2006). Perpendicular lines drawn on each side of the polygon form groups of locations or genotypes with similar behavior. In turn, coinertia analysis is a multivariate method that identifies trends or co-relationships in multiple datasets that contain the same samples and simultaneously finds ordinations (dimension reduction diagrams) from the datasets that are most similar. It does this by finding successive axes from the two datasets with maximum covariance (Culhane et al. 2003; Dray et al. 2003a). Separate analyses find axes maximizing inertia in each hyperspace. These axes are projected in a subspace, and each individual is represented by an arrow, where the beginning of the arrows is the position of the variety described by one data matrix and the arrowhead is the position of the variety described by the other matrix (Dolédec and Chessel 1994). The aim of this study was to identify megaenvironments (ME) and to determine optimal genotypes and environments for higher sugarcane yield measured by tons of cane per hectare (TCH) and percentage of sucrose (Pol% cane).

Materials and Methods

Material Selection

To carry out this work, data from the regional (outfield) trials of sugarcane in the last stage of selection breeding program at the National Institute for Agricultural Research (INIA) were used. These assays were performed using a randomized complete block design with three replicates and experimental units of 45 m2 (3 rows of 10 m with 1.5 m spacing). These assays were conducted for 2 years with successive cuts (plant and first ratoon).

Locations and Genotypes

The trials locations in Venezuela were: Quebrada Arriba (QA) and Montaña Verde (MV) located in Lara state; Las Majaguas (LM), Ivonne (Iv) and Castillera (Ca) in Portuguesa state; Santa Lucía (SL) and FUNDACAÑA (FC) in the Yaracuy state, and Los Tamarindos (LT) in Aragua state (Table 1). The main characteristics of soils and rainfall of these areas are also given in Table 1. The evaluated experimental material consisted of the following clones: V91-1 (G1), V91-2 (G2), V91-6 (G3), V91-8 (G4), V91-15 (G5), V98-62 (G6), V98-86 (G7), V98-120 (G8), V99-117 (G9), V99-190 (G10), V99-203 (G11), V99-208 (G12), V99-213 (G13), V99-217 (G14), V99-236 (G15), V99-245 (G16), V00-50 (G17) and three reference clones: B80-408 (G18), C323-68 (G19) and CP74-2005 (G20).

Table 1 Main characteristics of soil and precipitation/year of the locations tested

Analyzed Yield Variables

The analyzed variables were agronomic performance in tons of cane per hectare (TCH), determined by weighing the effective area of each experimental unit at harvest time and the estimate of industrial yield (sugar) through Pol% cane, determined by laboratory tests on samples composed of 10 stalks per experimental unit.

Statistical Analyses

To evaluate yield (TCH and Pol% cane) and to consider general and specific adaptability and possible groupings of environments, AMMI and GGE biplot methodologies were performed (Crossa et al. 2002; Yan 2011). AMMI models were performed using R package “agricolae” (De Mendiburu 2015). SREG-GGE biplot analyses were performed using R package “GGEBiplotGUI” (Frutos et al. 2014). To identify genotypes and environments with positive relationship between both yield characteristics simultaneously (TCH and Pol% cane), coinertia analysis (Dolédec and Chessel 1994; Dray et al. 2003b) was performed using R package “ade4” (Dray and Dufour 2007; Chessel et al. 2004; Dray et al. 2007). For the selection of genotypes, it was used two n × p matrices, one for each yield variable, where the genotypes were classified as individuals and the environments were the variables. On the other hand, for best environment determination, the above matrices were transposed so the environments were the individuals and the genotypes the variables. Then, each matrix was used for a principal components analysis (PCA), and then, each pair of matrices was constrained into a subspace of maximal coinertia and maximal correlation between both yield variables. To determine the best suited genotypes for the best environment taking into account both yield variables, a covariance analysis was performed, applying the formula:

$$ (G_{ij} \,_{\text{TCH}} - \overline{{G_{i} }} \,_{\text{TCH}} ) (G_{ij} \,_{{{\text{Pol}}\,\% }} - \overline{{G_{i} }} \,_{{{\text{Pol}}\,\% }} ) $$

where \( {\text{G}}_{{ij\,{\text{TCH}}}} \) is the value of TCH of a genotype i in an environment j. \( \overline{{{\text{G}}_{i} }} \,_{\text{TCH}} \) is the mean value of TCH of a genotype i in all environments. \( {\text{G}}_{ij} \,_{{{\text{Pol}}\,\% }} \) is the mean value of Pol% cane of a genotype i in an environment j. \( \overline{{{\text{G}}_{i} }} \,_{{{\text{Pol}}\,\% }} \) is the mean value of Pol% cane of a genotype i in all environments.

All three programs “agricolae,” “GGEBiplotGUI” and “ade4” were developed as Comprehensive R Archive Network (CRAN) in R software.

Results and Discussion

Interrelationship Between Genotypes and Locations

The average sugarcane yield (TCH) and Pol% cane were significantly affected by environmental and genotypic effects (p ≤ 0.01). Environmental effects accounted for 49.50 and 32.77 %, while genotypic effects explained 32.27 and 49.29 % of the total (E + G + GEI) variation (Table 2). This response to environmental and genotypic effects coincides with those found by Rea et al. (2011). Average TCH yields of genotypes ranged from 89.78 for V99-245 to 142.04 for V98-120, while TCH yield per location ranged from 93.87 for Santa Lucía to 146.11 for Los Tamarindos (Table 3). The average sugar content expressed in Pol% cane ranged from 12.29 % for V91-15 to 15.51 % for V99-245, while Pol% cane per location ranged from 12.54 % in Montaña Verde to 14.63 % in Santa Lucia (Table 4).

Table 2 Mean squares analysis of variance for cane yield (TCH) and Pol% cane
Table 3 Average yield in tons of cane per hectare (TCH) of twenty sugarcane genotypes at eight locations in two cycles of harvest
Table 4 Average yield in Pol% cane of twenty sugarcane genotypes at eight locations in two cycles of harvest

Also, the AMMI stability values (ASVs) are reported, noticing that for the case of TCH (Table 3), there are genotypes with high values of ASV, mainly G5 and G12, which means more unstability; in other words, the environments tend to affect more strongly the sugarcane growth rate. In the case of yield measured in Pol% cane (Table 4), all genotypes are relatively low (stable) in all studied environments which means the sucrose content is not so affected by environment like TCH.

Biplots of AMMI models for both yield variables were generated using genotypic and environmental scores of the first two AMMI components (Rea et al. 2011). AMMI stability values of the genotypes in all environments are graphically represented for both yield variables (Figs. 1, 2). The statistically stable genotypes are represented by points near the origin in the AMMI2 biplot, with values near zero for the two axes of interaction (IPCA1 and IPCA2). Distribution of genotype points in the AMMI2 biplot for TCH (Fig. 1) revealed that the genotypes G4, G8, G1, G20 and G17 scattered close to the origin, indicating minimal influence of these genotypes with environments. The remaining 15 genotypes scattered away from the origin in the biplot, indicating that the genotypes were more sensitive to environmental interactive forces. Meanwhile, distribution of genotype points in the AMMI2 biplot for Pol% cane (Fig. 2) revealed that all genotypes are close to the origin (note the scales of both axis), indicating that all are stable in terms of sucrose production, but not the environment.

Fig. 1
figure 1

AMMI biplot for TCH yield of the first two GE interaction principal components axes of 20 genotypes and 8 environments. The biplot reveals stability of yield just for G4, G8, G1, G20 and G17; the remaining 15 genotypes scattered away from the origin in the biplot, indicating that the genotypes were more sensitive to environmental interactive forces. Principal components explained 55.1 % of GEI interaction. G1 V91-1; G2 V91-2; G3 V91-6; G4 V91-8; G5 V91-15; G6 V98-62; G7 V98-86; G8 V98-120; G9 V99-117; G10 V99-190; G11 V99-203; G12 V99-208; G13 V99-213; G14 V99-217; G15 V99-236; G16 V99-245; G17 V00-50; G18 B80-408; G19 C323-68; G20 CP74-2005. QA Quebrada Arriba, MV Montaña Verde, LM Las Majaguas, Iv Ivonne, Ca Castillera, SL Santa Lucía, FC FUNDACAÑA, LT Los Tamarindos

Fig. 2
figure 2

AMMI biplot for Pol% cane yield of the first two GE interaction principal components axes of 20 genotypes and 8 environments. It can be observed in the biplot the low values of the both axes, indicating that all genotypes are represented close to the origin; this reveals stability for all genotypes, indicating that the environment does not affect majorly this yield variable. This was also corroborated by the relatively low value of the explained variation percentage in the analysis of variance (Table 2). Principal components explained 54.1 % of GEI interaction. G1 V91-1; G2 V91-2; G3 V91-6; G4 V91-8; G5 V91-15; G6 V98-62; G7 V98-86; G8 V98-120; G9 V99-117; G10 V99-190; G11 V99-203; G12 V99-208; G13 V99-213; G14 V99-217; G15 V99-236; G16 V99-245; G17 V00-50; G18 B80-408; G19 C323-68; G20 CP74-2005. QA Quebrada Arriba, MV Montaña Verde, LM Las Majaguas, Iv Ivonne, Ca Castillera, SL Santa Lucía, FC FUNDACAÑA, LT Los Tamarindos

Megaenvironment and Genotype Representativeness Determination

As reported previously (Yan et al. 2007), the best methodology for megaenvironment and genotype representativeness determination is GGE biplots. Figures 3 and 4 show the “which-won-where” SREG biplots of TCH and Pol% cane, respectively, in which it can be observed what genotype performed better in what location for the yield variable measured. The angle between the vectors of two environments is related to the correlation coefficient between them (Kempton 1984; Yan 2002). The distance between two environments (locations) measured by the cosine of the angle between the vectors indicates their similarity or dissimilarity in discriminating the genotypes (Yan and Tinker 2006). For the case of TCH (Fig. 3), the eight locations can be grouped into two megaenvironments, one conformed by QA and FC, in which G11, G19, G10, G15, and in major extent G6 and G12 are the best genotypes. The other megaenvironment is formed by MV, LM, SL, Iv, Ca and LT, in which G8, G17 and G13 are the best genotypes. Due to the establishment of two megaenvironments and based on the length of the QA and LT vectors, these would be the ideal environments for selecting and producing genotypes adapted specifically to both megaenvironments. LM is the most representative environment, and G8 and G12 are the best genotypes in terms of TCH yield.

Fig. 3
figure 3

Site regression GGE biplot based on symmetrical scaling for the which-won-where pattern for TCH yield. In black are shown the genotypes and in blue are shown the studied environments. Biplot based on a “Tester-centered (G + GE)” table, without any scaling and dual metric preserving. The perpendicular lines to each side of the polygon divide the biplot into several sectors to allow visualization. Principal components explained 78.7 % of TCH variation. Note the formation of two megaenvironments. G1 V91-1; G2 V91-2; G3 V91-6; G4 V91-8; G5 V91-15; G6 V98-62; G7 V98-86; G8 V98-120; G9 V99-117; G10 V99-190; G11 V99-203; G12 V99-208; G13 V99-213; G14 V99-217; G15 V99-236; G16 V99-245; G17 V00-50; G18 B80-408; G19 C323-68; G20 CP74-2005. QA Quebrada Arriba, MV Montaña Verde, LM Las Majaguas, Iv Ivonne, Ca Castillera, SL Santa Lucía, FC FUNDACAÑA, LT Los Tamarindos (color figure online)

Fig. 4
figure 4

Site regression GGE biplot based on symmetrical scaling for the which-won-where pattern for Pol% cane yield. In black are shown the genotypes and in blue are shown the studied environments. Biplot based on a “Tester-centered (G + GE)” table, without any scaling and dual metric preserving. The perpendicular lines to each side of the polygon divide the biplot into several sectors to allow visualization. Principal components explained 81.83 % of Pol% cane variation. Note that despite the environment variability, there is no megaenvironments formation. G1 V91-1; G2 V91-2; G3 V91-6; G4 V91-8; G5 V91-15; G6 V98-62; G7 V98-86; G8 V98-120; G9 V99-117; G10 V99-190; G11 V99-203; G12 V99-208; G13 V99-213; G14 V99-217; G15 V99-236; G16 V99-245; G17 V00-50; G18 B80-408; G19 C323-68; G20 CP74-2005. QA Quebrada Arriba, MV Montaña Verde, LM Las Majaguas, Iv Ivonne, Ca Castillera, SL Santa Lucía, FC FUNDACAÑA, LT Los Tamarindos (color figure online)

In the case of Pol% cane (Fig. 4), just one megaenvironment formation occurs. Based on the length of the vectors, it can be said that FC is the most representative of all locations tested and G18, G15, G20 and G16 are the best adapted genotypes to this environment.

Selection of Best Genotypes in Terms of Both Yield Variables

As can be noticed, the best genotypes for TCH (Fig. 3) turned out to be the worst for Pol% cane (Fig. 4), situation that results challenging for the selection of genotypes with positive relationship between both yield variables. To select the best genotypes, a coinertia analysis was performed (Dolédec and Chessel 1994; Dray et al. 2003b).

Coinertia analysis is a multivariate method often neglected, because it is a bit difficult to interpret, but is a very powerful methodology based on a different strategy of data analysis in which, instead of creating a single table with all variables, they are grouped according to a particular characteristic and are treated as two hyperdimensions, which then are coupled so that the amount of information gathered by the axes of coinertia is relatively large, making it very efficient.

Coinertia analysis is very flexible and allows many possibilities for coupling, besides is suitable for quantitative and/or qualitative or fuzzy variables. Moreover, various weighting of sites and various transformations and/or centering of species data are available for this method. Hence, more biological considerations can be taken into account in the statistical procedures. Moreover, the principle of this method is very general and can be easily extended to the case of distance matrices (Dray et al. 2003b). To apply coinertia, two data matrices were generated, one for each yield variable, in order to perform a principal components analysis (PCA) for both data matrices. Separate analysis of each data table permits to create two hyperspaces, one for each yield variable. In each hyperspace, it can be determined an axis, which is the vector direction maximizing the projected variability or inertia. Both axis can be isolated and plotted in a multidimensional subspace so that the covariance between the two new sets of projected scores is maximal. This maximal covariance means a maximal correlation between both yield variables (Fig. 5). Coinertia analysis explained 80.09 and 12.48 % of the observed inertia in the TCH hyperspace (X matrix in x axis) and the Pol% cane hyperspace (Y matrix in y axis), respectively. The beginning of the arrows is the position of the genotypes described by the TCH data matrix, and the arrowhead is the position of the genotypes described by the Pol% cane data matrix. Despite the high quality of representation, measured by the high quantity of information gathered by the two axes, the maximal correlation between the behavior of both yield variables is very low; this is explained by the low value of the Rv Escoufier similarity coefficient 0.07817213, corroborating that the best genotypes for a variable result in the worst for the other one. Nevertheless, it can yet be inferred that the best genotypes with both yield variables positively correlated must be represented as an arrow toward the right from the upper left to the upper right quadrant. In this sense, the best genotypes for all environments are G15, G17 and G13 in that order. These results demonstrate not only the best adapted genotypes to all environments, but also the large capability of the proposed strategy for data analysis in this paper, which succeeds in finding the greatest differences between genotypes minimizing the differences between the environments in terms of both yield variables.

Fig. 5
figure 5

Coinertia analysis of the genotypes combining TCH and Pol% cane yield variables. a, b Scatterplots represent the coefficients of the combinations of the variables (environments) for each data matrices to define the coinertia axes. Separate analyses find axes maximizing inertia in each hyperspace. These axes of maximum inertia are projected in (c) scatterplot on which the genotypes are also projected. The beginning of the arrows is the position of the genotypes described by the TCH data matrix, and the arrowhead is the position of the genotypes described by the Pol% cane data matrix. The analysis explained 80.095 % in the TCH hyperspace and 12.478 % in the Pol% cane hyperspace of the observed inertia with a Rv Escoufier similarity coefficient of 0.07817213. G1 V91-1; G2 V91-2; G3 V91-6; G4 V91-8; G5 V91-15; G6 V98-62; G7 V98-86; G8 V98-120; G9 V99-117; G10 V99-190; G11 V99-203; G12 V99-208; G13 V99-213; G14 V99-217; G15 V99-236; G16 V99-245; G17 V00-50; G18 B80-408; G19 C323-68; G20 CP74-2005. QA Quebrada Arriba, MV Montaña Verde, LM Las Majaguas, Iv Ivonne, Ca Castillera, SL Santa Lucía, FC FUNDACAÑA, LT Los Tamarindos

Selection of Best Environment for Both Yield Variables

Another strategy of determination of the best suited genotype was to determine the environment with major positive correlation between both TCH and Pol% cane variables and then determining the genotypes most adapted to that environment.

In this case, another coinertia analysis was performed but with the transposed matrices of the two yield variables so the hyperspaces were constructed by the genotypes information to generate a subspace analysis describing the environments behavior (Fig. 6). Coinertia analysis explained 81.482 % in the TCH hyperspace and 10.617 % in the Pol% cane hyperspace, respectively. Also in this case, the maximal correlation between the behaviors of both yield variables is low, Rv Escoufier similarity coefficient of 0.2203166, but taking into account the arrows interpretation explained above, the better environment must be represented as an arrow toward left from the upper right to the upper left quadrant. In this sense, the best environment, with both yield variables correlated positively, is Los Tamarindos (LT).

Fig. 6
figure 6

Coinertia analysis of the environments combining TCH and Pol% cane yield variables. a, b Scatterplots represent the coefficients of the combinations of the variables (genotypes) for each data matrices to define the coinertia axes. Separate analyses find axes maximizing inertia in each hyperspace. These axes of maximum inertia are projected in (c) scatterplot on which the environments are also projected. The beginning of the arrows is the position of the environments described by the TCH data matrix, and the arrowhead is the position of the environments described by the Pol% cane data matrix. The analysis explained 81.482 % in the TCH hyperspace and 10.617 % in the Pol% cane hyperspace of the observed inertia with a Rv Escoufier similarity coefficient of 0.2203166. G1 V91-1; G2 V91-2; G3 V91-6; G4 V91-8; G5 V91-15; G6 V98-62; G7 V98-86; G8 V98-120; G9 V99-117; G10 V99-190; G11 V99-203; G12 V99-208; G13 V99-213; G14 V99-217; G15 V99-236; G16 V99-245; G17 V00-50; G18 B80-408; G19 C323-68; G20 CP74-2005. QA Quebrada Arriba, MV Montaña Verde, LM Las Majaguas, Iv Ivonne, Ca Castillera, SL Santa Lucía, FC FUNDACAÑA, LT Los Tamarindos

Once it is determined that Los Tamarindos (LT) was the best environment for growth (TCH) and production (Pol% cane) of sugarcane, it was decided to establish the best suited genotypes for that particular environment. A covariance analysis per genotype between the yield variables was performed. It can be seen that in LT all genotypes showed positive numbers, which means that in this environment all genotypes exhibited results above the mean of all environments to both, TCH and Pol% cane. The more positive the result, the more efficient the genotype for yield in TCH and Pol% cane in this environment, so G11, G13 and G12 could be considered the better genotypes for successive selection processes throughout adaptation and stability (Table 5).

Table 5 Covariance analysis between both yield variables to determine best adapted genotype to “Los Tamarindos (LT)” environment

As demonstrated, this data analysis strategy is robust and permits to determine the best genotype for a particular environment or vice versa, depending of the breeding program needs. Nevertheless, for studies concerning national production, the genotypes selection process must be considering the best adapted for the megaenvironments determined.

Certainly, the environment (when GEI is present) is one of the most influential factors affecting genotype selection. Although a hurdle, it should not be seeing as a problem but as a challenge, grouping the set of variables as any other. It is just necessary to apply the adequate tools to solve and to understand the situation. Here, we demonstrate that besides AMMI models and GGE biplots, it is very necessary to add another statistical methodology like two-table coupling methods (coinertia) that expose a lot more information useful for breeders in this field of genotype × environment interaction studies. AMMI and GGE models permit to determine stability characteristics, megaenvironments and determining the best adapted genotype to some environment, but if coinertia analysis is added, more thorough conclusions can be obtained, specially because it is necessary to select genotypes with both yield variables positively correlated; in other words, it is useless to obtain genotypes with large quantities of TCH if the sucrose quantities are low or genotypes with high sucrose content but with a low growth rate. No doubt the use of the coinertia along with AMMI and GGE methodologies could enhance breeding programmes to obtain better productivity levels, always maintaining growth rates, to fulfill required needs.