Introduction

Maize (Zea mays L.) is considered the largest crop worldwide. In the 2018/2019 crop year its world yield production was 1.1 billion tons (USDA, 2019). An advantage of this crop is related to the capacity of being cultivated in a range of different environments and in different seasons. Such aspect leads to differential responses by genotypes through the environments, and is known as phenotypic plasticity (Marchal et al. 2019) or genotype by environment interaction (GEI) (Smith and Cullis, 2018).

One of the main focus in maize breeding programs is the identification of contrasting genotypes, as the hybridization among them permits the exploration of heterosis, or hybrid vigor (Grigolo et al. 2018). The genetic variability evaluation among genotypes, through multivariate statistical methods, can enable a synthetic description of the genetic relationship between genetic materials or populations (Dias et al. 1997). Then, one of the most important parameters estimated by plant breeders is the genetic dissimilarity of the genotypes selected to be used in future crosses and improve knowledge about the genetic variability of the base germplasm.

Many breeders have been selecting commercial hybrids aiming to begin a breeding program with a population with high favorable alleles frequency (Oliboni et al. 2013; Coelho et al. 2020). Through crosses among these genotypes, it will be making genes shuffling, while generating inter-population hybrids. In this way, the studying of these inter-population hybrids is a key to the breeding program development. Since their variability and complementarity will be the basis of the heterotic groups, and allow the future hybrids to be developed in the maize breeding program.

The multivariate analyses techniques permits the study of the genetic variability and clustering genotypes in an efficient way, by genetically identifying the differences between pairs and among groups of genotypes (Godoy et al. 2006; Cruz et al. 2011). They also seek to minimize intra-group variation and maximize inter-group variation (Rodrigues et al. 2017). Morphological and/or molecular marks associated to the traits of interest can be used for genetic variability analyses (Silva and Neves 2011). The choice of the method for genetic variability evaluation depends on the dataset, its collection procedure and desired precision, since there is no defined parameter for the selection of the best method for genetic variability evaluation for a specific population (Cargnelutti et al. 2008; Cruz et al. 2012).

Among the available methods of genetic variability evaluation, the Mahalanobis distance is applied to measure genetic dissimilarity, which is crucial to hierarchical clustering methods, such as Unweighted Pair Group Method with Arithmetic Mean (UPGMA), neighbor joining tree, principal component analysis and Tocher method of optimization (Silva et al. 2011; Azevedo et al. 2013). The method selected for the genetic variability study must guarantee the safety of the breeder for selecting the best parents for the crosses. By adopting more than one method for genetic variability evaluation, and if they are not concordant, the selection of the best parents will depend on the most efficient method applied (Cargnelutti et al. 2008).

There are some calculated parameters of the clustering for the determination of the clustering quality in the different methodologies adopted. For instance, the apparent error tax (TEA), by Anderson’s discriminant analysis (Anderson, 1958), is applied to measure the clustering adequacy, by quantifying the numbers of different clusters between the predicted and the realized optimized method. Another one is the cophenetic correlation coefficient (CCC), which is regarded to the distortion of the clustering process. In addition, regarding plotting dissimilarity measures, there are the stress and distortion parameters, which use the dissimilarity distance to calculate the value (Cruz et al. 2011).

All the cited parameters provide, in each methodology, relevant information about the clustering quality. Several studies have applied different genetic variability methods (Cantelli et al. 2016, Rocha et al. 2017, Gonçalves et al., 2019, Pereira et al. 2019). However, it is still rare to find clustering studies dealing with GEI through mixed models (Steckling et al., 2017). Thus, this study aimed: (1) to contrast the Tocher and UPGMA clustering methods, using the best linear unbiased prediction (BLUP), considering GEI, in maize breeding; and (2) to indicate, under these perspectives, the better crosses to create potential heterotic groups.

Materials and Methods

Experimental data

The experiments were carried out between January and July 2018 in four sites, all located in the Southwestern Goiás, Brazil (Appendix – Table A1). The climate of the region is wet temperate, with dry winters and hot summers (Cwa, according to Alvares et al. 2013).

The average annual temperature is around 21.5 °C and average rainfall is between 1400 and 2000 mm year−1. The agricultural practices were based on those used for the maize crop in experimental and commercial production in Brazil (Cruz et al. 2008). The 78 inter-population hybrids (Appendix – Table A2) and the six commercial hybrids (F1) most planted in the region were used in the experiment, which totaled 84 hybrids assessed.

In each environment (E1, E2, E3, and E4), a trial was conducted in a randomized complete block design with three replications and 44 plants per plot. The plots consisted of four 4-m rows, with a spacing of 0.40-m between plants and 0.45-m between rows. To eliminate the competition effect of each plot with its neighbor, only the two central rows were evaluated. Some cycle-related traits were analyzed: time for tasseling and time for flowering (TA and FL, the days were counted from seeding to 50% of tasseling – when more than half of the tassel releases pollen; and from seeding to 50% of flowering – when the ear starts silking and it is possible to see the silk outside of the husk), followed by the morphology traits: plant height and ear height (PH and EH, in meters – using a three-meter-high ruler and measuring from the ground to the flag leaf). Some yield related traits were also measured: ear length and ear diameter (EL and ED, in centimeters – length: mean of five unhusked ears in a row, mean of five unhusked ears; diameter: measured in the center side by side); and ear yield and grain yield (EY and GY, by the weight of the plot and conversion to hectares, kg ha−1).

Statistical analyses

The statistical analyses followed three steps: (1) restricted maximum likelihood/best linear unbiased prediction (REML/BLUP) to estimate the genotypic values and evaluate the performance of the hybrids over the environments, individually and jointly; (2) multivariate analysis by the Mahalanobis distance, aiming to increase the understanding of the genetic variability among the hybrids; and (3) application of two clustering methodologies, to comparing their performances.

(1) Restricted maximum likelihood/best linear unbiased prediction (REML/BLUP)

The estimation of variance components and prediction of the genotypic values of the traits assessed were carried out through the REML/BLUP procedure, proposed by Patterson and Thompson (1971) and Henderson (1975), respectively. The statistical model associated with the evaluation of hybrids, with one observation per plot and in a single trial, was given by the following equation (Model 1):

$$y = {\text{X}}r + {\text{Z}}g + e,$$

where y is the vector of phenotypes; r is the vector of fixed effects and comprises the replicate effect (fixed), added to the overall mean; g is the vector of genotypic effects [(assumed as random) (\(g \sim N\left( {0,\sigma_{g}^{2} } \right)\), where \(\sigma_{g}^{2}\) is the genotypic variance between hybrids; and, e is the vector of residual effects [(random),\(e \sim N\left( {0,\sigma_{e}^{2} } \right)\), where \(\sigma_{e}^{2}\) is the residual variance]. Uppercase letters refer to the incidence matrices for those effects.

The statistical model associated with the evaluation of hybrids, with one observation per plot and in a multi-environment trial, was given by the following equation (Model 2):

$${\text{y}} = {\text{X}}r + {\text{Z}}g + {\text{W}}i + e,$$

where y is the vector of phenotypes; r is the vector of replication-environment combinations (fixed), added to the overall mean; g is the vector of genotypic effects (assumed as random) [(\(g \sim N\left( {0,\sigma_{g}^{2} } \right)\)]; i is the vector of GEI effects (assumed to be random) [\(i \sim N\left( {0,\sigma_{gxe}^{2} } \right),\) where, \(\sigma_{gxe}^{2}\) is the GEI variance]; and e is the vector of residual variance [(random),\(e \sim N\left( {0,\sigma_{e}^{2} } \right)\)]. Uppercase letters refer to the incidence matrices for those effects.

For the random effects of the models, significance was tested by the likelihood ratio test (LRT), using chi-square statistics with 1 degree of freedom and 5% probability error type II (Rao 1952), as follows:

$$LRT = - 2(LogL - LogL_{R} ),$$

where \(LogL\) is the logarithm of the maximum (\(L\)) of the restricted likelihood function and \(LogL_{R}\) is the logarithm of the restricted likelihood function of the reduced models (without the genotypic or GEI effects).

In the model 1, the genotypic and residual variances (\(\hat{\sigma }_{g1}^{2}\), and \(\hat{\sigma }_{e1}^{2}\), respectively) allowed calculating the phenotypic variance (\(\hat{\sigma }_{p1}^{2}\)). In the model 2, the genotypic, GEI, and residual variances (\(\hat{\sigma }_{g2}^{2}\), \(\sigma_{gxe}^{2}\), and \(\hat{\sigma }_{e2}^{2}\), respectively) allowed calculating the phenotypic variance (\(\hat{\sigma }_{p2}^{2}\)). The broad-sense heritability (\(h_{i}^{2}\)), in percentage (%), coefficient of determination of GEI (\(c_{gxe}^{2}\)) effects, in percentage (%), genotypic correlation between performance in several environments (\(r_{gloc}\)), in percentage (%), and selective accuracy (\(r_{{\hat{g}gi}}\)), in percentage (%), were calculated by the following expressions:

\(\hat{\sigma }_{p1}^{2} = \hat{\sigma }_{g1}^{2} + \hat{\sigma }_{e1}^{2}\),


\(\hat{\sigma }_{p2}^{2} = \hat{\sigma }_{g2}^{2} + \hat{\sigma }_{gxe}^{2} + \hat{\sigma }_{e2}^{2}\),


\(h_{i}^{2} \left( \% \right) = (\hat{\sigma }_{gi}^{2} /\hat{\sigma }_{pi}^{2} )100\),


\(c_{gxe}^{2} \left( \% \right) = \left( {\hat{\sigma }_{gxe}^{2} /\hat{\sigma }_{p2}^{2} } \right)100\),


\(r_{gloc} \left( \% \right) = \left[ {\hat{\sigma }_{g2}^{2} /\left( {\hat{\sigma }_{g2}^{2} + \hat{\sigma }_{gxe}^{2} } \right)} \right]100\), and.


\(r_{{\hat{g}gi}} \left( \% \right) = \left[ {\sqrt {1 - \left( {PEV/\hat{\sigma }_{gi}^{2} } \right)} } \right]100\),where i refers to the model (1 or 2), and PEV is the prediction error variance extracted from the diagonal of the generalized inverse of the coefficient matrix of the mixed model equations.

The agreement between selected hybrids by the two models was calculated using the Cohen's kappa coefficient (K) (Cohen 1960) given by:

$$K = \left[ {\left( {A - C} \right)/\left( {D - C} \right)} \right]100,$$

where: \(A\) is the number of coincident hybrids by the two models, \(D\) is the number of hybrids selected, and C is the number of hybrids coincident due to chance (\(C\) = b \(D\), where: b is the selection intensity).

This analysis considered three different scenarios, with 3, 5 and 10 selected genotypes, which referred to selection intensity (b) of 3.57, 5.95 and 11.90%, respectively.

(2) Multivariate analysis

For multivariate analysis of genetic variability, the Mahalonobis genetic distance (Mahalanobis 1936) was used as a measure of dissimilarity, estimated from the genetic values predicted by BLUP for each trait that presented significant genotypic effect. The Mahalanobis distance (\(D_{M} )\), between the individuals i and i', was given by the following equation:

\(D_{M} \left( {i,i^{\prime}} \right) = \left[ {\left( {u_{i} - u_{{i^{\prime}}} } \right)^{^{\prime}} {\Sigma }^{ - 1} \left( {u_{i} - u_{{i^{\prime}}} } \right)} \right]^{1/2}\),where \(i\) is individual \(i\); \(i^{^{\prime}}\) is the individual \(i^{^{\prime}}\); \(u_{i}\) is the vector of the means of the eight traits evaluated for individual \(i\); \(u_{{i^{\prime}}}\) is the vector of the means of the eight traits evaluated for individual \(i^{^{\prime}}\); and, \({\Sigma }\) is the covariance matrix among traits.

The relative importance of the traits was determined via the interpretation of the eigenvectors associated with the eigenvalues, by principal component analysis (PCA), based on the predicted genotypic values (Rao 1952). It indicates the traits which contributed most for the expression of variation in the population, and the traits liable to be excluded in future works.

(3) Clustering analysis

Based on the Mahalanobis genetic distance calculated among the genotypes, it was studied the variability performance, by adopting two different clustering methodologies. It was adopted the UPGMA, a hierarchical, agglomerative and simple method, which uses the mean distances between genotypes as criteria for their allocation. The number of clusters was determined by the criteria proposed by Milligan and Cooper (1985). The analysis quality was verified based on the stability of the clustering, given by the CCC and distortion and stress levels, by comparing the cophenetic distances matrix with the original matrix.

The second method used was the Tocher method of optimization analysis. It considers, for the mutual exclusive cluster formation, one single criterion, that the intergroup distance is always higher than the intragroup distance. To evaluate the clustering quality, it was proceeded the Fisher discriminant analysis (Fisher 1938), which informed the result found by Tocher and obtained the apparent error tax (AET).

Statistical analyses were performed using the Selegen-REML/BLUP (Resende 2016) and R software (R Development Core Team 2020).

Results

Table 1 (and Appendix—Table A3) presents the variance components and genetic and non-genetic parameters estimates for all traits, in each environment and in the joint analysis. The LRT results were also presented in this table and shows which traits are significant or not in each analysis, individually and jointly. The FL trait had no significant genotypic effect (p < 0.05) on E1 and E4, which is also observed for the EL trait (E3), which has no significant genotypic effect (p < 0.05). In the joint analysis, all traits have significant genotypic effect, unlike the GEI effect, which was significant just for EY and GY traits (p < 0.05).

Table 1 Variance components and genetic and non-genetic parameters by individual and joint analyses for the traits: tasseling (TA), flowering (FL), plant height (PH), ear height (EH), ear length (EL), ear diameter (ED), ear yield (EY), and grain yield (GY), evaluated in four environments (E)
Fig. 1
figure 1

Kappa coefficients for selection of 3 (A), 5 (B), and 10 (C) genotypes

The GY trait is going to be the focus of this study, to demonstrate the variation over environments. For GY, the genotypic variance ranged from 233,962.91, in E4, to 780,380.53, in E1, whilst in the joint analysis, the genotypic variance was 327,385.9805. The residual variance presented values from 703,525.02 (E4) to 1,130,640.3391 (E3). In the joint analysis, this component was 863,359.84. The joint analysis includes the GEI effects in the model, which was 137,335.45. The variation of heritability along the environments doubled between E3 (21.51%) and E1 (41.28%). The heritability estimated by the joint analysis was 24.65%, similarly to E3 and E4. The mean selective accuracy ranged from 67.17 (E3) to 85.55% (E1) in the individual analyses, while in the joint analyses, it was 86.89%, thus overcoming the highest accuracy obtained in the individual analyses. The phenotypic means are also evidenced in Table 1, ranging from 5,594.94 (E4) to 7832.4135 (E2), while the overall mean was 7045.01.

Besides, the genetic material rankings (Appendix – Table A4) per environment and for the joint analysis evidence the high performance of some genetic materials developed in this study, compared to the commercial hybrids. At least five genetic materials figure out among the ten best genetic materials in all environments and in the joint analysis. In some cases, they appear among the five selected genetic materials.

The results of the principal component analysis are presented in Table 2. Among all environments and the joint analysis, the first eigenvalue (highlighted) presented at least 31% of accumulated importance. Their respective highest eigenvectors (highlighted) were the last, representing the GY trait, in E1, E2 and joint, or the second last, regarding the EY trait, in both E3 and E4, all around 0.50.

Table 2 Eigenvalues and eigenvectors by the principal component analysis for the traits: tasseling (TA), flowering (FL), plant height (PH), ear height (EH), ear length (EL), ear diameter (ED), ear yield (EY), and grain yield (GY). The square root (SR), square root, in percentage (%) of importance, and accumulated eigenvalues importance (Ac) among all, in percentage (%), are presented for each eigenvalue, per environment. The bold numbers represent, in Ac (%) column, the values until achieve near 80% of importance, and, in the eigenvectors, represent the highest trait value of the three first eigenvalues

The differences among the clustering by individual analyses and by the joint analysis, utilizing the two different clustering methodologies, are presented in Table 3 (and Appendix – Table A5). Both methodologies used the Mahalanobis distance based on the predicted genotypic values. For Tocher clustering, it is noted the variation of groups, from 9 (E4) to 18 groups formed (Joint analysis). The UPGMA method used the average distance to cluster the genetic materials and form the groups. Its number of groups varied around 10 groups, expect the E3, containing 8 groups. This method presents the CCC, stress and distortion, in percentage. The CCC ranged from 62.69 to 74.53%, through the environments, and the result of the joint analysis was 69.98%. The stress was below 13.07% for all environments, individually and jointly. Distortion was around 35% for all results.

Table 3 Clustering analysis methodologies, based on the Mahalanobis distance of the predicted genotypic values of the evaluated traits. Number of clusters and the quantity of hybrids composing each one, for each environment individually (E1, E2, E3, E4), and all (ALL) simultaneously. The appearance error tax (AET), in percentage, for Tocher analysis, and the cophenetic correlation coefficient (CCC), stress and distortion (Dist.), in percentage, for UPGMA analysis, are presented. The number of cluster (N Clust) is also presented for each methodology

Discussion

Considering the individual analyses for the GY trait, it was observed that the residual variance in E3 was 67% higher than in E4, while the genotypic variance in E4 accounts for 29% of this component in E1. The heritability classification ranged from low, in E3 and E4, to moderate, in E1 and E2 (Resende and Duarte 2007). The heritability almost more than doubled from E3 (21.51%) to E1 (47.65%). In the joint analysis, it was observed a slightly higher value, 24.65%. The coefficients of residual variation were used to infer about the experimental quality. Variation is observed among the environments, as it raises almost 4% from E2 to E4, and the joint analysis is between these extreme values (11.14 and 14.99%), namely, 13.19%, which is inferred as high experimental quality (Coelho et al. 2020). This discrepancy among the coefficients of residual variation reinforces the advantage of the joint analysis, which maintained a low coefficient of residual variation.

The agreement index also demonstrates the importance of the joint analysis of multi-environment trials data, which presents low agreement values in all three scenarios, mainly when it is related to E4, reaching 0.20 of agreement when five hybrids are selected, in all other environments. Considering the agreement coefficients, it is important to highlight that the selection by the joint analysis and the individual selection (per environment) led to the selection of different hybrids. It demonstrates the necessity of considering the joint analysis for more accurate genetic selection and recommendation, since the joint analysis allows evaluating the GEI. The GEI significance was confirmed by the LRT for EY and GY.

The different clustering methods provided distinct genetic variability results. Some studies also found differences among methods, when their clustering results were compared (Bhering et al. 2015; Oliveira et al. 2016; Silva et al. 2017). This is due to the multiple ways of determining the number of clusters by hierarchical methods and the Tocher methodology follows a pre-defined fixed methodology, making it impossible to establish a relationship between these two clustering methods. In the agglomerative methodologies, the first clusters present the highest number of genotypes.

The principal component analysis (PCA) allows genotype projection over a 2D-plot by obtaining coordinates and infers about the importance of the traits. Therefore, it indicates which traits can be excluded in the future analysis due to redundancy. The discussion of PCA projection in 2D or 3D-plots is not the aiming of this study, but is worth to remember that the minimum acceptable value of the total variation explained by the eigenvalues is 80% (Cruz et al. 2012).

The interpretation of eingenvalues, from the PCA, corroborates that EY is the most redundant trait and liable to be excluded in future works. The presence of the highest eingenvectors in the eigenvalues which carry the lowest variance explains EY redundancy. It is also explained by the high correlation found in other morphological traits, such as PH.

The different clustering methods provided distinct results. Some studies evidence some differences in the results among the clustering methodologies (Bhering et al. 2015; Oliveira et al. 2016; Silva et al. 2017), since the variation of determination of the cluster numbers in the hierarchical methods and Tocher follows a pre-fixed methodology. This difference among the methodologies for clustering determination makes it impossible to establish a relationship between these clustering methods. The similarity between them is the fact of being agglomerative, which makes the first clusters contain the higher number of genotypes.

The environment E3 presented the best clustering results in both clustering methods, since it is indicated by the “quality parameters”, and because of its experimental quality. The high experimental quality of E3 conducted to the least experimental error, when compared to the other environments, which provided results that corroborated the clustering quality. Reduced stress values, distortion and AET, besides the high CCC, were the parameters used in this study. According to Cruz et al. (2011), stress and distortion values should be below 20%; AET, inferior to 5%; CCC, on the contrary, should be higher than 80%. These referential values are close to those found in this study. E1 and E4 presented results opposite to those at E3 (stress, distortion and AET above, and CCC inferior), due to their experimental quality.

Even the joint analysis does not present the best results referring to the clustering quality. It is important to point out the importance of adopting the joint analysis in genetic variability studies, since it considers the GEI effects. It was observed, by the joint analysis results, that the environmental conditions are considered in the model and influence to encounter average values. The best and worst environments are considered, but the average among all is given by the joint analysis. Some studies considered the GEI effects and reinforces the importance of GEI over some traits, as confirmed in EY and GY traits, in this study. However, few studies have discussed and compared the impact of GEI effect on genetic variability studies (Bueno et al., 2013; Steckling et al., 2017), which highlights the importance of considering this effect on genetic variability studies in maize.

As defined by Melchinger and Gumber (2015): “a heterotic group denotes a group of related or unrelated genotypes from the same or different populations”. In this way, understanding the genetic diversity would allow the breeder to access some information regarding to the genotypes relationship. These authors also point out: "By comparison, the term heterotic pattern used herein refers to a specific pair of two heterotic groups, which express high heterosis and consequently high hybrid performance in their cross”. Where, it would be helpful to understand their relationship to make heterotic groups and as mentioned, to achieve: “high heterosis and consequently high hybrid performance in their cross”.

This study evidences the importance of the joint analysis instead of individual analyses. GEI must be considered in genetic variability evaluation, since it is frequently significant. GY trait, as observed, was the most important eigenvector in the first eigenvalue for E1, E4 and, mainly, in the joint analysis, thus confirming the importance of considering the GEI effect to develop the clustering analysis. The consideration of the GEI effects enriches the reliability of genetic variability evaluation and improves the clustering indexes, for Tocher or UPGMA analysis.

The recommended crosses are based on genetic distance, which is provided by the clusters containing them, and on the genetic material ranking of the joint analysis, based on the GY trait. In other words, the farther their genetic distance, the more complementary they are when combined, which can result in offspring with more favorable genes. Considering the UPGMA, the 11 × 65 cross was indicated, where, both are present in the ten selected genetic materials and have good genetic divergence. 11 × 62 and 11 × 32 would be other cross possibilities, being in different clusters, and well ranked for the GY trait. Considering Tocher, 11 × 20, 11 × 7, 11 × 65, 11 × 45, 65 × 20, 65 × 7 and 65 × 45 are among the ten selected genetic materials and present good genetic divergence, as they figured out in different clusters.

To conclude, both methodologies presented similar crosses recommendations. The cross between the inter-population hybrids 11 and 65 have high potential to be good population founders, indicated by both methodologies. The next step could be the begin of two populations starting from these two inter-population hybrids indicated by the analyses as potential crosses.