Introduction

Regional crop variety trials are routinely conducted to provide basis for variety release and recommendation. How many test locations are needed in a regional variety trial and how many replicates are needed within trials are perennial questions plant breeders and regional variety trial coordinators have asked for many years. More than 60 years ago Sprague and Federer (1951) investigated the optimum numbers of years, test locations, and replicates within trials, on the basis of a fixed number of field plots (or available funds) and genotypes. Likewise, Hanson and Brim (1963) investigated the optimal allocation of resources between test locations and replicates within trials, based on the relative magnitude of various variance components. A similar approach, considering the relative cost of adding one replicate per trial versus that of one extra location and the relative extent of the experimental error variance versus the genotype-by-location interaction variance, was discussed in Wricke and Weber (1986). The same approach was adopted by recent studies such as Mi et al. (2011), who studied the optimal allocation of test resources among the numbers of crosses, genotypes, and test locations, under the mixed model framework. These studies used a fixed number of experimental units such that the estimated numbers are a function of the number of genotypes to be tested. Solutions from this approach may be optimal in terms of relative allocation of the resources among replicates, locations, and years, but may not be optimal in terms of absolute numbers of replicates and locations. Johnson (1997) adopted a different approach, who studied the optimum number of test locations for a tree crop by examining the empirical relationship between the number of test locations and genetic gains.

This study had three objectives. The first was to develop formulas for estimating the optimum number of test locations for a given target region and the optimum number of replicates within trials. The second objective was to apply the formulas to determine the optimum numbers of replicates and test locations for the oat (Avena sativa L.) variety registration trials for eastern Canada.

The oat growing regions in eastern Canada covers four provinces: Quebec (QC), Ontario (ON), New Brunswick (NB), and Prince Edward Island (PE). The oat breeding program in the Eastern Cereal and Oilseed Research Center (ECORC) of Agriculture and Agri-Food Canada is the lone publically funded program in eastern Canada and has the mandate to breed improved oat cultivars for the whole region. Based on GGE [genotypic main effect (G) plus genotype-by-environment interaction (GE)] biplot analysis of the 2006–2008 yield data from the ECORC oat registration trials, it was concluded that the oat growing regions in eastern Canada consists of two distinct mega-environments: the southern and eastern Ontario as one, with latitudes around or below 45°N, and the rest of eastern Canada as the other, with latitudes above 45°N and up to 48°N (Yan et al. 2010). Meaningful genotype evaluation and test location evaluation can be achieved only when conducted within mega-environments (Annicchiarico et al. 2005; Yan et al. 2007; Yan 2014a). Similarly, estimation of the optimum number of test locations is meaningful only when conducted within mega-environments. Therefore, the third objective of this study is to validate the mega-environment differentiation of Yan et al. (2010) using independent data and to determine the optimum number of test locations within mega-environments.

Materials and methods

Determining the optimum number of replicates within a trial

Heritability (H, also abbreviated as \( h^{2} \)) is the most fundamental concept in quantitative genetics related to plant breeding. It is defined as the proportion of the genotypic variance in the total phenotypic variance for a given trait, estimated across a set of genotypes in a given trial or across trials representing a target region. H takes values in the range of [0, 1]. H = 1 means all observed differences among genotypic means are due to genetic differences, no matter how small they are; H = 0 means all observed differences are due to noise rather than genetic differences, no matter how large they may appear. H is the ultimate measure of the effectiveness of the trial(s) in genotype evaluation; all measures taken in crop variety trials, from experimental design, implementation, to data analysis, have a single objective, that is, to improve the heritability of the trials, on the basis that the test locations are selected to represent the target region.

Heritability within a trial is calculated by (DeLacy et al. 1996)

$$ {\text{H}} = \frac{{\sigma_{\text{g}}^{2} }}{{\sigma_{\text{g}}^{2} + \frac{{\sigma_{\epsilon }^{2} }}{{{\text{N}}_{\text{r}} }}}} , $$
(1)

where H is the heritability of the trial for the trait of interest, \( \sigma_{\text{g}}^{2} \) is the genotypic variance, \( \sigma_{\epsilon }^{2} \) is the error variance, and \( {\text{N}}_{\text{r}} \) is the number of replicates in the trial. From this equation, the number of replicates required to achieve a certain level of heritability can be determined by

$$ {\text{N}}_{\text{r}} = \left( {\frac{{\sigma_{\epsilon }^{2} }}{{\sigma_{\text{g}}^{2} }}} \right)\frac{\text{H}}{{1 - {\text{H}}}} = Q_{r} \frac{\text{H}}{{1 - {\text{H}}}} . $$
(2)

This equation indicates that the needed number of replicates \( {\text{N}}_{\text{r}} \) to achieve a certain level of heritability H is a linear function of the error variance versus the genotypic variance quotient (\( {\text{Q}}_{\text{r}} \)) and a curvilinear function of the target level of H. Equation (2) can also be written as,

$$ {\text{H}} = \frac{{{\text{N}}_{\text{r}} }}{{{\text{N}}_{\text{r}} + Q_{r} }}. $$
(2a)

The relationship among N, H, and Q (ignoring the subscripts) depicted in these equations is illustrated graphically in Fig. 1. In general, at a given level of Q, H increases in response to the increase of N. However, the response rate is greatest in the beginning and is gradually diminished as H becomes larger. At Q = 0 we have H = 1 at any N ≥ 1, thus any N > 1 is a waste of resources. At any level of Q > 0, the H versus N curve may be viewed as consisting of two segments: a replication-sensitive segment in which H is improved effectively by an increase in N, followed by a replication-insensitive segment in which H is improved slowly or even unnoticeably by an increase in N. For each Q level there exists an H level at which the transition between the two segments occurs. For example, at Q = 0.1, the transition appears to occur around H = 0.95 and the corresponding N needed to achieve this level of H is about 2 (Fig. 1). Thus N = 2 may be regarded as the optimum number of replicates at Q = 0.1. At the Q = 1 level, the transition occurs around H = 0.85 with a corresponding N = 4. At Q = 3, the transition occurs around H = 0.75 with N = 9; at Q = 5, the transition occurs around H = 0.68 with N = 11, and so on.

Fig. 1
figure 1

The curvilinear relationship between heritability (H) and the number of replicates (or test locations, N) needed to achieve it, at various levels of noise-signal quotient (Q)

Another observation from Fig. 1is that at a higher Q level the response rate of H to the increase in N is weaker in the replication-sensitive segment but stronger in the replication-insensitive segment, as compared to that at a lower Q level. In other words, the transition between the two segments becomes less obvious at a higher Q level; it is most obvious when Q ≤ 3. Therefore, although the N level at the transition point at a low Q level (e.g., Q ≤ 3) may be regarded as the optimum number of replicates, it is an underestimate of the optimum number of replicates at higher Q levels (Q > 3).

Based on these observations, we set H = 0.75 as the target heritability and consider the number of replicates required to achieve H = 0.75 as the optimum number of replicates. According to Eq. (2) this value can be determined by

$$ {\text{N}}_{{{\text{r}}, {\text{H}} = 0.75}} = 3\left( {\frac{{\sigma_{\epsilon }^{2} }}{{\sigma_{\text{g}}^{2} }}} \right) = 3{\text{Q}}_{\text{r}} . $$
(3)

A potential problem with this formula is that the estimated optimum number of replicates would be 0 when \( {\text{Q}}_{\text{r}} \) = 0, which should be 1, of course (Fig. 1). To accommodate this potential exception, we set \( {\text{N}}_{{{\text{r}}, {\text{H}} \,= \,0.75}} = 1 \) if it is estimated to be less than 1.

Determining the optimum number of test locations for a target region

The heritability for a given trait across test locations in a single year is defined as (DeLacy et al. 1996)

$$ {\text{H}} = \frac{{\sigma_{\text{g}}^{2} }}{{\sigma_{\text{g}}^{2} + \frac{{\sigma_{\text{ge}}^{2} }}{{{\text{N}}_{\text{e}} }} + \frac{{\sigma_{\epsilon }^{2} }}{{{\text{N}}_{\text{e}} {\text{N}}_{\text{r}} }}}} , $$
(4)

where \( \sigma_{\text{g}}^{2} \) is the genotypic variance, \( \sigma_{\epsilon }^{2} \) the experimental error variance, \( \sigma_{\text{ge}}^{2} \) the genotype-by-location interaction variance, \( {\text{N}}_{\text{e}} \) the number of test locations, and \( {\text{N}}_{\text{r}} \) the number of replicates within trials. Equation (4) can thus be written as:

$$ {\text{N}}_{\text{e}} = \left[ {\frac{{\sigma_{\text{ge}}^{2} + \sigma_{\epsilon }^{2} /{\text{N}}_{\text{r}} }}{{\sigma_{\text{g}}^{2} }}} \right]\frac{\text{H}}{{1 - {\text{H}}}}. $$
(5)

Given the relationships between H and \( {\text{N}}_{\text{e}} \) (Fig. 1), the number of test locations needed to achieve an H of 0.75 may be considered as the optimum number of test locations for a target region, which can be determined by

$$ {\text{N}}_{{{\text{e}}, {\text{H}} = 0.75}} = 3\left[ {\frac{{\sigma_{\text{ge}}^{2} + \sigma_{\epsilon }^{2} /{\text{N}}_{\text{r}} }}{{\sigma_{\text{g}}^{2} }}} \right] = 3{\text{Q}}_{\text{e}} , $$
(6)

where \( {\text{Q}}_{\text{e}} \) is the noise-signal quotient at the multilocation level (Yan 2014a). It consists of two sources of noise: GE and experimental error. When replicated data are not available such that the experimental error variance cannot be estimated, the following may be assumed for the sake of convenience:

$$ \frac{{\sigma_{\epsilon }^{2} /{\text{N}}_{\text{r}} }}{{\sigma_{\text{g}}^{2} }} = 1/3. $$
(7)

This is to assume that an optimum number of replicates are used in each trial (Eq. (3)). We thereby have

$$ {\text{Q}}_{\text{e}} = \frac{{\sigma_{\text{ge}}^{2} + \sigma_{\epsilon }^{2} /{\text{N}}_{\text{r}} }}{{\sigma_{\text{g}}^{2} }} = \frac{{\sigma_{\text{ge}}^{2} }}{{\sigma_{\text{g}}^{2} }} + 1/3, $$
(8)

and

$$ {\text{N}}_{{{\text{e}}, {\text{H}} = 0.75}} = 1 + 3\left( {\frac{{\sigma_{\text{ge}}^{2} }}{{\sigma_{\text{g}}^{2} }}} \right). $$
(9)

Equation (9) indicates that the number of test locations needed to achieve an H = 0.75 (or any other level deemed to be appropriate) is a linear function of the GE variance versus genotypic variance ratio and that a single test location would suffice if the trial is conducted properly and if there is no GE.

The data type needed to use Eq. (3) is replicated data from a single trial, after corrected for any human error and field spatial trend. That for using Eq. (9) is a genotype-by-location two-way table of means across replicates; replicated raw data are not needed. Equation (6) can be used if replicated raw data are available.

Case study: data from the oat variety trials in eastern Canada

Replicated data for multiple traits from Ottawa, Ontario and replicated data for grain yield from locations across Canada from the 2008 to 2012 ECORC oat registration trials were used as an example to determine the number of replicates within trials required to achieve a trial heritability of 0.75.

Data for multiple traits from the 2006 to 2012 ECORC oat registration trials conducted across eastern Canada were used as an example to determine the number of test locations needed to achieve a cross-location heritability of 0.75. In these oat variety trials, 32–36 oat genotypes were tested at 7–13 locations, depending on the year, across eastern Canada, with four (Ontario sites) or three (sites in other provinces) replicates in each trial. Traits determined included grain yield, agronomic traits (days to heading, plant height, and lodging score), grain quality (test weight and 1,000-kernel weight), milling quality (e.g., groat percentage), and compositional quality (e.g., beta-glucan concentration). Grain yield was determined at all locations while other traits were measured at a varying number of locations. Within trials, grain yield was determined for each replicate while agronomic and quality traits were determined for a varying number of replicates. The optimum number of test locations for each trait was determined yearly based on Eq. (9) because the yearly data were largely complete and balanced. The yield data from the 2009 to 2012 trials were used in mega-environment analysis to determine the optimum number of test locations within mega-environments.

The GGL+GGE biplot for mega-environment analysis based on multiyear data

The grain yield data from multiyear variety trials were first organized in a genotype-by-environment two-way table, treating each location-year combination as an environment. Each value in the two-way table was the mean of the relevant genotype in the relevant environment across replicates. This two-way table was then centered by environmental means, scaled by within-environment standard deviation, and weighted by within environment heritability. Missing values in the two-way table were then filled with estimates using a singular value decomposition (SVD) based procedure (Yan 2013). This “complete” two-way table was then subjected to SVD and the resulting first two principal components (PC1 and PC2) were used to construct a heritability-adjusted GGE biplot (Yan and Holland 2010).

In this GGE biplot, each test location was represented by several environments (location-year combinations). On the same biplot each test location was also represented by a single marker, the coordinates of which were determined by the average coordinates of the environments at the location. Therefore, this graph can be viewed and interpreted as a GGL (G+GL) biplot as well as a GGE biplot, and hence the term “GGL+GGE biplot” (Yan 2014a, Yan 2014b). This graph can be used for both mega-environment analysis as well as test location evaluation based on multiyear data. A GGS+GGE (GGS=G+GS, GS standing for genotype by subregion interaction) biplot was similarly generated to summarize the results of the mega-environment analysis (Yan 2014a, Yan 2014b). The numerical and biplot analyses were conducted using the GGEbiplot software (Yan 2014a).

Results

Optimum number of replicates for different traits

Days to heading, plant height, test weight, 1,000-kernel weight, and grain yield were among the traits determined in the oat variety trials at Ottawa, Ontario. Four to six replicates were used depending on the year. Grain yield was determined for all replicates but other traits were determined at a varying number of replicates. Table 1 summarizes the actual number of replicates used, the realized within-trial heritability, and the estimated optimum number of replicates for each trait in each year, based on Eq. (3).

Table 1 Heritability achieved (H), number of replicates used (N), and number of replicates needed to achieve a heritability of 0.75 (N_H75) for different traits measured in the oat registration trials at Ottawa, Ontario from 2008 to 2012

All traits had within-trial heritability greater than 0.75 in all years, indicating that the data quality was good at Ottawa and that the number of replicates used in the trials may be more than needed. On average, no more than two replicates were needed for test weight and kernel weight, and no more than three replicates were needed for days to heading, plant height, and grain yield, for achieving a heritability of 0.75. Reducing the number of replicates to the estimated optimum represents an important opportunity for reducing trial cost.

Optimum number of replicates for determining grain yield in different locations and years

When averaged across years, the estimated optimum number of replicates were equal to or smaller than three at locations Belgrave ON, St. Marys ON, Ottawa ON, Normandin QC, Nairn ON, Princeville QC, Amqui QC, and Harrington PE. This number is equal to or fewer than the actual number of replicates used (Table 2). Noticeably, six of these locations are core test locations in the ECORC oat breeding program. However, the estimated optimum number of replicates was more than what was actually used at the other locations. Among these the trials at Osgoode, Eganville, and Beachburg were grown in farmer’s fields near Ottawa and were poorly managed. No measures were taken to control weed infestation and wild animal damage, and harvest was not done timely. Similar situations may have been true for the other trials with exceptionally high estimated optimum number of replicates. In general, the results presented in Table 2 show that three replicates are sufficient for selecting grain yield if the trials are conduced properly, and there is no need to have four replications as currently required in Ontario. Poor management and human errors occurred from seed preparation to post-harvest data collection can be a major factor for excessively low within-trial heritability. When the estimated optimum number of replicates is very high, it is advised to first try to find and correct any human errors or field trend. When this fails, it is probably more rational to discard the data or even the test location in question than to increase the number of replicates in future years.

Table 2 Actual number of replicates and estimated optimum number of replicates for grain yield in different locations and years

The optimum number of test locations for oat registration test in eastern Canada

Within each year and for each trait, a genotype-by-location two-way table of means was first prepared, from which the cross-location heritability was determined based on Eq. (1), treating each location as a replicate. The number of test locations used, the achieved heritability, the estimated optimum number of test locations for each trait in each year based on Eq. (9) are summarized in Table 3.

Table 3 Heritability achieved (H), number of test locations used (N), and number of test locations needed to achieve a heritability of 0.75 (N_H75) for different traits in the oat registration trials conducted across eastern Canada from 2006 to 2012

Days to heading, plant height, 1,000-kernel weight, and beta-glucan concentration were found to be highly heritable, the achieved heritability being greater than 0.90 for most years. For these traits, no more than three test locations were needed to achieve a heritability of 0.75 across eastern Canada (Table 3). Test weight and groat percentage also showed high heritability; no more than four test locations were needed to achieve an H of 0.75 across the whole region in most years (Table 3).

Lodging score, however, was a trait difficult to measure and select because it is impossible to have a uniform lodging pressure across locations or within trials and lodging can occur at different developmental stages and for different reasons. Consequently, the number of test location was estimated to be 16 to achieve a heritability of 0.75, which is twice as many as the actual number of test locations in which lodging data were collected (Table 3). As expected, grain yield was another low heritability trait, due to strong yearly genotype-by-location interactions. Twenty test locations would be needed to achieve a heritability of 0.75 across eastern Canada, which is twice as many as the actual number of test locations used (Table 3). Therefore, the test locations used in the oat registration trials were too few for selecting grain yield and lodging score but too many for other traits for effective genotype evaluation across the whole oat growing region in eastern Canada. To determine a trait only in its optimum number of locations can effectively reduce trial cost.

Two oat mega-environments in eastern Canada

If the estimated optimum number of test locations for a key trait is considerably more than what is actually used, as in the case of grain yield (Table 3), the first thing the researcher should consider is to study the genotype-by-location interaction pattern for the trait and explore the possibility to divide the target region into meaningful subregions or mega-environments; this process is known as mega-environment analysis. Dividing a target region into meaningful subregions is the only way to make use of any repeatable genotype by location interactions in plant breeding. Mega-environment analysis can be conducted for any trait but mega-environments are usually defined with regard to the economically most important trait.

Presented in Fig. 2 is the GGL+GGE biplot based on the grain yield data from the 2009 to 2012 oat variety trials, in which the environments (location-year combinations) are represented by “O” and the genotypes by “+” for clarity. Only ten locations were included in the biplot, which had data in more than two of the four years.

Fig. 2
figure 2

The GGL+GGE biplot showing the differentiation of two groups of test locations. A GGL+GGE biplot is a GGE biplot based on multiyear data imposed with test locations. The GGE biplot is based on the grain yield data from the 2009 to 2012 oat registration trials conducted across eastern Canada. The environments (location-year combinations) are represented by “O” and the genotypes by “+” for clarity. The placement of each test location in the GGE biplot is defined by the average coordinates of the environments at the location. The locations are: EGANVILLE Eganville, Ontario; HARTLAND Harland, New Brunswick; NAIRN Narin, Ontario; NEWL New Liskeard, Ontario; NORM3 Normandin, Quebec; OTT Ottawa, Ontario; PALM Palmerston, Ontario; PEI Harrington, Prince Edward Island; PRIN2 Princeville, Quebec; STMARYS St. Marys, Ontario

The ten test locations fell into two apparent groups: OTT, NAIRN, EGANVILLE, PALM, and STMARYS formed one group, representing eastern and southern Ontario, while others (PEI, HARTLAND, NORM3, PRIN2, and NEWL) formed another group, representing the northern oat growing regions of eastern Canada. This pattern validates the result from the 2006 to 2008 yield data (Yan et al. 2010), which concluded that the oat growing region in eastern Canada can be divided into a southern mega-environment (eastern and southern Ontario) and a northern mega-environment (Quebec, New Brunswick, Prince Edward Island, and northern Ontario). The locations within the northern regions were more closely correlated with each other than those within the southern region (Fig. 2). This pattern can be summarized into a GGS+GGE biplot as shown in Fig. 3. It can be seen that the two mega-environments (SOUTH and NORTH) had a right angle, indicating that they were independent of each other. “Independence” means that the relative performance of the genotypes in one mega-environment was uncorrelated to that in the other, and that cultivars best in one mega-environment must be selected in that mega-environment but not in the other. Therefore, genotype evaluation, test location evaluation, and estimation of the optimum number of test locations, are meaningful only when analyzed within mega-environments (next section).

Fig. 3
figure 3

The GGS+GGE biplot showing two (NORTH and SOUTH) oat mega-environments. A GGS+GGE biplot is a GGE biplot based on multiyear data imposed with subregions. The GGE biplot is based on the grain yield data from the 2009 to 2012 oat registration trials conducted across eastern Canada. The environments (location-year combinations) are represented by “O” and the genotypes by “+” for clarity. The placement of each mega-environment is defined by the average coordinates of the environments within the mega-environment as revealed in Fig. 2

The vector length (i.e., the distance from a marker to the biplot origin) of the SOUTH mega-environment is considerably shorter than that of NORTH (Fig. 3). This indicates that the environments in SOUTH were more heterogeneous than those in NORTH, which is a result of greater random genotype-by-environment interaction in SOUTH than in NORTH. This is reflected by the large biplot area occupied by the environments in SOUTH as compared to those in NORTH (Fig. 3). The GGS+GGE biplot form is similar to the genotype plus genotype by blocks of environments biplot of Laffont et al. (2013).

The GGL+GGE biplot (Fig. 2) can also be used in assessing the stability or repeatability of a test location in representing a mega-environment, which is a new development in the area of test location evaluation relative to Yan et al. (2011). The vector length of a test location indicates its relative stability or repeatability in representing the target region. For example, the location OTT had a relatively short vector; this is because the representativeness of OTT varied greatly across years, as indicated by the great differences among the four environments (location-year combinations) at Ottawa. The same can be said for the location “NAIRN.” In contrast, the location “PRIN2” had a relatively long vector, and the four environments involving PRIN2 were placed in a small area around it, indicating that it was a more stable and repeatable test location.

The optimum number of test locations for each mega-environment

Now that the oat growing regions in eastern Canada can be divided into two distinct mega-environments in terms of grain yield, an optimum number of test locations for this trait should be determined for each mega-environment rather than for the undivided whole region (Table 4).

Table 4 Heritability achieved (H), number of test locations used (N), and number of test location needed to achieve an H = 0.75 (N_H75) for oat grain yield within and across mega-environments

On average, 5.3 test locations were used in the northern mega-environment, and the optimum number of test locations was estimated to be 5.6 (Table 4). Thus, the number of test locations used for this mega-environment was close to the optimum. A very different situation was found for the southern mega-environment, however. On average, 4.3 test locations were used for this subregion whereas the estimated optimum number of test locations was about 12. This result is consistent with the observation that the northern mega-environment is a relatively simple and homogenous subregion while the southern mega-environment is more complex and heterogeneous (Fig. 3). The total number of test locations needed for eastern Canada was estimated to be 17.6 (12.0 for the southern mega-environment plus 5.6 for the northern mega-environment), which is fewer than the number when estimated for the undivided whole region (20.8) (Table 4). It is interesting to note that the northern mega-environment is many times larger than the southern mega-environment in terms of oat acreage. It is economically more profitable to breed for a large but simple mega-environment as opposed to a small but complex mega-environment. This is how private breeding companies choose their target region and crop, leading to small but complex mega-environments unattended and farmers in such regions neglected (Yan 2014a). This is where public breeding programs come into play. The success of such breeding programs is not only measured by the economic profit they generate but also by the social, ecological, and environmental benefits they bring. In our case, oats in eastern Canada is an essential element in the rotation system, in addition to being a grain crop.

Discussion

The formulas for determining the optimum numbers of replicates and test locations

Heritability is a comprehensive measure of the variety trial data quality. The proposed formulas for determining the optimum number of replicates within a trial (Eq. (3)) and that of test locations in a target region (Eq. (6)) were derived from the well-known definitions of within-trial and cross-trial heritability, respectively. An important basis for these formulas was the observation that H can be effectively improved by increasing the number of replicates or test locations when it is small (say H < 0.75), but cannot be effectively improved when it becomes large (say H > 0.75) (Fig. 1). The “optimum” number is, in essence, a compromise between trial accuracy and the cost required to achieve it. H = 0.75 is not a magic number; it was chosen for intuit and convenience. A smaller H (say 0.65) or a larger H (say 0.85) may be used at the researcher’s judgment, and the formulas need only minor modification to accommodate this change.

Equation (9) is a simplified form of Eq. (6), based on the assumption that the data quality of individual trials are sufficiently good. When the single trial accuracy is poor (with H < 0.75) then more than one test location would be needed even when there is no GE. Only a minor modification of Eq. (9) is needed for such cases, however.

The application of Eq. (9) may be extended to data from multilocation trials conducted across years. The parameters in the equation can be redefined as follows: \( \sigma_{\text{g}}^{2} \) will remain to be the variance for genotypic main effect, but \( \sigma_{\text{ge}}^{2} \) will be the variance for genotype-by-environment interaction (which is composed of genotype-by-location interaction, genotype-by-year interaction, and genotype-location-year interaction), with each environment being a location-year combination, and \( {\text{N}}_{\text{e}} \) will be the total number of test environments needed to achieve H = 0.75 across locations and years. The needed number of test locations per year would be \( {\text{N}}_{\text{e}} \) divided by the number of years involved in the analysis. When Eq. (9) is used this way, it can be expected that the optimum number of test locations would be reduced as the number of years involved in the analysis is increased. From a practical viewpoint, the number of years included in the analysis should be limited to the number that is required to make cultivar recommendations in a particular variety trial system, which is usually two or three years.

For simplicity, balanced data, i.e., complete genotype-by-location two-way tables, were used in the case study. The formulas should also be applicable to unbalanced data except that the variance components required in the formulas has to be estimated using mixed models (e.g., Littell 2006).

Implications on conducting crop variety trials

The use of an optimal number of replicates and test locations in regional variety trials is to maximize the trial heritability hence the genetic gain at a minimum trial cost. When it is determined that too many replicates and/or test locations have been used, the trial cost can be reduced by reducing the number of replicates and/or test locations to the optimum levels while maintaining the same level of trial accuracy. When it is determined that too few replicates and/or test locations have been used then the trial heritability can be effectively improved by increasing the number of replicates and/or test locations to the optimum levels.

Multiple traits are routinely measured in crop variety trials and considered in decision making. Some traits are intrinsically more heritable than others and therefore require fewer replicates and test locations for reliable selection. Although the number of replicates and test locations are determined by the most important trait, such as grain yield in most crops, there is still space to reduce trial cost by determining highly heritable traits only at their optimum numbers of replicates and test locations. For example, it was found that no more than four test locations were needed for reliable selection of beta-glucan concentration, groat percentage, days-to-heading, plant height, test weight, and 1,000-kernel weight across the whole oat growing regions of eastern Canada (Table 3).

On the other hand, when it is determined that too few locations have been used, increasing the number of test location to the optimum can effectively improve the cross-trial heritability. However, low cross-trial heritability could be due to the presence of repeatable genotype-by-location interaction, which can be utilized by dividing the whole target region into subregions and selecting within subregions. Therefore, mega-environment analysis should be conducted before setting forth to increase the number of test locations, as demonstrated for grain yield in the case study. Similarly, when it is determined that too few replicates have been used in some trials, increasing the number of replicates can effectively improve the trial heritability. However, low trial heritability could be due to large field spatial variations or human errors. So any human error should be corrected and any field trend adjusted by using certain type of spatial analysis (e.g., Müller et al. 2010) before setting forth to increase the number of replicates. Field trend correction based on polynomial regression within blocks was found to be a simple and effective method to improve trial heritability at various scenarios (Yan 2014a).

Two approaches in determining the numbers of replicates and test locations

Methods for determining the number of replicates and test locations existed as mentioned in the beginning of this paper. These methods optimize the resource allocation between replicates and test locations based on two aspects: the relative cost of adding one replicate per trial versus that of one extra location, and the relative extent of the experiment error variance versus the genotype-by-location interaction variance. These methods have some merits as the numbers of replicates and of test locations are indeed interdependent under certain conditions.

From Eq. (6), the number of test locations and the number of replicates within trials can be complementary when the within trial noise-signal ratio is substantially large. The number of test locations required to compensate for insufficient replication is determined by

$$ N_{{{\text{e}},{\text{H}} = 0.75}}^{{^{\prime } }} = 3Q_{r} /N_{r}^{{^{\prime } }} , $$
(10)

where \( N_{r}^{{^{\prime } }} \) is the number of replicates that is fewer than optimum. This formula indicates that at \( {\text{Q}}_{\text{r}} = 1 \) the reduction of each replicate (such that the number of replicates becomes fewer than what is needed to achieve the target within-trial heritability of 0.75) would require three additional test locations to achieve the same cross-trial heritability of 0.75. More test locations would be needed to compensate for the loss of replication at a higher \( {\text{Q}}_{\text{r}} \) level. Excessive replication, however, will not reduce the number of locations required to account for genotype-by-location interactions and, therefore, is a waste of resources.

In contrast, in this paper we adopted the strategy to first estimate the optimum number of replicates and on this basis to estimate the optimum number of test locations. This approach is justified for several reasons. First, there is an “official” requirement for the number of replicates in most regional variety trial systems (e.g., four replicates in Ontario and three in other Canadian provinces are required) and there is a real need to know if the required replication is too many or too few. Similarly, regional variety trial sponsors are always interested in knowing the optimum number of qualified test locations for their target regions and whether their current test locations are too many or too few. The approach proposed this paper can provide answers to these questions. Second, variety trials need to be analyzed individually to assess their data quality and to identify and correct any problems. Third, the required number of test-locations is independent of that of replicates when the data quality in individual trials is sufficiently good, therefore these two aspects can be dealt with separately. A major strength of this approach is that it is simple, straightforward, and easy to use. Work may be useful to compare results from the two approaches in various scenarios, however.