Introduction

The genetic design diallel is regularly used in plant breeding. It is commonly employed to provide interpopulation crosses or single crosses/F1’s from inbred/pure/doubled haploid (DH) lines, for assessing heterosis, combining ability, or molecular genetic diversity, for predicting non-assessed testcrosses and single-crosses, among other applications (Mowers et al. 2018; Kadam et al. 2016; Yu et al. 2020; Leng et al. 2019). There are several methods of analysis but most investigations are based on the models (fixed or random) and methods proposed by Griffing (1956a). The main reasons that explain the general choice by breeders for the Griffing’s combining ability analysis are: the methodology can be used for any crop and trait and the computation and interpretation of the genetic parameters are simple. Concerning the study of inheritance of quantitative traits, both heterosis and combining ability analyses allows testing non-additive effects, if there is genetic variability between the parents. But it is not possible to test for epistasis.

In the context of inheritance, an interesting approach based on diallel cross was proposed by Hayman (1954). However, very few inheritance studies based on Hayman’s method were published in the last 10 years (de Lima et al. 2019; Makumbi et al. 2018; Shahadati-Moghaddam et al. 2017; Kalinina and Lyakh 2011). The Hayman’s method has indeed some limitations, as the assumptions of no epistasis (independent action of non-allelic genes) and no linkage disequilibrium (LD) (genes independently distributed in the parents). But it has positive aspects as testing the adequacy of the additive-dominance model. Then, if there is epistasis, the breeder can ignore the trait or take into account the influence of epistasis on the analysis. Why, then, the Hayman’s method has been almost ignored in the studies of inheritance of quantitative traits? For sure, this cannot be attributable to a regular evidence of epistasis since most of the empirical data show mainly additive genetic variation (Hill et al. 2008). In my opinion, the main reasons are: the method seems very complex for breeders in regard to computation and interpretation; the breeders guess that the Griffing’s method provides the same inferences; and the method is restricted for diploids and homozygous parents. However, Hayman’s approach is not very complex for computing and interpreting and it provides some information on inheritance that a combining ability analysis does not offer.

Another interesting approach for the study of inheritance of quantitative traits is the generation mean analysis, proposed by Mather and Jinks (1971). Similar to the heterosis and combining ability analysis, this biometrical genetics methodology is based on the estimation of linear components of means of populations derived from homozygous parents. This approach has been commonly used by breeders of self- and cross-pollinated crops since its proposition (Rai et al. 2020; Addy et al. 2020; Verma and Singh 2018). Its main advantages are: it is easily performed for crops with inbred/pure/DH lines; it is applicable to any trait but few studies involved grain yield (Mohammed et al. 2018); it allows for testing dominance and epistasis separately; and the computation and the interpretation are simple.

Assuming absence of epistasis, the linear components of means do not depend on the LD. However, the genotypic variance and its genetic components are affected by LD (Hill and Maki-Tanila 2015). Because joint modelling epistasis and LD is a challenger for quantitative and biometrical geneticists, in the most important theoretical papers on heterosis and combining ability analysis there is only superficial information on the influence of non-allelic interaction (Gardner and Eberhart 1966; Kempthorne 1956; Griffing 1956b). Because his method is based on components of the genotypic variance for parents and F1’s, Hayman (1954) makes some statements on how LD and epistasis affects the diallel analysis (see Sects. 4.4. Correlated gene distributions and 4.5. Non-allelic gene interaction). The knowledge provided by Hayman (1954) was extended in the studies of Hill (1964), Nassar (1965), Mather (1967), and Coughtrey and Mather (1970). In regard to generation mean analysis, Mather and Jinks (1971) included theory and some general conclusions for the analysis assuming LD and epistasis (see Sect. 5.18 Linkage of interacting genes). Hill (1964) and Nassar (1965), based on 3- and 10-gene models, provided contrasting results regarding the consequences of LD on the Wr/Vr graph and the order of dominance. Mather (1967) and Coughtrey and Mather (1970) assumed only complementary and duplicate epistasis in a two-gene model with no LD.

Thus, none previous published study provided general information on the joint impact of LD and epistasis on the Hayman’s diallel and generation mean analyses. Then, the objective of this study was to provide significant additional knowledge about the influence of LD and epistasis on the Hayman’s diallel and generation mean analysis, based on simulated data. I assumed hundreds of genes, variable degree of dominance, variable gene frequencies, LD, and seven types of digenic epistasis. In the first part of this study I present the theoretical background that supports the software used for simulating the dataset.

Material and methods

Genetic variances and covariances of the Hayman’s diallel assuming LD and epistasis

Consider n inbred/pure/DH lines (n > 3). Assume initially LD but no epistasis. In regard to two genes, the genotype probabilities are \(P\left( {AABB} \right) = f_{22} = u_{a} u_{b} + \Delta_{ab}\), \(P\left( {AAbb} \right) = f_{20} = u_{a} v_{b} - \Delta_{ab}\), \(P\left( {aaBB} \right) = f_{02} = v_{a} u_{b} - \Delta_{ab}\), and \(P\left( {aabb} \right) = f_{00} = v_{a} v_{b} + \Delta_{ab}\), where \(u\) and \(v\) are the allelic frequencies and \({\Delta }_{ab}\) is the measure of LD in the gametic pool. Note that I did not state that the parents are a sample from a reference population or that the two genes are linked. Using the same definition of Hayman (1954) for the genotypic values of the r-th parent and the F1 between parents r and s, the variance of the parents is \(V_{0L0} = \left( {1 - w_{a}^{2} } \right)d_{a}^{2} + \left( {1 - w_{b}^{2} } \right)d_{b}^{2} + 8\Delta_{ab} d_{a} d_{b} = D\), where \({d}_{i}\) is the deviation between the genotypic value of the homozygote of greater expression and the mean of the genotypic values of the homozygotes (\(m\)) and\(w=u-v\). Because the means of the parents and the F1 are not affected by the LD (in the absence of epistasis), the difference between the mean of the n2 progeny (\(m_{L1}\)) and the mean of their parents (\(m_{L0}\)) is equal to the function derived by Hayman (1954): \(\left( {1/2} \right)\left[ {\left( {1 - w_{a}^{2} } \right)h_{a} + \left( {1 - w_{b}^{2} } \right)h_{b} } \right] = \left( {1/2} \right)h\), where \(h_{i}\) is the dominance deviation. The variance of the r-th array is \(V_{r} = \left( {1/4} \right)D + \left( {1/4} \right)H_{1} - \left( {1/4} \right)F_{r} - \left( {1/4} \right)F_{1r}\), where \(H_{1} = \left( {1 - w_{a}^{2} } \right)h_{a}^{2} + \left( {1 - w_{b}^{2} } \right)h_{b}^{2} + 8\Delta_{ab} h_{a} h_{b}\), \(F_{r} = 2\left[ {\left( {1 - w_{a}^{2} } \right)d_{a}\theta_{a} h_{a} + \left( {1 - w_{b}^{2} } \right)d_{b}\theta_{b} h_{b} + 4\Delta_{ab} d_{a}\theta_{b} h_{b} + 4\Delta_{ab} d_{b}\theta_{a} h_{a} } \right]\), where \(\theta= 1\) if AA or BB and \(\theta= - 1\) if aa or bb, and \(F_{1r} = 16k\Delta_{ab} h_{a} h_{b}\), where \(k\) = 0 if AABB or aabb and \(k\) = 1 if AAbb or aaBB.

Thus, the average variance is \(V_{1L1} = \left( {1/4} \right)D + \left( {1/4} \right)H_{1} - \left( {1/4} \right)F - \left( {1/4} \right)F_{1}\), where \(F = 2\left[ {w_{a} \left( {1 - w_{a}^{2} } \right)d_{a} h_{a} + w_{b} \left( {1 - w_{b}^{2} } \right)d_{b} h_{b} + 4\Delta_{ab} w_{b} d_{a} h_{b} + 4\Delta_{ab} w_{a} d_{b} h_{a} } \right]\) and

\(F_{1} = 16\Delta_{ab} \left( {u_{a} v_{b} + u_{b} v_{a} - 2\Delta_{ab} } \right)h_{a} h_{b}\).

The covariance between the non-recurrent parents and their offspring in the r-th array is \(W_{r} = \left( {1/2} \right)D - \left( {1/4} \right)F_{r}\). Thus, the average covariance is \(W_{0L01} = \left( {1/2} \right)D - \left( {1/4} \right)F\). Then, if there is dominance and LD, the difference between the covariance and the variance in the arrays is not a constant value. The difference is \(\left( {1/4} \right)\left( {D - H_{1} } \right) + k\left( {1/4} \right)F_{1r}\). This implies that the points (\(W_{r} ,V_{r}\)) does not lie on a straight line of unit slope through their mean point (\(W_{0L01} ,V_{1L1}\)). However, the function does not allow realizing how much LD affects the deviation from 1. The variance of the array means is \(V_{0L1} = \left( {1/4} \right)D + \left( {1/4} \right)H_{1} - \left( {1/4} \right)H_{2} - \left( {1/4} \right)F\), where \(H_{2} = \left( {1 - w_{a}^{2} } \right)^{2} h_{a}^{2} + \left( {1 - w_{b}^{2} } \right)^{2} h_{b}^{2} + 8\Delta_{ab} \left( {1 - w_{a} w_{b} } \right)h_{a} h_{b}\). Finally, the total genotypic variance is \(V_{0L1} + V_{1L1} = \left( {1/2} \right)D + \left( {1/2} \right)H_{1} - \left( {1/4} \right)H_{2} - \left( {1/2} \right)F - \left( {1/4} \right)F_{1}\). Note that, because LD, the four equations independent of r allows the estimation of the parameters \(D\), \(H_{1} - F_{1}\), \(H_{2} - F_{1}\), and \(F\). This implies that the genetic parameters that are dependent of \(H_{1}\) and/or \(H_{2}\) – average degree of dominance, mean value of \(uv\) (symmetry) for dominant genes, proportion between dominant and recessive genes, and number of dominant genes – are biased. Because the functions for these parameters are complex, the magnitude of the bias can only be assed using simulated data. Further, it is not also clear how LD affects the order of dominance of the parents in the graph (\(W_{r} ,V_{r}\)).

Assume now that the two genes are epistatic. Assuming an epistatic effect (\(I_{ij}\); i, j = 2, 1, and 0) for each genotype, the variance of the parents is \(V_{0L0}^{*} = V_{0L0} + V\left( I \right)_{0} + 2Cov_{0}\), where \(V\left( I \right)_{0} = f_{22} I_{22}^{2} + f_{20} I_{20}^{2} + f_{02} I_{02}^{2} + f_{00} I_{00}^{2} - \left[ {E\left( I \right)_{0} } \right]^{2}\) is the epistatic variance of the parents and \(Cov_{0} = f_{22} \left( {d_{a} + d_{b} } \right)I_{22} + f_{20} \left( {d_{a} - d_{b} } \right)I_{20} + f_{02} \left( { - d_{a} + d_{b} } \right)I_{02} + f_{00} \left( { - d_{a} - d_{b} } \right)I_{00} - \left( {m_{L0} - m_{a} - m_{b} } \right)E\left( I \right)_{0}\) is the covariance between the sum of additive deviations and the epistatic effect for the parents. The diallel mean is \(m_{L1}^{*} = m_{L1} + E\left( I \right)_{1}\), where \(E\left( I \right)_{1} = f_{22}^{2} I_{22} + 2f_{22} f_{20} I_{21} + \ldots + f_{00}^{2} I_{00}\). Because \(m_{L0}^{*} = m_{L0} + E\left( I \right)_{0}\), \(m_{L1}^{*} - m_{L0}^{*} = \left( {1/2} \right)h + E\left( I \right)_{1} - E\left( I \right)_{0}\). The variance in the r-th array is \(V_{r}^{*} = V_{r} + V\left( I \right)_{r} + 2Cov_{r}\), where, for example, for the array of the parent AABB, \(V\left( I \right)_{22} = f_{22} I_{22}^{2} + f_{20} I_{21}^{2} + f_{02} I_{12}^{2} + f_{00} I_{11}^{2} - \left[ {E\left( I \right)_{22} } \right]^{2}\) is the epistatic variance in the array and \(Cov_{22} = f_{22} \left( {d_{a} + d_{b} } \right)I_{22} + f_{20} \left( {d_{a} + h_{b} } \right)I_{21} + f_{02} \left( {h_{a} + d_{b} } \right)I_{12} + f_{00} \left( {h_{a} + h_{b} } \right)I_{11} - \left( {E\left( G \right)_{22} - m_{a} - m_{b} } \right)E\left( I \right)_{22}\) is the covariance between the non-epistatic deviation and the epistatic effect in the array.

Thus, \(V_{1L1}^{*} = V_{1L1} + V\left( I \right)_{1} - V( {\overline{I}} )_{1} + 2Cov_{1}^{1} - 2Cov_{1}^{2}\), where \(V\left( I \right)_{1} = f_{22}^{2} I_{22}^{2} + 2f_{22} f_{20} I_{21}^{2} + \ldots + f_{00}^{2} I_{00}^{2} - \left[ {E\left( I \right)_{1} } \right]^{2}\) is the F1 epistatic variance, \(V( {\overline{I}} )_{1} = f_{22} \left[ {E\left( I \right)_{22} } \right]^{2} + f_{20} \left[ {E\left( I \right)_{20} } \right]^{2} + f_{02} \left[ {E\left( I \right)_{02} } \right]^{2} + f_{00} \left[ {E\left( I \right)_{00} } \right]^{2} - \left[ {E\left( I \right)_{1} } \right]^{2}\) is the variance of the average epistatic values of the array means, \(Cov_{1}^{1} = f_{{22}}^{2} \left( {d_{a} + d_{b} } \right)I_{{22}} + 2f_{{22}} f_{{20}} \left( {d_{a} + h_{b} } \right)I_{{20}} + \ldots + f_{{00}}^{2} \left( { - d_{a} - d_{b} } \right)I_{{00}} - \left( {m_{{L1}} - m_{a} - m_{b} } \right)E\left( I \right)_{1}\) is the covariance between the non-epistatic deviation and the epistatic effect in the F1, and \(Cov_{1}^{2} = f_{22} \left( {E\left( G \right)_{22} - m_{a} - m_{b} } \right)E\left( I \right)_{22} + f_{20} \left( {E\left( G \right)_{20} - m_{a} - m_{b} } \right)E\left( I \right)_{20} + f_{02} \left( {E\left( G \right)_{02} - m_{a} - m_{b} } \right)E\left( I \right)_{02} + f_{00} \left( {E\left( G \right)_{00} - m_{a} - m_{b} } \right)E\left( I \right)_{00} - \left( {m_{L1} - m_{a} - m_{b} } \right)E\left( I \right)_{1}\) is the covariance between the average non-epistatic and the epistatic values of the array means.

The covariance in the r-th array is \(W_{r}^{*} = W_{r} + Cov_{01\left( r \right)}^{1} + Cov_{01\left( r \right)}^{2} + Cov_{01\left( r \right)}^{3}\), where, for example, for the array of the parent AABB,

\(Cov_{{01\left( {22} \right)}}^{1} = f_{22} \left( {d_{a} + d_{b} } \right)I_{22} + f_{20} \left( {d_{a} - d_{b} } \right)I_{21} + f_{02} \left( { - d_{a} + d_{b} } \right)I_{12} + f_{00} \left( { - d_{a} - d_{b} } \right)I_{11} - \left( {m_{L0} - m_{a} - m_{b} } \right)E\left( I \right)_{22}\),

\(Cov_{{01\left( {22} \right)}}^{2} = f_{22} I_{22} \left( {d_{a} + d_{b} } \right) + f_{20} I_{20} \left( {d_{a} + h_{b} } \right) + f_{02} I_{02} \left( {h_{a} + d_{b} } \right) + f_{00} I_{00} \left( {h_{a} + h_{b} } \right) - E\left( I \right)_{0} \left( {E\left( G \right)_{22} - m_{a} - m_{b} } \right)\), and

$$Cov_{{01\left( {22} \right)}}^{3} = f_{22} I_{22}^{2} + f_{20} I_{20} I_{21} + f_{02} I_{02} I_{12} + f_{00} I_{00} I_{11} - E\left( I \right)_{0} E\left( I \right)_{22}$$

are the covariances between the non-epistatic deviation of non-recurrent parent and the epistatic effect of F1, between the non-epistatic deviation of F1 and the epistatic effect of non-recurrent parent, and between the epistatic effects of non-recurrent parent and F1, respectively. The average covariance is \(W_{0L01}^{*} = W_{0L01} + Cov_{01}^{1} + Cov_{01}^{2} + Cov_{01}^{3}\), where \(Cov_{01}^{1} = f_{22} \left( {d_{a} + d_{b} } \right)E\left( I \right)_{22} + f_{20} \left( {d_{a} - d_{b} } \right)E\left( I \right)_{20} + f_{02} \left( { - d_{a} + d_{b} } \right)E\left( I \right)_{02} + f_{00} \left( { - d_{a} - d_{b} } \right)E\left( I \right)_{00} - \left( {m_{L0} - m_{a} - m_{b} } \right)E\left( I \right)_{1}\),

\(Cov_{01}^{2} = f_{22} I_{22} \left( {E\left( G \right)_{22} - m_{a} - m_{b} } \right) + f_{20} I_{20} \left( {E\left( G \right)_{20} - m_{a} - m_{b} } \right) + f_{02} I_{02} \left( {E\left( G \right)_{02} - m_{a} - m_{b} } \right) + f_{00} I_{00} \left( {E\left( G \right)_{00} - m_{a} - m_{b} } \right) - E\left( I \right)_{0} \left( {m_{L1} - m_{a} - m_{b} } \right)\), and \(Cov_{01}^{3} = f_{22} I_{22} E\left( I \right)_{22} + f_{20} I_{20} E\left( I \right)_{20} + f_{02} I_{02} E\left( I \right)_{02} + f_{00} I_{00} E\left( I \right)_{00} - E\left( I \right)_{0} E\left( I \right)_{1}\).

Note that the difference between the covariance and the variance in the arrays is \(W_{r}^{*} - V_{r}^{*} = \left( {1/4} \right)\left( {D - H_{1} } \right) + k\left( {1/4} \right)F_{1r} + Cov_{01\left( r \right)}^{1} + Cov_{01\left( r \right)}^{2} + Cov_{01\left( r \right)}^{3} - V\left( I \right)_{r} - 2Cov_{r}\). Thus, the epistasis is an additional factor that deviate the points (\(W_{r}^{*} ,V_{r}^{*}\)) from a straight line of unit slope through the mean point (\(W_{0L01} ,V_{1L1}\)). However, the function does not also allow realizing how much epistasis affects the deviation from 1. The variance of the array means is \(V_{0L1}^{*} = V_{0L1} + V( {\overline{I}} )_{1} + 2Cov_{1}^{2}\). Finally, the total genotypic variance is \(V_{0L1}^{*} + V_{1L1}^{*} = V_{0L1} + V_{1L1} + V\left( I \right)_{1} + 2Cov_{1}^{1}\). Thus, epistasis introduces an additional bias in the estimates of the Hayman’s genetic parameters, which can only be assessed using simulated data.

Generation mean analysis with LD and epistasis

Assuming two linked epistatic genes, an epistatic effect for each genotype, and parents AABB and aabb (association), the genotypic values of the parents and the F1 are \(G_{22} = m_{a} + m_{b} + d_{a} + d_{b} + I_{22}\), \(G_{00} = m_{a} + m_{b} - d_{a} - d_{b} + I_{00}\), and \(G_{11} = m_{a} + m_{b} + h_{a} + h_{b} + I_{11}\), respectively. Using the notation of Mather and Jinks (1971) for the additive and dominance components, \(P_{1} = m + \left[ d \right] + I_{22}\), \(P_{2} = m - \left[ d \right] + I_{00}\), and \(F_{1} = m + \left[ h \right] + I_{11}\). Because epistasis, \(m\) is not the \(F_{\infty }\) mean, given by \(F_{\infty } = m + \left[ {1/2\left( {1 + 2r} \right)} \right]\left( {I_{22} + I_{00} } \right) + \left[ {r/\left( {1 + 2r} \right)} \right]\left( {I_{20} + I_{02} } \right) = m + E\left( I \right)^{\left( \infty \right)}\), where r is the recombination frequency. Note that in F2, linked genes with r lower than 0.5 are in LD. The absolute LD value in the gametic pool of F1 is \(\left( {1 - 2r} \right)/4\). The value is positive with coupling and negative with repulsion. The mean of the F2 generation is \(F_{2} = m + \left( {1/2} \right)\left[ h \right] + E\left( I \right)^{\left( 0 \right)}\), where \(E\left( I \right)^{\left( 0 \right)} = \left[ {\left( {1 - r} \right)/2} \right]^{2} I_{22} + 2\left[ {\left( {1 - r} \right)/2} \right]\left( {r/2} \right)I_{21} + \ldots + \left[ {\left( {1 - r} \right)/2} \right]^{2} I_{00}\) is the expectation of the epistatic values. The Fn+2 mean is \(F_{n + 2} = m + \left( {1/2} \right)^{n + 1} \left[ h \right] + E\left( I \right)^{\left( n \right)}\), where n is the number of selfing generations. It is interestingly to note that the expectation of the epistatic values in a selfed generation is not directly proportional to the expectation in the F2 generation, since \(E\left( I \right)^{\left( n \right)} = E\left( I \right)^{\left( 0 \right)} + deviation\) (see the deviation for the Fn+2 generation in the Appendix).

The average genotypic values of the two backcross generations are \(BC_{1} = m + \left( {1/2} \right)\left[ d \right] + \left( {1/2} \right)\left[ h \right] + \left( {1/2} \right)E\left( I \right)_{1}\) and \(BC_{2} = m - \left( {1/2} \right)\left[ d \right] + \left( {1/2} \right)\left[ h \right] + \left( {1/2} \right)E\left( I \right)_{2}\), where \(E\left( I \right)_{1} = \left( {1 - r} \right)\left( {I_{22} + I_{11} } \right) + r\left( {I_{21} + I_{12} } \right)\) and \(E\left( I \right)_{2} = \left( {1 - r} \right)\left( {I_{11} + I_{00} } \right) + r\left( {I_{10} + I_{01} } \right)\). Thus, assuming digenic epistasis, the means of the parents, F1, F2, F3, and backcrosses (seven equations) depend on 10 genetic linear components (seven epistatic components). A very known approach for allowing testing epistasis and estimating and testing epistatic components was provided by Mather and Jinks (1971). This simplified approach has been used for modelling epistasis in genomic selection, GWAS (genome-wide association studies), and QTL mapping. Mather and Jinks (1971) assumed \(I_{22} = I_{00} = - I_{20} = - I_{02} = \left[ i \right]\), \(I_{21} = - I_{01} = \left[ j \right]\), \(I_{12} = - I_{10} = \left[ {j^{\prime}} \right]\), and \(I_{11} = \left[ l \right]\). However, because linkage (LD), \(E\left( I \right)^{\left( 0 \right)} = \left[ {\left( {1 - 2r} \right)/2} \right]\left[ i \right] + \left( {1/4} \right)\left[ {1 + \left( {1 - 2r} \right)^{2} } \right]\left[ l \right]\), \(E\left( I \right)_{1} = \left( {1 - r} \right)\left( {\left[ i \right] + \left[ l \right]} \right) + r\left( {\left[ j \right] + \left[ {j^{\prime}} \right]} \right)\), and \(E\left( I \right)_{2} = \left( {1 - r} \right)\left( {\left[ i \right] + \left[ l \right]} \right) - r\left( {\left[ j \right] + \left[ {j^{\prime}} \right]} \right)\).

Note that I wrote simplified approach because the assumptions for the additive x additive, additive x dominance, and dominance x additive components do not necessarily met for any type of digenic epistasis, including complementary, duplicate, dominant, recessive, dominant and recessive, duplicate genes with cumulative effects, and non-epistatic genic interaction, regardless of the degree of dominance. To characterize complementary epistasis it is necessary to assume \(d_{a} = d_{b} = h_{a} = h_{b} = i_{ab} = j_{ab} = j_{ba} = l_{ab}\); for duplicate epistasis, \(d_{a} = d_{b} = h_{a} = h_{b} = - i_{ab} = - j_{ab} = - j_{ba} = - l_{ab}\); assuming recessive epistasis implies \(d_{a} = h_{a} = i_{ab} = j_{ba}\) and \(d_{b} = h_{b} = j_{ab} = l_{ab}\) or \(d_{a} = - h_{a} = - i_{ab} = j_{ba}\) and \(d_{b} = - h_{b} = j_{ab} = - l_{ab}\); in case of dominant epistasis, \(d_{a} = h_{a} = - i_{ab} = - j_{ba}\) and \(d_{b} = h_{b} = - j_{ab} = - l_{ab}\) or \(d_{a} = - h_{a} = i_{ab} = - j_{ba}\) and \(d_{b} = - h_{b} = - j_{ab} = l_{ab}\) (Mather 1967). Then, it is assumed complete dominance (|h/d|= 1).

Data simulation

The simulated dataset was generated using the software REALbreeding (available by request). REALbreeding has been used in studies related to genomic selection (Viana et al. 2018), GWAS (Pereira et al. 2018), QTL mappisng (Viana et al. 2017), LD (Andrade et al. 2019), population structure (Viana et al. 2013b), heterotic grouping/genetic diversity (Viana et al. 2020), and plant breeding (Viana et al. 2013a). In summary, the software simulates individual genotypes for genes and molecular markers and phenotypes in three phases, using inputs from the user. The first phase (genome simulation) is the specification of the number of chromosomes, molecular markers, and genes as well as marker type and density. The second phase (population simulation) is the specification of the population(s) and sample size or progeny number and size. A population is characterized by the average frequency for the genes (biallelic) and markers (first allele). The last phase (trait simulation) is the specification of the minimum and maximum genotypic values for homozygotes, the minimum and maximum phenotypic values (to avoid outliers), the direction and degree of dominance, and the broad sense heritability.

The current version allows the inclusion of digenic epistasis, genotype × environment interaction, and multiple traits, including pleiotropy. The population mean (M) and additive (A), dominance (D), and epistatic (additive × additive (AA), additive × dominance (AD), dominance × additive (DA), and dominance × dominance (DD)) genetic values or general combining ability (GCA), specific combining ability (SCA), and epistatic (I) effects, or genotypic values (G), depending on the population, are calculated from the parametric gene effects and frequencies and the parametric LD values. The population in LD is generated by crossing two populations in linkage equilibrium followed by a generation of random cross. The parametric LD is \(\Delta_{ab}^{{\left( { - 1} \right)}} = \left[ {\left( {1 - 2r_{ab} } \right)/4} \right]\left( {p_{a1} - p_{a2} } \right)\left( {p_{b1} - p_{b2} } \right)\), where \(r_{ab}\) is the recombination frequency, \(p\) is an allelic frequency, and the indexes 1 and 2 indicates the parental populations. The phenotypic values (\(P\)) are computed assuming error effects \(\left( E \right)\) sampled from a normal distribution (\(P = M + A + D + AA + AD + DA + DD + E = G + E\) or \(P = M + GCA1 + GCA2 + SCA + I + E = G + E\)).

The types of digenic epistasis are: complementary (\(G_{22} = G_{21} = G_{12} = G_{11}\) and \(G_{20} = G_{10} = G_{02} = G_{01} = G_{00}\); proportion of 9:7 in a F2, assuming independent assortment), duplicate (\(G_{22} = G_{21} = G_{20} = G_{12} = G_{11} = G_{10} = G_{02} = G_{01}\); proportion of 15:1 in a F2, assuming independent assortment), dominant (\(G_{22} = G_{21} = G_{20} = G_{12} = G_{11} = G_{10}\) and \(G_{02} = G_{01}\); proportion of 12:3:1 in a F2, assuming independent assortment), recessive (\(G_{22} = G_{21} = G_{12} = G_{11}\), \(G_{02} = G_{01}\), and \(G_{20} = G_{10} = G_{00}\); proportion of 9:3:4 in a F2, assuming independent assortment), dominant and recessive (\(G_{22} = G_{21} = G_{12} = G_{11} = G_{20} = G_{10} = G_{00}\) and \(G_{02} = G_{01}\); proportion of 13:3 in a F2, assuming independent assortment), duplicate genes with cumulative effects (\(G_{22} = G_{21} = G_{12} = G_{11}\), and \(G_{20} = G_{10} = G_{02} = G_{01}\); proportion of 9:6:1 in a F2, assuming independent assortment), and non-epistatic genic interaction (\(G_{22} = G_{21} = G_{12} = G_{11}\), \(G_{20} = G_{10}\), and \(G_{02} = G_{01}\); proportion of 9:3:3:1 in a F2, assuming independent assortment).

Because the genotypic values for any two interacting genes are not known, there are infinite genotypic values that satisfy the specifications of each type of digenic epistasis. For example, fixing the gene frequencies (the population) and the parameters m, d, h, and h/d (degree of dominance) for each gene (the trait), the solutions \(G_{22} = G_{21} = G_{12} = G_{11}\) = 5.25 and \(G_{20} = G_{10} = G_{02} = G_{01} = G_{00}\) = 5.71 or \(G_{22} = G_{21} = G_{12} = G_{11}\) = 6.75 and \(G_{20} = G_{10} = G_{02} = G_{01} = G_{00}\) = 2.71 define complementary epistasis but the genotypic values are not the same. The solution implemented in the software allows the user to control the magnitude of the epistatic variance (V(I)), relative to the magnitudes of the additive and dominance variances (V(A) and V(D)). As an input for the user, the software requires the ratio V(I)/(V(A) + V(D)) for each pair of interacting genes (a single value; for example, 1.0). Then, for each pair of interacting genes the software samples a random value for the epistatic value \(I_{22}\) (the epistatic value for the genotype AABB), assuming \(I_{22}\sim N\left( {0, V\left( I \right)} \right)\). Then, the other epistatic effects and genotypic values are computed. In this study, I assumed ratio 1. Increasing the ratio increases the magnitude of the additive, dominance, and epistatic genetic values.

I simulated grain yield (g/plant) and expansion volume (a measure of popcorn quality; ml/g), assuming 400 genes distributed in 10 chromosomes of 200 cM. For grain yield and expansion volume, I specified positive dominance (average degree of dominance of 0.6) and bidirectional dominance (average degree of dominance of 0.0), respectively. For grain yield, the minimum and maximum genotypic values for homozygotes were 30 and 160. The minimum and maximum phenotypic values for homozygotes were 10 and 180. For expansion volume, I assumed 5 and 55 as the minimum and maximum genotypic values for homozygotes. The minimum and maximum phenotypic values were 0 and 60. I assumed no epistasis (but LD), seven types of digenic epistasis and an admixture of these types, defining 30 and 100% of epistatic genes. For the diallel cross, I generated 1,000 DH lines from a population with high LD and selected 15 DHs. The number of dominant genes in the DHs ranged from 136 to 278. The criterion for selecting the parents was two to three DHs at random from six classes for the number of dominant genes: 0 to 140 up to 261 to 290. The number of dominant genes in the selected DHs ranged from 161 to 246. Then, the selected DHs were crossed in a complete diallel without reciprocals. For the generation mean analysis, I generated the contrasting parental inbred lines (P1 and P2), assuming all genes in association, and the generations F1, F2, F3, BC1, and BC2. The numbers of plants per generation were 50, 50, 50, 400, 400, 400, and 400, respectively.

I also generated a scenario of low LD and no epistasis, assuming 10 independent genes and 10 DHs sampled from 1,000 DHs. The criterion for selecting the 10 DHs was minimization of the LD. The LD values (\(\Delta_{ab} = p_{AB} - p_{A} p_{B}\)) ranged from − 0.11 to 0.09 (absolute average value 0.04). For comparison purpose, I computed the LD values for the 80 genes in the chromosomes 1 and 2. The ranges and the means of the absolute LD values for the genes in chromosomes 1, 2 and for the independent genes were, respectively, − 0.15 to 0.25 and 0.07, − 0.16 to 0.21 and 0.05, and − 0.20 to 0.20 and 0.05. The broad sense heritability at the plant level was 20%. For the progeny level were 40, 60, and 80%. To avoid the influence of the experimental error, all analyses were based on the parametric genotypic values and variances and covariances, provided by REALbreeding.

Results

The impact of LD on the Hayman’s diallel analysis is evident even in the scenario of independent genes. The analysis of variance of Wr−Vr indicated adequacy of the additive-dominance model (P value of 1.00) even assuming a heritability of 80%. The regression of Wr on Vr indicated partial dominance (\(\beta_{0} = 0.02\); P value of 0.00) and \(\beta_{1}\) equal to 1 (P value of 0.38). The coefficient of determination was 0.99. The conclusion of partial dominance is correct since the average degree of dominance is 0.52 (in the range 0.01–1.16). However, the LD between the independent genes led to a bias of − 22.5 and − 27.3% in the estimates of the dominance components and a bias in the range − 13.8 to − 47.7% for three of the genetic parameters, ignoring the estimate of the number of dominant genes that is always sub estimated (4 vs. 9, because the DHs have one fixed gene). The correlation between the number of recessive genes and Wr + Vr was 0.92 and the number of dominant genes for the parents’ order of dominance was 6, 6, 6, 6, 5, 5, 4, 5, 4, and 2. These results indicates that when there is low LD and no epistasis, the Hayman’s diallel analysis provides confident results about the inheritance of the quantitative trait but biased estimates of the dominance components and the genetic parameters.

The negative consequences of high LD under no epistasis are also biased estimates of the dominance components and genetic parameters plus a biased estimate of the covariance F, a significant test for the homogeneity of Wr − Vr assuming a heritability of 80%, and an intermediate value for the correlation between the number of recessive genes and Wr + Vr (0.51) (Table 1). Thus, high LD can lead to inadequacy of the additive-dominance model for a high heritability trait. The number of dominant genes for the parents’ order of dominance was 245, 176, 246, 195, 230, 229, 203, 189, 191, 217, 166, 176, 214, 203, and 161. Regardless of the heritability trait, there was evidence of partial dominance and a slight deviation from 1.0 for the regression coefficient.

Table 1 Probability values for the hypothesis of homogeneity of Wr−Vr (P1), assuming heritabilities of 40, 60, and 80%, respectively, β0 = 0 (P2), β1 = 0 (P3), and β1 = 1 (P4), parametric values of the coefficients of the regression of Wr on Vr0 and β1) and the coefficient of determination (R2), parametric average heterosis (h) and correlation between the number of recessive genes and Wr + Vr (r1), parametrica and estimated components of the genotypic variance for parents and F1s, and total genotypic variance (V), assuming no epistasis (No), seven types of digenic epistasisb, and an admixture of types, and 100 and 30% of epistatic genes (% eg)

As demonstrated from the theory, epistasis is a significant additional factor negatively affecting the Hayman’s diallel analysis, especially assuming a high proportion of interacting genes. Assuming 100% of epistatic genes and dominant, recessive, duplicate genes with cumulative effects, and non-epistatic genic interaction, there was evidence of inadequacy of the additive-dominance model, regardless of the trait heritability (Table 1 and Online Resource Fig. 1). Irrespective of the type of epistasis and significance of the test of adequacy of the additive-dominance model, there was a tendency for concluding in favor of overdominance, which is a wrong inference. Further, the correlation between the number of recessive genes and Wr + Vr tends to be lower (in the range − 0.01 to 0.67; 0.39 on average) and all genetic components and parameters will be very biased. Assuming 30% of epistatic genes, there was a tendency for accepting the additive-dominance model for low heritability traits and for rejecting the model for high heritability traits (Table 1 and Online Resource Table 1 and Fig. 1). Regardless of the significance level of the adherence test, there was evidence of partial dominance and the correlation between the number of recessive genes and Wr + Vr had intermediate magnitude (in the range 0.31–0.62; 0.50 on average). However, as demonstrated from the theory, the genetic components and parameters will be very biased. Note that, in general, the coefficient of regression has a magnitude close to 1.0, but only assuming an admixture of epistasis types it is consistently not statistically different from 1.0. In regard to the heterosis, that is unaffected by LD, note that the epistasis tends to decrease it, except assuming duplicate genes with cumulative effects and non-epistatic genic interaction (Table 1 and Online Resource Table 1).

In the absence of epistasis, as theoretically demonstrated, linkage (LD) does not affect the generation mean analysis, respecting that this method cannot distinguish absence of dominance from symmetrical bidirectional dominance. Under the sample sizes and heritability (a low value, 20%) assumed, the experimental evidence is that there is additive variability, dominance, and no epistasis, for both traits (Table 2). Additionally, there was evidence of (predominantly) positive dominance for grain yield and (predominantly) negative dominance for expansion volume. Note also that, regardless of the trait, heritability, and genetic control, if the sample sizes for parents and derived generations are representative, the analysis will provide confident estimates of the parametric means (correlations close to 1.0 between the estimated and true means). However, linkage and epistasis significantly affects the estimates of the genetic components of the generation means, regardless of the type of epistasis, proportional the percentage of epistatic genes (Table 2). One impressive result by fitting the epistatic components ([i], [j], and [l]) is that, even assuming 100% of interacting genes, for most epistasis types there were evidence of no epistasis (P values in the range 0.052–0.15). Another remarkable result is a null or negative correlation between the epistatic components of my model and of the Mather and Jinks’ model, for most of the epistasis types. Note that the two higher correlations (among four) are associated with the less biased estimates of the non-epistatic components, under duplicate genes with cumulative effects and non-epistatic genic interaction. Fitting the Mather and Jinks’ complete model when there is statistical evidence of epistasis or fitting the additive-dominance model when there is epistasis but no statistical evidence of epistasis provides biased estimates of the genetic components, proportional to the percentage of epistatic genes (Table 2 and Online Resource Table 2). For grain yield, under duplicate and dominant and recessive epistasis, a wrong inference about dominance occurred (negative dominance and absence of dominance, respectively, instead of positive dominance).

Table 2 Parametric linear components of means, probability values for the tests of additive effects (P1), dominance (P2), and epistasis (P3), correlation between the parametric and estimated means (r1), and correlation between epistatic components (my model and Mather and Jinks’ model; r2), assuming no epistasis (No), seven types of digenic epistasisa, and an admixture of types, 100 and 30% of epistatic genes, and grain yield (GY; g/plant) and expansion volume (EV; ml/g)

Discussion

Probably because Hill (1964), Nassar (1965), Mather (1967), and Coughtrey and Mather (1970) used simplified genetic models (2 to 10 genes, complete dominance, only LD or epistasis) for investigating the effects of LD or epistasis on Hayman’s diallel analysis, their main findings are very limited and only partially confirmed in this study. Hill (1964) concluded that LD led to a significant upward curvature on the Wr/Vr graph and alters the level of dominance. My results do not support his conclusions. I observed a downward curvature and a change in the level of dominance only under a high proportion of epistatic genes. Nassar (1965) observed that LD determines serious deviations in the slope of the Wr/Vr regression line (consistently lower than one) and an intercept predominantly below the origin (indicating overdominance). I generally observed a slope with magnitude close to 1 but statistically lower than one in 50% of the scenarios, but there was evidence of overdominance only under a high proportion of epistatic genes. I also did not observe a Wr/Vr graph concave upwards with complementary epistasis, regardless of the percentage of interacting genes, as emphasized by Mather (1967) and Coughtrey and Mather (1970).

Concerning the impact of LD and epistasis on the Hayman’s diallel analysis, the results describe an apparently very negative effect: biased estimates of the non-additive components and, consequently, biased estimates of the genetic parameters. But in my opinion, average degree of dominance, symmetry, proportion of dominant to recessive genes and, especially, number of dominant genes has no current significance for breeders. Assuming no epistasis or 30% of interacting genes, the results show a correct inference on the inheritance of a low heritability quantitative trait: partial positive dominance. The decrease in the correlation between the number of dominant genes and Wr + Vr is not so serious to make the Hayman’s diallel analysis useless. However, breeders should be conscious to process and interpret only the traits that show adequacy of the additive-dominance model (homogeneity of the difference Wr − Vr), functional relationship between Wr and Vr (slope of the regression statistically different from zero), and especially a reasonable coefficient of determination of the regression analysis (say, greater than 75%). Unfortunately, this was not observed in some recently published studies (Pessoa et al. 2019; de Lima et al. 2019; Shahadati-Moghaddam et al. 2017). Thus, breeders must realize that LD and epistasis are not their major problem when processing a Hayman’s diallel analysis, but the experimental error.

As theoretically demonstrated, only in a very particular case the Mather and Jinks’ model will provide unbiased estimates of the genetic components. This will occur only if \(I_{22} = I_{00}\), \(I_{21} + I_{12} = - I_{01} - I_{10}\), and \(E\left( I \right)^{\left( n \right)} = \left( {1/4} \right)^{n + 1} I_{11}\). But, because epistasis implies in more genetic components than generations, the only option is to fit the Mather and Jinks’ model. In my opinion, biased estimates of the genetic components are not a serious problem if the inferences on the inheritance of the quantitative trait are correct. In general, even assuming 100% of epistatic genes, the analyses correctly evidenced positive dominance for grain yield but, unfortunately, in 5 out of 10 scenarios the epistasis was not detected. Thus, a more serious problem of the generation mean analysis is not allowing the detection of epistasis even when the percentage of interacting genes is high. When there is evidence of epistasis, another serious problem is the correct attribution of the predominant type. Regardless of the percentage of interacting genes, the signs of the seven epistatic components do not separate complementary, recessive, and dominant and recessive types, as well as duplicate genes with cumulative effects and non-epistatic genic interaction. Assuming positive partial dominance and 100% of interacting genes, the signs of the epistatic components [i], [j], and [l] do not allow discriminate complementary, recessive, dominant and recessive, duplicate genes with cumulative effects, and non-epistatic genic interaction (all positive). When the three components are negative, there is predominantly dominant epistasis. If the additive × additive and dominance × dominance components are positive and the additive × dominance component is negative, there is duplicate epistasis, as emphasized by Mather and Jinks (1971).

Recent published generation mean analyses based on field data, involving diverse crops and traits, six to seven generations, one to six crosses, one or more seasons or environments, evidenced epistasis in many cases but not all epistatic components were statistically different from zero (Shirinpour et al. 2020; Rai et al. 2020; Pal et al. 2020; Verma and Singh 2019, 2018; Mohammed et al. 2018). This is not a problem since the non-additive components are a sum of effects that take sign. In few studies, the authors concluded in favor of complementary or duplicate epistasis but assuming diverse combinations of signs for the epistatic components. This is not in agreement with Mather and Jinks (1971). In two investigations also involving a qualitative analysis, the two analyses provided inconsistent results. For example, the qualitative analysis showed dominant and recessive epistasis and the generation mean analysis evidenced complementary epistasis.

Concluding, taking into account the theory presented and the results from the simulation study, LD and epistasis can have negative effects on the Hayman’s diallel analysis and on the Mather and Jinks’ generation mean analysis, proportional to the LD level and the percentage of epistatic genes, and depending on the predominant type of epistasis. However, biased estimates of quadratic or linear genetic parameters are not so serious if the inheritance of the quantitative trait is correctly inferred, at least partially. Note that, excepting for a high proportion of epistatic genes under high LD, the general correct conclusion was partial positive dominance for grain yield from both analysis. Further, the order of dominance provides a good discrimination between the parents, regardless of the type of epistasis, percentage of interacting genes, and level of LD. Unfortunately, the detection of epistasis from both analyses is highly affected by the trait heritability, predominant type of epistasis, and percentage of interacting genes.