Abstract
Most theoretical studies on epistatic QTL mapping have shown that this procedure is powerful, efficient to control the false positive rate (FPR), and precise to localize QTLs. The objective of this simulation-based study was to show that mapping epistatic QTLs is not an almost-perfect process. We simulated 50 samples of 400 F2 plants/recombinant inbred lines, genotyped for 975 SNPs distributed in 10 chromosomes of 100 cM. The plants were phenotyped for grain yield, assuming 10 epistatic QTLs and 90 minor genes. Adopting basic procedures of r/qtl package, we maximized the power of detection for QTLs (56–74%, on average) but associated with a very high FPR (65%) and a low detection power for the epistatic pairs (7%). Increasing the average detection power for epistatic pairs (14%) highly increased the related FPR. Adopting a procedure to find the best balance between power and FPR, there was a significant decrease in the power of QTL detection (17–31%, on average), associated with a low average detection power for epistatic pairs (8%) and an average FPR of 31% for QTLs and 16% for epistatic pairs. The main reasons for these negative results are a simplified specification of the coefficients of epistatic effects, as theoretically proved, and the effects of minor genes since 2/3 of the FPR for QTLs were due to them. We hope that this study, including the partial derivation of the coefficients of epistatic effects, motivates investigations on how to increase the power of detection for epistatic pairs, effectively controlling the FPR.
Similar content being viewed by others
Introduction
Quantitative trait loci (QTL) mapping continues to be an important method for studying the genetic architecture of quantitative traits. QTLs are genomic regions that significantly affect complex traits. Recently, most studies on epistatic QTL mapping have used a multiple interval mapping (MIM) model based on a frequentist approach. These investigations have shown that epistasis regularly determines quantitative traits (Goto et al. 2019; Xu et al. 2021; Yang et al. 2020), as emphasized by Mackay (2014). However, most of the genotypic variance is additive (Hill et al. 2008). Epistasis is the interaction between non-allelic genes. In recent investigations, the epistatic QTL mapping was based on 80–200 recombinant inbred lines (RILs), using high to average single-nucleotide polymorphism (SNP) density (one SNP/0.7–4.7 cM). The theoretical background for a multiple QTL model with epistasis was developed in a series of studies that included theory and evaluation of the method efficacy based on field and/or simulated dataset (Boer et al. 2002; Carlborg et al. 2000; Jannink and Jansen 2001; Kao et al. 1999; Sen and Churchill 2001; Yi and Xu 2002; Yi et al. 2003).
The MIM model is a multiple QTL model and its likelihood function is a finite normal mixture (Kao et al. 1999). From the analysis of field data of a backcross design, Kao et al. (1999) observed that MIM identified more QTLs than composite interval mapping and interval mapping. In addition, the epistasis contributed 10–14% of the genotypic variance. Zeng et al. (1999) emphasized that MIM can improve the power of detection for minor QTL and the precision of estimating QTL positions. For mapping multiple interacting QTLs, Carlborg et al. (2000) proposed a weighted least-squares approach. On the basis of a simulated F2 dataset assuming none to 16 QTLs and a single pair of epistatic QTLs, they concluded that the genetic algorithm can be effective for the simultaneous mapping of multiple epistatic QTLs. The idea of a two-dimensional scan for searching epistatic QTLs was presented by Sen and Churchill (2001) using a Bayesian framework. From the analysis of field data from a backcross design and based mainly on the two-dimensional scan, the authors declared five QTLs explaining 3–13.5% of the phenotypic variance and two epistatic pairs explaining 6 and 6.5%.
The one-dimensional maximum likelihood approach proposed by Jannink and Jansen (2001) combines across- and within-population information using three doubled-haploid parents. The authors concluded that the method might double the power to detect first-order epistasis. Boer et al. (2002) proposed a penalized likelihood method for mapping epistatic QTLs with a one-dimensional genome search. The analysis of a simulated backcross dataset including three QTLs and all epistatic digenic interaction effects showed that the inclusion of the epistatic effects increased the QTL power of detection but under an effective dimension of 3 for epistatic interactions. Other Bayesian frameworks were proposed by Zuanetti and Milan (2022), Balestre and de Souza (2016), Yi et al. (2003), and Yi and Xu (2002).
Using simulated and field F2 data, Zuanetti and Milan (2022) proposed the method of data-driven reversible jump (DDRJ) for modeling epistasis and compared its efficacy with the usual reversible jump (RJ), QTLBim, MIM, and LASSO. DDRJ outperformed the other methods in regard to the power of detection and mapping precision without increasing the number of false positive QTLs. Balestre and de Souza (2016) proposed an RJ method for epistasis analysis. From the analyses of a simulated F2 and an empirical F2:3 datasets, the authors concluded that their method is effective to map epistatic QTLs. Using three simulated backcross data, Yi et al. (2003) observed no detection of false QTL under a no QTL model, a lower number of detected QTLs when ignoring epistasis, and a high power of detection for the epistatic QTLs. The genetic model adopted by Yi and Xu (2002) includes the additive × additive, additive × dominance, dominance × additive, and dominance × dominance epistatic effects. The analyses of three simulated backcross datasets, assuming two to three QTLs and one, two, and four epistatic effects, showed that fitting the epistatic model improved the power of detection for QTL with no main effect.
On the basis of the previously described methodological investigations on epistatic QTL mapping, the general conclusion is that this procedure is powerful, efficient to control the false positive rate (FPR), and precise to localize QTLs. That is, the power of detection for individual QTL and for epistatic pairs is close to 100%, there are no false positive QTL or there is an efficient control of the FPR, and the localization of the QTL is very precise. Compared with the almost-perfect results from most of the methodological studies, the investigations of Laurie et al. (2014) and Wei et al. (2010) provide some more realistic results as the influence of the trait heritability, the epistatic heritability, and the type of epistasis on the detection power of epistatic pairs. However, in both studies, there is also an almost-perfect control of the FPR, and the QTL mapping is very precise. It is surprising that an FPR close to zero (0.04) for both individual and epistatic QTLs is associated with a high power of detection, between 78 and 100%. It is also surprising that an increase in the power of QTL detection, from 8–16% to 90–99%, is associated with a decrease or an insignificant increase in the FPR.
Using theoretical and empirical results, we show that these almost-perfect results from simulated datasets are because of a combination of simplified specification of the coefficients of epistatic effects and no inclusion of minor genes. Thus, the objective of this simulation-based study on the efficacy of epistatic QTL mapping is to show that mapping epistatic QTLs is not an almost-perfect process. Differently from the previous studies based on simulation, we included 90 minor genes, defined the most known digenic epistasis types, and generated the additive, dominance, and four epistatic genetic values based on Kempthorne (1954).
Materials and methods
Simulation
The simulated dataset was generated using the software REALbreeding (available on request). The program has been developed using Xojo (https://www.xojo.com/). REALbreeding has been used in studies related to genomic selection (Viana et al. 2018), genome-wide association study (Pereira et al. 2018), QTL mapping (Viana et al. 2017), linkage disequilibrium (LD) (Andrade et al. 2019), population structure (Viana et al. 2013b), heterotic grouping/genetic diversity (Viana et al. 2020), and plant breeding (Viana et al. 2013a). It can also be used in research in human genetics, animal genetics and breeding, population genetics, and evolution. The program simulates individual genotypes for genes and molecular markers, and phenotypes in three stages, using inputs from the user. The first stage (genome simulation) is the specification of the number of chromosomes, molecular markers, and genes, as well as marker type and density. The second stage (population simulation) is the specification of the population(s) and sample size or progeny number and size. A population is characterized by the average frequency for the genes (biallelic) and markers (first allele). The final stage (trait simulation) is the specification of the minimum and maximum genotypic values for homozygotes, the minimum, and maximum phenotypic values (to avoid outliers), the direction and degree of dominance, and the broad sense heritability. Thus, the genetic effects are not sampled from a distribution. They are computed using the information provided by the user. For each gene (QTL and minor gene), the software computes the parametric (true) additive (A) and dominance (D) genetic values. On the basis of Kempthorne (1954, 1973), for each pair of epistatic genes (QTL-QTL, QTL-minor gene, and minor gene-minor gene), the software computes the additive × additive, additive × dominance, dominance × additive, and dominance × dominance genetic values. Finally, it sums the computed values to provide the genetic values for the trait.
The types of digenic epistasis are: complementary (G22 = G21 = G12 = G11 and G20 = G10 = G02 = G01 = G00; proportion of 9:7 in a F2, assuming independent assortment), duplicate (G22 = G21 = G20 = G12 = G11 = G10 = G02 = G01; proportion of 15:1 in a F2, assuming independent assortment), dominant (G22 = G21 = G20 = G12 = G11 = G10 and G02 = G01; proportion of 12:3:1 in a F2, assuming independent assortment), recessive (G22 = G21 = G12 = G11, G02 = G01, and G20 = G10 = G00; proportion of 9:3:4 in a F2, assuming independent assortment), dominant and recessive (G22 = G21 = G12 = G11 = G20 = G10 = G00 and G02 = G01; proportion of 13:3 in a F2, assuming independent assortment), duplicate genes with cumulative effects (G22 = G21 = G12 = G11, and G20 = G10 = G02 = G01; proportion of 9:6:1 in a F2, assuming independent assortment), and non-epistatic genic interaction (G22 = G21 = G12 = G11, G20 = G10, and G02 = G01; proportion of 9:3:3:1 in a F2, assuming independent assortment). Gij is the genotypic value for two epistatic genes, where i and j (i, j = 2, 1, or 0) are the number of copies of the gene that increases the trait expression.
The genotypic values for interacting genes are used to compute the additive, dominance, and epistatic genetic values, on the basis of Kempthorne (1954). Detailed description can be found in Viana and Garcia (2022). In summary, the parametric values of the 36 parameters for the nine genotypic values are obtained by solving the equations β = (X′VX)–1 X′Vy, under the restrictions defined by Kempthorne (1954), where X is the incidence matrix, \(V={\rm{diagonal}}\{{f}_{ij}^{(n)}\}\) is the diagonal matrix of the genotype probabilities (n is the number of selfings), and y is the vector of the genotypic values (Gij) (i, j = 0, 1, and 2). The genotype probabilities for the non-inbred population (as F2) are presented by Viana (2004). The genotype probabilities for inbred populations (as RILs) are presented by Viana and Garcia (2022) (Supplementary information).
Dataset
REALbreeding crossed two contrasting inbred lines, assuming genes in association, and generated the populations F1, F2, and F3. The number of simulations of the segregant generations was 50. The sample size for the F2 and F3 were 400 and 4000 (400 progeny of size 10). The F2 plants were genotyped and phenotypes, and the F3 plants were phenotyped. We assumed genotyping for 975 SNPs distributed in 10 chromosomes of 100 cM. The number of SNPs per chromosome ranged from 93 to 100, and the average density was an SNP/cM. We simulated grain yield (g/plant) assuming 10 epistatic QTLs and 90 minor genes (7–10 per chromosome). We allocated one QTL in chromosome 1 (Q1), two QTLs in chromosomes 3 (Q2 and Q3), 5 (Q4 and Q5), and 7 (Q6 and Q7), and three QTLs in chromosome 9 (Q8–Q10). The minimum distance between linked QTLs was 20 cM. The epistatic pairs were Q1–Q5, Q2–Q4, Q3–Q8, Q6–Q7, and Q9–Q10. Thus, we defined two epistatic pairs in the same chromosome and three epistatic pairs in distinct chromosomes. We assumed seven scenarios of the same type of epistasis and an admixture of the distinct types. The trait broad sense heritability was 60%. The QTL heritabilities varied from approximately 1–13%, depending on the generation and epistasis type (7% on average). In general, one to four QTLs showed heritability between 1 and 5% (low), four to five QTLs had heritability between 6 and 9% (intermediate), and two to three QTLs showed heritability in the range 10–13% (high). The scenario of no epistasis included six QTLs with heritabilities in the range of 6–8%. Thus, for both scenarios, the average QTL heritability was 7%.
We also generated 50 simulations of a RIL population derived from the same parents, using a single seed descent process. The sample size was 400. Because we kept the trait heritability at 60%, the QTL heritabilities increased, ranging from approximately 2–22%, depending on the epistasis type (10% on average). Finally, we also processed the analysis of the RIL dataset, assuming a sample size of 600 and an admixture of epistasis types.
Efficiency of the epistatic QTL mapping
The dataset was processed using the R package qtl version 1.48.1 (Broman et al. 2003). We performed a two-dimension scan for the two-QTL model. The output for the epistatic QTLs from a two-dimension scan depends on five LOD thresholds. These LOD thresholds are for testing the full two-locus model versus the null (no QTL) model, the two-locus additive model versus the null model, interaction, the full two-locus model versus a one-locus model, and the two-locus additive model versus a one-locus model (Broman and Sen 2009). Because Broman and Sen (2009) declare that we are inclined to ignore the maximum LOD for interaction (Mi(j, k)), we processed the data using and ignoring Mi(j, k). Because a permutation test based on the expectation-maximization method is extremely time-consuming, we used 1000 permutations for the single-QTL model (one-dimension scan) but 100 permutations for the two-QTL model. For computing QTL detection power, FPR, and mapping precision (bias in the QTL positioning), we used a REALbreeding tool (eQTL summary). A declared QTL was assumed as a true QTL when the bias between its estimated and actual positions was lower than 20 cM. Because epistatic QTL mapping is more complex to interpret than non-epistatic QTL mapping, we first processed preliminary analyses using the single- and two-QTL models, the F2 and F2:3 designs, and two sample sizes (200 and 400), based on the first simulation data assuming no epistasis and an admixture of epistasis types. The main questions to be answered were: (1) do the two-QTL models provide a more effective analysis relative to the single-QTL model when there is epistasis? (2) does the F2:3 design provide a more effective analysis relative to the F2 design? and (3) does increasing the sample size significantly improve the epistatic QTL mapping?
Results
On the basis of the results from the preliminary analysis (Table 1), we realized that: (1) the two QTLs model provides a more effective analysis relative to the single-QTL model when there is epistasis; (2) the F2:3 design did not provide a more effective analysis relative to the F2 design; and (3) increasing the sample size significantly improves the epistatic QTL mapping. Under no epistasis and higher sample size, note that the power of detection for QTLs with heritability between 6 and 8% was 0.67. The higher FPR (0.20) is attributable to the threshold of 3.9. The FPR would be zero by using a threshold of 4.9 (26% higher). The bias was only 1 cM, on average. Assuming an admixture of epistasis types, note that fitting the two-QTL models to the F2 or F2:3 data of 400 genotyped plants provided the best results concerning power and FPR. Thus, for assessing the efficacy of the epistatic QTL mapping, we used the F2 data with 400 plants. Under this scenario, the thresholds assuming 100 and 250 permutations were essentially the same. An impressive negative result was a very high FPR under epistasis.
The differences between the parents’ genotypic values and the magnitude of the F2 genotypic variance were lower assuming duplicate epistasis and higher under non-epistatic genic interaction (Table 2). Comparing with the scenario of no epistasis, where the ranges for the parents and the F2 genotypic variance were 120 g/plant and 367.7 (g/plant)2, respectively, note that epistasis significantly decreased the difference between parents and the F2 genotypic variance. This is impressive considering that it was assumed only 10% of epistatic genes.
Regardless of the epistasis type, ignoring the maximum LOD for interaction (Mi(j, k)) maximizes the power of QTL detection irrespective of the QTL heritability (Table 2). The average power of QTL detection for low, intermediate, and high heritability QTLs were 56, 67, and 74%, respectively. However, even taking into account the Mi(j, k) obtained by permutations, the FPR was very high, in the range of 60–70%, approximately (65% on average). The high FPR is attributable mainly to the minor genes. The average FPR ignoring the false declarations in chromosomes with no QTL was 45% (range between 40 and 50%). Thus, we can estimate an average FPR of 23% (range of 15–30%), ignoring minor genes. But minor genes cannot be ignored. Note that taking into account or ignoring Mi(j, k), the power of detection for epistatic pairs is generally zero (average value of only 7%). Only under duplicate and duplicate genes with cumulative effects, ignoring Mi(j, k), we observed one or two epistatic pairs with power greater than or equal to 50%. Interestingly, for these three pairs, only one pair involved high heritability epistatic QTLs. Thus, we did not observe a positive association between the power of epistatic pairs and QTL heritability. As expected, due to the low power of detection for epistatic pairs, we observed a low FPR for epistatic pairs (maximum of 15%).
We analyzed the increase in the power of detection for the epistatic pairs by decreasing the Mi(j, k) obtained by permutations but filtering in the file of maximum LODs. But the results were disappointing (Table 2). The average power doubled (14%), but only one to three out of the five epistatic pairs showed approximate power of at least 10% (this means that only one in ten similar pairs would be declared), and the FPR was very high (85 to 97%, with one exception). Finally, we investigated the best balance between the power of QTL detection and FPR for both QTLs and epistatic pairs. That is, we tried to minimize the FPR by keeping a reasonable power of detection. Again, we filtered in the file of maximum LODs from the analysis ignoring Mi(j, k), by specifying a high threshold for the full two-locus model versus a one-locus model. The consequences of keeping an FPR of up to 30% for both QTLs and epistatic pairs were a significant decrease in the power of detection, proportional to the QTL heritability. The average power of detection for the low, intermediate, and high heritability QTLs were 17, 22, and 31%, respectively, and the average power of detection for epistatic pairs was 8%.
In regard to mapping precision, we observed an average bias in the positioning of the true declared QTLs in the range of 1.1, for duplicate genes with cumulative effects, and 6.0 cM, for duplicate epistasis (Table 2), for the best balance between the power of detection and FPR. The problems of very high FPR for individual QTLs and very low power of detection for epistatic pairs were also observed in a RIL population, even increasing the sample size to 600 (Table 3). Comparing with the results from F2, the analysis of the RILs provided, on average, a slightly lower QTL power of detection of approximately −7%, regardless of the maximum LOD for interaction. The average FPR was 67%, irrespective of the Mi(j, k) too. Using RILs did not increase the power of detection of epistatic pairs as well as the mapping precision. The average power of detection for epistatic pairs ranged from 0 to 18% and the bias in the positioning the epistatic QTLs increased 2 and 5%, ignoring and taking into account Mi(j, k). Again, regardless of the epistasis type, 60–100% of the epistatic pairs were not declared. Those pairs mapped showed low average power of detection, 19%. The value for F2 was 37%. No significant advantage was obtained by increasing the sample size to 600.
Discussion
The results from this simulation-based study show that mapping epistatic QTLs is a challenge. Regardless of the epistasis type, the basic procedures of r/qtl maximized the QTL power of detection, by ignoring Mi(j, k), but associated with a very high FPR of approximately 60–70%. Furthermore, the lack of power to detect epistatic pairs is also disappointing. In F2, under complementary, recessive, dominant and recessive, and non-epistatic genic interaction, no epistatic pair was detected. For the other epistasis types and admixture of types, only one to two out of five epistatic pairs showed a power of at least 10%, approximately, with no clear association between power and the interacting QTL heritabilities (48.7% on average). Broman and Sen (2009) emphasized that they “are inclined to ignore Mi(j, k)” in the rule (8.3) and that they “place greater reliance on the numeric summaries from summary.scantwo”. This procedure alone, however, did not clearly provide the best balance between power and FPR for QTLs and epistatic pairs. We stress that this negative general result is not due to the use of r/qtl. We also analyzed the first simulation data using Windows QTL Cartographer (Wang et al. 2011), performing the sequential search proposed by Laurie et al. (2014), and QTL IciMapping (Meng et al. 2015), observing similar results to those provided by r/qtl. The problem could be a simplified specification of the coefficients of epistatic effects implemented in the software available.
Any software for QTL mapping process the phenotypic and molecular datasets from code specified under quantitative genetics theory. For F2 data, the genotypic value for two non-epistatic genes is G = ma + mb + θaaa + θbab + γada + γbdb = M + A + D, where, for each gene, m is the mean of the homozygotes, θ = 1 and γ = 0 if AA or BB, θ = 0 and γ = 1 if Aa or Bb, and θ = −1 and γ = 0 if aa or bb, a is the deviation between the genotypic value of the homozygote of greater expression and m, d is the dominance deviation (Falconer and Mackay 1996). The interval mapping is based on the conditional probabilities of the QTL genotypes in an interval flanked by two molecular markers. For example, if there is a QTL in a given interval, the average genotypic value for the individuals with SNP genotype mn (for the first SNP) op (for the second SNP) is Gmnop = M + [P(QQ│mnop) – P(qq│mnop)]a + P(Qq│mnop)d, where P(QQ│mnop), P(qq│mnop), and P(Qq│mnop) are the conditional probabilities (Haley and Knott 1992). Because all software use correct modeling for the additive and dominance deviations, the QTL mapping of non-epistatic QTLs is really a powerful process that provides effective control of the FPR and precise positioning of the true declared QTLs, irrespective of the statistical approach (Viana et al. 2017).
However, correctly modeling epistasis in QTL mapping is a challenge to overcome. The genotypic value for two interacting genes is G = ma + mb + θaaa + θbab + γada + γbdb + I = M + A + D + I, where I is the epistatic effect (a specific value for each one of the nine genotypes). Partitioning the epistatic effect based on Kempthorne (1954), we have a much more complex situation: \({G}_{ijkl}=M+{\alpha }_{i}^{1}+{\alpha }_{j}^{1}+{\alpha }_{k}^{2}+{\alpha }_{l}^{2}+{\delta }_{ij}^{1}+{\delta }_{kl}^{2}+{({\alpha }^{1}{\alpha }^{2})}_{ik}+{({\alpha }^{1}{\alpha }^{2})}_{jk}+{({\alpha }^{1}{\alpha }^{2})}_{il}+{({\alpha }^{1}{\alpha }^{2})}_{jl}+{({\alpha }^{1}{\delta }^{2})}_{ikl}+{({\alpha }^{1}{\delta }^{2})}_{jkl}+{({\delta }^{1}{\alpha }^{2})}_{ijk}+{({\delta }^{1}{\alpha }^{2})}_{ijl}+{({\delta }^{1}{\delta }^{2})}_{ijkl}=M+A+D+AA+AD+DA+DD\), where α is the average effect of a gene, δ is the dominance value, and (αα), (αδ), (δα), and (δδ) are the additive × additive, additive × dominance, dominance × additive, and dominance × dominance effects, respectively. Assuming two epistatic QTLs in two intervals, the 81 parametric average genotypic values in F2 are complex to derive since the SNP genotype probabilities depend on the gamete probabilities regarding four SNPs and two QTLs. Only in this way could the correct coefficients for the epistatic effects be derived. In all the previous theoretical studies on epistatic QTL mapping there is a simplified specification of the coefficients of epistatic effects. Yi et al. (2003), Boer et al. (2002), Zeng et al. (1999), and Kao et al. (1999) assumed that the coefficient of the epistatic effect is the product of the coefficients of the effects, multiplied by an indicator variable, to define if the QTLs are epistatic or not. When modeling individual epistatic effects, the same general rule was assumed (Kao and Zeng 2002; Yi and Xu 2002). That is, the coefficient of the additive x additive effect, for example, was defined as the product between the coefficients for the two deviations (correctly attributed). However, as demonstrated in the Appendix, even assuming that the SNPs in the first interval have independent assortment (because independence or free recombination (1/2)) relative to the SNPs in the second interval, the coefficients for the epistatic effects do not follow this product rule. The same result was theoretically proved by Weir and Cockerham (1976), for the components of the genotypic variance in inbred populations, under LD. We were also surprised by the disappointing results from RILs. In this generation, the most significant component of the genotypic variance is the additive variance, higher than the additive × additive variance. The other epistatic components have a low magnitude, even with a high percentage of epistatic genes (Viana and Garcia 2022). Thus, even in RILs, there is a problem of theoretically modeling the epistatic effects.
Concerning the differences between this study and the previous investigations on the efficacy of mapping epistatic QTLs, including Laurie et al. (2014) and Wei et al. (2010), our results show that ignoring the effects of minor genes when computing the genotypic values explains the lower FPR previously observed. This study showed that approximately 2/3 of the FPR was due to minor genes. Furthermore, in the previous studies, there is generally a simplified specification of the epistatic effects when generating the simulated genotypic values. Most previous studies on the efficacy of epistatic QTL mapping ignored the types of epistasis and sampled epistatic effects from a Normal distribution, assuming I∼N(0, \({\sigma }_{I}^{2}\)). Only Wei et al. (2010) and Carlborg et al. (2000) considered five to six epistasis types. In regard to high-order epistasis, the epistasis type specification for three or more genes is complex, except for complementary and duplicate (for three epistatic genes, proportions of 27:37 and 63:1, respectively, in an F2 generation, assuming independent assortment). On the basis of Kempthorne (1954), the computation of the parametric genetic effects relative to three epistatic genes depends on the F1 gamete probabilities, where, for example, P(ABC) = (1–rab)(1–rbc)/2, assuming no interference. However, Maki-Tanila and Hill (2014) showed that the majority of the epistatic variance is due to two-locus interactions.
In the simulation-based studies on epistatic QTL mapping of Zuanetti and Milan (2022) and Balestre and de Souza (2016), the authors did not include minor genes and adopted the same product rule to specify the coefficients of the epistatic effects. However, their results also show that mapping epistatic QTLs is not an almost-perfect process. From a single simulated dataset, the DDRJ method proposed by Zuanetti and Milan (2022) provided a power of detection of 50% for the main QTLs and 40% for the epistatic pairs. The FPR for epistatic pairs achieved 78%. The authors also concluded that RJ, MIM, and LASSO showed a greater rate of false positives and/or a lower power of QTL detection, especially when estimating epistatic effects. Furthermore, because MIM was affected by the model selection criterion, the method may overestimate or underestimate the number of QTLs. The RJ method proposed by Balestre and de Souza (2016) provided a power of detection of 31% for the main QTLs and 14% for the epistatic pairs, associated with FPRs of 20 and 25%, respectively. The average bias was 6.1 cM.
The number of applied studies involving epistatic QTL mapping has increased. However, because most of them do not include any information on candidate genes (ontology and level of expression) and gene networks, the investigations do not prove that each declared QTL and epistatic pair are true declarations. Tadmor-Levi et al. (2019) and Li et al. (2018) observed QTLs with significant epistatic effects but with one or both interacting QTLs with no significant additive effects. Goto et al. (2019) found 15 pairs of epistatic QTLs from 27 main-effect QTLs, for chicken body weight. Regarding maize grain yield and five related traits, Yang et al. (2020) found 49 QTLs and 24 epistatic pairs. For rice grain yield and seven related traits, Chen et al. (2021) found 37 QTLs and 47 epistatic pairs. However, Xu et al. (2021) mapped only two QTLs with significant epistatic effects and Hu et al. (2021) did not find epistatic QTLs using MIM.
Concluding, on the basis of Kempthorne (1954) to simulate the components of the genotypic values assuming 10 epistatic QTLs and 90 minor genes, we show that mapping epistatic QTLs is a challenge. The basic procedures for the most important software available provided average power of detection for QTLs in the range of 40–65%, associated with low power for the epistatic pairs and a very high FPR for QTLs. Increasing the average power for epistatic pairs highly increased the related FPR. Adopting a procedure to find the best balance between power and FPR, there was a significant decrease in the power of QTL detection, associated with a low average power for epistatic pairs and an average FPR of 31% for QTLs and 16% for epistatic pairs. Under the best balance, the procedure is precise for positioning QTLs, showing an average bias of 3.5 cM. The main reasons for these negative results are a simplified specification of the coefficients of epistatic effects, as theoretically proved, and the effects of minor genes since 2/3 of the FPR for QTLs were due to them. However, on the basis of the great flexibility of the software available, we have confidence that breeders can achieve a superior balance between power and FPR when processing field data. Finally, we hope that this study, including the partial derivation of the coefficients of the epistatic effects, motivates investigations on how to increase the power of detection for QTLs and epistatic pairs, effectively controlling the FPR.
Data availability
The dataset is available at https://doi.org/10.6084/m9.figshare.19242855.
References
Andrade ACB, Viana JMS, Pereira HD, Pinto VB, Fonseca ESF (2019) Linkage disequilibrium and haplotype block patterns in popcorn populations. PLoS One 14(9):e0219417
Balestre M, de Souza Jr CL (2016) Bayesian reversible-jump for epistasis analysis in genomic studies. BMC Genomics 17(1):1012
Boer MP, Braak CJF, Jansen RC (2002) A penalized likelihood method for mapping epistatic quantitative trait loci with one-dimensional genome searches. Genetics 162:951–960
Broman KW, Sen S (2009). A guide to QTL mapping with R/qtl. Springer
Broman KW, Wu H, Sen S, Churchill GA (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19(7):889–890
Carlborg O, Andersson L, Kinghorn B (2000) The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155:2003–2010
Chen J, Tai LY, Luo L, Xiang J, Zhao ZW (2021) Mapping QTLs for yield component traits using overwintering cultivated rice. J Genet 100:2
Falconer DS, Mackay TFC (1996). Introduction to quantitative genetics, 4th edn. Longman
Goto T, Ishikawa A, Nishibori M, Tsudzuki M (2019) A longitudinal quantitative trait locus mapping of chicken growth traits. Mol Genet Genomics 294(1):243–252
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:15–324
Hill WG, Goddard ME, Visscher PM (2008) Data and theory point to mainly additive genetic variance for complex traits. PLos Genet 4:2
Hu ST, Wang M, Zhang X, Chen WK, Song XR, Fu XY et al. (2021) Genetic basis of kernel starch content decoded in a maize multi-parent population. Plant Biotechnol J 19(11):2192–2205
Jannink J-L, Jansen R (2001) Mapping epistatic quantitative trait loci with one-dimensional genome searches. Genetics 157:445–454
Kao C-H, Zeng Z-B, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203–1216
Kao CH, Zeng ZB (2002) Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics 160(3):1243–1261
Kempthorne O (1954) The theoretical values of correlations between relatives in random mating populations. Genetics 40:153–167
Kempthorne O (1973) An introduction to genetic statistics. The Iowa State University Press, Ames
Laurie C, Wang S, Carlini-Garcia LA, Zeng ZB (2014) Mapping epistatic quantitative trait loci. BMC Genet 15:112
Li JC, Huang W, Cao HJ, Xiao GL, Zhou J, Xie CH et al. (2018) Additive and epistatic QTLs underlying the dormancy in a diploid potato population across seven environments. Sci Hortic 240:578–584
Mackay TFC (2014) Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet 15(1):22–33
Maki-Tanila A, Hill WG (2014) Influence of gene interaction on complex trait variation with multilocus models. Genetics 198(1):355–367
Meng L, Li H, Zhang L, Wang J (2015) QTL IciMapping: integrated software for genetic linkage map construction and quantitative trait locus mapping in biparental populations. Crop J 3(3):269–283
Pereira HD, Viana JMS, Andrade ACB, Silva FFE, Paes GP (2018) Relevance of genetic relationship in GWAS and genomic prediction. J Appl Genet 59(1):1–8
Sen S, Churchill GA (2001) A statistical framework for quantitative trait mapping. Genetics 159:371–387
Tadmor-Levi R, Hulata G, David L (2019) Multiple interacting QTLs affect disease challenge survival in common carp (Cyprinus carpio). Heredity 123(5):565–578
Viana JMS (2004) Quantitative genetics theory for non-inbred populations in linkage disequilibrium. Genet Mol Biol 27(4):594–601
Viana JMS, DeLima RO, Mundim GB, Teixeira Conde AB, Vilarinho AA (2013a) Relative efficiency of the genotypic value and combining ability effects on reciprocal recurrent selection. Theor Appl Genet 126(4):889–899
Viana JMS, Garcia AAF (2022) Significance of linkage disequilibrium and epistasis on genetic variances in noninbred and inbred populations. BMC Genomics 23(1):286
Viana JMS, Pereira HD, Mundim GB, Piepho HP, Silva FFE (2018) Efficiency of genomic prediction of non-assessed single crosses. Heredity 120(4):283–295
Viana JMS, Risso LA, Oliveira deLima R, Fonseca e Silva F (2020) Factors affecting heterotic grouping with cross‐pollinating crops. Agron J 113:210–223
Viana JMS, Silva FF, Mundim GB, Azevedo CF, Jan HU (2017) Efficiency of low heritability QTL mapping under high SNP density. Euphytica 213(1):13
Viana JMS, Valente MSF, Silva FF, Mundim GB, Paes GP (2013b) Efficacy of population structure analysis with breeding populations and inbred lines. Genetica 141(7-9):389–399
Wang S, Basten CJ, Zeng ZB (2011). WINDOWS QTL Cartographer
Wei WH, Knott S, Haley CS, de Koning DJ (2010) Controlling false positives in the mapping of epistatic QTL. Heredity 104(4):401–409
Weir BS, Cockerham CC (1976). In: Pollak E, Kempthorne O, Bailey Jr TB (eds) International Conference on Quantitative Genetics. The Iowa State University Press, Ames, p 247–269
Xu YF, La GX, Fatima N, Liu ZH, Zhang LR, Zhao LF et al. (2021) Precise mapping of QTL for Hessian fly resistance in the hard winter wheat cultivar ‘Overland’. Theor Appl Genet 134(12):3951–3962
Yang JW, Liu ZH, Chen Q, Qu YZ, Tang JH, Lubberstedt T et al. (2020) Mapping of QTL for grain yield components based on a DH population in maize. Sci Rep. 10:1
Yi N, Xu S (2002) Mapping quantitative trait loci with epistatic effects. Genet Res 79:02
Yi N, Xu S, Allison DB (2003) Bayesian model choice and search strategies for mapping interacting quantitative trait loci. Genetics 165:867–883
Zeng ZB, Kao CH, Basten CJ (1999) Estimating the genetic architecture of quantitative traits. Genet Res 74(3):279–289
Zuanetti DA, Milan LA (2022) Bayesian modeling for epistasis analysis using data-driven reversible jump. IEEE/ACM Trans Comput Biol Bioinform 19(3):1495–1506
Acknowledgements
The authors thank the National Council for Scientific and Technological Development (CNPq), the Brazilian Federal Agency for Support and Evaluation of Graduate Education (Capes; Finance Code 001), and the Foundation for Research Support of Minas Gerais State (Fapemig) for financial support.
Author information
Authors and Affiliations
Contributions
JMSV designed the study, programmed the software, and revised the manuscript. CASS simulated and processed the data, and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor: Chenwu Xu.
Appendix: The coefficients of the epistatic effects in F2 generation
Appendix: The coefficients of the epistatic effects in F2 generation
Assume that there is a QTL in two intervals with the independent assortment (due to independence or free recombination; Q1/q1 and Q2/q2). Assume also association for the two QTLs and the four flanking SNPs (A/a, B/b, C/c, and D/d) in the F1 (AQ1BCQ2D/aq1bcq2d). Assuming no interference, the F1 gamete probabilities in relation to the first QTL and their flanking markers are well known, given by:
Defining by R the F1 gamete probabilities in relation to the second QTL and their flanking markers, we have, for example,
The expectation of the genotypic values for the F2 individuals with SNP genotype AABBCCDD is
where
and I2222 is the average epistatic value for the SNP genotype, given by
Note that the coefficients of the a and d deviations are those presented by Haley and Knott (1992). Kempthorne’s assumptions are (since p1 = p2 = 1/2):
(i) restrictions for the AA effects: 1) (αQ1 αQ2)+(αQ1 αq2) = 0; 2) (αq1 αQ2)+(αq1 αq2) = 0; 3) (αQ1 αQ2)+(αq1 αQ2) = 0; and 4) (αQ1 αq2)+(αq1 αq2) = 0.
(ii) restrictions for the AD effects: 1) (αQ1 δQ2Q2)+(αQ1 δQ2q2) = 0; 2) (αQ1 δQ2q2)+(αQ1 δq2q2) = 0; 3) (αq1 δQ2Q2)+(αq1 δQ2q2) = 0; 4) (αq1 δQ2q2)+(αq1 δq2q2) = 0; 5) (αQ1 δQ2Q2)+(αq1 δQ2Q2) = 0; 6) (αQ1 δQ2q2)+(αq1 δQ2q2) = 0; and 7) (αQ1 δq2q2)+(αq1 δq2q2) = 0 (six out of the seven are independent).
(iii) restrictions for the DA effects: 1) (δQ1Q1 αQ2)+(δQ1q1 αQ2) = 0; 2) (δQ1q1 αQ2)+(δq1q1 αQ2) = 0; 3) (δQ1Q1 αq2)+(δQ1q1 αq2) = 0; 4) (δQ1q1 αq2)+(δq1q1 αq2) = 0; 5) (δQ1Q1 αQ2)+(δQ1Q1 αq2) = 0; 6) (δQ1q1 αQ2)+(δQ1q1 αq2) = 0; and 7) (δq1q1 αQ2)+(δq1q1 αq2) = 0 (six out of the seven are independent).
(iv) restrictions for the DD effects: 1) (δQ1Q1 δQ2Q2)+(δQ1Q1 δQ2q2) = 0; 2) (δQ1Q1 δQ2q2)+(δQ1Q1 δq2q2) = 0; 3) (δQ1q1 δQ2Q2)+(δQ1q1 δQ2q2) = 0; 4) (δQ1q1 δQ2q2)+(δQ1q1 δq2q2) = 0; 5) (δq1q1 δQ2Q2)+(δq1q1 δQ2q2) = 0; 6) (δq1q1 δQ2q2)+(δq1q1 δq2q2) = 0; 7) (δQ1Q1 δQ2Q2)+(δQ1q1 δQ2q2) = 0; 8) (δQ1q1 δQ2Q2)+(δq1q1 δQ2Q2) = 0; 9) (δQ1Q1 δQ2q2)+(δQ1q1 δQ2q2)=0; 10) (δQ1q1 δQ2q2)+(δq1q1 δQ2q2) = 0; 11) (δQ1Q1 δq2q2)+(δQ1q1 δq2q2)=0; and 12) (δQ1q1 δq2q2)+(δq1q1 δq2q2) = 0 (nine out of the 12 are independent).
Using the restrictions, I2222 can be expressed as follows:
Thus, the coefficients for the additive × additive, additive × dominance, dominance × additive, and dominance × dominance effects are not given by the product of the coefficients of the additive and dominance effects. Assuming incomplete linkage between the SNPs B/b and C/c (0 < rbc < 1/2), we have
since \(P(AQ1BCQ2D)=P(A).P(Q1|A).P(B|Q1).P(C|B).P(Q2|C).P(D|Q2)=(1/2).(1-{r}_{aq1}).(1-{r}_{q1b}).(1-{r}_{bc}).(1-{r}_{cq2}).(1-{r}_{q2d})={P}_{111}.(1-{r}_{bc}).2{R}_{111}\). Assuming complete linkage (rbc = 0), the probability is the same, but there will be 32 and not 64 gametes for the F1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Viana, J.M.S., Souza, C.A.S. Efficiency of mapping epistatic quantitative trait loci. Heredity 131, 25–32 (2023). https://doi.org/10.1038/s41437-023-00618-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41437-023-00618-5
- Springer Nature Switzerland AG
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.