Introduction

Epistasis refers to interactions between alleles from two or more genetic loci (Carlborg and Haley 2004; Phillips 2008) and recent molecular and biochemical work identified a high number of interactions between genes or their products (Tong et al. 2004; St Onge et al. 2007; He et al. 2010; Costanzo et al. 2010). Even though regulatory mechanisms that are not based on DNA sequence, like epigenetic, post-transcriptional, or post-translational modifications, certainly play a role, the observed networks and the extensive interplay of factors should to some extent be reflected on the genetics level. Consistent with this assumption, epistatic interactions have been identified in animals and plants (Montooth et al. 2003; Carlborg et al. 2006; Würschum et al. 2011a, b). Epistasis scans are, however, computationally demanding and have therefore often been neglected.

Genome-wide association studies (GWAS) in humans have successfully identified a large number of genetic variants associated with complex traits and diseases. Intriguingly, these variants generally only explain a small proportion of the genotypic variance (Visscher 2008) which led to the question of what may explain the remaining, or hidden heritability (Dermitzakis and Clark 2009; Maniolo et al. 2009; Gibson 2010). Explanations that have been brought forward include rare variants (low-frequency alleles), genetic heterogeneity, epigenetics, genotype-by-environment interactions but also simply the contribution of many thousands of variants each having only a small effect on a complex trait. The assumption that a major part of the genetic variance is simply hidden below the stringent significance thresholds of GWAS is supported by recent findings of Yang et al. (2010) who used a genome-wide prediction and by this approach could explain a much larger proportion of the heritability for height. In addition, epistasis is considered another suspect for the hidden heritability, but detailed knowledge about the contribution of epistasis to the heritability of complex traits is missing.

Oil content is controlled by fatty acid biosynthesis in the plastid and by the complex assembly of triacylglycerol (TAG) molecules in the endoplasmic reticulum (ER) (Baud and Lepiniec 2010) (Fig. 1a). The TAG metabolism is biochemically characterized and many of the genes in this pathway have been identified. In addition, transcriptional control of the involved enzymes constitutes another important level of regulation. The precursor for de novo fatty acid biosynthesis is acetyl-CoA which is formed by the transphosphorylation of PEP to pyruvate catalyzed by the plastidial pyruvate kinase (PK) and the subsequent oxidative decarboxylation of pyruvate to acetyl-CoA and CO2 by the pyruvate dehydrogenase complex. The first committed step in this pathway is the formation of malonyl-CoA from acetyl-CoA and bicarbonate by the acetyl-CoA-carboxylase (ACC). This reaction requires biotin as a cofactor and though the organization of the plant ACC is a matter of debate, ACC from rapeseed plastids is likely to consist of four separate subunits. A homodimer of biotin carboxylase (BC) assembled with a homodimer of biotin carboxyl carrier protein (BCCP), which is loosely associated to a heterotetramer of α- and β-carboxyltransferase subunits (BCT) (Alban et al. 2000). Fatty acids are then synthesized by a series of condensation reactions of which the initial reaction is catalyzed by a 3-ketoacyl-ACP synthase of type III (KAS3). The only enzymatic reaction exclusively committed to TAG formation is the transfer of a third fatty acid to the vacant position of a diacylglycerol by diacylglycerol acetyltransferase (DGAT). WRINKLED1 encodes an AP2-transcription factor which was identified in A. thaliana as the mutant is specifically impaired in TAG accumulation in the seeds (Focks and Benning 1998; Cernac and Benning 2004).

Fig. 1
figure 1

Simplified triacylglycerol pathway and oil content QTL. a Simplified schematic representation of the fatty acid metabolism involving the candidate genes of this study. b Main effect QTL for oil content, intergenic and intragenic epistasis, and linkage disequilibrium (LD). Enzymes: PK pyruvate kinase, PDH pyruvate dehydrogenase, PDHK pyruvate dehydrogenase kinase, ACC acetyl-CoA carboxylase, BC biotin carboxylase, BCCP biotin carboxyl carrier protein, BCT biotin carboxyltransferase, KAS3 beta-ketoacyl synthase III, DGAT diacylglycerol acetyltransferase. Transcription factor: WRI1 WRINKLED1, ER endoplasmic reticulum

Despite our understanding of the separate steps involved in TAG biosynthesis, attempts to modulate TAG storage in plants have been not or only modestly successful (Cahoon et al. 2007). This may be attributed to unknown interactions between the components involved in the TAG pathway which have so far been neglected. The TAG metabolism thus represents an ideal model pathway to investigate the contribution of epistatic interactions to the heritability of complex traits.

Materials and methods

Plant materials, field experiments and molecular markers

This study was based on 685 diverse elite rapeseed (Brassica napus) inbred lines. All material used in this study was provided by the breeding company Norddeutsche Pflanzenzucht Hans-Georg Lembke KG (NPZ, Germany). The evaluated traits were oil content (% of dry matter), glucosinolates (GSL, μmol/g fat free dry matter), yield (Mg ha−1), and thousand-kernel weight (TKW, g) (Fig. S1). The 685 genotypes were evaluated in 1 year in routine plant breeding trials at 4 locations except for GSL which was evaluated at 3 locations. Field trials were unreplicated and block adjustments were done based on three included checks. The 685 genotypes were fingerprinted following standard protocols with 60 simple sequence repeat (SSR) markers distributed throughout the whole genome and with 17 SNP markers (SNPs and InDels) based on candidate gene sequences. The candidate genes were sequenced in a discovery set of rapeseed lines to identify polymorphic sites. Details about the identification of the SNPs in the candidate genes are available in the accompanying publication (GABI OIL consortium, in preparation). In brief, the 17 SNP markers used in this study are derived from nine candidate genes from the TAG biosynthesis pathway: pyruvate kinase (PK, 2 SNPs), pyruvate dehydrogenase (PDH, 2 SNPs), pyruvate dehydrogenase kinase (PDHK, 2 SNPs), biotin carboxylase (BC, 1 SNP), isoform 2 of biotin-carboxy-carrier protein (BCCP, 4 SNPs) (Thelen et al. 2001), α-subunit of biotin carboxyltransferase (BCT, 1 SNP), diacylglycerol acetyltransferase DGAT1 (DGAT, 2 SNPs), β-ketoacyl synthase III (KAS3, 1 SNP), and WRINKLED 1 (WRI1, 2 SNPs).

Phenotypic data analyses

The analyses were based on a two-stage approach which has been shown to yield similar results as a one-step analysis (Möhring and Piepho 2009). In the first step adjusted entry mean values were estimated for each location by adjusting for block effects using the included checks. In the second step the adjusted entry mean values from the first step were used to estimate variance components using the following linear mixed model: y ij  = μ + l j  + g i  + e ij , where y ij is the adjusted entry mean of the ith rapeseed line at the jth location, μ the intercept term, l j the effect of the jth location, g i the genetic effect of the ith rapeseed line, and e ij the error term including the genotype-by-location interaction effect. To determine variance components by the restricted maximum likelihood (REML) method both location and genotype were modeled as random effects. Significance of variance component estimates was tested by model comparison with likelihood ratio tests where the halved P values were used as an approximation (Stram and Lee 1994). Heritability (h 2) on an entry-mean basis was estimated as the ratio of genotypic to phenotypic variance according to Melchinger et al. (1998). Furthermore, genotypes were regarded as fixed effects in the above model to estimate adjusted entry means across locations for all genotypes and traits. These were then used to estimate the proportion of genotypic variance (p G) explained by the detected QTL.

Genetic correlations were calculated according to Cooper et al. (1996). Associations among the 685 genotypes were analyzed by applying principal coordinate analysis (PCoA) (Gower 1966) based on the modified Rogers’ distances of the individuals (Wright 1978). Linkage disequilibrium (LD) between candidate gene SNPs was assessed by the LD measure r 2 (Weir 1996) and significance of LD was tested with Fisher’s exact tests (Hill and Robertson 1968). LD and PCoA computations were performed with the software package Plabsoft (Maurer et al. 2008).

Association mapping

The following mixed model was used for the association mapping: y ij  = μ + m u  + g i  + l j  + e ij , where y ij is the adjusted entry mean of the ith rapeseed line at the jth location (estimated in the first step of the phenotypic analysis), μ the intercept term, m u the effect of uth marker genotype, g i the genetic effect of the ith rapeseed line, l j the effect of the jth location, and e ij the residual including the genotype-by-location interaction effect. The marker effect m u was modeled as fixed effect whereas g i and l j were regarded as random effects.

The K model was applied (Yu et al. 2006; Reif et al. 2011) in which the variance of the random genetic effect was assumed to be Var(g) = 2 Kσ 2 g , where σ 2 g refers to the genetic variance estimated by REML and K was a 685 × 685 matrix of kinship coefficients that define the degree of genetic covariance between all pairs of entries. We followed the suggestion of Bernardo (1993) and calculated the kinship coefficient K ij between inbreds i and j on the basis of the SSR marker data as K ij  = 1 + (S ij  − 1)/(1 − T ij ), where S ij is the proportion of marker loci with shared variants between inbreds i and j, and T ij is the average probability that a variant from one parent of inbred i and a variant from one parent of inbred j are alike in state, given that they are not identical by descent. The coefficient T ij was estimated separately for each trait using a REML method setting negative kinship values between inbreds to zero (Fig. S2).

For the detection of main effects of the candidate gene SNPs, these were fitted as fixed effects in the mixed model and their significance was tested by a Wald F test. For the detection of two-way and three-way epistatic QTL, the subordinated main (2-way epistasis scan) and main and two-way epistatic effects (3-way epistasis scan) were included in the model. For the two-way epistasis the model thus extends to y ij  = μ + m u  + m v  + m u :m v  + g i  + l j  + e ij , where m u and m v denote the effects of the uth and vth marker genotypes and m u :m v refers to the interaction effect between the uth and vth marker genotypes. For the three-way epistasis the model is extended to include the wth marker genotype m w and the respective interactions with m u and m v . The Bonferroni–Holm procedure (Holm 1979) was applied to correct for multiple testing with P < 0.01. All mixed-model calculations were performed using the software ASReml 2.0 (Gilmour et al. 2006).

The total proportion of genotypic variance (p G) explained by the detected QTL was calculated by fitting all QTL simultaneously in a linear model to obtain R 2adj . The ratio p G = R 2adj /h 2 yielded the proportion of genotypic variance (Utz et al. 2000). The proportions of genotypic variance explained by the single QTL (Tables S3, S4) were obtained by fitting all significant QTL simultaneously in the order of their P values with the most significant QTL as first in the model. The obtained sums of squares were used to calculate p G by again standardizing with the heritability.

Results

All traits showed significant genotypic variation and, except for yield, very high heritabilities (Table S1). Principal coordinate analysis of the 685 elite rapeseed lines revealed that no major population structure was present in the population (Fig. S2). For association analysis we, therefore, used a mixed-model approach controlling only for familial relatedness (Yu et al. 2006; Reif et al. 2011; Würschum 2012). All lines were genotyped with SNPs derived from nine candidate genes representing selected members of genes from the TAG pathway (Fig. 1a). We detected significant main effects for oil content for nearly all genes (Figs. 1b, S3). Some main effect QTL were also detected for glucosinolate content (GSL) and few for yield and thousand-kernel weight (TKW). For oil content the main effect QTL together explained 14.04 % of the genotypic variance (Table 1). The highest proportion of genotypic variance was found for a SNP within BCCP which explained a considerable 5.48 % (Table S3). Low values for the proportion of genotypic variance as observed for some QTL can also be caused by collinearity (i.e., LD) of markers in the simultaneous fit of all detected QTL. BC for example is in significant LD with eight of the eleven main effect QTL for oil content (Fig. 1). As it showed the least significant association with the trait (Fig. S3) it was modeled last in the final fit to obtain the p G values, possibly contributing to the low proportion of explained genotypic variance for this QTL. The strongest associations were observed for BCCP, BCT, PK, and DGAT (Fig. S3).

Table 1 Detection of main and epistatic QTL. Number of significant main, two-way and three-way epistatic QTL and the proportion of the genotypic variance (p G) explained by these QTL

We performed full two- and three-dimensional epistasis scans for interactions between the candidate genes from the TAG pathway (Figs. 1b, 2a, S4, S5). For oil content, the detected two-way epistatic QTL explained 12.26 % of the genotypic variance, a proportion comparable to that of the main effect QTL (Table 1). In contrast, the combined three-way epistatic QTL only explained a small proportion of the genotypic variance (3.43 %).

Fig. 2
figure 2

Inter- and intragenic epistasis contributing to oil content. a Selected intergenic epistatic interactions, allele frequencies, average allele substitution (α) effect on oil content (green refers to a positive and red to a negative α-effect), relative contribution of the alleles to the best epistatic interaction, and the mode of the interaction. The symbols identify the different alleles from the interacting partner. b Intragenic epistasis. The most favorable interaction is indicated in green (colour figure online)

In addition to intergenic epistasis we observed intragenic epistasis affecting oil content (Figs. 1b, 2b, S6). The change in average oil content values (SD 1.12 %) between the most and the least favorable haplotypes ranged from 0.75 % (WRI1-1:WRI1-2) to 1.67 % (BCCP-2:BCCP-3). The proportion of genotypic variance explained by an intragenic epistatic interaction was highest with 0.54 % (BCCP-2:BCCP-3) which was comparable to intergenic epistasis (average 0.56 %) and still approximately half that of the average of the main effects (1.24 %) (Tables S3, S4).

Discussion

Allelic variation in genes from the TAG pathway

To investigate the extent of epistasis, we aimed to dissect the genetic networks underlying phenotypic diversity of oil content in rapeseed. In addition, glucosinolate content, yield, and thousand-kernel weight were studied. Glucosinolates are secondary metabolites implicated in plant resistance to insects and pathogens (Sonderby et al. 2010). All four traits were significantly correlated (Table S2) suggesting that there could be a direct or indirect interplay between the factors influencing these traits. All correlations were, however, rather small indicating that the traits are mainly affected by independent pathways. Consistent with their presumed role in the TAG pathway we detected significant main effects for most candidate genes. Whereas the few main effect QTL detected for yield and TKW are likely due to indirect effects, the glucosinolate biosynthetic pathway requires Acetyl-CoA and glucose. Thus, the glucosinolate and the TAG pathway rely on common metabolites and our results exemplify that changes within one pathway can affect interconnected biochemical pathways.

Analyses of mutant and transgenic plants have revealed essential roles of the enzymes in the TAG pathway which in our experiment showed the strongest associations with oil content. Downregulation of BCCP, for example, resulted in a decreased fatty acid content in mature seeds (Thelen and Ohlrogge 2002), whereas overexpression of DGAT, the last enzyme in the Kennedy pathway, which has been suggested to catalyze the rate-limiting reaction in TAG biosynthesis (Perry et al. 1999), has been shown to increase oil content (Jako et al. 2001). Our results underline the presence of strong natural variation in the activity of enzyme-coding genes and the importance of the selection of favorable alleles either for a bioengineering of plants by transgenic approaches, or by marker-assisted selection and the exploitation of genetic diversity.

Epistatic interactions in the TAG pathway

Together, the epistatic QTL explained a higher proportion of the genotypic variance than the main effects which highlights the importance of gene networks even for quantitative traits such as oil content. Our results corroborate those from Zuk et al. (2012) who used a theoretical approach to show that the missing heritability in human association studies may be due in significant part to genetic interactions. In addition they show that traits with greater biological complexity may have a higher contribution of epistasis than less complex traits. The observed strong contribution of epistasis to the genotypic variance as compared to other association studies in crops (Buckler et al. 2009; Kump et al. 2011) may be attributed to the different experimental designs, different genetic architectures of the underlying traits, a genome-wide versus a candidate-gene based approach, but also to the fact that our study was based on elite germplasm in which many of the major additive QTL may be fixed due to the selection in breeding programs.

The detected epistatic QTL can mainly be classified as complementary gene interactions (Figs. 2a, S5) which can result if genes code for enzymes in the same pathway such that the final product depends on an optimal functioning of all involved gene products (Holland 2001). Interestingly, we observed that the alleles with the positive effect on oil content and the highest contribution to the most favorable epistatic interaction were often not the most frequent alleles, opening room for a knowledge-based improvement of this trait. Some of the genetic interactions identified here also reflect direct biochemical interactions (e.g., ACC subunits BCCP and BCT) or confirm the interaction of the transcription factor WRI1 with the promoter sequences of PK and BCCP (Baud et al. 2009). Our approach revealed that there is extensive crosstalk between the genes involved in TAG biosynthesis (Fig. 3). This is not limited to single cellular compartments or modules of genes with similar function but rather involves interactions between the regulatory machinery, the enzymes catalyzing steps in the cytosol/plastid, and those in the ER.

Fig. 3
figure 3

Visualization of the interaction networks. Intergenic epistatic interactions detected for oil content between the candidate genes from the different modules

Effect of intragenic epistasis on complex traits

Our epistasis scan also revealed the presence of intragenic epistasis affecting oil content. A recent theory raised by Haig (2011) suggests that the hidden heritability of complex traits may be hiding in epistasis between linked SNPs. If such intragenic epistasis, i.e., favorable haplotypes within a gene, affects the trait, then this will go undetected in GWAS as each locus is assessed separately. In contrast, in pedigree studies with their much higher extent of LD, the haplotypes can not be resolved and their effect will add to the additive genetic variance. Our results substantiate this theory and show that intragenic epistasis is present even for complex traits and, more importantly, contributes to a considerable amount to the genotypic variance.

Conclusions

In a wider context, our discoveries may be of broad relevance for association approaches of complex traits. Our results exemplify the impact of both, intergenic and intragenic epistasis that explained a considerable proportion of the genotypic variance. Future association studies should, therefore, have appropriate designs and include scans for epistasis as one of the potential sources for the phenomenon of hidden heritability.