18.1 Introduction

Biology occupies a unique niche among the domains where higher-order interactions exist and are the object of study. By some standards, the biological realm is home to many of the oldest examples of complex systems where higher-order interactions have been observed and cataloged. On the other, it is also among the arenas where higher-order actions remain the most intractable and elusive. But, as we argue in the coming sections, the biological realm is poised to make rapid progress in our scientific understanding of higher-order interactions and our ability to engineer desired outcomes in complex adaptive systems.

While dozens of studies have affirmed that biological systems have all the associated jazz that we associate with complex systems (e.g., emergence), there are fewer areas of biology where a consideration of higher-order interactions has improved our understanding enough to make accurate predictions. That is, while the biological realm recognizes the importance of higher-order interactions in its constitution—across scales of biological organization, from the molecular to the cellular to the ecosystem—the study of higher-order interaction has yet to fully reconcile nor solve any longstanding challenges in biology.

In this chapter, we discuss higher-order interactions within biology, using genetics—and specifically a concept called “epistasis”—as a model problem to highlight how higher-order interactions can be identified, measured, and interpreted. We will focus on higher-order epistasis, a phenomenon that captures both the caprice and the ubiquity of higher-order interactions in biological systems. We examine its multiple definitions within the fields of population, evolutionary, and quantitative genetics, and discuss some of its implications for fundamental theoretical questions and conflicts in the many subfields of genetics. We offer an analysis of a small dataset to demonstrate how higher-order interactions manifest within genes but focus our attention on large questions and conflicts that have undermined a more rigorous understanding of cutting-edge problems in modern genetics.

18.2 Higher-Order Interactions Across Biological Domains

As in all the foundational fields where higher-order interactions exist, higher-order epistasis is defined by nonlinear interactions between actors, entities, replicators, or parcels of information. Such interactions exist in the many subfields of biology, including ecology, where the study of high-dimensional species interactions has long been an important area of research. Indeed, in one of the foundational papers of both mathematical ecology and complex systems, May 1972 asked (and tried to answer) the question, “Will a large complex system be stable?” [1]. Not surprisingly, the answer was that it depends. Whether a large complex system—think about an ecosystem or even a genome—will be stable depends on the number of interacting entities, how they interact, the density of their connections, and—crucially—how we quantify stability. Nevertheless, examples of large, complex biological systems abound [2]. How then should we quantify the nature of the interactions and the degree of stability in these systems? Or said differently, how do we study higher-order interactions in biological systems?

Higher-order interactions have been examined in ecological systems of various kinds, ranging from statistical tests of interactions in ecology [3] to community ecology [4,5,6], predator–prey interactions [7], food webs [8], and microbial systems [9, 10]. The importance of higher-order interactions between taxa of microbes has become the focus of study in the microbiota. The microbiota includes the complex community of microbes—trillions in number, across thousands of taxa—that live within other, often multicellular organisms. The composition of the microbiota is now understood to play a large role in many organismal-level phenotypes—disease states, behavior, and many other phenotypes. And higher-order interactions have been measured and are known to occur between the taxa of the microbiota, which may have consequences in the construction of microbial consortia [11,12,13].

Biomedical systems have highlighted the presence of higher-order interactions between drugs used at the clinical bedside, and especially to treat microbes [14,15,16]. These findings have reframed our understanding of the challenges of therapy, as drug-drug interactions can foster treatment environments that are challenging to predict from the effects of individual treatments. The future of biomedicine must, then, properly incorporate details of how drugs interact in a higher-order fashion to responsibly predict the effects of multiple drug environments. And, perhaps surprisingly, biomechanical and physiological systems also embody higher-order interactions. For example, recent studies have identified how interactions between anatomical traits may have influenced their evolution across taxa of flatfishes [17]. And, in the model organism Arabidopsis thaliana, physiological stress response to cold vs. drought treatments result in different patterns of higher-order interaction in gene expression [18].

In sum, the biological world is full of examples that fortify the importance of higher-order interactions between actors, entities, and parcels of biological information. And biology’s many subfields utilize a wide breadth of methods to detect and quantify higher-order interactions.

However, across all the subfields of biology, genetics is an area where higher-order interactions remain most present in both theoretical and empirical research. The notion that genetic information interacts with other parcels of genetic information in crafting phenotypes is a defining feature of genotype–phenotype mapping. And it manifests in many of the most sophisticated aspects of modern genetics, including the search for the genetic underpinnings of complex traits (e.g., disease, behavior) and genetic-modification technology. But how did we end up here? Where did some of these concepts related to higher-order epistatic effects originate? While a full examination of the history of these ideas is beyond the scope of this chapter, we will offer a take, which we hope can provide some context for how higher-order effects ended up on the menu of central ideas in evolutionary and population genetics.

18.3 On Epistasis

18.3.1 History

Shortly after the rediscovery of Gregor Mendel's work on mechanisms of inheritance, biologists demonstrated that certain phenotypes appeared to violate the independence assumptions of gene interactions. Work by Bateson et al. 1910 and Weinberg 1910 [19, 20] both found that offspring phenotypes significantly deviated from expectation, which could not be accounted for by dominance effects nor differences in environment. In 1909, Bateson published an extensive description of what he referred to as epistasis [21], which is now the most commonly used term for gene interactions leading to deviation from independence. However, he first used the term in a 1907 correspondence to Muriel Wheldale who was conducting breeding experiments on snapdragons in Bateson’s laboratory [22]. In Wheldale’s paper [23], she ended up using a different term to describe the effect of the expected phenotype being repressed by another site in the genome. Because Bateson 1907 was trying to describe this repression effect, he settled on the term epistasis, which comes from Greek and most closely means “standing upon” [24]. Today, epistasis is used quite broadly to mean any deviation from non-additive (or in some cases even multiplicative) interactions between regions in the genome.

Deviations from the expected phenotype due to genetic interaction were first rigorously demonstrated in plant and animal breeding experiments in the early 1900s. Concurrently, there was a debate raging between geneticists broadly referred to as “Mendelians” and those referred to as “Biometricians” [25]. The debate was between a continuous view of variation between individuals for the same phenotype (Biometricians) and the discrete view that was empirically demonstrable for many phenotypes in breeding experiments (Mendelians) [24]. Reminiscent of the emergence of quantum mechanics, the seeming incongruence of the continuous and discrete views of variation was resolved by R.A. Fisher in 1918, who showed analytically that the two views were entirely compatible [26]. Although, when discussing statistical deviations from non-independence, Fisher used the term epistacy, which has fallen out of favor and is no longer used [24]. Instead, geneticists now use the term epistasis to refer to any genetic interaction that leads to non-independence (regardless of whether its suppression—as the term was originally defined—or enhancement and whether the effect is demonstrated using continuous or discrete models [24]).

We now know that the myriad factors interact to determine an individual's trait value for any specific phenotype. These factors include genetic effects, environmental effects, and all possible interactions. Broadly, we use the term genetic architecture to represent the joint effects of all determinants of a phenotype. For some phenotypes, e.g., the smooth vs. wrinkly peas from Mendel’s experiments, there may be exclusively independent genetic effects. For others, such as schizophrenia, there will be higher-order interactions between genes, environments, and other modifiers. Nearly 100 years of genetic studies have concluded that the vast majority of genetic architectures are complex [27, 28]. Despite this complexity, animal and plant breeders are able to direct the evolutionary trajectory of numerous traits, from milk production to cold tolerance, using equations and frameworks from quantitative genetics [29].

18.3.2 A Contemporary Example: Biological Networks

Today we know that the structure of molecular interactions clearly affects evolutionary processes across biological levels of organization: from cells, to species, to populations and ecosystems. However, the precise role of molecular network evolution in the process of species formation and adaptation to novel environments, i.e. innovation, remains contentious, with prominent researchers claiming regulatory network changes are critical for macroevolutionary processes [30] and others concluding that they rarely matter [31]. These differences in opinion manifest in various settings.

For example, a recent comparative study of biological regulatory networks found that such networks exist at the edge of criticality, straddling the border of chaotic and ordered states [32]. That biological regulatory networks should exhibit the kind of dynamic stability associated with near-critical networks has been theorized as adaptive, both from the perspective of functional robustness [33] and their ability to effectively process information [34]. However, there is also empirical and theoretical evidence for the importance of change in these networks, e.g., if species must evolve to meet shifting environmental or ecological selection pressures [35]. This tradeoff between robustness and evolvability is hypothesized as an explanation for the commonality of “small-world” networks in biology [36]. Nevertheless, foundational work on self-organized criticality and 1/f noise demonstrated that dynamical systems embedded in a spatial dimension, e.g., biological regulatory networks, might naturally evolve to near-critical states [37, 38].

What then is the role of networks in evolution? For adaptation occurring along a single phenotypic axis, e.g., temperature, RA Fisher’s geometric model (1930) of adaptive substitutions provides a great deal of insight [39]. Simply put, the farther away a population is from its  fitness optimum, the larger the expected effect-size of a mutational substitution. Recently, a number of experimental evolution studies have provided empirical support for this model (e.g., Hietpas et al. 2013) [40]. However, it seems unlikely that most historical episodes of adaptation occurred along single phenotypic or genotypic axes. And, as Haldane (1957) concluded [41], for many species experiencing selection along multiple axes of variation (e.g., temperature and salinity), that importantly are under independent genetic control, natural selection is expected to drive them rapidly to extinction. As we now know, the genetic architectures underlying even diverse phenotypes are rarely independent: genes underlying all traits exist in complex webs termed molecular networks [42]. Therefore, the role of individual genes in the process of adaptation is affected by their position in these networks and their very interconnectedness may explain why species are able to simultaneously adapt to multiple axes of selection.

18.4 Epistasis: Definitions and a Brief Survey of Methods

One simple dichotomy that captures many of the different uses of the term “epistasis” is the difference between statistical and physiological epistasis. This difference was summarized by Sackton and Hartl (2016) as: “any situation in which the genotype at one locus modifies the phenotypic expression of the genotype at another [43].” This is compatible with another very useful definition used by Weinreich et al. (2013), who offered that epistasis is the “the surprise at the phenotype when mutations are combined, given the constituent mutations’ individual effects [44]”. Both are highly mechanistic definitions of epistasis and differ from the more statistical pictures offered that focus on the role of epistasis on genetic variance in populations. Though each form of epistasis—physiological and statistical—are referring to the same sort of phenomena, the means through which epistasis is measured differs greatly depending on which form we are discussing. In general, questions surrounding which methods one should use to study epistasis are analogous to statistical debates regarding the most defensible ways to measure nonlinear effects in complex systems. Quite often, the specific question and context dictate which methods that one should use, and some methods specialize in capturing a particular feature of epistasis. For example, some methods address how noise can affect our understanding of epistasis [45, 46]. Others consider the limits of regression in detecting epistatic effects [47, 48], or propose ways to measure epistasis in incomplete data sets [49].

One method for quantifying epistasis considers the marginal effect of non-independence across large sets of mutations in genomes [50]. This test, called “MArginal ePIstasis Test” or MAPIT, which is a linear mixed modeling strategy for detecting genetic variants (e.g., single nucleotide polymorphisms, SNPs) that are involved in the study of epistasis in genomic mapping studies. With respect to Genome-Wide Association Studies (GWAS), MAPIT estimates and tests the marginal epistatic effect or the combined epistatic effect between SNPs of interest and all other SNPs in the data. By inferring the marginal epistatic effects of SNPS, MAPIT can identify variants that exhibit epistatic interactions with any other variant without the need to identify the specific combinations that drive the epistatic association. Therefore, MAPIT represents an important alternative to standard methods for measuring epistasis [50]. Although somewhat intriguingly, this class of models may perform differentially well at reconstructing epistatic interactions depending on whether the species is outcrossing or self-fertilizing [51]. Despite the documented effectiveness of methods like MAPIT, such approaches are unable to detect epistatic interactions that are greater than second order (pairwise). However, recent applications of approximate Bayesian inference and neural networks, e.g., Biologically Annotated Neural Networks (BANNs), show promise for reconstructing higher-order interactions leading to epistasis [52].

In sum, the methods to diagnose and measure epistasis are as diverse as the settings and varied definitions of epistasis, which provides a challenge to whomever wants to tell a singular narrative about higher-order interactions in genetics. But there are smaller problem cases in genetics where the conceptualization and study of higher-order epistasis is more tractable. These can serve as useful model problems to discuss how interactions between mutations in a single genetic locus may manifest. And for these purposes, often the physiological definition of epistasis, one marked by clear ways of measuring the “surprise” at the phenotypic effect of combinations of mutations, is the simpler way of demonstrating larger implications of the phenomenon.

18.5 A Demonstration of Epistasis Operating in a Gene Encoding an Enzyme

To demonstrate how epistatic interactions for even a relatively “simple” genetic system can lead to complexity, we examine a data set corresponding to a protein that carries multiple mutations that are associated with resistance to an antibiotic (trimethoprim). The example that follows is based on real-world data, but is simplified as a model to illustrate how epistasis manifests in a biological system. We will measure how interactions between mutations are computed from values for individual variants of a protein containing a different suite of mutations. The dataset itself might be called a “fitness landscape,” whereby scientists ask questions about how evolution might be expected to occur across a small discrete portion of sequence space [53, 54].

Specifically, we utilized a suite of three mutations (P21L, A26T, and L28R; single amino acid abbreviations) in bacterial dihydrofolate reductase (DHFR, an essential enzyme target of many antimicrobial drugs) constructed in combination (23 = 8 alleles) across 3 different genetic backgrounds [55, 56]. The three different genomic backgrounds are as follows: (i) wildtype, (ii) a GroEL+ strain, where a protein chaperone is overexpressed, (ii) ΔLon protease, where an important protease (Lon) has been deleted. Both GroEL and Lon have been demonstrated to regulate DHFR activity in bacterial cells [57].

18.5.1 Data Structure and Methods to Detect Epistasis

The data used here are a subset of those taken from a 2019 study on higher-order epistasis [56]. Importantly, the growth rate and IC50 are different, but related traits, part of a canonical tradeoff that has often been observed between growth and resistance in microbial systems. In this examination, we measure epistasis across both traits.

Because this analysis is invoked to be a general examination of higher-order effects in biological systems, rather than a focused study of proteins (or a protein of a certain kind), we will rename the elements in our discussion:

  • Dihydrofolate reductase will be referred to as “an enzyme.”

  • The mutations corresponding to P21L, A26T, and L28R will be referred to with regards to their combinatorial arrangement. For example, “PAL” corresponds to the enzyme variant with amino acids Proline (P), Alanine (A), and Leucine (L) at the three loci of interest.

  • The genomic backgrounds corresponding to the wildtype GroEL+ and ΔLon protease will be referred to as “environment A” (wildtype) “environment B” (GroEL+) and “environment C” (ΔLon).

  • Growth rate will be referred to as “trait 1,” and IC50 (resistance) will be referred to as “trait 2”

We will make use of a method pioneered in theoretical computer science called the Walsh-Hadamard transform, which computes a coefficient corresponding to the magnitude and sign of an interaction between mutations, akin to an epistatic coefficient. It was pioneered for use in the study of epistasis in a 2013 study that both provided a primer for the calculation and analyzed several combinatorially complete data sets [44]. It has since been further elaborated on applied to study of higher-order epistasis across a larger sampling of empirical data sets [58, 59].

The Walsh-Hadamard transform implements phenotypic measurements into a vector, then a Hadamard matrix, subsequently scaled by a diagonal matrix. The calculation yields a set of coefficients which measure the degree to which the relationship between genetic information and phenotypes are linear, or second order, third, and so forth. One limitation of the Walsh-Hadamard transform is that its data must be combinatorially complete with no more than two variants at a given locus of information. In this one scenario, the mutations at each of three sites (e.g. the three mutations corresponding to Trimethoprim resistance in E.coli dihydrofolate reductase), P21L, A26T, and L28R.

The full data set for the alleles consists of a vector of phenotypic values (resistance to trimethoprim in the case of the DHFR mutants) for all possible combinations of mutations (8 in total), represented by their single amino acid substitutions:

PAL, LAL, PAR, PTL, PTR, LAR, LTL, LTR.

These can be represented in binary notation:

000, 100, 001, 010, 011, 101, 110, 111.

This vector of phenotypes can be multiplied by a (8 × 8) square matrix, which is the product of a diagonal matrix V and a Hadamard matrix H. These are defined recursively by:

$$V_{n + 1} = \left( {\begin{array}{*{20}c} {\frac{1}{2}Vn} & 0 \\ 0 & { - V_{n} } \\ \end{array} } \right), V_{0} = 1$$
(18.1)
$$H_{n + 1} = \left( {\begin{array}{*{20}c} {H_{n} } & {H_{n} } \\ {H_{n} } & { - H_{n} } \\ \end{array} } \right), H_{0} = 1$$
(18.2)

n is the number of sites that differ in this enzyme (n = 3 in this setting).

The multiplication gives the following expression:

(18.3)

In this scenario, H and V are the matrices described in Eqs. 18.1 and 18.2 and \(\gamma\) is the Walsh coefficient, the measure of the average interaction between parcels of information (mutations in this setting), here measured across environments.

Negative values for an effect suggest that that average effect is negative, positive if it has a beneficial effect on a phenotype (e.g., antibiotic resistance).

Note: while we have provided some details of the calculation above, we encourage those interested in the subtleties of the calculation to refer to several manuscripts, especially Weinreich et al. 2013 [44] and Poelwijk 2016 [58], for a more focused treatment of these methods.

18.5.2 Calculating Higher-Order Interactions

The above formula can be used to calculate the strength of interactions between parcels of information, the mutations corresponding to different amino acid substitions in an enzyme in our example. But what about higher-order interactions (epistasis beyond pairwise in this case)?

Previous studies have examined how higher-order epistasis manifests in adaptive landscapes that include analogously structured data sets, including other enzymes [44, 55, 59, 60]. Because our focus is on how higher-order epistasis manifests in biological systems, we will offer a means through which one can make these measurements.

For example, in a complete data set comprising eight variants, we can describe the interactions between individual loci and genetic background in binary terms. If we are talking about a combinatorial set of variants with three loci, we can describe the interactions using binary representation.

γ000:

interaction between the mutations in the wild-type background.

γ001:

interaction between the “third site” mutation and all other genetic backgrounds.

γ010:

interaction between the “second site” mutation and all other genetic backgrounds.

γ100:

interaction between the “first site” mutation and all other genetic backgrounds.

γ011:

second-order (pairwise) interaction between mutations at the second and third loci.

γ101:

second-order (pairwise) between mutations at the second and third loci.

γ110:

second-order (pairwise) between mutations at the first and second loci.

γ111:

A third-order interaction between mutations at all three loci.

In this set, there is one zeroth order interaction, three first-order interactions, three second-order interactions, and one third-order interaction. The third-order interaction would formally qualify as “higher-order.”

In addition, one can take the mean of these epistatic coefficients within order, which can facilitate comparisons between orders. For a given epistatic coefficient we compute an absolute mean epistatic coefficient, E, as in prior studies that have examined higher-order interactions on in silico fitness landscapes [61]:

$$E_{i} = \frac{{\left| {\gamma_{i} } \right|}}{{\mathop \sum \nolimits_{j} \left| {\gamma_{j} } \right|}}$$
(18.4)

The absolute value allows us to focus on the magnitude of higher-order interactions. We label them with the term “absolute mean” since we incorporated absolute values and averages in the calculation. This provides mean values for each order, which translates to the overall contribution of, for example, 1st order effects. And we can calculate the higher-order interaction across environments, creating an abstraction called the “mutation effect reaction norm” that highlights how environments influence the effect of mutation interactions [62].

Figure 18.1 is a hypercube representation of the eight mutants, arising from a combination of three different mutations (P21L, A26T, and L28R) in a model enzyme. All eight enzymes were engineered using transgenic methods and their growth and resistance phenotypes measured using experimental methods [55]. Figure 18.2 represents the phenotype data for these eight mutants across the three environmental contexts (A, B, C). From this, we can observe the existence of gene by environment (G x E) interactions, indicated by the fact that different mutants have differing slopes for performance for their growth (trait 1) and resistance (trait 2).

Fig. 18.1
figure 1

A hypercube representation of the combinatorial set of mutations in the enzyme target of study. Letters correspond to amino acids. In this instance, a bacterial enzyme has two amino variants at each of three sites. Different combinations of these mutations are associated with different values for traits associated with groth rate (trait 1) and antibiotic resistance (trait 2)

Fig. 18.2
figure 2

Values for a trait 1 (growth) and b 2 (resistance) across environments A, B & C

Data for trait 1 and 2 can be measured for all the alleles in this hypergraph, and graphed with respect to the trait values across environment A, B, and C. This is depicted in Fig. 18.2, in an abstraction called a “reaction norm,” which is often used to detect gene by environment interactions.

From the data in Fig. 18.2, the epistatic coefficients can be computed as outlined in Eqs. 18.118.3. These yield calculations for the average effect of individual interactions between mutations across environments, which can also be depicted in terms of their absolute mean as outlined in Eq. 18.4.

Figure 18.3 depicts the coefficients for different interactions between mutants (Fig. 18.3a and c). For example, the mutation effect corresponding to [*1*] translates to the average effect of adding the second-site mutation (A26T) across available genetic backgrounds. Alternatively, [*11] corresponds to the average phenotypic effect of adding A26T and L28R in combination. Also in Fig. 18.3 is the absolute mean data, where we can observe the overall presence of epistatic effects across environments, organized by order.

Fig. 18.3
figure 3

Epistasis across different environments. a mutation effect reaction norm for individual mutation interactions, for trait 1 (growth). b Absolute mean mutation effects for trait 1, showing effects organized by order. c mutation effect reaction norm for individual mutation interactions, for trait 2 (resistance). b Absolute mean mutation effects for trait 2, showing effects by order. The notation in the figure only use the subscripts of the effects described in the subscripts of γ described in 19.5.2

What do we learn about how epistasis manifests in an enzyme with a suite of mutations corresponding to different levels of growth and antibiotic resistance? Calculations of epistasis as depicted in Fig. 18.3 communicate the variability of epistatic effects as a function of environment. When there are gene by environment interactions present (as in Fig. 18.2), there are likely epistasis by environment interactions (as in Fig. 18.3).

Even for traits where we expect a clear tradeoff, e.g., growth and antibiotic resistance, the pattern of epistasis differ considerably. As a result, there exists no singular, tractable pattern for how mutations will interact. More specifically, the eminence of higher-order interactions changes across environmental contexts, something that has been observed in other contexts [56, 63, 64]. Given that no locus operates in a genome alone, we might expect the consequences of higher-order interactions to be far greater than the effects measured by single mutations. Not only do mutations interact in surprising ways within loci (as in the results of this data set), but we might also expect genes to interact with other genes in surprising ways (as in gene networks).

The ability of quantitative genetics to predict the trajectory of mean trait values for populations, coupled with our inability to predict the effect of most mutations, poses both challenges and opportunities for the future of genetics. Can we develop theories capable of making predictions relevant for engineering traits via genetic modification, e.g., CRISPR? What additional ethical considerations arise from continuing investment in genetic engineering without a general theory capable of predicting the individual effect of mutations? In the following paragraphs, we advocate for a complex network approach to genetic engineering. One where—far from trying to reduce the effects of mutations—we instead embrace the gestalt when trying to direct the phenotype of an individual through mutational engineering.

18.6 On Higher-Order Interactions and Genetic Modification

Engineering human traits using CRISPR relies on the assumption that scientists can accurately predict phenotype from genotype. Even for diseases caused by mutations in single genes, such as cystic fibrosis [65] or muscular dystrophy [66], CRISPR induced mutations failed to completely restore the “healthy” phenotypes in mouse models or tissue culture.

Many other phenotypes in humans display a property that geneticists term “missing heritability” [67], which means that researchers cannot identify genetic markers that account for the expected phenotypic similarity between parents and offspring. Despite sequencing thousands of human genomes, we can only predict 6% of the heritability of Type 2 diabetes, 5% for HDL cholesterol, 5% for height, and <3% for early onset myocardial infarction [68]. This “missing heritability” implies that the causal role of genetic interactions, environment, epigenetics, etc. is often as strong as simple changes in one’s DNA.

So how can we reconcile the incredible predictive ability of quantitative genetics when applied to animal/plant breeding with our inability to engineer traits using tools like CRISPR? This is where we argue that the modern study of higher-order interactions and genetics currently reside. However, the debate goes back to the early 1900s and the foundations of the field. Specifically, quantitative genetics abstracts away the individual effects of genes and environments (along with their interactions). Instead, it models them as expected effects on the variability of a trait. For example, Huang and Macaky 2016 showed that genetic architecture could not be determined for a principle component analysis of genetic variation (this is despite one’s ability to influence trait evolution by selecting on the principal components) [69]. Our modern understanding of the relationship between the individual determinants of traits and their quantitative genetics representation was reviewed by Stinchombe and Hoekstra 2007 [70].

The continued effectiveness of quantitative genetics has led some mathematical geneticists to question the importance of epistasis. Specifically, can we understand and predict evolution without needing to know the underlying causes? Mäki-Tanila and Hill 2014 showed that non-independent interactions between genes increases the additive genetic variation at far higher rates and that it contributes to deviations from additivity [71]. Indeed, they conclude that “Epistasis may be important in understanding the genetic architecture, for example, of function or human disease, but that does not imply that loci exhibiting it will contribute much genetic variance. Overall we conclude that theoretical predictions and experimental observations of low amounts of epistatic variance in outbred populations are concordant. It is not a likely source of missing heritability, for example, or major influence on predictions of rates of evolution.”

This breakdown in the predictability of how an engineered mutation will affect a phenotype was demonstrated by Guerrero et al. 2019 [56]. This study shows that a mutation's effect depends strongly on genome background, environment, and the interactions between them. As a result, even the sign (positive or negative) of the effect that a mutation will have on a phenotype may be unpredictable. The ability of models that we know are wrong to make accurate predictions is well known outside of mathematical genetics [72] and is indicative of the deep connection between all fields related to complex systems.

18.7 Closing

What are the most important questions, then, about the implications of higher-order interactions for biology? For convenience, we’ve identified three that capture a broader set of curiosities.

  1. (1)

    Despite most traits being high-order, many established population genetic models assume pairwise interactions and that high-order interactions can be modeled as the summation of pairwise interactions. How can we accommodate higher-order interactions into this theory?

  2. (2)

    Fundamentally, we can re-ask where (or what) is the source of missing heritability? Considering the plausibility of higher-order interactions between genetic parcels, how can we “find” this heritability in a manner that doesn’t simply relegate the problem to being unsolvable in light of combinatorial explosion?

  3. (3)

    At a more biophysical and physiological scale, can we use theory from higher-order interactions in other fields to make engineering-level predictions? In the data-driven example in this chapter—a single gene encoding an enzyme, and a small set of mutations—we reveal how higher-order interactions are present and context-dependent. But the methods do allow a mechanistic take on how they manifest and influence a trait of interest. Can we apply such methods to other problems?

The future of higher-order interactions in genetics encompasses these questions and many more. And more broadly, as our understanding of the biological world continue to grow in scope, we can expect the eminence of higher-order interactions to also grow in relevance. While biological information might be highly specialized, with drift and selection responsible for its arrival and dispersal, it is now surrounded by other parcels of information, that all interact in surprising ways, creating a biosphere that is both more corporeal and capricious than scientists and naturalists have appreciated.