Introduction: the genotype–phenotype map

How does genetic variation map onto phenotypic variation? This question is crucial for understanding the linkage between the two processes at the core of biological evolution: the production of heritable phenotypic variation, and its sorting by natural selection. Remarkably, in spite of the major advances in biology since the time of Darwin, we still lack a systematic causal understanding of the genotypephenotype map (Burns 1970; Alberch 1991; Pigliucci 2010).

The “genotype–phenotype map” refers to the process of biological development, in which genes causally contribute to the formation of complex and differentiated phenotypic end-states. Since evolutionary processes such as selection operate on phenotypes, causal explanations of evolutionary change require causal explanation of how genotypes get mapped onto phenotypes through development. As Horder (1989, 340) states, “[i]n order to achieve a modification of adult form, evolution must modify the embryological processes responsible for that form.” Amundson (2005) calls this idea the “causal completeness principle,” and localizes its origin within the tradition of classical embryology.

Despite the straightforward reasoning behind the causal completeness principle, many evolutionary biologists of the twentieth century thought that the lack of a causal understanding of the genotype–phenotype map did not create any significant problems for evolutionary theory (e.g., Mayr 1961; Wallace 1986). It is rather like a lack of extraneous causal detail that, while forming a necessary part of a complete causal story, can be safely abstracted from for purposes of evolutionary (population-level) explanations (see Scholl and Pigliucci 2014). The idea that development can be safely ignored is based on the assumption that the genotype–phenotype map is linear and deterministic, or that this is at least a valid approximation. It allows one to effectively collapse variation at the phenotypic level onto the genetic level, in order to describe and explain evolutionary dynamics purely in terms of genetic variation. This is the central idealization strategy of population genetics.

However, we have known for a long time that the assumption of linearity and determinism is false. The genotype–phenotype map is causally complex and non-linear. It is also degenerate in the sense that the same genotype can generate different phenotypes due to developmental plasticity (West-Eberhard 2003; Gilbert and Epel 2009; Moczek et al. 2011), while different genotypes often produce the same phenotype due to the robustness of development (Waddington 1942, 1957; Schmalhausen 1949; Kitano 2004; Wagner 2005, 2011; Masel and Siegal 2009). Taken together, these properties of the genotype–phenotype map prevent many of the explanatory goals of evolutionary biology from being realized using traditional statistical-correlational methods.

Many biologists look to evolutionary developmental biology (evo-devo) to provide the missing causal understanding of the genotype–phenotype map. This is because evo-devo is supposed to go beyond the statistical association of genes with phenotypes by investigating the developmental causes connecting genotype and phenotype. Since evo-devo explains evolutionary change as change in developmental mechanisms, it can be considered a “mechanistic” approach to evolutionary biology (Wagner et al. 2000). As Brigandt (2015, 136) writes: evo-devo “does not just lay out phylogenetic transformation sequences of morphological characters, but offers a causal explanation of how those character transformations occurred by means of changes in developmental mechanisms.” This sort of explanation is what Calcott (2009) has called a “lineage explanation” in evolutionary biology, which is complementary to traditional population-level explanations. Change in developmental mechanisms is expected not only to explain actual evolutionary sequences, but also to generate a “map of the possible” for evolution (Alberch 1991), which determines the evolutionary potential, or evolvability, of characters (Wagner and Altenberg 1996).

Over the past few decades, evo-devo has made significant progress in probing and disentangling genotype–phenotype maps in different evolutionary lineages. The major conceptual innovation behind its progress has been a shift away from the study of individual gene-trait co-variations and towards network thinking, an approach in which interactions between genes are seen as forming functional modules of gene regulatory networks (GRNs) whose structure explains how development occurs. This has led to better models of the complex regulatory architecture underlying development and developmental evolution. In addition, it has enabled novel questions and approaches in comparative evo-devo. However, even though network thinking makes valuable contributions to our understanding of the structure of the genotype, its capacity to provide mechanistic understanding of the genotype–phenotype map is suspect.

In this paper, we argue that the robust model of the genotype–phenotype map demanded by contemporary evo-devo requires us to go “beyond networks” in the direction of dynamic mechanistic explanations of development. We will show how GRN-centered explanations of development are limited by the complex degeneracy of the genotype–phenotype map. Progress on this problem requires taking into account not only network structure but also dynamics. The sections “From gene-trait atomism to gene regulatory networks” and “GRNs and ChINs” introduce the main ideas of network thinking and GRN-centered theories of development and evolution. In “GRNs do not adequately explain development”, we examine how static networks fail to provide mechanistic explanations of development. In “Dynamical models and diachronic mechanism in evo-devo”, we show how processual models based on dynamical systems theory can overcome this deficiency, and close by considering the relations between mechanism and process in the context of evo-devo theory.

From gene-trait atomism to gene regulatory networks

In order to understand what gene regulatory networks are, and how they have come to occupy the role of explaining development and its evolution, we have to take a brief look the history leading up to the concept.

Among early evolutionary geneticists, it was common to view the association of genes and traits as an aggregate of linear covariations. Let us call this view gene-trait atomism. This view is taken to the extreme in Ronald Fisher’s (1930) infinitesimal model, which postulates that traits depend on an infinite number of genetic loci, each having an infinitesimally small effect on the trait (Orr 2000). In the limit of infinite loci with infinitesimal effects, the influence of non-linear interactions between genes vanishes completely.

Gene-trait atomism has always been in tension with phenomena that were known early on in genetics, in particular epistasis (Bateson 1909), which refers to the dependence of mutational effects on genetic background (Hansen 2013). However, prominent work by T. H. Morgan and his colleagues attenuated this tension, in part by promoting a difference-making view of genetic causation in contrast to the productivist view common among embryologists. For Morgan, to say that a Mendelian factor (gene) causes a character “does not assume that any one factor produces a particular character directly and by itself, but only that a character in one organism may differ from a character in another because the sets of factors in the two organisms have one difference” (Morgan et al. 1915, 212, emphasis added). In other words, if changing one gene correlates with a change in eye color, then we can justifiably call that gene a cause of eye color, even though “the character is the product of a number of genetic factors and of environmental conditions” (Morgan et al. 1915, 210; see Waters 2007). Morgan’s distinction between genetic and embryological causes of characters was enormously influential for separating the problem of heredity from the complexities of development, and establishing the modern concepts of genotype and phenotype (see Amundson 2005, 148 ff.).

Morgan’s distinction enabled a productive research program in genetics, but it had a number of important limitations. One major problem is that Morgan’s methods could only detect the genes underlying traits for which there is non-lethal variation in the population (e.g., eye color and bristle number). Another major limitation was the lack of an explanation for spatial differentiation in development. This issue was only resolved much later with the operon model of gene regulation (Jacob and Monod 1959), which showed how structural (effector) genes could be turned on and off by proteins encoded by regulatory genes in response to signals from the environment.

Although the operon model was designed for a physiological response network in a bacterium, which is characterized by fast and reversible processes, a central and general explanatory role for gene regulation was immediately postulated (Monod and Jacob 1961), and soon explicitly applied to the slower, irreversible processes of eukaryotic multicellular development and differentiation (Morange 2014). This forms the basis for the concept of gene regulatory networks (GRNs) that govern the dynamics of embryogenesis (e.g., Britten and Davidson 1969). In parallel, researchers formulated the hypothesis that regulatory differences in gene–gene interactions could account for most of the phenotypic differences between evolutionary lineages (e.g., King and Wilson 1975; Davidson and Erwin 2006).

GRNs are composed of regulatory genes, which encode transcription factors and signaling proteins, and the target genes they regulate through cis-regulatory elements in the genome (Fig. 1) (e.g., Davidson et al. 2002). Signaling proteins form cascades that transduce signals from the cellular environment to affect gene expression in the nuclei of receiving cells. This is achieved through transcription factors that bind to DNA and typically either activate or repress the expression of their target genes. Cis-regulatory elements are the non-coding regions of DNA that transcription factors bind to, lying upstream, downstream, or sometimes even within the RNA-producing sequence of the regulated genes. These components can be visually depicted in a network diagram, where nodes represent genes (and their protein products) with associated cis-regulatory elements, and edges represent regulatory interactions such as activation or repression (see Fig. 1).

Fig. 1
figure 1

Gene regulatory networks consist of genes that encode transcription factors (circles) and their mutual interactions, which can be activating (arrows) or repressive (T-bars, in the abstract depiction on the left). Their activity is modulated by external inputs from the intra- or extracellular environment. Each interaction between a regulator and its target is mediated by transcription factors binding to cis-regulatory sequences (shown on the right) thereby inducing or inhibiting expression of the target

Despite the advent of GRNs in the 1960s, the implicit assumption of gene-trait atomism survived for decades in the fields of developmental genetics and evo-devo. It was at the heart of much of the work on homeotic genes that boosted the rise of both of those fields in the 1980s and 90 s (e.g., Carroll 1995, Holland 1999). In the early 2000s, a renewed focus on GRNs led to a decisive shift from this gene-trait atomism toward the more holistic “extended genotype” of gene–gene interactions (compare, for example, Carroll 1995 and Carroll 2008). As a consequence, some proponents of evo-devo began to claim that changes in cis-regulatory elements are more important for phenotypic evolution than changes in the protein-coding sequences of the genes themselves (e.g., Wray 2007; Stern and Orgogozo 2008; Carroll 2008).

As the explanatory role of GRNs expanded over the past decades, Morgan’s organizing distinction between evolutionary and developmental genetic causes came to be reconfigured. Morgan’s distinction was made necessary by the fact that the mechanistic causal linkage between genes and characters remained obscure. Once the focus shifted towards the role of gene regulatory activity in development, genes could be re-conceptualized not only as causes of heredity but also, through their contextualization in GRNs, as causes of development. Organized into regulatory networks, they produce phenotypic characters by means of molecular mechanisms. The new theory of developmental gene regulation has thus been seen as offering an integral theory of development and its evolution (Wagner et al. 2000). As Davidson (2010, 918) puts it: “Evolution and development emerge as twin outputs of the same mechanistic domain of regulatory system genomics.”

In the next section, we examine the main features of two exemplary bodies of work, to understand the explanatory role of GRNs in current evo-devo theory: the research program of Eric Davidson and colleagues, as well as Günter Wagner’s genetic model of homology.

GRNs and ChINs

Over the last 50 years, Eric Davidson and his research group at Caltech have carried out one of the most determined and influential efforts to characterize a developmental GRN. Davidson and colleagues rely on engineering metaphors in which a network is pictured as a “wiring diagram” composed of “circuits” and “sub-circuits,” with certain connections “hard-wired” into the genome. In their most well-known classification scheme (Davidson and Erwin 2006), GRN components are arranged into the following categories: conserved kernels and more taxon- and tissue-specific plug-ins, with input/output devices (or switches) connecting them, which ultimately activate different sets of downstream differentiation gene batteries. Network components that are downstream in the regulatory hierarchy are expected to be more evolutionarily labile.

The most important aspect of Davidson’s theory in the present context is its claim that GRNs suffice to explain all of development. This idea can be found in various forms throughout Davidson’s recent body of work.Footnote 1 For example: “Once it includes all or almost all specifically expressed regulatory genes, a GRN constitutes an explanation of why the events of development occur” (Oliveri et al. 2008, 5961); “The spatial causes of developmental events after the earliest stages of dependence on egg cytoarchitecture are essentially all programmed in the genomic control system” (Oliveri et al. 2008, 5961); “Development and evolution of the body plan, and execution of physiological responses, devolve causally from the regulatory genome” (Davidson 2010, 918). In essence, GRNs implement the genetic program of development.

Such claims would seem to be prima facie overly ambitious in light of the known degeneracy of the genotype–phenotype map. What is new about this form of genetic determinism, however, is that we no longer treat genotypes as sets of individual genes with dissociable effects on phenotypes, but as genetic networks that are organized in such a way as to control specific, modular developmental processes. The explanatory role of the genotype is shifted to GRNs at a higher level of organization than individual genes. In Davidson’s theory, the nature of a particular gene is less important than its position and connections to other genes in the network. Thus, the claim of sufficiency for genetic explanations of development is not exactly a claim about the genotype–phenotype map per se, but about a “GRN-phenotype” map.

For this sufficiency claim to be tenable, it is essential that the behavior or functioning of a network can be inferred from its regulatory structure. As Davidson states explicitly: “[a] given sub-circuit structure implies a given function […] what the circuit can do depends directly on its structure” (Davidson 2010, 911); or “[s]everal types of network subcircuits have been identified so far, each associated with specific regulatory functions” (Peter and Davidson 2017, 5862). This is what we contest, and what makes it necessary to supplement research on the structure of GRNs with dynamical modeling of network behavior.

Before proceeding to our argument, we introduce another theory of developmental evolution that heavily relies on GRNs: Günter Wagner’s genetic theory of homology (2007; 2014). Wagner’s framework differs from Davidson’s in that, rather than attempting to construct a genetic theory of metazoan development and evolution, the goal is to develop an account of homology for morphological characters. Homology has many crucial roles in evolutionary biology and evo-devo in particular (see DiFrisco 2019). While there is widespread agreement that homology is something like sameness or similarity due to common descent (DiFrisco 2019; Hall 1994), a more sophisticated approach is needed for determining which morphological characters are homologous with which. Wagner (2014) argues that we cannot successfully individuate character homologies simply in terms of morphological resemblance, physical distinctness, or embryological origin. Characters must have evolutionary “individuality”—they must behave as variational modules in evolutionary processes. In order for a character that we demarcate to possess this sort of individuality, that character must be underwritten by an appropriately modular genotype–phenotype map (Wagner and Altenberg 1996; Jaeger and Monk 2019). Invoking Riedl (1978), Wagner (2014, 44) formulates this requirement as follows:

The individuality of body parts, required for homology to make biological sense, requires specific genetic and developmental mechanisms to cause the distinctness of the body part during the life of an individual and continuity of distinctness in the course of evolution.

As Wagner (2014, 90) points out, the trouble is that the required sort of underlying genetic and developmental mechanisms might not exist: “some level of variation in the developmental mechanisms of homologous characters is the rule rather than the exception.”

The solution Wagner proposes for this problem involves distinguishing between character identity and character state. Characters are modular body parts such as forelimbs or hearts. Their identity is defined by their position and other structural factors (Wagner 2014, 54). Character states are the more determinate properties that differ across different instances of a character, such as its size, shape, and color (Wagner 2014). Wagner maintains that it is character identity that provides the right sort of individuality and evolutionary stability to ground morphological homology (Wagner 2014, 80).

With this qualification in place, Wagner introduces the central postulate of his genetic theory of homology: “The distinction between character identity and character states […] is reflected in the genetic architecture of development in which character identity has a different genetic substrate than character states” (Wagner 2014, 94). The genetic substrate that uniquely attaches to a character’s identity is a subset of the GRN that generates it, which he calls Character Identity Network (ChIN). ChINs are the phylogenetically conservative parts of regulatory networks that give characters their modularity and evolutionary stability, and that must be modified in order for evolutionary novelties to appear.

The basic mode of character specification in Wagner’s model is depicted schematically in Fig. 2. During embryogenesis, inductive signals or initial patterning cascades provide positional information to a specific region of the embryo. ChINs then “interpret the positional information signals and activate position-specific developmental programs” (Wagner 2014, 97), thus translating continuous positional information into discontinuous, individualized characters. Finally, “realizer genes” controlled by the ChIN produce specific character states, which vary widely across species sharing the same characters.

Fig. 2
figure 2

Character Identity Networks (ChINs) determine the identity, but not the state, of a character. They receive inputs from general positional information signals, and induce the expression of specific sets of realizer genes. In the example depicted here, the forewing and hindwing are character identities common to all flying insects, but they exist in different states in different sub-groups. In Drosophila the forewing is a flying wing and the hindwing a haltere, which is used for balance during flight. By contrast, in Tribolium the forewing is a hardened cuticular structure (the elytron) that covers the hindwing, which is used for flying

Wagner is careful not to elevate ChINs to the status of a strict definition that would be conceptually connected to character homology. Instead, it is meant to function as an “idealized image,” or model, which may admit of exceptions but which is nonetheless intended to be useful for orienting research in evo-devo (Wagner 2014, 118). His characterization of ChINs includes the following main features. ChINs are historically coextensive with the characters they specify, thus grounding relations of homology (Wagner 2014, 118). Members of ChINs sustain each other’s expression and jointly repress the development of alternative character identities (Wagner 2014, 117). Finally, genes in a ChIN are jointly necessary and often also individually sufficient to trigger the differentiation of a character (Wagner 2014, 118). These features highlight that ChIN components are not simply arrayed in a linear chain of causation (recall Morgan’s genes), but “form a functional unit in which developmental causality is realized at the level of the network rather than at the level of the single gene” (Wagner 2014, 117).

In ascribing to GRNs the primary causal responsibility in the development of phenotypic characters, Wagner’s genetic theory of homology is quite close to Davidson’s framework. There are, of course, important differences between the two besides their different theoretical aims. For example, Wagner’s framework is not attached to genetic determinism as a general thesis about development. But both frameworks are similar enough in ascribing a privileged causal role to GRNs that they face some of the same difficulties and limitations, which we explore in the following section.

GRNs do not adequately explain development

There are three main problems with GRNs as explanations of developmental processes and morphological characters. The first is the problem of genetic determinism, the second the problem of correspondence, and the third the problem of diachronicity. The latter two problems are closely related to each other through the degeneracy of the genotype–phenotype map (see Introduction), and the fact that network structure only loosely correlates with network dynamics and function.

The problem of genetic determinism

In Davidson’s work, the claim that GRNs provide a sufficient explanation for development gets supported by the following type of argument. There is a special resemblance between parents and offspring: “frogs beget frogs and dogs beget dogs, and never does one sort of animal produce an embryo that develops into another” (Peter and Davidson 2015, 2). This resemblance cannot be explained by the environment or by “magic hormones” (Peter and Davidson 2015, 2), but requires a heritable genomic program containing the instructions to build organisms of a certain type. “Such programs must exist; they must be identically replicated, hence genomic; and they must suffice to control the nature of developmental events independently and similarly in each organism” (Peter and Davidson 2015, 2).

The causal sufficiency of the genetic program hinges on two premises, the first being that the genome is the sole source of heritability in organisms, the second that the genomic program contains and processes all the instructions required to construct a phenotype. Both premises are highly problematic.

First, different forms of non-genetic inheritance are well-established phenomena by now (for recent reviews, see Danchin et al. 2011; Kronholm 2017; Bonduriansky and Day 2018). Moreover, the genome does not simply copy itself from generation to generation. Genetic transmission requires a continuity of cell state as well as organismic integrity and activity (Griesemer 2000, 2006; Jaeger et al. 2012; Walsh 2015). This continuity is essential for the maintenance, replication, and ordered segregation of the genome among offspring. The genome depends on processes that are, in turn, dependent on the genome. For this reason, it is more accurate to say that the genome, the organization of the cell, and concurrent regulatory dynamics are all propagated across generations (Jaeger et al. 2012).

Second, regulatory processes occur at all levels of organization, not only at the level of GRNs (cf. Britten and Davidson 1969, 349), and so we should not think that the complete “instructions” for developmental construction lie in a genetic program. The idea that the genome is a program is a metaphor, but its metaphorical status is rarely acknowledged. The program metaphor has become reified, its existence inferred from the robust reproducibility of development (Nijhout 1990). This sort of inference—a type of inference to the best explanation—is only warranted if there are no alternative explanations, or if the alternative explanations are evidently inferior. That is not the case here: there are many different ways to generate reproducible behavior. For example, attractors of dynamical systems provide a powerful alternative explanation that is just as consistent with the reproducibility of developmental outcomes as a genetic program (Thom 1976; Goodwin 1982; Oster and Alberch 1982; Webster and Goodwin 1996; Jaeger et al. 2012; Jaeger and Monk 2014; Green et al. 2015). Moreover, it does not require questionable assumptions native to the program metaphor that are difficult to map onto biological reality, such as algorithmic sets of instructions, or a hardware-software distinction.

The program metaphor quickly shows its limits in the context of biological systems because the instructions of the program and the substrate it is running on are one and the same thing. The components of a GRN (transcription factor proteins and cis-regulatory sequences) are produced by the organism, which is in turn generated by GRNs. A self-referential dynamic system of this kind is very different from what is normally understood as a program, which consists of pre-coded and pre-scheduled algorithmic sequences of instructions. In contrast, the structure of developmental regulatory systems is constantly modified during development through inductive signaling events and environmental cues (Jaeger 2019). If there were instructions to be discovered, they would be continually rewriting themselves.

In summary, the genome is not the only source of organismic heritability, genes and their interactions are not the only major causes of development, and the notion of a genetic program today has limited metaphorical use at best, and is potentially very misleading. Davidson’s view, despite being focused on network connections at the systems level, amounts to a strong form of reductionist preformationism, and in this respect is no different from the classical genetic determinism that preceded it. Because cellular and environmental context is crucial for both genetic inheritance and gene expression, genetic determinism is untenable. GRNs may be necessary, but they are not sufficient to explain organismic development.

Here one could object that, even if the GRN is not causally sufficient for explaining development, it contains all the most important difference-making causes of development. So, even if factors like cell state, cell environment, and dynamics would need to be part of an ideally complete causal story of how development occurs, explanations of development can safely abstract from these factors without much loss of explanatory power or specificity (cf. Waters 2007). This could be a legitimate objection if there were one-to-one correspondences between network structure, cellular dynamics, and phenotypic outcomes. However, these correspondences frequently do not obtain in real biological systems, as we now show.

The problem of correspondence

The problem of correspondence affects the ChIN model of homology most directly, but also undermines genetic determinism interpreted as an abstraction strategy. The problem is succinctly stated by von Dassow and Munro (1999, 315): “there is no a priori reason to believe that the same instantiation of a developmental mechanism underlies a conserved developmental process in even closely related organisms.” Phenotypic evolution and evolution of GRNs is to a large extent dissociable: evolutionary changes at both levels show a marked degree of independence from each other. This is because developmental processes and phenotypic outcomes are underdetermined by the composition of their trait-generating mechanism. Equivalent dynamics and homologous morphological traits can be generated by a wide variety of regulatory mechanisms. Thus, there is no guarantee that regulatory mechanisms in different individuals or lineages resemble each other even if the resulting character is strongly conserved (von Dassow and Munro 1999). This diversity of mechanisms is due to “network drift” or “developmental system drift,” caused by mutations and polymorphisms in regulatory network interactions that do not affect the robust dynamics or outcome of a developmental process (True and Haag 2001; Haag and True 2018).Footnote 2

The converse of the correspondence assumption is the expectation that different (non-homologous) traits are generated by distinct networks. This expectation does not hold up either: the genotype–phenotype (or GRN-phenotype) map is degenerate in both directions. Not only can many networks generate the same (homologous) phenotype, but the same GRN can produce different phenotypes depending on environmental and organismic context. More specifically, almost any given network is able to generate some range of distinct dynamic behaviors (its dynamic repertoire), depending on the precise parameter values of the system (such as regulatory interaction strengths, production rates or the stability of network components, for example), and its regulatory and environmental context (given by its initial and boundary conditions) (Fig. 3; Jaeger et al. 2012; Jaeger and Monk 2019, and references therein). Hence, one and the same GRN can produce different phenotypic outcomes in different situations. The subsystems driving these distinct behaviours usually overlap in the sense that they are not cleanly separable in terms of modular network structure (Jiménez et al. 2017; Jaeger 2018, Verd et al. 2019, Jaeger and Monk 2019). Typically, there is no unique and exclusive set of genes and interactions that defines a specific behaviour. Instead, network components tend to contribute (in different ways) to different behaviours under different circumstances. As a consequence, we will frequently fail to find specific structural differences between networks underlying distinct developmental and morphological traits.

Fig. 3
figure 3

Dynamical repertoires of simple gene regulatory (sub)circuits. The structure of two closely related circuits is shown to the left; their dynamical repertoire (consisting of the indicated dynamical regimes) on the right. Top: the repressilator (Elowitz and Leibler 2000), bottom: the AC/DC circuit (Panovska-Griffiths et al. 2013). The specific regime a circuit implements depends on parameter values (e.g., strengths of regulatory interactions) and boundary conditions

Another aspect of the correspondence problem concerns which genes and interactions to include in a mechanism and which ones to leave out. Many GRNs exhibit robust dynamics, either due to redundancy in subsystems, or compensatory regulatory capacities of structurally unrelated subsystems (functional multiplexing, Wimsatt 2007; or distributed robustness, Wagner 2008). It is not evident how to assign functions to components of such networks, since perturbation often has no consequences. Other perturbation effects depend heavily on the intra-organismic, genetic, and environmental context of the network. There are many ambiguities and context-dependencies in delimiting the boundaries of a GRN, and so it is often unclear how to establish “the” correspondence base of a specific developmental phenomenon.

The evolutionary dissociability of GRNs and homologous phenotypes blocks the construction of lineage explanations that generalize across species. One way of addressing the problem would be to abandon homology as the criterion of phenotypic sameness, and instead to simply individuate traits in terms of their underlying GRNs. In this approach, traits that have undergone developmental system drift would no longer count as the same trait. However, even assuming that this strategy could be successfully implemented, abandoning homology in order to re-establish correspondence across levels would create other problems for a mechanistic theory of developmental evolution. The homology constraint is necessary to preserve the connection between developmental mechanisms and actual trait phylogenies, which is what is targeted by lineage explanations in evo-devo in the first place. A similar concern pertains to the way in which mechanisms are classified as the same or different in assessing correspondence. Later, in “Dynamical models and diachronic mechanism in evo-devo” section, we explore how dynamic approaches to developmental mechanism attenuate but do not completely resolve the problem. Non-correspondence between developmental mechanisms and phenotypic traits remains as an ultimate obstacle to the mechanistic research agenda of evo-devo.

The problem of diachronicity

The third issue with GRN-based explanations is conceptual: the problem of diachronicity. GRNs defined by genetic and molecular experimental approaches consist of sets of genes and qualitative interactions represented by network graphs.Footnote 3 A graph is a static depiction (a “snapshot”) of the regulatory mechanism implemented by the GRN. To explain a time-extended chain of events, however, a causal mechanism cannot be defined solely by structural properties at a time, but must also include time-extended elements. It must account for the way in which the system proceeds from its initial to its final state. Static graph representations do not do this. The regulatory structure of a network constrains, but does not specify, what the network does. In fact, the connection between structure and dynamics is often rather loose. As mentioned earlier, even the simplest sub-circuits of a complex network can exhibit a more or less extended repertoire of different possible dynamical behaviors (see Fig. 3). It is therefore incorrect to claim that network sub-circuits “imply” specific behaviors or regulatory functions (Peter and Davidson 2017), no matter how simple those sub-circuits are. Network dynamics crucially depend on the strength of regulatory interactions, and the initial and boundary conditions of the system, i.e. the context a sub-circuit is embedded in. Adding or removing a single regulatory interaction can often radically change the dynamical repertoire of a sub-circuit.

Even if parameters and boundary conditions can be determined accurately by experimental means, mental simulation of the behavior of a GRN is rarely possible. As soon as more than two or three simultaneous non-linear interactions are involved in a regulatory process, it quickly becomes impossible to infer system dynamics from the graph of a network alone. A mechanistic understanding of more complex systems must therefore rely on computational models of network dynamics to get any traction at all on the system’s behavior.

Dynamical models and diachronic mechanism in evo-devo

It is a central concern of evo-devo to produce a causal account of the developmental processes that constitute the genotype–phenotype map (see Introduction). We have shown in the previous sections that static depictions of GRNs frequently fail to provide the right kind of causal-mechanistic explanation. What is missing is a way to understand how a developmental process, in its particular intra- and extra-cellular context, unfolds through time from its initial to its final state: mechanisms sufficient to explain what the system does.

Philosophical analysis of mechanistic explanation in biology has tended to focus on decomposition, the identification of system parts, and functional localization, the attribution of specific operations or activities to individual components (Bechtel and Abrahamsen 2005). These activities lie at the heart of what counts as an explanation of development, as their orchestrated operation generates the phenotypic outcome of a regulatory process (Bechtel and Abrahamsen 2010; Bechtel 2011, 2012; Brigandt 2015). In practice, mechanistic localization of activities largely relies on methods from genetics and molecular biology, which involve perturbing specific components of a system and then inferring their activity by interpreting the effects caused by the perturbation. These kind of methods have an important limitation: they can only identify components and activities that are necessary for a given developmental process to occur, but they cannot show that the postulated mechanism is also sufficient to produce the observed phenomenon.

The reason for this shortcoming is that the structure of developmental systems is complex, typically consisting of more than two or three regulatory factors and their non-linear interactions. Non-linear systems above a very basic level of complexity cannot be decomposed into parts that can be studied separately and recomposed additively to yield a faithful representation of system behavior. It is because of this insufficiency of experimental decomposition and localization that the category of “dynamic system” or, more generally, “process” acquires a special status, as it cannot simply be collapsed down to activities of mechanistic components considered separately. Currently, the best way we have of capturing the dynamics of complex systems requires the use of computational and mathematical models (Bechtel and Abrahamsen 2005, 2010; Bechtel 2011, 2012; Brigandt 2013, 2015).

Although they are an essential aspect of dynamic mechanistic explanation, these models do not themselves have to be mechanistic in the stereotypical sense of being constructed bottom-up from basic biomolecular components and measured biophysical parameters representing their activities. A model of a mechanism is deemed explanatory as long as it accurately captures the relevant aspects of its operation, that is, the causal relations between components at an appropriate level of description (Bechtel and Abrahamsen 2005; Brigandt 2013, 2015). This implies that coarse-grained phenomenological models, or models fitted to data, can support mechanistic explanation if they help interpret the systems-level behavior of the mechanism under study (Jaeger and Crombach 2012; Jaeger et al. 2012; Jaeger and Monk 2014; Jaeger and Sharpe 2014; Green et al. 2015). Although such models may not provide mechanistic explanations by themselves, they become an integral part of mechanistic explanation for complex non-linear networks by enabling the recomposition or reconstitution of the overall operation of the system from its decomposed components and localized functions (Bechtel and Abrahamsen 2010; Bechtel 2011, 2012; Brigandt 2013, 2015).

As an example of dynamic mechanistic explanation, let us consider the developmental and evolutionary dynamics of the gap gene network in dipteran insects (flies and midges) (Fig. 4) (see Jaeger 2018, for a recent review). This GRN is involved in pattern formation and determination of body segments during early embryogenesis (Jaeger 2011). It was functionally decomposed in the vinegar fly Drosophila melanogaster through genetic and molecular perturbation assays yielding a complete set of necessary components (transcription-factor encoding genes) and their individual regulatory connections (Nüsslein-Volhard and Wieschaus 1980; Nüsslein-Volhard et al. 1987; Akam 1987; Ingham 1988). This corresponds to a static GRN-type explanation as discussed in the previous sections.

Fig. 4
figure 4

The gap gene network of dipteran insects (flies). a During early development, the blastoderm-stage embryo (shown as oval shapes, with the anterior to the left) gets subdivided into distinct territories of differential gene expression by the segmentation gene network. This network has a hierarchical structure, with protein gradients encoded by maternal co-ordinate genes at the top (see graph). They provide the regulatory input to the zygotic gap gene system, which forms the top-most hierarchical layer of the network. Gap and maternal co-ordinate genes then activate the pair-rule genes in a periodic pattern of seven stripes. Finally, a frequency-doubling event occurs and segment-polarity genes become expressed in 14 stripes, which form a molecular pre-pattern for the body segments that form later in development. Arrows indicate interactions between hierarchical layers and cross-repression within each layer. b Regulatory structure of the gap gene system. The position of gap gene expression domains is indicated as boxes along the antero-posterior axis. Cross-repressive interactions among gap genes are shown as T-bars. The dashed line indicates a bifurcation boundary between stationary and shifting domains. c Subcircuits of the gap gene system active in the anterior (AC/DC1), center (AC/DC2), and the posterior (AC/DC3) of the embryo. Flashes indicate which subcircuit is in a critical state in different dipteran species (Drosophila melanogaster and Megaselia abdita). Gap genes: hunchback (hb), Krüppel (Kr), knirps (kni), and giant (gt). See text and Jaeger (2018) for further details

However, the sufficiency of components and activities to account for the overall dynamics of the system was only established much later, through a modeling effort that reverse-engineered the gap gene network by fitting dynamical computational models to quantitative gene expression data (Jaeger et al. 2004a, b; see also Jaeger and Crombach 2012; Green et al. 2015; Jaeger 2018). The resulting models showed that although the core structure of interactions among gap genes remains the same throughout the relevant developmental stage, inputs to the system from maternal morphogen gradients change over time, rendering the system time-variable (Verd et al. 2017). Analysis of these models revealed switch-like and oscillatory regulatory dynamics in the anterior versus the posterior region of the embryo (Manu et al. 2009; Verd et al. 2018) and mapped those different behaviors back onto specific subsystems (dynamical modules) of the network (Jaeger 2018; Verd et al. 2019; Jaeger and Monk 2019). Surprisingly, it turns out that all of these subsystems share a common regulatory structure despite being composed of different (yet overlapping) sets of components (gap genes), and despite producing qualitatively different behavior depending on their spatial and network context (Fig. 4).

One of the idealization strategies used here is to formulate the model at the level of dynamical behaviors, without direct reference to (molecular) model components and their interactions. Instead, model dynamics are characterized by the geometry of configuration space or, more precisely, the type and arrangement of the system’s trajectories and attractors (Jaeger and Crombach 2012; Jaeger and Monk 2014; Jaeger 2018). The connection between this abstract level of analysis and specific components and activities within the network is far from trivial, and needs to be carefully established (Verd et al. 2019). In this case, the mechanistic nature of the explanation arises from a tight integration of experimental and modeling approaches—not from analysis of the model alone, nor from experimental perturbations aimed at identifying the relevant components.

This type of approach not only sheds light on the developmental dynamics of the gap gene network, but also on its evolution. A comparative analysis between D. melanogaster and another dipteran species, the scuttle fly Megaselia abdita, reveals a plausible (‘how-possibly’) scenario for the evolutionary trajectory of the system (Fig. 4) (Wotton et al. 2015; Crombach et al. 2016; Verd et al. 2019). Comparison reveals that remarkably small and localized changes in the strength of regulatory interactions (the activities of specific transcriptional regulators) can account for the observed qualitative differences in gene expression dynamics between the two species. These changes are required to compensate for differences in the input into the gap gene system from upstream maternal gradients, such that the resulting output of the system is equivalent in both flies (Fig. 4). This type of compensatory evolution affecting the strength of regulatory interactions in a network is called quantitative developmental system drift (Wotton et al. 2015; Crombach et al. 2016), and is probably a very widespread mode of network evolution.

Finally, differences in gene expression dynamics between Drosophila and Megaselia can be explained in terms of the behavior of the subsystems (dynamical modules) of the network (Verd et al. 2019). In each species there is a subsystem that is highly sensitive to alterations in the strength of regulatory interactions. Such subsystems are in a critical state, poised around a bifurcation boundary where the dynamics of the system become altered in abrupt and qualitative ways (Jaeger and Monk 2014, 2019; Verd et al. 2019). Differences in expression dynamics are caused by different subsystems being critical in Drosophila compared to Megaselia (Fig. 4). This provides a mechanistic explanation for the evolvability of the gap gene network (Verd et al. 2019; Jaeger and Monk 2019): it reveals that some features of gene expression are much more sensitive to evolutionary change than others.

It is important to note that the kind of mechanistic explanation we are endorsing here does not necessarily include any molecular details of gene regulation. It is not necessarily a molecular mechanism, and does not have to “bottom-out” in maximal detail. The higher- or multi-level aspect of this type of mechanistic explanation is even more strongly highlighted by another example: the process of vertebrate segmentation or somitogenesis (reviewed in Oates et al. 2012; Hubaud and Pourquié 2014).

In contrast to segment determination in flies, vertebrate embryos add their body segments (called somites) one by one during the posterior extension and growth of the paraxial mesoderm (Fig. 5). Confirming an earlier theoretical prediction (Cooke and Zeeman 1976), experimental studies showed that this process involves repeating kinematic waves of gene expression traveling anteriorly through the tissue (a “clock”), in combination with a mechanism to slow down and stop these periodic waves (a “wavefront”) (Palmeirim et al. 1997; Cooke 1998; Dale and Pourquié 2000; Masamizu et al. 2006).

Fig. 5
figure 5

Vertebrate somitogenesis: conserved dynamics and characters despite divergent clock mechanisms. a The vertebral column as well as the segmentation process that produces it is conserved in vertebrates from fish (left), to birds (center), to mammals (right). b During this process, the U-shaped paraxial mesoderm extends in posterior (P) direction, while a segmentation clock drives waves of gene expression towards the anterior (A). Simultaneously, a wavefront of cell specification advances posteriorly. Wherever a wave of gene expression meets the wavefront, a new symmetric pair of somites (segments) is formed. c Clock mechanisms differ between species, as indicated by different network structures. Note that the real clock mechanisms are far more complicated that the simplified ones shown here

The general principles of this patterning process are conserved among vertebrate species, and they are reasonably well understood via dynamical models together with diverse sources of experimental evidence. In contrast to the conserved high-level generative principles, interestingly, the molecular details of the clock mechanism differ markedly between species (Fig. 5) (Dequéant et al. 2006; Krol et al. 2011). Although most cyclic genes belong to three conserved signaling pathways known to be involved in somitogenesis, very different sets of individual genes exhibit oscillatory gene expression in different vertebrate groups. In other words, dynamic behavior at the process level is conserved, while the underlying molecular details have radically diverged in different lineages. Wotton et al. (2015) call the exchange of components and interactions in a network during evolution, despite conservation of the dynamics and patterning output, qualitative developmental system drift.

Given that both the molecular details and the dynamical model are fairly well-established, the somitogenesis example provides an occasion to revisit the problem of correspondence from the perspective of dynamic mechanistic explanation. In somitogenesis, different molecular mechanisms underlie a homologous character. At the same time, the different molecular mechanisms produce equivalent dynamics, or invariant sets with respect to their pattern-forming potential (Goodwin 1982; Webster and Goodwin 1996). Thus, while there is no one-to-one correspondence between the character and the specific molecular mechanisms, there is a one-to-one correspondence between the character and the developmental process as described in the clock and wavefront model. Molecular differences that have accumulated due to developmental system drift are screened off by phenotypic robustness at the process level. This raises the intriguing possibility that correspondence might be re-gained in other cases of developmental system drift by shifting the correspondence base to a higher level than the molecular-genetic components.

Two factors caution against interpreting this as a complete solution to the correspondence problem, however. First, there is no guarantee that equivalent dynamics can be found in all cases of non-correspondence. Vertebrate digit identity may present a more difficult case than somitogenesis, for example. Second, the clock and wavefront model is only able to identify equivalent pattern-forming mechanisms by abstracting from specific components. One can meaningfully raise the question of whether the model by itself is genuinely mechanistic if it does not identify specific components, molecular or otherwise. The relation between correspondence and abstraction in dynamical models is an unexplored conceptual issue that we leave for future investigation.

Although modeling may turn out to have important limitations when it comes to correspondence, dynamical modeling of developmental processes like somitogenesis nonetheless contributes significantly to our understanding of the genotype–phenotype map. The dissociability of molecular mechanisms and characters implies that we cannot understand the plasticity and robustness of development, or the probability of phenotypic transitions, using evidence at the molecular level alone. To be a truly mechanistic science, evo-devo will need to embrace the dynamics of development as orchestrated patterning activity across levels, from molecules and genes to whole networks, tissues, and organisms.

Conclusion

Many of the central aims of evo-devo are premised on its being a mechanistic science (Wagner et al. 2000). A causal-mechanistic understanding of the genotype- phenotype map is necessary for understanding how the production of heritable variation at multiple organismic levels is causally connected to the sorting of traits by population-level processes like selection. It is necessary for explaining, rather than just describing, developmental phenomena such as phenotypic plasticity and robustness. It is also necessary for understanding the variational properties and evolvability of biological characters, and for illuminating the possibilities and probabilities of evolutionary change (Alberch 1991; Wagner and Altenberg 1996).

The explanatory mode that predominates in contemporary research in evo-devo is based on gene regulatory networks, as exemplified by Davidson’s hierarchical GRN model of development and Wagner’s genetic model of homology. We have argued that, although “network thinking” of this sort represents a major improvement over classical gene-trait atomism, it still falls short of fulfilling the mechanistic research agenda of evo-devo. This is due to problems with genetic determinism, correspondence across levels, and diachronicity. Fundamentally, these problems arise from the fact that GRNs are static structures, whereas much of the difference-making action in development lies in the complex activities and non-linear interactions of system components (Bechtel and Abrahamsen 2005, 2010; Bechtel 2011, 2012; Jaeger and Sharpe 2014; Brigandt 2015; Green et al. 2015).

The proposed alternative is to integrate dynamical modeling of developmental processes into empirical practice alongside the identification of system components and their structural relations. This is currently the only realistic way to go beyond mechanistic decomposition and functional localization—operations that identify causally necessary components and their interactions—and towards the reconstitution of system-level behaviors that are causally sufficient to produce phenotypic outcomes. Dynamic mechanistic explanation resolves the problem of diachronicity by introducing dynamics, it attenuates (but does not eliminate) the problem of correspondence by causally connecting networks with phenotypes while also describing conserved dynamics of divergent molecular mechanisms, and it enables us to avoid the problematic assumptions of gene determinism by including non-genetic regulatory factors and environmental influences in our models. Without the extra step of modeling network dynamics, researchers will frequently and systematically miss out on key aspects of the causal structure of the genotype–phenotype map. In this sense, the way for evo-devo to become adequately “mechanistic” is for it to become “processual.”

In making the above argument, we have picked up some general insights about mechanistic explanation along the way, which we will briefly summarize here. It is common for biologists to conflate mechanisms with molecular mechanisms, and to discount explanations that are not entirely based on molecular components as not being mechanistic and/or genuinely explanatory. This conflation is problematic, and not only for the reason that mechanistic explanations can be based on non-molecular components (e.g., cells, tissues, organs, or environmental factors). The correspondence problem, and developmental system drift in particular, provides empirical reasons why explanations of development should not always bottom-out in molecular details. The fine structure of cyclical gene networks in zebrafish will not explain why birds or mammals have somites, and why they develop the way they do. By contrast, higher-level dynamics, as described by mechanisms based on phase shifts between oscillators, have more explanatory power. There is a degree of generality in the these higher-level models that permits addressing mechanistic questions about genotype–phenotype mapping across wider taxonomic ranges than would be possible with molecular mechanistic explanations. The challenge is, of course, to determine which level and amount of detail is best for a given investigation. The response to this challenge is likely to be heavily question- and context-dependent.

A further implication of the somitogenesis case relates to the construction of phylogenetic transformation series, and specifically to what Calcott (2009) calls “lineage explanations.” To explain why the underlying molecular mechanisms diverged in spite of conservation at the level of the oscillatory dynamics and the resulting trait of the morphological somites, we need a working understanding of robustness at the process level. Dynamical models can provide such an understanding, whereas it is difficult to imagine how this could be achieved with a purely bottom-up inventory of molecular components and interactions. This is even more evident in the case of insect segmentation and the gap gene network, where one and the same critical subsystem produces different dynamics depending on spatio-temporal and network context (Verd et al. 2019; Jaeger and Monk 2019). In both cases, a descriptive phylogenetic series of static network representations correlated to phenotypic traits would miss out on essential causal information. It would be limited to recording the change without explaining it, while also omitting key variational properties that arise from the dynamics of the system. Without the requisite dynamical mechanisms, lineage explanations based on gene networks remain “just-so” stories rather than “how-possibly” explanations.

Finally, we have assumed that categories of mechanism and process are complementary, despite their being sometimes pitted against one another (e.g., Austin 2016). In the context of causal explanation for genotype–phenotype mappings, there is no tension between these two categories as long as mechanistic explanation is understood in the suitably broad sense of “dynamic mechanistic explanation” (Bechtel and Abrahamsen 2005, 2010; Bechtel 2011, 2012). Claims to the effect that one category is ontologically more fundamental than the other may be philosophically interesting, but they are underdetermined by the forms of explanation considered in this paper. The ontology of organisms cannot be simply read off from an examination of existing practices of scientific explanation.