Keywords

1 Introduction

From Darwin’s time to the present day, biologists have debated the question of whether or not the evolutionary theory of the time suffices to explain “macroevolution.” Before the Evolutionary Synthesis (ES), extending from about 1930 to 1950, few biologists, and almost no paleontologists, thought it did. According to the architects of the Synthetic Theory (ST) that emerged during the ES from the union of ideas and evidence from genetics, systematics, natural history, and paleontology, the processes of evolution within species, accumulated over time, explain the origin and diversification of higher taxa (Futuyma 1979, Mayr and Provine 1980). For about 20 years, rather little disagreement was audible, but challenges arose in the 1970s and 1980s that lay bare deep differences within evolutionary biology. For example, disagreements among some parties at a major conference on macroevolution in 1980 were so intense that proceedings of the conference were never published (Levinton and Futuyma 1982). Discontent at that time arose chiefly within paleontology and systematics, disciplines that have since achieved greater rapprochement with the Synthetic Theory, although the arguments raised by paleontologists have not been entirely settled. In the last decade or so, new calls for extension, reconsideration, or even repudiation of the ST have been issued, this time largely from developmental biology. In this essay, I will address several of the major challenges to the Synthetic Theory, ranging from the 1970s to the present. I will conclude that many of these challenges have had a positive impact on evolutionary biology, but that the fundamental principles of the ST remain valid, and can explain known evolutionary phenomena with only modest extension.

It would be useful to define “macroevolution,” but definitions vary. Simpson (1944, p. 97) wrote that “Micro-evolution involves mainly changes within potentially continuous populations…[whereas] Macro-evolution involves the rise and divergence of discontinuous groups.” In Evolution Above the Species Level, Rensch (1959, p. 1) objected to the lack of a clear borderline between “larger” and “smaller” events (and to the hybridization of Greek and Latin roots) and referred instead to “infraspecific” (referring to processes that occur within a species or lead to a new species) and “transspecific” evolution (referring to processes that “lead to new genera, families, and lesser divisions, and thus to new constructional types”). Rensch thus focuses on the evolution of characters of individual organisms that distinguish taxa above the species level. (Levinton (2001) is among modern authors who adhere to this usage.) For many authors, however, “evolution above the species level” also includes patterns and causes of diversification of higher taxa, such as variation in diversity, speciation rates, and extinction among clades or geographic regions or geological periods.

I must at this point emphasize that I am neither a historian nor a philosopher and cannot address many questions that arise in those contexts. For example, I am hesitant to say whether or not the ST explains macroevolution, because I do not know what “explain” means. By “explanation,” I usually mean consistency of explananda with a set of postulated, sufficient causal processes. Others may require that an explanation enables prediction of the explananda, such as prediction of macroevolutionary diversification from a theory of mutation and natural selection. Current evolutionary theory cannot provide so grand a prediction, but it often can predict patterns (e.g., that mitochondrial mutations are more harmful to males than females, on average Innocenti et al. 2011) or very short-term responses to selection. By way of analogy, all meteorological phenomena are manifestations of physical principles, but you will be disappointed if you expect physics to predict the weather in your location a month from now.

1.1 Background: The Evolutionary Synthesis and Its Aftermath

In order to appreciate discussion of the sufficiency of today’s evolutionary theory, we must be familiar with the Evolutionary Synthesis, which familiarity itself requires a glance further back. Many historians recognize three major stages in the development of evolutionary theory: Darwinism (from 1859 until about 1898) in which natural selection among “random” variations (meaning undirected with respect to need) was urged as the most important but not sole cause of evolution (for some, inheritance of acquired characters was allowed); neo-Darwinism (from about 1898), referring to August Weismann’s and Alfred Russel Wallace’s complete rejection of Lamarckian inheritance in favor of selection as the sole cause of evolution; and the Synthetic Theory, which in my view extends from about 1930 (with the publication of Fisher’s The Genetical Theory of Natural Selection) to about 1950 (with Stebbins’s Variation and Evolution in Plants). (The definition, temporal extent, major players, and content of the Evolutionary Synthesis, or Modern Synthesis are all debated by historians.)

The major elements of the ST, which remain major elements of evolutionary theory today, include (1) the units of evolution are populations of organisms, not types or single organisms (“population thinking”); (2) evolution is based on mutations that are random with respect to the adaptive needs of the organism (but not necessarily random in other respects), resulting in inherited variation that may be amplified by recombination; (3) natural selection (at the level of individual organisms), acting on inherited variation, is the major cause of evolution of adaptive characteristics; (4) changes in the genetic composition of populations can also result from random genetic drift, especially in small populations; (5) new species are formed by divergence between populations of an ancestral species, owing to factors that reduce or prevent gene flow between populations that undergo different evolutionary changes; (6) gradual accumulation of changes by these same factors results in character differences that distinguish higher taxa, i.e., macroevolution (Reif et al. 2000; Kutschera and Niklas 2004). In particular, as embodied in the equations of theoretical population genetics, the theory was cast in very general terms. “Selection” is not identified with any specific mode or agent (and so could include ecological sources of selection, sexual selection, the “internal selection” stemming from functional interactions among characters Schmalhausen 1949, and genic selection owing to factors such as meiotic drive). “Mutations” are any kind of reasonably stable alternatives (“allelomorphs”) to a prevailing unit of heredity; the equations for the dynamics of mutations in populations apply equally to what we now identify as single-base pair substitutions (whether in structural or regulatory sequences), chromosome inversions, polyploids, and even epigenetic “mutations.” These broad concepts lack mechanistic content; empirical data are needed to describe real instances of evolution, such as the agents of selection and the molecular and developmental basis of phenotypic variants. Thus, the conception of causes of evolution embodied in the Synthetic Theory, i.e., gene frequency change, is quite different from the causes of differences in morphology, physiology, or behavior that are commonly envisioned by mechanistic developmental biologists, physiologists, or neurobiologists (cf. Amundson 2005).

The Evolutionary Synthesis was both a synthesis (especially of genetics and natural selection) and a “constriction” (Provine 2001). The seeming exclusivity of the ES can be understood (and excused, if deemed necessary) only by appreciating the state of evolutionary discourse in the early twentieth century (see Simpson 1944; Rensch 1959; Bowler 1983; Reif et al. 2000). Darwinism was in “eclipse” (Huxley 1942; Bowler 1983), in that almost no biologists accepted natural selection as a significant agent of evolution. (The exceptions were chiefly some of the naturalists.) Almost nobody had attempted to measure selection in natural populations, so it simply had not been documented. Many biologists doubted that organisms’ characteristics are adaptive; Robson and Richards (1936), for instance, devoted much of their book to the thesis that differences between related species are nonadaptive. Selection was thought of as a “random,” undirected process, so “orderly” phenomena such as trends and parallel or convergent evolution were thought to refute evolution by natural selection. After centuries of a theological world view that included divine design and purpose, many morphologists were “idealists” who held a Platonic interpretation of each species’ form as “an element in the overall pattern imposed by Mind upon the material world” (Bowler 1983, p. 47; Winsor 2006; Amundson 2005; and others disagree). Moreover, to those who thought in terms of purpose, Darwinian selection was far less appealing than theories that did not include struggle for survival, and in which organisms could be viewed as active agents, directing their own evolution (Bowler 1983, p. 15). Among these was “neo-Lamarckism,” especially popular and long lasting among paleontologists, even after geneticists had refuted and abandoned “soft inheritance.” Lamarckism, in which organisms direct their evolution by use and disuse of certain organs, was related to Haeckel’s recapitulation theory (for ontogeny displays “progress” toward the “goal” of the adult organism), and both of these to orthogenesis, the belief (again persistent among paleontologists) that evolution is driven by irresistible internal factors in specific directions; in some versions, the drive is inexorable progress, while in others it involves momentum that carries the species into maladaptive degeneration and extinction. One might imagine that the geneticists, having disposed of two arguments against the efficacy of Darwinian selection (Lamarckian inheritance and blending inheritance), would have been staunch Darwinians, but Hugo de Vries and Thomas Hunt Morgan, founders of genetics, instead interpreted mutations as a sufficient cause of evolution. Early in his career, Morgan thought that species arise simply as mutations; natural selection simply eliminated mutations that were unfit. If selection explained anything, it was adaptation, not the origin of species—but he denied that most characteristics were adaptations (Bowler 1983, p. 198). A more extreme mutationism was voiced by some paleontologists, such as (Schindewolf 1950; cited by Simpson 1953a, b), and most notoriously by the (otherwise respected) geneticist Richard Goldschmidt (1940), who considered gene mutations and selection instrumental within species, but argued that species and higher taxa originate by an entirely different process, involving a major reconfiguration of the genome. Such a “macromutation” would often, perhaps usually, yield a hopelessly dysfunctional organism, but occasionally a coherent, adapted “hopeful monster” instead. Thus, Goldschmidt proposed evolution by saltation, i.e., a “large” discontinuous change in one or more characteristics that arises in a single generation.

Those who today disparage the Evolutionary Synthesis as a constrained, dogmatic assertion that evolution consists only of natural selection on random genetic mutations within species must recognize that the authors of the Synthesis were responding to an almost complete repudiation of natural selection, adaptation, and coherent connection of macroevolution to these processes. Macroevolution, in particular, was explained by Lamarckian modification, orthogenesis (for which no mechanism was ever articulated), and saltation (mutationism). It is instructive, then, to glance at some of the main arguments presented by the contributors to the Evolutionary Synthesis.

1.2 The Content and Authors of the Evolutionary Synthesis

The best known contributors to, or “architects” of, the Evolutionary Synthesis (sensu lato) are R.A. Fisher, J.B. Haldane, Sewall Wright, Theodosius Dobzhansky, Julian Huxley, Ernst Mayr, George Gaylord Simpson, and G. Ledyard Stebbins. Bernhard Rensch is rightfully placed in this company by those who know his work (see below), but that number is regrettably dwindling. A considerable number of other authors should be credited with major conceptual or empirical contributions, especially in Germany and Russia (Adams 1980; Reif et al. 2000). To mention only a few, in Russia, Sergei Chetverikov was a founder of population genetics, and I.I. Schmalhausen integrated natural selection with genetics and some aspects of development. Nikolai Timofeeff-Ressovsky did pioneering, insightful work on genetic variation in natural populations in Russia and later in Germany. Experimental population genetics was initiated by Georges Teissier and Philippe L’Héritier in France. To the well-known names in England should be added, at least, the cytogeneticist C.D. Darlington, author of The Evolution of Genetic Systems (1939), E.B. Ford, who with his students created ecological genetics and applied Fisher’s theory to real genetic data, and Gavin de Beer, who in his many books (e.g., Embryos and Ancestors, 1940) used comparative embryology to dismantle Haeckel’s recapitulation theory (the “biogenetic law”) and interpret macroevolutionary changes in form. In the United States, major contributions to the genetic aspects of evolution came from Herman Muller and from the botanists Edgar Anderson (author of Introgressive Hybridization, 1949), E.B. Babcock (Smocovitis 2010), and the famous trio of Clausen et al. (1948). Non-Darwinian views of evolution in Germany were countered by many adherents to Darwinism, informed by genetics (Reif et al. 2000). Erwin Baur, Max Hartmann, Wilhelm Ludwig, and Alfred Kühn, among others, developed arguments for evolution by natural selection of genetic variants that conformed fully to the ST as it developed in England and the United States. As early as 1930, in Die Phylogenie der Pflanzen, ein Überblick über Tatsachen und Probleme, the botanist Walter Zimmermann argued against idealistic morphology, typology, Lamarckism, and saltation, interpreted plant characters as adaptations formed by natural selection, and “single-handedly accomplished a synthesis years before other synthesists” (Reif et al. 2000). (Zimmermann had almost no impact, even during his lifetime, because of academic politics and because he embedded his arguments in scientific philosophy that Simpson (1949), for one, found hard to read.) The broadly trained zoologist Gerhard Heberer edited a book (Die Evolution der Organismen, 1943) in which he and most of the 18 other authors argued for the gradual evolution of higher taxa, and against saltation, Lamarckism, and orthogenesis.

Most of the well-known “architects” of the ES addressed aspects of macroevolution to at least some extent. Huxley (1942) sketched the newly forming theory most broadly in the book that gave the Evolutionary Synthesis its name; his most significant personal contribution was probably his formulation of allometric growth (unequal growth rates of different features or dimensions), which was used, by Haldane among others (Haldane 1932b), to explain some apparently nonadaptive characters—and which, incidentally, illustrates an awareness of the importance of development in evolution. Wright (1932) intended his quite abstract “shifting balance theory,” with the “adaptive landscape” as its metaphor, as a theory of long-term progressive evolution, and his landscape metaphor was adopted by Dobzhansky, Simpson, and subsequently many others. Dobzhansky (in Genetics and the Origin of Species, 1937) said almost nothing about macroevolution, but drew attention to Wright’s and Fisher’s theoretical arguments, including the efficacy of even very weak selection. He marshaled most of the existing evidence of the operation of natural selection, and it is striking to read how few his examples were—and how many more he could cite 14 years later (Dobzhansky 1951). Among the major themes in Systematics and the Origin of Species (Mayr 1942) is geographic variation: its nature, adaptive significance, and the evidence it provides of the gradual evolution of species. Mayr emphasized the uncertain borders of many genera as evidence of continuity of divergence and cited many examples of species that are clearly closely related but were assigned to different genera on the basis of one or a few character differences. In a final chapter on “The higher categories and evolution,” he listed seven factors that “deprive the macroevolutionary processes of much of their former mysteriousness,” including “the smallness and frequency of mutations,” pleiotropy, the polygenic basis of traits, allometric growth, and the power of selection (citing Fisher). He closed the book by stating, “all the processes and phenomena of macroevolution and the origin of the higher categories can be traced back to intraspecific variation, even though the first steps of such processes are usually very minute.” Complementing Mayr’s book that was written “from the viewpoint of a zoologist,” Stebbins (in Variation and Evolution in Plants, 1950) described macroevolutionary patterns in plants (e.g., fusion of flower parts) and interpreted them both in terms of likely adaptive value and developmental mechanism, a dual approach that he revisited in Flowering Plants: Evolution Above the Species Level (1974). In the later book, he listed many instances in which a diagnostic feature of a higher taxon is found as a difference between congeneric species in other plant families, illustrating that evolutionary changes at different taxonomic levels do not differ in kind.

The most widely known treatment of macroevolution during the Synthesis is Tempo and Mode in Evolution (1944), by the vertebrate paleontologist George Gaylord Simpson. By interpreting patterns in the fossil record in terms of genetics, Simpson achieved a remarkable union of widely disparate disciplines. He cited both genetic data (e.g., the adequacy of mutation rates to account for rates of phenotypic evolution in the fossil record, the polygenic basis of most phenotypic traits) and geological data to explain apparent saltations in fossil series, but agreed that the common absence of forms transitional to high-level taxa (e.g., orders, classes) requires a special explanation. He postulated “quantum evolution,” a forerunner of punctuated equilibria, in which new forms evolve very rapidly as they adapt to very different habitats or ways of life. Intermediate fossils would not be found “if the animals involved in the transitions were relatively few in number and if they were evolving at unusually high rates” (p. 117). This was not a saltationist hypothesis, for “in general the genetic processes involved do not permit making the step with a single leap” (p. 210). Having dispatched saltationism, Simpson addresses “inertia, trend, and momentum” in a multifaceted attack on orthogenesis. Rectilinear evolution does occur, but is far from universal, for most clades show a pattern of branching, diversifying in different directions. Mutation can be biased in certain directions, but appears not to coincide “with the direction in which the group is really evolving.” Progressive, rectilinear change is most consistent with persistent natural selection (for example, increasing tooth height in grazing horses). Apparent momentum can be produced by selection in many ways, such as the effect of selection on two correlated characters that reach their optima at different times. Simpson does not claim to have demonstrated that particular evolutionary events had these causes, only that they are realistic possibilities, consistent with genetic data and theory, in contrast to Lamarckian inheritance or the undefined, almost mystical factors invoked by supporters of orthogenesis.

Bernhard Rensch, in my opinion, is the great unsung hero of the Evolutionary Synthesis; it is a great misfortune that he is so poorly known, especially in English-speaking countries. Although he was a Lamarckian early in his career, he soon became a neo-Darwinian, did extensive research on geographic variation in land snails, lizards, birds, and mammals, and formulated the well-known Bergmann’s, Gloger’s, and Allen’s “rules” that provided important evidence for adaptation and natural selection. Anticipating much later research, he experimentally altered the color pattern of birds’ eggs, found that more markedly altered eggs elicited more frequent rejection by the parents, and interpreted the egg polymorphism of the brood-parasitic cuckoo (Cuculus canorus) as an adaptive response to egg rejection. He published major papers in 1939 on Typen der Artbildung (kinds of speciation) and in 1943 on Die paläontologischen Evolutionsregeln in zoologischer Betrachtung (paleontological rules of evolution from a zoological viewpoint). He worked on his book, Neuere Probleme der Abstammungslehre (recent problems of the theory of evolution), during World War II, but was unable to publish it until 1947; only after correcting proofs did he see for the first time the books by Huxley (1942), Mayr (1942), and Simpson (1944). He took these into account in the second edition that was published in German in 1954 and finally appeared in English, as Evolution Above the Species Level, in 1959.

Rensch’s treatment of macroevolution is, I think, more impressive, and certainly more multifaceted, than Simpson’s. He not only counters Goldschmidt’s saltationism by citing abundant gradations from geographic races to species to genera, but he also provides deeply insightful analyses of the major problems of macroevolution (arguing against saltation, orthogenesis, and neo-Lamarckism) by summarizing paleontological studies and especially evolutionary patterns revealed by comparative morphology and embryology—an approach that Simpson did not take. In reviewing the first edition of Rensch’s book, Simpson (1949) lavished praise. Although Rensch is not a paleontologist, Simpson wrote, he provides interpretations that may be “commended to paleontologists as examples of how…to understand the facts of their subject.” Simpson praised the book for “an extraordinary richness of pertinent examples and for clearly reasoned interpretation, … so packed with well-integrated information that summary is impossible.”

Rensch interprets the temporal course of clade diversification (cited by paleontologists as an inherent “life cycle”) as adaptive proliferation when a lineage adapts to new habitats that are relatively free of competitors; the rate of diversification may last for more than 100 million years (not a brief, vigorous “adolescence”), and declines, he suggests, because competition increases with the number of species. Dollo’s “law” of irreversibility (a mainstay of orthogenesis) has many exceptions, but there is seldom reversal to exactly the ancestral character, because during the interim, the “whole organism of the animal has undergone change:” any reversal has to be functionally integrated with the entire system. Examining development and the “mechanisms of construction” shows that many possibly nonadaptive features may be ascribed to allometry or to the multiple effects of hormones. Increases in body size carry with them changes in such features as the number of retinal cells or brain cells, which may support new functions and even be the basis of selection on size. Parallel evolution may arise from similar hereditary factors and development, as seen in lepidopteran wing patterns, or from similarity of natural selection, as in the longer wings of diverse migratory bird species compared to nonmigratory relatives. Apparent orthogenesis, as exemplified by Cope’s rule of increase in body size, can readily be caused by selection or by correlated growth (especially allometry). Adaptively novel clades (e.g., of arboreal mammals) must originate from small, unspecialized ancestors, not giant forms. (This argument anticipates Stanley’s 1973 explanation of Cope’s rule of size increase.) Alterations of embryonic development (described in a 27-page passage that draws on comparative and experimental embryology) show that ontogeny can be altered in so many ways that the direction of evolution cannot be set by autonomous factors. “Jumps” in the fossil record of the origin of new “structural types” are explained by failure to fossilize (the then recently discovered living coelacanth Latimeria strikingly shows the incompleteness of the fossil recordFootnote 1), by geographic shifts in the distribution of new species that evolved elsewhere, and by the accelerated evolution and strong directional selection of lineages that adapt to new habitats or ways of life. (Cf. Simpson’s “quantum evolution” and Eldredge and Gould’s “punctuated equilibria.”) New organs are usually modifications of features that evolved long before (e.g., mammalian ear ossicles, derived from jaw bones), and probably evolved by successive small changes (since most “large” mutations are harmful in Drosophila and other species). Rensch cautioned that “I do not wish to deny the possibility that some day further evidence of the evolutionary effects of macro-mutation may come to light” (p. 106), but he concludes (p. 358) that the wealth of forms that compose a single, giant tree of life “is the result of continuous, undirected mutation and is patterned by the respective conditions of selection,” and that “if there are some special problems to which we can only say ‘Ignoramus’ [we do not know], we need not add ‘Ignorabimus’ [we shall not know].”

In my view, the major contributors to the Synthesis marshaled available evidence (which on some points, such as the prevalence of natural selection, was strikingly sparse) logically and effectively in support of gradual evolution, chiefly by natural selection acting on undirected mutations of mostly small magnitude. The Synthesis “architects” successfully banished orthogenesis and, together with geneticists, Lamarckian mechanisms of change. They replaced mutationism with “population thinking,” although they did not (and could not, as the quotation from Rensch admits) demonstrate that “macromutations” never contribute to the evolution of major change in form. They did not address all evolutionary phenomena, by any means; they said little about patterns and causes of extinction, for example, and Mayr (1960) noted, in a famous essay on evolutionary novelties, that “the problem of the emergence of evolutionary novelties has undoubtedly been greatly neglected during the past two or three decades.” As is widely recognized, ecology, morphology, developmental biology, and phylogeny received little attention during the Synthesis, relative to genetics.

Empirical evolutionary research in the 1950s and 1960s greatly increased information on genetic variation in natural populations, the seeming ubiquity of natural selection (Endler 1986, Table 5.1 lists at least 85 studies of “natural selection in the wild” from these decades), and speciation. Major theoretical advances included the articulation of kin selection, the distinction between individual selection and group selection (the latter still a contentious issue), and the consequent development of theory, based on individual and kin selection, to explain classes of characteristics such as life history traits and social behaviors. These developments may well have led to “hardening” of evolutionary thinking around selection as an almost exclusive factor of evolution, as I will note later. (Gould’s 1983 claim that the Synthesis itself became more exclusively selectionist has not been rigorously analyzed Reif et al. 2000.) But the all-important role of selection was challenged by interpretations of molecular polymorphism and evolution in neutralist terms (King and Jukes 1969; Kimura 1968, 1983). Students of phenotypic evolution nevertheless tended to remain convinced that the features they studied evolved mostly by natural selection, and Kimura himself agreed that this is likely the case. (Remarkably, Wright disavowed the role of genetic drift in any but very small populations and was little interested in Kimura’s theory, because “the condition that gives the maximum amount of such drift is that of complete neutrality and hence of no evolutionary significance” Provine 1986, p. 472.)

Evolutionary biology since about 1970 has seen immense growth and integration with other areas of biology (e.g., ecology, behavior, physiology, developmental biology, and especially molecular biology). The ST has proven flexible because it was cast in general terms that could be easily honed to describe specific, newly discovered phenomena such as codon bias and transposable elements. Massive evidence of selection and adaptation was revealed not only by demographic studies of the kind that Endler (1986) and Kingsolver et al. (2001) summarized, but also by “signatures” of selection in DNA sequences, experimental evolution (chiefly in laboratory cultures of microorganisms), the revival and documentation of sexual selection, and the frequent fit of data to adaptive models of life history, behavioral, physiological, and morphological traits. Phylogenetic inference became increasingly rigorous and reliable and is now a major element in evolutionary biology, appreciated not only as a reconstruction of some aspects of evolutionary history, but also as an analytical approach to inferring some evolutionary processes. Evolutionary studies became increasingly quantitative and increasingly compared data against neutral (random) null models. Evolutionary biologists became increasingly cognizant of mechanistic biology: It is necessary to know some molecular biology in order to interpret molecular data. At the same time, there has been a resurgence of challenges to the ST (Depew and Weber 2013), with calls to expand the ST (e.g., by recognizing selection at different levels), to extend it (by incorporating other processes and other fields of study), or to replace it. Paleontologists have been most conspicuous in challenging the ST, but developmental biologists and a few “neontological” evolutionary biologists have also issued calls for change, in some instances echoing the paleontologists. Stephen Jay Gould, the most incessant and articulate critic of the ST, played all three roles. Most of these calls for change explore or advocate explanations other than individual selection within populations, which is commonly viewed as too exclusive a theory of evolutionary process. In the remainder of this essay, I will comment on four contentious issues: (1) alternatives to gradualism, (2) internal constraints on adaptation by natural selection, (3) challenges from developmental biology, and (4) long-term changes in diversity.

2 Alternatives to Gradualism

Two major alternatives to gradualism have been posed since 1970, both led by Gould: a revived dalliance with saltation (Gould 1980) and the model of punctuated equilibria introduced by Eldredge and Gould (1972) and elaborated and defended, especially by Gould, in many later publications (see Gould 2002). These are entirely different propositions.

2.1 Macromutation

Gould (1980) envisioned a discontinuity between intraspecific evolution and the origin of new species (which he named the “Goldschmidt break”) and advocated a more favorable reconsideration of a role for discontinuous, macromutational changes in the evolution of major character changes. In an introduction to a 1982 reprint of Goldschmidt’s notorious The Material Basis of Evolution, Gould wrote “I find [Goldschmidt] not victorious, but weighted equally with his self-proclaimed Darwinian opponents.” (The book was reviewed in Paleobiology by four reviewers, among whom Charlesworth (1982) and Templeton (1982) wrote scathing criticisms of Goldschmidt that included highly unflattering comments on Gould’s advocacy.)

“Macromutation” has been used with a variety of very different meanings: For some, such as Goldschmidt, it is manifested as the origin of a radically altered character or set of characters (or a major morphological remodeling, as expressed by Schindewolf’s (1950) famous speculation that the first bird emerged from a reptile’s egg). For other authors, a macromutation merely causes a discrete difference in a character, the magnitude of which need not be specified. An evolutionary role for single discrete mutational changes of single characters, of substantial magnitude, has been admitted from the beginnings of the Synthesis. Haldane (e.g., 1932a), for example, suggested that evolutionary “jumps” could arise by a variety of processes, such as hybridization, polyploidy, and the substitution of fairly “large” mutations, followed by modifier alleles with small effects. Fisher (1930) suggested that the latter model would account for data on inheritance of mimetic color patterns in butterflies: A major “switch” gene decides between two alternatives, each of which may be modified later in evolution by other substitutions. This model was adopted and supported by mimicry researchers such as Philip Sheppard (a student of E.B. Ford) (Sheppard et al. 1985) and by the population geneticist Charlesworth (1980), a vocal defender of the ST.

It is true that by the early 1980s, it was widely thought that almost all the allele substitutions underlying variation in polygenic traits had very small effects, but on closer examination, it became clear that many character differences between closely related species are based on fewer gene differences, of larger effect, than previously supposed (Gottlieb 1984; Orr and Coyne 1992). Nevertheless, quite a few genes contribute to such differences. For example, a bee-pollinated and a hummingbird-pollinated species of Mimulus have very different flowers that differ in at least 12 features. Analysis of interspecific crosses documented one to six quantitative trait loci (QTL) contributing to each trait. In nine traits, at least one QTL accounted for more than 25 % (but always less than 50 %) of the variance (Bradshaw et al. 1998). The authors interpreted the data as meaning that genes of large effect can contribute to speciation. However, the traits were affected by different QTL, a total of 47 QTL were detected, each QTL might well be a cluster of genes rather than a single gene, and the considerable unexplained variance surely is attributable, in part, to many loci with effects too small to be detected by the rather coarse genetic analysis. This is a far cry from a macromutation that might be imagined to underlie a major change in flower form. To be sure, it had been recognized since before the Synthesis that evolution of major differences in form could sometimes arise from changes in development that ensured coherent, integrated change in multiple traits. The chief example was paedomorphosis, as exemplified by salamanders (Ambystoma), known as axolotls, that retain larval features in the reproductive adult stage.Footnote 2 The difference between the metamorphic and paedomorphic life cycles is closely associated with a major gene that affects delay of metamorphosis, but other genes clearly contribute to the threshold that determines which developmental mode is expressed (Voss and Smith 2005). Of course, paedomorphosis is merely a change in timing of development of features that are the product of a very long history of, possibly, entirely gradual evolution.

Whether or not major, discontinuous single-step changes in phenotype have occurred in evolution is an empirical question. No mathematical theory excludes the possibility; Fisher’s (1930) famous geometrical model, often cited as an argument against macromutation, is a metaphor, not comparable to, say, models of the conditions for stable polymorphism. The manifold effects of polyploidy (Levin 1983; Ramsey 2011) might be considered macromutational; possibly newly established endosymbioses will likewise have large but beneficial effects. The most intriguing possibilities are raised by developmental genetics, in which major regulatory genes have “coopted” different developmental pathways, or have different spatial expression and so are associated with major morphological differences among taxa. (For example, the somites that develop into different classes of vertebrae differ among major vertebrate taxa, apparently caused by differences in the expression domains of certain Hox genes.) It is conceivable that a single mutational change in the association between a regulatory gene and a developmental pathway accounts for such cases, but it is also possible that it happened incrementally. An instructive example is the gene shavenbaby, responsible for a difference between Drosophila species in the presence or absence of larval trichomes: The difference is based not on a single mutation, but on a combination of mutations in three different enhancers (Stern 2011). Fisher (1930, p. 164), in the discussion of mimicry that I have already cited, noted that a single gene determines sex in some fishes, but that we would not suppose that the various adaptations of one sex have arisen by a single saltation from the other sex. This example, he wrote, emphasizes “that it is the function of a Mendelian factor to decide between two (or more) alternatives, but that these alternatives may each be modified in the course of evolutionary development, so that the morphological contrast determined by the factor at a late stage may be quite unlike that which it determined at its first appearance.” It is certainly possible that instances will be found in which a functionally coherent set of character alterations will be found to have originated by one or a few mutations that affect development. So far, however, it seems as if the bulk of evidence continues to favor the view that phenotypic characters generally evolve more or less independently at different rates (mosaic evolution) and by multiple, polygenic substitutions.

2.2 Punctuated Equilibria

The proposition that Eldredge and Gould (1972) dubbed punctuated equilibria (PE) has often been confused with macromutational saltation, but it is entirely different. PE refers both to a pattern that Eldredge and Gould claimed is common in the fossil record, and to a proposed process that would, they said, explain that pattern. The pattern is rapid shift between one long-lasting, virtually constant phenotype and another. The shift is typically not documented by intermediate fossils, but the geological interval during which the shift occurs is typically on the order of thousands of years, long enough for appreciable evolution by standard processes (Stebbins and Ayala 1981; Hunt 2010). Both in 1972 and afterward, both Gould and Eldredge emphasized that character change during the shift (the “punctuation”) may well be gradual, i.e., a continuous change in mean character state, caused by natural selection acting on undirected variation. The radical feature of Eldredge and Gould’s proposed process is that during the long periods of constancy (“stasis”), populations cannot readily respond to natural selection because of genetic constraints (in the form of epistatic interactions among genes), constraints that might be loosened when a population undergoes a bottleneck in size. Genetic drift might then initiate evolution toward a different genetic equilibrium, which they envisioned to be a different species, reproductively isolated from its more widespread “parent” species. Thus, character evolution occurs chiefly in concert with, and is caused by, speciation—bifurcation of an ancestral lineage into two reproductively isolated descendants. The new, modified “daughter” species originates as a small, geographically localized population, in which the evolutionary transition from one optimal phenotype to another occurs rapidly. Its existence can be preserved and documented in the fossil record only if it eventually expands its geographic range, perhaps supplanting its ancestor (the “parent” species) as it does so. This hypothesis, they suggested, accounts for the paucity of cases of steady transformation of lineages (which they labeled “phyletic gradualism”) and for gaps in morphology that have plagued evolutionary biologists from Darwin on. (Note, however, that the phenotypic gaps in this model are small; PE describes not the origin of higher taxa with novel features, but closely related, similar species. PE was born, in large part, from Eldredge’s study of the trilobite Phacops rana, in which ancestral and descendant forms are distinguished by a small difference in the number of rows of eye lenses.)

Eldredge and Gould argued that if a character evolved only during the origin of a new “daughter” species, and if the direction of character evolution depends only on local selection that is unlikely to be correlated among successive speciation events, widely separated in time and space, individual selection would produce only random fluctuations in the character, averaged over the members of a clade. Eldredge and Gould concluded that long-tern trends in characters should therefore be attributed not to individual selection within species, but selection at the species level: association of character states with rates of speciation or species extinction (the species-level analogs of birth and death). Paleontologist Stanley (1975) phrased the same argument more dramatically, claiming that macroevolution is decoupled from microevolution. This argument epitomized the rebellion against the ST that started in the 1970s.

Eldredge and Gould’s hypothesis was not entirely original. As one of several possible reasons for the embarrassing paucity of transitional forms in the fossil record, Darwin (1859, p. 306 of 1979 reprint) wrote: “One other consideration is worth notice: with animals and plants that propagate rapidly and are not highly locomotive, there is reason to suspect, as we have formerly seen, that their varieties are generally at first local; and that such local varieties do not spread widely and supplant their parent-forms until they have been modified and perfected in some considerable degree. According to this view, the chance of discovering in a formation in any one country all the early stages of transition between any two forms, is small, for the successive changes are supposed to have been local or confined to one spot.” Rensch (1959, p. 106) wrote that although paleontologists often invoke macromutations to account for “saltatory deviations” in fossils from successive geological horizons, “these ‘saltations’ are probably due to horizontal shifts of geographic races or closely related species.” Eldredge and Gould (1972) stated clearly that their model was an application to the fossil record of Mayr’s (1954) founder-effect model of speciation (which Mayr later dubbed “peripatric speciation”). Mayr, in fact, had already made this application explicit, asserting that a locally formed species will invade new areas, and “only then will it become widespread and thus likely to be found in the fossil record. But then it is already too late to record the evolutionary change through which it has gone. All the paleontologist finds is the fact that one widespread numerous species was replaced or succeeded by a rather different species…” (reprinted in Mayr 1976, p. 207).

Mayr (1954, 1963) had suggested that adaptive evolution might be slow in large populations because of what evolutionary geneticists call epistasis for fitness: Many mutations fail to increase because they have deleterious interactions with many of the vast number of genetic backgrounds in which they are distributed in a large, highly polymorphic population. He proposed that changes in allele frequencies due to sampling error (genetic drift) at some loci in a population founded by a few individuals, together with the population’s reduced genetic variation, would change the “genetic environment,” in which certain alleles would confer high fitness that would not do so in the large “parent” population. This might be a snowballing process (a “genetic revolution”), leading to such great change as to form a new, reproductively isolated species, rapidly and in a localized area. This was envisioned as a process of evolution by natural selection, but the selection was “internal,” not necessarily imposed by environmental change. When population geneticists later modeled their interpretation of Mayr’s verbal model (and rather similar verbal propositions by Carson and Templeton (1984), they found it almost indistinguishable from Wright’s 1932 et seq.) shifting balance theory. Like Wright, they said, Mayr appears to envision a shift between peaks on an adaptive landscape, requiring that selection against departure from the original peak be countered by genetic drift. All the population geneticists who modeled this process agreed that this was very unlikely unless selection is very weak; that is, the “valley” between adaptive peaks is very shallow (Charlesworth and Rouhani 1988; Barton and Charlesworth 1984). But a shallow valley would imply, they said, that the fitness of hybrids between “parent” and “daughter” populations would be quite high; hence, reproductive isolation would be very weak. The majority of population geneticists judged Mayr’s peripatric speciation unlikely, and some rejected Wright’s shifting balance theory as well (Coyne et al. 1997; but see Wade and Goodnight 2000). Furthermore, they found no genetic evidence, based on allozymes and early DNA data, that speciation is associated with reduced population size. Charlesworth (1984) argued that seemingly unchanging characters actually show substantial fluctuations around a nontrending long-term average, suggesting that they are able to evolve, but are subject to long-term stabilizing selection (see also Gingerich 1983). Thus, evolutionary geneticists criticized the theory on which Eldredge and Gould based punctuated equilibria theory; they rejected genetic constraint as an explanation of stasis; they rejected the proposition that character evolution depends on speciation; and they vigorously defended the ST (Charlesworth et al. 1982).Footnote 3

As to evidence, the little evidence from studies of closely related species, based on application of coalescent theory to DNA sequences, generally suggests that speciation has not been associated with bottlenecks in population size (e.g., Rovito 2010; Yeung et al. 2011). Whether or not “punctuational” changes in fossil lineages are associated with biological speciation or are simply episodes of rapid evolution within single, nondividing lineages (“punctuated gradualism”) is still unclear. In paleontological taxonomy, “species” are morphologically distinguishable named units, either successive stages in a single evolving lineage (“chronospecies”) or reproductively isolated forms (biological species). Temporal overlap between “parent” and “daughter” forms is the best evidence that they represent cladogenesis (biological species) rather than “chronospecies.” Although punctuated gradualism has been claimed for some lineages of planktonic Foraminifera, which provide exceptionally complete fossil records (e.g., Malmgren et al. 1983), Gould and Eldredge (1993) claimed that many studies find temporal overlap, supporting their model. A recent comprehensive analysis of 337 Cenozoic “speciation” events in Foraminifera concluded that at most 19 % of Cenozoic events (last 65 million years) and 10 % of Neogene events (last 23 million years) represented change within nondividing lineages: The great majority revealed temporal overlap, and hence biological speciation (Strotz and Allen 2013). Analyses of living species can also shed light on the question. Starting with Avise (1977) and Ricklefs (1980), several authors noted that (controlling for clade age) the total amount of phenotypic variation (“disparity”) among species in a clade should be correlated with the number of species according to the PE model, but not if phenotypic evolution is independent of speciation. Statistical methods for testing this hypothesis have been developed only recently (Bokma 2010; Magnuson-Ford and Otto 2012), but have indicated that most of the evolution of body size in mammals (Mattila and Bokma 2008; Monroe and Bokma 2009) and of habitat use in primates (Magnuson-Ford and Otto 2012) is associated with speciation. (See Adams et al. 2009 for a counterexample.) Likewise, DNA sequence divergence seems to have been enhanced by the amount of speciation in some higher taxa, but not others (Venditti and Pagel 2010; Goldie et al. 2011; Duchene and Bromham 2013).

The theoretical criticisms of founder-effect speciation by Charlesworth, Barton, and their coauthors do not necessarily eviscerate Eldredge and Gould’s hypothesis that phenotypic evolution is associated with and enabled by speciation. Slatkin (1996) wrote “in defense of founder-flush theories of speciation,” noting that relaxation of selection during the exponential population increase that may occur in newly founded populations can enable new advantageous allele combinations to be formed and selected, and Gavrilets (2004) noted that small populations can drift along adaptive ridges in multidimensional genetic landscapes, and achieve reproductively incompatible genetic configurations without having to cross impassably deep fitness valleys. And as explained in the next section, speciation might well promote trait evolution even if it does not proceed by genetic drift and reduced population size: Any mode of speciation might do.

2.3 Reconciling Punctuated Equilibria with Population Genetics

I proposed a simple explanation of why biological speciation (by any mode) is likely to be associated with substantial, long-lasting phenotypic alterations in fossil lineages (Futuyma 1987; see Futuyma 2010). I noted that only a minority of changes in phenotypic characters are advantageous across a broad array of environmental conditions; most advantageous alterations enhance adaptation to particular ecological niches or circumstances. Many herbivorous insects, for example, have the potential for advantageous changes in behavioral or physiological responses to certain plant species, perhaps adding the plant to the insect’s diet. Most such adaptations have a polygenic basis, often composed of several functionally interacting components (e.g., recognizing a plant and possessing the enzymes needed to digest it or detoxify its chemical defenses). The geographic distribution of a specific “niche” (e.g., an environmental condition or a resource such as a host plant) is often discontinuous (patchy); moreover, it is likely to change over time, due to climate change, if for no other reason. An adaptation to such a “niche” arises and may be fixed in a local population, but for two reasons, both owing to breakdown of the adaptation by recombination, it may not persist long enough to be registered in the fossil record, much less be inherited by a clade of species. Specifically, the constellation of alleles and component characters associated with an adaptive trait will generally not be maintained intact if the population interbreeds freely with another population (such as the ancestral form) that is adapted to a different niche. Two likely consequences follow. First, if the adaptation does not become widespread, it is unlikely to be documented in the fossil record, and unlikely to persist very long because the natal population will eventually become extinct. But spread of the new adaptation from its birthplace to other patches with the same niche may well be hindered if emigrants are likely to disperse into intervening patches of the ancestral niche, where they will interbreed with ancestral genotypes. Second, environments undergo geographic shifts, dramatically illustrated by Pleistocene glacial and interglacial fluctuations. When this occurs, species commonly “track their niche”: They undergo range shift, during which new populations are founded by migrants and some old populations become extinct, and the former geographic structure of the species is broken down and reformed. (“Niche tracking” suggests that dispersal may often be “easier” than adaptation in situ to an environmental change. The several possible reasons include genetic constraints, discussed in Sect. 4.2) The founders of a new population often will be drawn from separate, differentiated populations, causing gene flow on a more massive scale than the “trickle gene flow” that characterizes equilibrium populations (Slatkin 1977; McCauley 1993). Such gene flow, if between differently adapted populations, may break down the differences between them. In both of these scenarios, the evolution of reproductive isolation maintains the locally originated adaptation intact, by preventing free interbreeding with the more widespread, common ancestral genotype. Thus, I concluded, “speciation can facilitate morphological change not by liberating a population from genetic homeostasis or accelerating the response to selection, but by enabling a gene pool to remain subject to consistent selection pressures even as it moves about in space. By isolating gene pools from other gene pools that they encounter as they move about, speciation enables them to retain characters that evolved in a local context…

Although speciation does not accelerate evolution within populations, it provides morphological changes with enough permanence to be registered in the fossil record” (Futuyma 1987, p. 467). In that paper, I emphasized the role of reproductive isolation in protecting adaptations from dissolution during massive changes in geographic range, but I am now inclined to think that its more important effect is in enabling a new adaptation to spread by migrants that do not interbreed with residents of intervening ancestral-type populations. Eldredge et al. (2005) also considered the problem of how to reconcile apparent stasis in fossil lineages with the capacity for rapid evolution, and observed high rates of evolution, in populations of living organisms. (See also Thompson 2013.) Chief among the several factors that they suggested might cause stasis was spatially and temporally heterogeneous selection, owing in part to a “geographic mosaic” of different coevolutionary interactions experienced by different populations of a species (Eldredge 2001; Lieberman and Dudgeon 1995; Thompson 2005). Although individual populations may respond rapidly to local selection, consistent directional selection seldom acts on the species as a whole. Eldredge et al. did not discuss what factors overcome the heterogeneity of selection and enable significant character change, i.e., punctuation. Their hypothesis for stasis is related to mine and can be extended to account for punctuation by postulating, as do I, that evolution of reproductive isolation by one such population enables the phenotype to spread and persist. My model has won modest approval, especially among some paleontologists. Gould (2002, pp. 798–802), in particular, admitted in The Structure of Evolutionary Theory that the original explanation of punctuational evolution by founder-effect speciation and “genetic revolution” was untenable, and strongly endorsed my model, writing that “his simple, yet profound, argument has not infused the consciousness of evolutionists because the implied and required hierarchical style of thinking remains so unfamiliar and elusive to most of us” (p. 799). (Well, maybe.) Although this model is not the only one that might account for a correlation between divergence and speciation (Rabosky 2012), the evidence mentioned earlier (e.g., Mattila and Bokma 2008; Venditti and Pagel 2010; Strotz and Allen 2013) is consistent with it. So is evidence suggesting a break between intraspecific evolution and divergence between reproductively isolated populations.

For example, the structure of the phenotypic variance–covariance matrix is much the same among conspecific geographic populations of damselflies, but differs strongly between closely related species, between which divergence has been highly discordant with the intraspecific first principal component of variation (Eroukhmanoff and Svensson 2008). A most intriguing “blunderbuss” pattern of evolution of vertebrate body size has been described by Uyeda et al. (2011), who show that size evolves at a high rate over short time spans, but does not accumulate until lineages have been separated for about a million years or more. That is, the amount of divergence between related lineages is much the same after 105 years as at 103 years. After 106 years, however, the amount of difference mounts steadily and rapidly with time. It is tempting to attribute the million-year break to speciation, which often requires isolation and genetic divergence for about that long (Coyne and Orr 2004).

If Eldredge and Gould (1972) were right in supposing that trait evolution is facilitated by speciation, they were surely wrong about the mechanism, as Gould (2002, p. 796) came to recognize: “I believe that our critics have been correct in this argument, and that Eldredge and I made a major error by advocating, in the original formulation of our theory, a direct acceleration of evolutionary rate by the processes of speciation.” It is possible, in my view, that phenotypic evolution and speciation are functionally associated, although more evidence will still be needed before the generality and cause of this pattern can be established. Perhaps Mayr (1963, p. 621) rightly wrote that “without speciation there would be no diversification of the organic world, no adaptive radiation, and very little evolutionary progress. The species, then, is the keystone of evolution.”

3 Internal Constraints on Adaptation

Neither Darwin nor major figures in the Evolutionary Synthesis viewed natural selection as the sole important factor of evolution, much less as an omnipotent agent that could always fit organisms optimally to their environment. Darwin made frequent reference to the “mysterious laws of growth,” as well as to environmental modifications that he supposed (especially in later editions of The Origin of Species) might be inherited. Wright included genetic drift as an important component of his Shifting Balance Theory; Fisher recognized genetic drift (especially as it affects the probability of fixation of a new advantageous mutation), and described “runaway” sexual selection in which female preference evolves not because of an advantage, but because of linkage disequilibrium with the male trait. In his well-known essay “What is an adaptive trait?” Dobzhansky (1956) emphasized the importance of nonadaptive pleiotropic effects of selected genes. Rensch (1959) attributed parallel evolution partly to similarity of hereditary factors, emphasized the role of development and “mechanisms of construction,” and explained many characters by character correlation, especially allometric growth. Like Rensch, Mayr (e.g., 1963, p. 608) attributed parallel evolution partly to shared genetic and developmental properties, which also predispose every group of animals “to vary in certain of its structures, and to be amazingly stable in others.” Stebbins (1950, 1974) noted that certain traits, such as the number of ovules per carpel, vary in certain taxa and are invariant, both within and among species, in other taxa.

During and after the ES, however, evidence mounted that natural populations are genetically very variable. Lewontin (1974, p. 92) famously wrote that “[t]here appears to be no character—morphogenetic, behavioral, physiological, or cytogenetic—that cannot be selected in Drosophila,” and concluded that “there is good reason to suppose that any outbred population or cross between unrelated lines will contain enough variation with respect to almost any character to allow effective selection.” This view, still widely held by evolutionary geneticists, supports an optimistic view of species’ adaptability, and skepticism that adaptation is often limited or channeled by available genetic variation.

3.1 Adaptation: Critique and Defense

As I noted earlier, evidence for natural selection increased greatly after the Synthesis, and interest grew in explaining the evolution, by natural selection, of classes of characteristics such as life history traits and animal behaviors. Some such literature included plausible, but not well-tested, adaptive interpretations that became disparaged as “just-so stories” by critics, especially Richard Lewontin and Stephen Jay Gould. Much of the literature, though, consisted of optimality models that could be evaluated by empirically testing their assumptions and, especially, by comparing the models’ predictions with observations (Maynard Smith 1978). Such models included constraints, or boundary conditions, such as trade-offs among traits. “Adaptationism” came under fire in the 1970s, Gould and Lewontin’s (1979) eloquently written paper, “The spandrels of San Marco and the Panglossian paradigm,” being by far the most frequently cited critique. Echoing Gould’s frequent complaint that the Synthesis had “hardened” around natural selection, and Lewontin’s (1977, 1979) critiques of the “adaptationist program” embodied in sociobiology and in The Selfish Gene (Dawkins 1976), Gould and Lewontin criticized what they viewed as a practice of atomizing organisms into unitary traits, proposing adaptive explanations of each, and substituting alternative adaptive hypotheses if the first ones fail. Among the many faults they found in the “adaptationist program” was its supposed failure to consider alternatives to natural selection, such as random genetic drift, alternative stable states, and especially nonadaptive by-products of developmental correlation. Maynard Smith (1978), among others, defended optimality theory, noting that the traits usually studied “can hardly be selectively neutral” (e.g., behavior and other traits that affect reproductive success), that the theory does not assume or attempt to show that traits are actually at their optima, and that the models make explicit assumptions about constraints and heredity, but he agreed that it was important to develop adequate methods of testing the models and that the field could benefit from heeding Lewontin’s criticisms. Since then, researchers in this field have indeed become more critical, and the literature now includes countless examples of adaptationist hypotheses that were testable, have been tested, and have (usually) provided evidence of adaptation. For example, the inflorescence of wild carrot (Daucus carota) consists of an umbrella-like array of many tiny white flowers—with one or a few purple flowers in the center. Maynard Smith (1978) quotes Darwin’s passage about this in The Origin of Species: “that the modified central flower is of no functional importance to the plant is almost certain,” and then writes that, having cited this example in conversation, his companions immediately offered two adaptive hypotheses which, however, struck him as “fanciful.” One of these hypotheses was that the dark flower is an “insect mimic” that attracts pollinating insects to the inflorescence. In 2009, Goulson et al. reported experiments, including experimental removal of the dark flower, that showed exactly this effect. As tests of adaptationist hypotheses improved after Gould and Lewontin’s (1979) critique, constraints and the possibility of nonadaptive interpretations (especially based on development) became a common theme. A distinction was made between universal constraints (owing to physics and chemistry that affect, for example, the properties of materials) and “phylogenetic” constraints, particular to a clade because the features established in its earlier history can restrict the variety of possible evolutionary paths. (For example, it has long been supposed that the maximum body size of insects is set by the extent to which gas exchange can occur by diffusion through the tracheae.) Constraints might be caused by natural selection (“selective constraints”) or by internal factors that restrict or bias the kinds of phenotypic variations that can arise. These “genetic constraints” and “developmental constraints” are closely related and often are much the same thing. Moreover, the distinction between selective and developmental constraints is often unclear, for a phenotypic change may cause death by disrupting development (e.g., failure of proper formation of the embryonic notochord could abort development of vertebrae, which the notochord induces). Smith et al. (1985) offered the most widely used definition of a developmental constraint: “a bias on the production of variant phenotypes caused by the structure, character, composition, or dynamics of the developmental system.”

In the now extensive literature on constraints, some authors (e.g., Wake 2009) attributed certain evolutionary patterns, such as toe webbing in some salamanders, to developmental correlation rather than adaptation (as had Rensch and others during the Synthesis). Others provided both theoretical and empirical studies of ways in which the direction and extent of evolutionary change might be biased or limited by genetic variance and, especially, covariance among traits (e.g., Bradshaw 1991; Schluter 1996; Futuyma et al. 1995; Marroig and Cheverud 2005). The broad problem addressed is the extent to which constraints are important in explaining a range of phenomena. These include both existing features (such as toe webbing) and restrictions on adaptation, such as limits on species’ geographic range and ecological amplitude (niche width). The following paragraphs summarize my recent review of the importance of genetic constraints, especially as they may limit adaptation (Futuyma 2010). I include under “genetic constraint” both so-called phylogenetic constraint and developmental constraint, which implies strictures set by developmental properties that do not vary, even though they are based at least partly on genetically encoded products (see Sect. 4.1).

3.2 Genetic Constraints

Studies of genetic variation in natural populations, responses to artificial selection, and rapid adaptation to environmental changes have led most population geneticists to conclude that almost every characteristic of most species is so genetically variable that the availability of variation seldom limits the response to selection (e.g., Mather 1955; Barker and Thomas 1987; Barton and Partridge 2000). However, all acknowledge that genetic correlations caused by pleiotropy can greatly retard, or possibly prevent, evolution of a character if there exists antagonistic selection on correlated characters; the strength of this effect grows with the number of correlated characters (Dickerson 1955; Kirkpatrick 2009; Walsh and Blows 2009). Authors past and present (e.g., Schmalhausen 1949; Riedl 1978; Schwenk and Wagner 2004) have emphasized the likely importance of “internal” selection, owing to antagonistic pleiotropy and epistasis, in limiting selection response and evolutionary change.

Several phenomena suggest that genetic constraints may effectively prevent response to selection. (1) The most striking evidence of failure of adaptation is extinction, the fate of the vast majority of species that have existed. Even with plentiful genetic variation, adaptation will lag behind very rapid changes of environment. This was surely the case during certain mass extinctions, but a greater fraction of species have succumbed during periods of “background” extinction. Whether or not the unknown environmental changes that may have caused these extinctions were rapid or slow is not known. (2) Almost all species have limited geographic distribution and habitat occupancy. Why they cannot adapt to often modestly different environments beyond their range is one of the most challenging problems in evolutionary ecology, in my view (see Kirkpatrick and Barton 1997; Holt and Gaines 1992). Bradshaw (1991), an authority on rapid adaptation of plants to metal-contaminated soils, cogently attributed habitat limits, and many other examples of adaptive failure to what he called “genostasis,” a lack of selectable genetic variation. (3) Although convergent adaptation to similar selective challenges is common, there are also countless examples of unique, one-off adaptations; many are familiar synapomorphies of higher taxa. No bryophytes are more than about 15 cm tall, because they lack the vascular tissues that evolved only once (as far as known), in the ancestor of tracheophytes; among millions of species of insects, only one lineage (aculeate Hymenoptera) evolved a sting. The quantitative difference between evolving a feature once and not at all is slight, and terrestrial biotas would be very different if vascular plants had not evolved. (4) There are “empty niches,” lacunae in the economy of nature, as we see from geographic comparisons (e.g., sea snakes in the Indo-Pacific but not the Atlantic Ocean) and from the replacement of extinct forms by ecological counterparts only after a very long time (e.g., 120 million years between extinction of the first bivalve-drilling gastropods and the evolution of modern oyster drills) or not at all (e.g., sauropod dinosaurs). (5) “Phylogenetic conservatism” is a major feature of life that is largely unexplained. It is hard to envision an adaptive explanation of many morphological synapomorphies that characterize large, old taxa whose species are distributed among many environments, such as certain wing vein patterns that distinguish large families of Diptera and Hymenoptera. Dobzhansky (1956), in ascribing some traits to pleiotropy, cited a diagnostic feature of all of the 600 species of Drosophilidae then known: three orbital bristles, the anterior bristle oriented forward and the others toward the rear. Phylogenetic “niche conservatism,” associated with limited variation in physiology, morphology, and behavior, has immense ecological consequences (Wiens and Graham 2005). Many families of herbivorous insects have been associated with a single plant family for more than 70 million years; congeneric species of plants have similar latitudinal distributions and climate associations on different continents, after lengthy opportunity for divergence. Thermal tolerance limits are highly conserved, varying little with latitude, in both lizards and Drosophila (Grigg and Buckley 2013; Kellermann et al. 2006).

Genetic evidence of constraints on adaptation is mostly rather indirect. In a few cases, little or no genetic variance could be detected for certain characters in outbred natural populations. Bradshaw’s research group found genetic variation for copper tolerance in populations (from uncontaminated areas) of those species of grasses that have evolved copper tolerance in copper-contaminated areas, but no variation at all in other species of grasses that have failed to evolve copper tolerance (Bradshaw 1991; Macnair 1997). Tolerance of desiccation and cold displayed little or no genetic variation in rainforest-dwelling species of Drosophila (Kellermann et al. 2006). In a series of tests, my colleagues and I screened four species of Ophraella leaf beetles for genetic variation in their willingness to consume, and ability to survive on, species of plants other than their normal host plant; every species failed to display genetic variation in consumption and survival on at least one of the test plants (Futuyma et al. 1995). Moreover, the macroevolutionary pattern of diet evolution in this genus is partly predictable from, and perhaps has been guided by, the abundance or paucity of genetic variation for different responses.

There is considerable evidence that correlations among genetically variable traits may retard response to selection; examples include such traits as sexually selected male features in fishes, crickets, and Drosophila, floral traits in Ipomoea (morning glories), and tarsus length in flycatchers. In an elegant experiment, Etterson and Shaw (2001) transplanted families from a Minnesota population of Chamaecrista fasciculata further south, estimated genetic variance and covariance among several traits, and determined the relationship between trait combinations and fitness in the southern environment—which is expected to prevail in Minnesota about 50 years from now. There was little genetic variance for the trait combinations that would provide the greatest potential enhancement of fitness at that time, suggesting that future adaptation to climate change may be inadequate to ensure population persistence. In several taxa, divergence among species has been along the multivariate axis of greatest intraspecific variation, a pattern that Schluter (1996) called “evolution along genetic lines of least resistance.”

The quantitative genetic approach has been used to test whether or not a postulated developmental constraint actually would prevent response to selection. For example, a positive correlation between two characters, perhaps expressed as an allometric relationship, may be postulated to represent developmental constraint. This can be tested by artificially selecting for a character combination orthogonal to the observed axis of variation (i.e., for increase of one trait and decrease of the other). For example, the features of the several “eye spots” on the wing of the butterfly Bicyclus anynana are genetically correlated. Beldade et al. (2002) successfully uncoupled the size of two such eye spots by artificially selecting in different directions, showing that there exists some independent, uncorrelated genetic variation for each trait and that the observed correlation need not constrain response to natural selection, but a similar attempt to decouple their color was unsuccessful (Allen et al. 2008). Theoretically, the likelihood of constraint increases with the number of intercorrelated characters, but all selection experiments to date have addressed bivariate correlations. Nevertheless, both theory and evidence suggest that estimates of genetic variances and correlations generally provide weak evidence on the strength or even existence of genetic and developmental constraints (Conner 2012).

Another kind of evidence, the genetic architecture of a trait, might at least hint at the possibility of constraints or bias on its evolution. The mutational variance of a character, the genetic variance that arises each generation by new mutations, is greater, ceteris paribus, if many genes affect its development (Lynch and Walsh 1998). Highly polygenic characters may be expected to display considerable genetic variation. Conversely, if only a few genes affect a character, the origin of a new character state might be a rarer event, and there might be less standing genetic variation, and adaptive evolution might have to wait for suitable new mutations to arise (Houle 1998). If so, the rate and possibly the direction of evolution of the trait might be limited, or at least biased, by mutation (Hartl and Taubes 2008; Stolt Even Simpson 1944), in postulating “quantum evolution,” envisioned a rapid shift between peaks in Wright’s adaptive landscape, but “in general the genetic processes involved do not permit making the step with a single leap” (p. 210). Stolfus (2006), in contrast to the view that selection generally acts on a nonlimiting pool of standing variation. A considerable number of traits, ranging from pesticide resistance in plants and insects to pelvic reduction in stickleback fish, have evolved in diverse species and populations by independent mutations of the same gene (sometimes the same base pair), suggesting that there are few possible genetic avenues to the adaptive phenotype (Wood et al. 2005; Arendt and Reznick 2008; Martin and Orgogozo 2013). The extent to which adaptation is based on standing variation or new mutations is uncertain (Barrett and Schluter 2008), but the evidence of abundant “selective sweeps” in DNA sequences, which occur when new or previously rare mutations increase fitness, suggests that new mutations might play a more important role than traditionally (and still widely) thought.

In summary, several phenomena, among which extinction is most conspicuous, strongly imply that there exist constraints on the rate and direction of adaptation, including genetic/developmental constraints. The great attention to the question of constraint is a major, valuable development in evolutionary biology. Testing the constraint hypothesis in any particular instance, however, is not easy, and the evidence to date does not yet enable us to decide on the importance of internal constraints on adaptation.

4 Challenges from Developmental Biology

As many authors have noted, from Darwin into the early twentieth century, the study of evolution was intimately related to embryology. Starting with Haeckel’s recapitulation theory, embryology was viewed as a window into the past, a way of reconstructing ancestral forms. Early leaders of genetics, such as Thomas Hunt Morgan, separated genetics from embryology, which they viewed as speculative; embryology likewise became an experimental science that rejected its speculative past and turned away from evolution, considering it not rigorous enough to be taken seriously (Smocovitis 1996, p. 193; Amundson 2005). However, comparative embryology continued as a parallel discipline: During the 1920s and 1930s, Gavin de Beer, Walter Garstang, and others dethroned recapitulation and described other categories of evolution of development, such as heterochrony (Gould 1977; Love 2003; Love and Raff 2003). The split between genetics and embryology probably affected the formation of the ST (Love 2009), which built on genetic but not developmental mechanisms.

It is often said that development was excluded from the Evolutionary Synthesis, although this claim has been contested by authors like Smocovitis (1996), Amundson (2005), and Love (2009). Mayr claimed that developmental biologists “bitterly resisted the synthesis. They were not left out of the synthesis…they simply did not want to join” (Mayr 1993, p. 32), and the developmental biologist Hamburger (1980, p. 98) noted that leading books on experimental embryology in the 1930s did not treat evolution, and that “the modern synthesis did not receive assistance from contemporary embryologists.” The “architects” of the Synthesis were certainly familiar with contemporary comparative embryology. Ford and Huxley (1929) studied the genetics of “rate factors” in a crustacean, Haldane (1932b) wrote on the evolutionary significance of the time of action of genes, and Mayr (1942) alluded to allometry and compensatory growth. Rensch (1947, 1959) treated developmental phenomena in some depth, as I have noted, and Stebbins (1950) gave equal time to developmental and selectionist interpretation of patterns of morphological evolution. Huxley, whose analysis of allometry was his chief conceptual contribution to evolutionary analysis, included de Beer in The New Systematics (Huxley 1940), and de Beer included Haldane, Huxley, and Ford in Evolution: Essays on Aspects of Evolutionary Biology (de Beer 1938). The only (or at least the only well known) experimental embryologist to address evolutionary processes (and who did not espouse Lamarckism, saltation, or vitalism) was C.H. Waddington, whose experimental studies of canalization and genetic assimilation appeared in the early 1950s, after the Synthesis. Simpson (1953a) expressed some cautious doubt that genetic assimilation is an important factor in evolution, but did not object to it in theory. Dobzhansky (1951) referred very favorably to Schmalhausen’s (1949) views on what Waddington called canalization, and in Genetics of the Evolutionary Process (1970, the sequel to Genetics and the Origin of Species) referred repeatedly to Waddington’s and Schmalhausen’s concept of canalization. He described Waddington’s genetic assimilation experiments, noting that Waddington did not interpret them in Lamarckian terms. One has the impression that the architects of the Synthesis were entirely open to admitting a role for development, but that no one stepped forward to join them as an advocate for development—especially experimental embryology.

Whatever the reasons may have been, development was not effectively assimilated into the Evolutionary Synthesis, which lacked a theory of the origin of phenotypic variation, as many authors have noted. I do wonder what kind of theory of variation could have been derived from developmental biology in the 1940s, when even the physical basis of heredity, much less the mechanisms of development, was unknown. Developmental biologists had phenomenological descriptors, with names such as embryonic induction and prepattern (just as the comparative embryologists had phenomenological descriptors such as heterochrony), but development was a black box. Kirschner and Gerhart (2010, p. 276), who have suggested ingredients of a theory of variation, write that the “Modern Synthesis did not and could not incorporate any understanding of how the phenotype is generated.” Certainly, some evolutionary biologists were sensitive to this gap. Early steps toward our growing understanding of developmental mechanisms, especially the models of gene regulation by Jacob and Monod (1961) and Britten and Davidson (1971), informed King and Wilson’s (1975) interpretation of molecular differences between chimpanzee and human, and were featured in chapters on macroevolution in the textbooks by Dobzhansky et al. (1977) and Futuyma (1979).Footnote 4 Since then, evolutionary developmental biology (EDB) has become (in my opinion) one of the most exciting dimensions of evolutionary biology. Mechanistic understanding of gene action, of regulatory circuits, of the conservation of elements in the “genetic toolkit,” and their association with different downstream genes are rapidly deepening our understanding of evolutionary changes in form (Carroll et al. 2005; Kirschner and Gerhart 2005; Stern 2011; Davidson 2011).

4.1 Structuralism: An Alternative to Variation and Selection?

Amundson (2005) places much of modern EDB in the structuralist tradition and contrasts the Synthetic (or “neo-Darwinian”) and structuralist concepts of what constitutes the process and “causes” of evolution. Neo-Darwinians, following Dobzhansky (1937), define evolution as change of gene frequencies, and the causes of evolution are therefore the factors that change gene frequencies. For a structuralist concerned with the evolution of form (organisms’ bodies), evolution is change in form, which requires change in ontogenies, the mechanical processes by which form develops. For the adaptationist, says Amundson (2005, p. 255), “Individuals don’t evolve. Populations do. Populations evolve by natural selection,” whereas the structuralist maintains that “Individuals don’t evolve. Ontogenies do. Ontogenies evolve by modifications of ontogeny.”

The distinction, then, is between change in the frequency of alleles that affect a phenotype, and the material mechanisms by which the phenotype is formed and is altered—a contrast closely related to Mayr’s distinction between ultimate and proximal causes. But the distinction between explanation by gene frequency change versus mechanism is not limited to the evolution of ontogenies and form. I noted above (Sect. 1.1) that population genetic theory lacks mechanistic content. The mutations that produce genetic variation have no molecular specification; the trait affected by a mutation is not specified; selection is represented by coefficients that are mute with regard to the ecological or internal sources of selection. Much of evolutionary biology since the 1960s has consisted of applying the abstract theory to real biological systems. A large industry describes the molecular nature of the genes and mutations that affect traits of interest. An even larger industry attempts to identify the sources of selection on life history variables, physiological and biochemical traits, behaviors, and morphological features, often by describing how variation in a trait affects fitness via its interaction with specified environmental factors. Amundson (2005, p. 176) describes a “Causal Completeness Principle,” espoused by earlier authors, according to which understanding development is a requirement for understanding evolution. I suggest that understanding developmental mechanisms is just one of the several components of a “causally complete” explanation of the evolution of form.

Practitioners and supporters of EDB are rightly enthusiastic for their subject. Some authors, however, make slightly hyperbolic claims for EDB’s revolutionary impact on evolutionary biology, either by claiming a power and prevalence of certain developmental mechanisms well beyond what current evidence supports, or by suggesting that some developmental phenomena can replace genetic variation and natural selection as explanations of the evolution of form. Müller (2010) speaks of a “shift from a predominantly statistical and correlational approach to a causal-mechanistic approach.” We can and should applaud a union of these approaches (consider the enormous benefits that have flowed from the union of evolutionary and molecular biology!), but I see no need for a “shift,” if that implies lessening the role of the one in favor of the other in explaining evolution. Developmental mechanisms (which count among Mayr’s (1961) “proximate” explanations) and population-level processes such as selection (“ultimate” explanations) are, of course, complementary. I will take that position throughout the remainder of this essay, as well as the position that as important as speculative hypotheses are in this as in all fields; a skeptical demand for evidence is also essential.

A strain persists within developmental biology that seems to echo, even if faintly, the idealistic morphology of the nineteenth century that carried over into physicalist or structuralist interpretations of development and evolution. On Growth and Form, by the anti-Darwinian Thompson (1917), was intended to show by mathematics that organisms conform to purely formal laws of growth and structure (Bowler 1983, p. 157), such conformation proving (Thompson wrote) “that a comprehensive ‘law of growth’ has pervaded the whole structure in its integrity, and that some more or less simple and recognizable system of forces has been in control”—by which Thompson presumably meant laws of mechanics. A more recent structuralist interpretation of development and evolution has been provided by Goodwin (e.g., 1984), whose position is appealing to many developmental biologists. Like many other writers, he misinterpreted the “random variation” in neo-Darwinism (i.e., the Synthetic Theory) to imply that “survival is the only constraint,” approvingly citing pre-Darwinian rational morphologists who interpreted regularities, such as the segmented body plan of arthropods, as “basic structural constraints.” He rejected Darwin’s attribution of such similarities among organisms to heredity, and in a vigorous attack on the ability of genetics to explain similarity among organisms concluded that “gene products affecting morphology are to be understood as stimuli which evoke particular categories of response from a structured, self-organizing process which has a limited repertoire of possible responses” (Goodwin 1984, p. 227). The self-organizing processes are the consequences of developmental fields, spatial domains in which “every part has a state determined by neighbouring parts,” and which are capable of reconstituting themselves if perturbed. Goodwin illustrated his point with models (e.g., by Oster et al. 1980) that describe developmental events and resulting forms, such as gastrulation and invagination, in terms of the properties of cellular elements such as cytoskeletons. He granted that the “main source of the heritable differences between multicellular organisms” surely resides in DNA (p. 236), even though he maintained, a few pages before, that “there is no way of accounting in causal terms for observed differences of form in organisms by the identification of differences in hereditary factors” (p. 219).

I am baffled by the argument that, on the one hand, genes cannot explain commonality of form among related organisms, and on the other hand that they can explain differences—especially since vast amounts of evidence attest to the role of gene activity in the formation and maintenance of phenotypes, both within and among individual organisms. If mutations of genes cause differences, how is it possible that unchanged genes should not cause unchanged, shared properties, at least in part? But my principal criticism of Goodwin’s argument, as I wrote in a review of the book in which it appeared, is that “to provide physicochemical models of developmental events is not to replace genes and selection with a sufficient physicalist theory, as Goodwin believes: obviously the constituents of organisms obey physical laws, but these laws permit innumerable developmental patterns, of which only some are permissible under natural selection” (Futuyma 1984). We see the regularities of development monstrously violated by mutations and environmental teratogens, and we see countless (but not all possible) variations of development and form, even of such supposedly fundamental processes as gastrulation, that are attributable to the action of once-mutated genes that have at least been permitted, if not fixed, by natural selection.

Physical and chemical processes are of course the proximal causes of development, and models of these processes, by Goodwin, Oster et al. and others then and now, are immensely important. They complement not only evolutionary explanations of phenotypes, but also the explanations of development, expressed in terms of gene regulatory pathways and networks (Davidson 2011), that form so much of current developmental biology and describe the genetic “algorithms” or instructions for building concrete features, but not the physical events by which the features are built. Of course, physics sets constraints, but they are broad and do not provide a sufficient account of the origin and evolution of new phenotypes.

Some authors today may disagree with that statement. Perhaps the most thought-provoking structuralist interpretations of evolution today are provided by Müller (e.g., 2007, 2010) and Newman (e.g., 2010).

Newman’s approach, in the tradition of Turing’s (1952) and Murray’s (1981) physicochemical models of animal patterns, is to show in detail that various forms and patterns, both of unicellular and simple multicellular organisms, can arise from known properties of cells and proteins. Newman’s models are intriguing and may well be an important step toward understanding the mechanisms by which some phenotypes are produced. (I am not qualified to make that judgment.) But I have deep reservations about Newman’s interpretation of the evolutionary scenarios he portrays. For example, in discussing “organismal motifs” (complex multicellular structures, as found in metazoans), he writes (Newman 2010, p. 283) that “the all but inevitable emergence, in this view, of organismal motifs that were not products of natural selection, but rather serves as its raw material, raises questions concerning both the necessity and sufficiency of the mechanisms of the neo-Darwinian Modern Synthesis for the origination of ancient multicellular forms” (my italics). The mechanisms (“dynamic patterning modules,” or DPMs) he describes include establishment of cell adhesion by cadherins and C-type lectins like those found in choanoflagellates—but note that these must have evolved during the evolution of multicellularity, perhaps in concert with other molecules, since choanoflagellates are not multicellular. Cell clusters then took on different forms, says Newman, via differential adhesion owing to differences in levels of cadherins (resulting in multilayering), lateral inhibition of neighboring cells mediated by the Notch transduction pathway (enabling coexistence of multiple cell types), cell polarity mediated by the Wnt gene family (enabling lumen formation), and other such changes. From Newman’s description of these DPMs, they all appear to involve multigene pathways, or at least regulation of expression level. In other words, they are complex characters that (presumably) did not exist as such in unicellular ancestors of metazoans (perhaps choanoflagellates): They must have arisen during the origin of protometazoans with multiple cell layers, lumens, etc., based partly on gene products in unicellular ancestors. But these gene products required modification if they were to interact in the way the components of the known DPMs do, on which Newman bases his scenario. The only known process by which such modifications can form complex, functional pathways is mutation (and recombination) of genes, coupled with natural selection.

Likewise, I am skeptical of Newman’s proposition that new forms emerged abruptly, almost saltationally, and that “since the resulting pattern or form would potentially self-organize in a significant portion of the founding population, there would be no question of a single, isolated individual” (p. 293) establishing a new lineage (one of the criticisms of Goldschmidt’s hopeful monsters). A new form might be induced by an environment in many individuals (as in one of the Newman’s scenarios), but without genetic specification of the critical components, it will persist only as long as the inducing conditions—unless “genotypes associated with increased reliability of developmental outcome” are selected, “leading to what has been termed genetic assimilation or accommodation” (p. 298). Thus, in addition to what I view as implausible origin of a morphology without the aid of genetic variation and selection, Newman must invoke another quite controversial hypothesis, genetic assimilation (which I treat in the next section). At some point, the concatenation of questionable scenarios or hypotheses should be resisted until sufficient evidence is brought to bear on them.

Müller (2007, 2010) sounds many of the same themes as Newman, mostly in the context of the more familiar realm of the development and evolution of major multicellular clades (specifically, animals). In a thoughtful review (Müller 2007) of EDB, or “evo-devo” in his paper), he analyzes the field’s major research programs, themes, and theoretical implications such as evolvability and organization, which includes features such as modularity. I agree with him that, in contrast to the theory of how genetic variation affects population dynamics (i.e., natural selection), evo-devo “does not invalidate the formal framework of the Modern Synthesis, but adds another level of explanation. The reach of evolutionary theory is expanded in that evo-devo accounts not for what kinds of variation are going to be maintained through natural selection, but also what kinds of variation can possibly arise from specific developmental systems” (p. 947). But the evo-devo that I think makes the greatest contribution to understanding evolution is not the one that “assigns much of the explanatory weight to the generative properties of development, with natural selection providing the boundary condition” (p. 947), nor the one that “posits that the causal basis for phenotypic form resides not in population dynamics or, for that matter, in molecular evolution, but instead in the inherent properties of evolving developmental systems” (p. 948). Like Newman, Müller gives lip service to the complementarity of the ST and developmental mechanism, but in effect treats them as alternative explanations.

Müller (2010) provides more concrete examples of his views in his treatment of morphological evolutionary novelties, in which he distinguishes “Type III” novelties, which are major changes of existing characters (e.g., tusks that are modified teeth), from “Type II” innovations, which are “new constructional elements that do not have a homologous counterpart in the ancestral species or in the same organism.” (The latter provision excludes serially homologous structures; thus, the paired mouthparts of crustaceans are not novelties because their ancestor had paired biramous locomotory appendages.) Examples of Type II innovations include the carapace of turtles and the patella (knee bone) of mammals. Müller describes developmental mechanisms which, when modeled, produce changes in skeletal patterns as an “emergent consequence of activation-inhibition thresholds in geometrically confined spaces,” illustrated by the loss of digits in salamanders and lizards. In his view, such evolutionary changes represent threshold responses to perturbations of “developmental systems that are characterized by cellular self-organization, feedback regulation, and environment dependence” (p. 322). Environmental induction, he says, is a realistic initiating trigger of innovations, via phenotypic plasticity, that are eventually genetically consolidated (or assimilated, or accommodated). Thus, “genetic evolution, while facilitating innovation, serves a consolidating role rather than a generative one, capturing and routinizing morphogenetic templates” (p. 323).

Again, I view this position as an unnecessary concatenation of speculations. I do not understand, for example, why so complex and unsupported a hypothesis is needed to explain the origin of the patella, which arises by osteogenesis in a phylogenetically novel location in the body. Selection of mutations in gene regulation is an alternative, simpler hypothesis. There is plenty of evidence that changes in gene regulation trigger the expression of entire developmental pathways at different times in ontogeny (resulting in heterochrony) or at different locations in the developing body (resulting in heterotopy; Baum and Donoghue 2010). Müller grants that genetic and cellular innovations permitted the formation of such novel mineralized tissues, “but the question of phenotypic novelty is why and how these processes were initiated in specific patterns and at specific locations of the vertebrate body” (p. 313). He does not entertain the hypothesis that mutant heterotopic expression of bone may have occurred a great many times, in various body locations, in diverse vertebrates, and that only those few mutant expression patterns that provided a selective advantage have been retained. The patella is one of many heterotopic bones (cf. the osteoderms of crocodiles, armadillos, and others) that have clear selective value. We must bear in mind, also, that the “inherent properties” of the developmental system themselves can evolve. It has long been known that variation in threshold traits, which display discrete or quasi-discrete states, usually has a polygenic basis, that the position of the threshold can evolve, and that the steepness of transition between states can change under artificial selection. (E.g., Suzuki and Nijhout 2006; Chevin and Lande 2013 provide a review and a model of the evolution of threshold characters from continuous variation). If a simpler hypothesis of genetic variation and natural selection can explain the observation, especially in view of abundant evidence for that hypothesis, those who propose more complex (and more vague) hypotheses should expect to be asked for evidence.

4.2 Genetic Assimilation and Accommodation

Perhaps no developmental theme has had as long an uncomfortable relationship to evolutionary theory as phenotypic plasticity. If defined as “the property of a given genotype to produce different phenotypes in response to distinct environmental conditions” (Pigliucci 2001, p. 1), plasticity can be either adaptive or nonadaptive (as illustrated by malformations and stunted growth if individuals are deprived of key nutrients during growth). The array of phenotypes that a genotype produces is the genotype’s norm of reaction. Nobody denies the abundance of adaptive plastic responses (ranging from learning to the different adult morphologies of many organisms that are triggered by environmental stimuli during development); nor does anyone deny that the mean reaction norm can evolve, based on genetic variation in reaction norms. [That is, different genotypes display different reactions to an environmental stimulus or condition, a property called G × E (genotype × environment) interaction.] Under some conditions (such as constant stabilizing selection for a single phenotype), “canalization” may occur: the evolution of a phenotype that is relatively unaffected by environmental (and perhaps also genetic) perturbations. Canalization can sometimes break down in organisms that experience a novel, stressful environment, revealing “cryptic” genetic variation. For instance, body size of marine threespine sticklebacks (Gasterosteus aculeatus) increased dramatically when fishes were reared in freshwater, to which many stickleback populations have become adapted (Mc Guigan et al. 2010). The environment, then, may be said not only to exert selection, but also to amplify the variation on which selection can act.

Phenotypic plasticity may affect evolution in a variety of ways (Ghalambor et al. 2007; Wund 2012). For example, many authors have suggested that the expression of a modified phenotype in a newly encountered environment may help populations persist until natural selection improves adaptation to the environment (see Lande 2009). The time-honored idea (e.g., Mayr 1960) that animals’ behavior may initiate a shift in ecological niche, leading to morphological and physiological adaptation, provides an important potential role for behavioral plasticity. Aubret et al. (2007) found that young tiger snakes (Notechis scutatus), a terrestrial species, could swim faster if reared for 5 months in water than on a solid surface, and they suggested that this plastic response may have facilitated the evolution of fully aquatic snakes, such as sea snakes. Aside from the fairly considerable phylogenetic distance between these taxa, we might ask whether the snakes (like human athletes) might have been trained to become more proficient at any physically possible task to which they might have been set. More studies on the possibility that behavioral plasticity initiates evolutionary change would be desirable.

I am not concerned not with the entire theme of the importance of phenotypic plasticity for evolution, but rather with a single controversial issue: the extent to which a phenotypically plastic response to an environmental stimulus becomes genetically entrained, such that the phenotype develops even in the absence of the stimulus. This is the thrust of several closely related ideas, of which Simpson (1953a, p. 110) wrote “Characters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reinforced or replaced by similar hereditary characters. That is the essence of the evolutionary phenomenon here called ‘the Baldwin effect’.” Simpson noted that this idea had been independently proposed by Baldwin (in 1896), Lloyd Morgan, H.F. Osborn, and Soviet geneticists whose ideas were promulgated by Schmalhausen (1949). Waddington (1953) introduced “genetic assimilation” to describe the genetic fixation, due to selection, of part of an originally broad reaction norm, a character state that initially required an environmental stimulus. In the most comprehensive treatment of this theme, West-Eberhard (2003) wrote that Baldwin’s hypothesis allowed for a broader range of outcomes than Waddington’s, and she introduced “genetic accommodation” to mean a variety of genetic changes, caused by selection on genetic variation, in the “regulation, form, or side effects of the novel trait” (p. 140). But the common controversial element in the Baldwin effect, genetic assimilation, and genetic accommodation is precisely what Simpson identified as “the essence” of the Baldwin effect: the evolution from an environmentally triggered individual developmental response to a similar, genetically determined phenotype. I will refer to this specific aspect of the evolution of reaction norms as genetic assimilation.

Simpson (1953a) noted that the postulated process has three elements: (1) Owing to interaction with the environment, at least some individuals develop a nonhereditary character state that is advantageous. (2) The population includes “genetic factors” that produce the same kind of individual modifications (or, as we would say today, affect the reaction norm so as to make the phenotype more likely to develop, independent of environment). (3) These genetic factors (alleles) are favored by natural selection, increase, and make the character state more hereditary. Simpson wrote (p. 113) that each of these processes, viewed individually, does occur and that all may well occur together. Thus, the Baldwin effect may well occur. “Nevertheless two points remain decidedly questionable: whether the Baldwin effect does in fact explain particular instances of evolutionary change, and the extent to which this effect has been involved in evolution or can explain the general phenomenon of adaptation.”

By whatever name, genetic assimilation is not a Lamarckian hypothesis, and it is fully compatible with the ST (cf. Lande 2009). After the 1950s, and until very recently, genetic assimilation was the subject of little research and was largely viewed as a “baroque hypothesis” (Orr 1999) that received little attention in most textbooks.Footnote 5 Recently, however, it has become a focus of intense interest. (For overviews, see West-Eberhard 2003; Price et al. 2003; Ghalambor et al. 2007; Pfennig et al. 2010; Schwander and Leimar 2011; Moczek et al. 2011; Moczek 2012; Wund 2012.) Genetic assimilation is now very popular with many evolutionary developmental biologists (e.g., Schlichting and Pigliucci 1998; Gilbert and Epel 2009; also authors in Pigliucci and Müller 2010) and is viewed by some as a major extension of, if not replacement for, the Synthetic Theory. The controversy is epitomized (and perhaps partly sparked) by West-Eberhard’s (2003) provocative proposal that “most phenotypic evolution begins with environmentally initiated phenotypic change… The leading event is a phenotypic change with particular, sometimes extensive, effects on development. Gene-frequency change follows, as a response to the developmental change. In this framework, most adaptive evolution is accommodation of developmental-phenotypic change. Genes are followers, not necessarily leaders, in phenotypic evolution” (pp. 157–158).

This proposition is frequently associated with the idea that expression of phenotypic plasticity can help populations persist in a new environment until adaptation evolves by genetic change. That is not controversial, but it is not quite the same as genetic assimilation (or accommodation) of an environmentally induced character state. After all, two different characters might be involved: Animals exposed to unusually high temperature might react via behavioral flexibility, by staying in shady microsites, while natural selection enhances thermal tolerance over the course of generations. I am concerned here only with the question of genetic assimilation: whether or not genes are usually “followers” rather than “leaders” in adaptive evolution. There is abundant evidence that reaction norms evolve, that phenotypic plasticity and canalization can be shaped by natural selection, and that genetic assimilation is possible: It has been demonstrated in artificial selection experiments (e.g., Waddington 1953; Suzuki and Nijhout 2006). The major questions today include Simpson’s queries, still unanswered, on whether or not it explains particular instances of evolution and whether it accounts, as West-Eberhard proposed, for “the general phenomenon of adaptation.” To address these issues, we should ask how a history of genetic assimilation might be detected and demonstrated and how adequate the evidence is at this time.

West-Eberhard (2003) supported her thesis by describing numerous interesting examples of closely related species or populations that differ in plasticity of one or more traits: Commonly, one species exhibits different adaptive phenotypes under different conditions, and another exhibits a relatively fixed, nonplastic phenotype. However, such examples show only that reaction norms can evolve, which is not disputed. They do not show that genes follow plasticity; on the face of it, plasticity might be the derived trait. What is needed, at the least, is evidence of the direction of change (Moczek et al. 2011): Is plasticity the ancestral condition and genetic fixity the derived state? West-Eberhard’s treatment provides little evidence on this point. Evidence on polarity of change is best sought when rapid, recent evolution has been observed or can confidently be reconstructed, or by robust phylogenetic inference (Schwander and Leimar 2011). Only recently has any such evidence been amassed.

One can envision at least three scenarios for genetic assimilation. These are not sharply demarcated.

  1. 1.

    A population with an adaptively plastic trait that experiences a variety of environments becomes subjected to a new selective regime, owing to constant exposure to one of the formerly experienced environmental states. Stabilizing selection now favors one of the previously expressed phenotypic states, resulting in abbreviation, narrowing, of the formerly broad reaction norm. Plasticity, the capacity to produce different phenotypes if exposed to environments that the population no longer experiences, may be lost if it is costly (DeWitt et al. 1998; Snell-Rood et al. 2010), or perhaps, if the population inhabits a constant environment (Moran 1992; Masel et al. 2007), by mutation and genetic drift that erode the genetic capacity to produce alternative phenotypes. Thus, the phenotype has been canalized around one of the states the ancestral population could express.

  2. 2.

    The population has an adaptively plastic trait and is exposed to a new constant environment in which a quantitatively, but not qualitatively, different adaptive phenotype outside the previous range of observed variation is induced by an environmental stimulus that is simply an extension of, or is similar to, one of the environmental states that selected for plasticity in the ancestor. The expression of this phenotype is later canalized. For example, if the ancestral population had been selected to develop larger size when exposed to lower temperature, the “novel phenotype” might be a still larger size, triggered by an unprecedentedly low temperature or perhaps a novel stimulus. In this case, the novel character state arose by plasticity and the genetic change followed, but the plastic response is an “exaptation,” a manifestation of an adaptive reaction norm that had been forged in the past, presumably by selection among genotypes with different reaction norms.

  3. 3.

    The population experiences a qualitatively novel environment that induces a novel phenotypic character state that happens to be advantageous. One possibility is that the new optimal character state is, or is close to, the extension of the ancestral reaction norm, which proves to be “preadaptive” even though the novel environment is not an extension of the range of environmental states the ancestral population experienced. As Ghalambor et al. (2007) have emphasized, however, the ancestral plasticity might well be maladaptive in a new environment: The extended reaction norm might be very different from the new optimum. In this case, plasticity, instead of facilitating adaptation to the new environment, would retard it. Such cases are not uncommon. For example, populations of humans and other vertebrates native to low elevations undergo several maladaptive acclimatization responses to low oxygen availability at high altitude, such as increased hematocrit and decreased affinity of hemoglobin for O2, in opposition to the genetic adaptations seen in adapted highland populations (Storz et al. 2010).

I consider scenarios 1 and 2 to fit well within the standard Synthetic Theory. In neither case have genes been “followers,” and for both cases represent simple modifications—abbreviation or extension—of a reaction norm that had already evolved (according to the Synthetic Theory) by selection of alleles that moved the developmental reaction closer to the optimum for an array of environments experienced by the ancestral population. The claim that genes are “followers” would receive strongest support if scenario 3 proves to be common, i.e., when a fortuitously advantageous expression of phenotype is induced in an ancestral population by a novel environment.

The number of empirical studies in which the polarity of change is known or can be inferred with reasonable confidence is too small to establish any generalizations about how common these several scenarios have been realized. I suspect that environmental induction of novel characters that are not manifestations of adaptive ancestral reaction norms (scenario 3) is likely to be rare (see also Schwander and Leimar 2011). Nevertheless, a few convincing cases have been described. In one of the clearer examples, Lédon-Rettig et al. (2010) showed that a short larval gut, a feature of the carnivorous morph of the spadefoot Spea multiplicata, can be induced by an animal diet in a related genus, Scaphiopus, which has the ancestral detritivorous habit. Freshwater populations of the stickleback Gasterosteus aculeatus have evolved from a marine ancestor and have evolved novel limnetic and benthic-feeding morphologies. Some of their features (body shape and gill raker length) were induced in experimental marine sticklebacks that were reared under conditions of diet and environmental configuration that resembled aspects of those under which the freshwater populations have evolved (Wund et al. 2012).

Some other cases are more difficult to interpret. Aubret et al. (2004) found that an island population of the tiger snake (Notechis scutatus), which feeds on larger prey than mainland populations, has a larger head. This is attributable both to a genetic difference and to greater phenotypic plasticity: Young snakes develop larger heads if fed larger prey. The reaction norm of the mainland population, presumably representing the ancestral state, displays similar, but less pronounced, plasticity. This may indicate that natural selection, based on success in prey capture, has shaped adaptive plasticity in the past and that selection in the island population has acted on genetic variation in the reaction norm.

A number of cases illustrate less ambiguously the genetic assimilation of an adaptive ancestral reaction norm (scenario 1). For example, some montane populations of Daphnia melanica that inhabit lakes to which fish have recently been introduced have lost the ability to develop pigmentation, a shield against ultraviolet radiation that also makes the animal more conspicuous to visual predators (Scoville and Pfrender 2010). Genetic assimilation has occurred, by abbreviating an adaptive ancestral norm of reaction. Similarly, there are many cases in which species with discrete alternative phenotypes, either genetic polymorphism or developmental polyphenism, have given rise to descendants with a single phenotype (Schwander and Leimar 2011). Since that state implies genetic fixity, the derivation of a monophenic form from a polyphenic ancestor can be considered genetic assimilation. Examples include loss of one of the alternative male mating strategies in insects, of the ability to develop a carnivorous phenotype in some populations of a spadefoot toad (Spea), and of polyphenism for wing development in some populations of a water strider (Aquarius remigis), as well as the transition from random bilateral asymmetry to consistent “left-handed” or “right-handed” phenotypes. Except for asymmetry, the ancestral polyphenisms are thought to represent adaptive plasticity, presumably the product of natural selection, and the evolutionary change illustrates my “scenario 1,” narrowing of an adaptive reaction norm.

Considerably, more research, on a range of taxa and characters, will be needed before we can judge whether or not plasticity often “leads” adaptation. Most of the literature to date represents a biased sample of characters, viz., those in which a role for plasticity might be suspected from the outset. It will be illuminating to determine whether plasticity has any detectable role at all for sets of randomly selected differences among closely related species. I expect that among those cases in which a history of plasticity can be shown, the great majority will be interpretable as modifications of an adaptive reaction norm that had evolved in an ancestor by the action of natural selection on genetic variation (my scenarios 1 and 2). They will be interesting and important to document, but they will not represent a significant departure from the Synthetic Theory.

4.3 Nongenetic Inheritance

Most of evolutionary theory, during and since the Evolutionary Synthesis, has been framed in terms of inheritance based on variation that (as known since 1953) resides in DNA sequence. It has long been known, however, that there exist other forms of inheritance (Bonduriansky 2012). The cortical structure in ciliates, for example, is transmitted in cell division and grows by building onto the template provided by inherited cortex. Cultural characteristics such as language and wealth are nongenetically inherited. Cultural inheritance can be viewed as an example of inheritance of environmentally caused variation. Jablonka and Lamb (2010, p. 137) use Mayr’s (1982) term “soft inheritance” to include several processes by which “variations that are the result of environmental effects are transmitted to the next generation.” Some authors (e.g., Koonin and Wolf 2009) have enthusiastically welcomed certain of these processes as a return of Lamarckism; the most enthusiastic and prolific such advocacy has been by Jablonka and Lamb (e.g., 1995, 2005). I will comment on only one of these processes, transgenerational epigenetic inheritance.

Waddington (1957) introduced the term “epigenetic” to refer to the developmental processes by which genotypes become expressed as phenotypes. Today, it usually refers to “a mitotically and/or meiotically heritable change in gene function that cannot be explained by changes in DNA sequence” (Gilbert 2006, p. 118). Some of the huge body of research on epigenesis concerns transgenerational inheritance, via meiosis. The most frequently cited molecular mechanism of genomic “imprinting” is methylation of certain cytosine residues, which generally silences the gene. The methylated state persists and is replicated in mitosis; it is usually erased in the germ line or during embryogenesis, but not always—in which case, there is transgenerational inheritance. Methylation and other epigenetic “marking” of genes is often induced by specific environmental stimuli, and often enhances fitness within that environmental context (Bossdorf et al. 2008). For example, the production of some chemical defenses in plants, which are induced by damage by herbivores or pathogens, may be epigenetically inherited (Holeski et al. 2012). It is this inheritance of a potentially adaptive phenotype by a process other than mutation of DNA sequence that stimulates Lamarckian interpretations.

Clearly, epigenesis and epigenetic inheritance are important biological phenomena that have evolutionary implications. But it is necessary to ask whether or not transgenerational epigenetic inheritance fits into or departs from the Synthetic Theory, whether it represents a true vindication of Lamarckism, and whether, and in what ways, it may be important in evolution.

Many phenomena that were not explicitly considered during the formation and early elaboration of the ST (referring here to both mathematical population genetic theory and the verbal theory that extended from population genetics to macroevolution) subsequently found a place in it quite comfortably. For example, maternal effects based on offspring effects of maternal genotype were modeled by fairly simple elaborations of traditional population genetic theory (e.g., Wade and Beeman 1994; Wolf et al. 1999), as were the evolutionary dynamics of transposable elements (e.g., Charlesworth and Langley 1989). The Synthetic Theory, formulated before Watson and Crick published on DNA, did not specify the nature of mutations. Thus, the population dynamics of epigenetic mutations (“epimutations”) can be described in the same terms as sequence mutations (Haig 2007; Slatkin 2009). Population genetic models of epigenetic inheritance and its interaction with genetic inheritance have shown some of its most interesting theoretical effects (Day and Bonduriansky 2011). As befits a hitherto unknown biological process, some potential effects were not envisioned by Fisher, Wright, or Haldane, but neither were many other evolutionary dynamics described by population geneticists since then.

The big question is whether transgenerational epigenetic inheritance is Lamarckian. The key feature of Lamarckism is the production, from within the organism (in response to some stimulus), of inherited variation that is biased, directed, toward an adaptive end. In an incisive analysis, Haig (2007), a leading researcher on evolutionary effects of epigenetics, argues that transgenerational epigenetic inheritance is not Lamarckian, even when the phenotypes expressed enhanced fitness in the environmental context that induces them. Epigenetic inheritance characterizes few genes. Therefore, some feature of the marked gene must distinguish it from others, and make it susceptible to an epigenetic mark that resists erasure in the germ line. Moreover, there is considerable evidence of genetic variation in the propensity of a gene to be methylated or otherwise marked (Dickins and Rahman 2012). Genetic variants that act as maladaptive developmental switches will be eliminated by purifying natural selection, whereas variants that enhance fitness will be perpetuated by selection. The simplest interpretation, then, of environmental induction of fitness-enhancing inherited epigenetic switches is that they are adaptations honed by the action of natural selection on genetic variation, just like adaptive, phenotypically plastic reaction norms. As Dickins and Rahman (2012) remark in their critique of the evolutionary role of soft inheritance, epigenetic systems are phenotypes, subject to the standard evolutionary processes of mutation, natural selection, and genetic drift. Haig notes that adaptive directedness, or “intentionality,” cannot be intrinsic to the epigenetic process: It must arise by some other process, and the only known candidate process is the “neo-Darwinian” action of natural selection on adaptively undirected variation that is the centerpiece of the Synthetic Theory. What we should like to have, then, is data on phenotypic effects of a large sample of novel epigenetic mutations, similar to the extensive data on de novo genetic mutations, that respond to environmental stimuli in species that have not experienced those or similar environments in their evolutionary history. The prediction is that they will show no overall tendency to be directed toward fitness-enhancing phenotypes.

Is epigenetic inheritance important in evolution? Almost surely it is, but importance can mean many things. In their population genetic models, Day and Bonduriansky (2011, also Bonduriansky and Day 2009) find a variety of ways in which epigenetic inheritance can affect the dynamics of gene frequency change; for instance, it can change the adaptive landscape, resulting in evolution toward a different genetic equilibrium. What is far from certain is that inherited epigenetic variation is the source of long-lasting adaptive phenotypes. Inheritance of epigenetic effects is frequently observed to persist for two or three generations; the highest figure I have encountered (in my limited reading) is nine generations. One of the most famous examples of an epigenetic phenotype is the “peloria” form of Linaria vulgaris, in which the normally bilaterally symmetrical flower is radially symmetrical (the phylogenetically ancestral condition) instead. This form was named by Linnaeus, and it can be found today, but there is no evidence at all that there has been unbroken descent from the mid-eighteenth century to the present time (a point that Jablonka and Lamb 2010 do not make in describing this example). The marked state of a gene is generally highly unstable, so the low fidelity of transmission will reduce the precision of adaptation (Haig 2007) and make it unlikely that an epigenetic phenotype will be fixed in a population and persist for any appreciable period of evolutionary time.

Instances of fitness-enhancing inherited epigenotypes appear to represent adaptations, not the source of adaptations. (As Dickins and Rahman 2012 remark, Jablonka and Lamb conflate proximate and ultimate causes of phenotypes.) But the adaptive epigenetic phenotype seems seldom to be stable enough to characterize an entire population. Future research might reveal, but so far I know of no evidence, that epigenetic differences distinguish different species or different populations of a single species. Despite the paucity or lack of even modest examples of epigenesis as a source of adaptation, Jablonka and Lamb (e.g., 2010) speculate at length about how this “Lamarckian” mechanism will account for adaptation (“genetic change is not necessary”), how it may accelerate adaptive evolution by enhancing the effectiveness of genetic assimilation, how incompatible chromatin marks may lower the fitness of hybrids and contribute to reproductive isolation, and how it may “play a key role in many macroevolutionary changes,” especially if hybridization and polyploidization are accompanied by bursts of epigenomic variation. The claim about adaptation is, I believe, flatly wrong. The other speculations are interesting and enjoyable to read, but it would be good to bear in mind that they are so speculative, so removed from evidence, and so lacking in any compelling, rigorous theoretical foundation that they are wildly premature.

4.4 Evolutionary Developmental Biology and Evolutionary Theory

Many evolutionary biologists react with skepticism, or outright dismissal, to great speculative leaps about the likelihood that developmental mechanisms will replace traditional explanations of macroevolution. Probably most evolutionary biologists strongly disagree with the aversion to genetics some evolutionary developmental biologists evince, and especially with their tendency to proclaim that internal powers of organisms steer their evolutionary fate—a seeming echo of the decades of widespread, deep, almost emotional aversion to Darwin’s theory of natural selection on undirected variation. Some evolutionary biologists, especially population geneticists, are inclined to dismiss EDB altogether. But that would be a great mistake, I believe, for the argument that evolutionary theory lacks but needs a theory of the origin of phenotypic variation is convincing—even obvious. As I indicated in the historical background with which this essay begins, most biologists since Darwin, including the architects of the Evolutionary Synthesis, recognized that not all conceivable variations are possible, and that taxon-specific biases or constraints must affect the likely paths of evolution. Subsequently, many population geneticists and other evolutionary biologists came closer to Gould (2002) portrayal of the Synthetic Theory: that it assumed that variation is always small in extent of change, copious in amount, and isotropic in direction. Given the evidence that new variation is limited rather than isotropic, evolutionary biology will clearly be enriched by a theory, founded in mechanistic molecular, cell, and developmental biology, of variation and how it can be shaped by natural selection into diverse, sometimes novel phenotypes.

Such a theory is under construction, with firm foundations in mechanistic biology, population genetic theory, and perhaps systems theory. Much of it stems from the discovery of phylogenetically conserved genes, chiefly regulatory genes, such as the Distalless gene, which initiates development of evaginations that form legs and other appendages in a wide range of animal phyla. These genes have often been recruited or co-opted to govern other pathways. For example, the anterior–posterior axis of all bilaterian animals is patterned by Hox genes that were recruited, much later, to pattern the proximal–distal pattern of tetrapod limbs. Animal phyla share a “genetic toolkit” of such deeply conserved genes and pathways, as Carroll and collaborators (2005), True and Carroll (2002) have called it. The remodeling of ancestral features and the origin of new ones may therefore be easier than traditionally thought, if existing genetic and developmental pathways can be expressed at different times or in conjunction with other such pathways.

A similar theme has been advanced by Kirschner and Gerhart (2005, 2010) in their theory of “facilitated variation,” expressed more in terms of cellular and developmental processes than of genes. Phylogenetically “conserved core processes,” such as the formation of the actin-based cytoskeleton, are “the basic machinery” of multicellular organisms that can be expressed, by virtue of gene regulation, in diverse contexts and in various combinations. They can “deconstrain” evolution and increase “evolvability” (the “capacity to generate heritable, selectable phenotypic variation”) partly because they consist of a set of elements that are expressed as a functional unit (that was assembled by past genetic variation and selection) and need not be separately evolved anew. Other features that enhance evolvability include compartmentation (expression only in certain parts of the developing organism) and exploration. For instance, an evolutionary change in the length of a femur entrains changes in muscles, nerves, and blood vessels, all of which grow and proliferate in diverse directions, but persist and differentiate only in proper relation to the bone; evolving a longer leg does not require independent genetic change in all these components. These developmental and cellular processes may well be adaptations, formed by an ancient history of genetic variation and selection, but they make subsequent phenotypic evolution easier than it might otherwise be. The roles of developmental processes that Kirschner and Gerhart propose do not go beyond the empirical evidence; as they note, advances in understanding the mechanisms by which phenotypes are formed “have not undermined the previous achievements of evolutionary theory” (p. 276).

Evolvability has also been explored by Riedl (1978) and by Günter Wagner and colleagues, who approach the topic via population genetic and quantitative genetic models and data (e.g., Wagner 2010; Wagner et al. 2007; Pavlicev and Wagner 2012). They have aimed at developing a theory of the evolution of the mapping between genotype and phenotype via development. For example, pleiotropy will tend to reduce evolvability (the potential of a population to evolve under natural selection) if it affects functionally unrelated characters, for a mutation that improves the function of one character is likely to damage the function of another, leading to antagonistic effects on fitness. On the other hand, pleiotropic effects on functionally related features may be more likely to have advantageously correlated effects. Population genetic models show that evolvability can evolve, in that patterns of pleiotropy can be shaped by natural selection. For example, modifier mutations can be selected that reduce the harmful effect of another antagonistically pleiotropic locus on one of the affected characters, and effectively reduce or eliminate the pleiotropic correlation between the characters, thus changing the genotype–phenotype map (Pavlicev and Wagner 2012). Consequently, modularity, similar in concept to Kirschner and Gerhart’s compartmentation, can be expected to evolve: Pleiotropy will be more frequent among functionally related characters or measurements than among unrelated ones. Pleiotropy is a major cause of genetic correlations among characters, which are estimated by the methods of quantitative genetics (Pavlicev and Wagner 2012). Studies of both genetic and phenotypic correlations in a variety of species have supported the theoretical expectation. For example, genetic correlations among various measurements of the mandible of mice decompose the structure into two modules, corresponding to the tooth-bearing part and the ascending ramus to which muscles attach. Correlations between corresponding bones of the forelimb and hindlimb are lower in humans (in which very different functions evolved relatively recently) than in other apes (in which some similarity of function, i.e., climbing, is retained). In this and many other instances, quantitative genetics can sketch the developmental map; identifying the genes and the developmental processes to which they contribute can follow.

Wagner’s theoretical approach, then, explores the far reaches of the relationship between development and evolution. Not only can we look forward to learning the developmental basis of the evolution of modified and novel characters, and how developmental processes can facilitate, bias, or constrain evolution; we may look forward to understanding how natural selection has shaped the structure of the developmental processes themselves.

5 Accounting for Diversity

I will use “diversity” to mean number of taxa (often species) and “disparity” to mean some measure of the variety of different phenotypes among the members of a clade. A large literature is concerned with accounting for differences in diversity (and with disparity to a much lesser extent) among geological time periods, among geographic regions, and among clades. Numbers of species change by speciation (by which I mean the evolution of reproductive isolation between populations) and extinction. Changes in species diversity are often analogized with population growth, so differences in diversity may be attributed to differences in available time (e.g., since a region became habitable, or since the origin of a clade), in rates of increase (speciation rate minus extinction rate), or in limiting or damping factors (e.g., interspecific competition). Rates need not be constant, of course: A mass extinction caused by a bolide impact is a great increase in extinction rate.

The field of evolutionary ecology includes extensive theory and evidence bearing on topics, such as the evolution of interactions among species, that bear on the processes that influence diversity. However, much of the theory and other discourse on diversity dynamics and differences use species as units of evolution, and does not explicitly include evolutionary (or ecological) processes within species. This includes most of ecological theory, which addresses conditions for coexistence at the level of regional assemblages, taking into account competition and interactions between trophic levels. To the traditional equilibrium theory of ecological diversity, which emphasized the importance of resource partitioning, indirect competition, and predator–prey dynamics, has been added a neutral theory, based on rates of speciation and extinction of ecologically equivalent species (Hubbell 2001). Such ecological models, in which species are the units, have been paralleled in paleobiology, in which changes in diversity have been compared with “random clades” (Raup et al. 1973) and have been explored with species-level analogs of competition between species (Sepkoski 1996). Some paleobiologists have reported that rates of change in the diversity of fossilized taxa are negatively related to diversity (Foote 2010), mirroring long-standing observations of rapid evolutionary radiations on islands and rapid increases in diversity after mass extinction events: circumstances in which competition is thought to have been alleviated. An important contribution of paleobiologists in the 1970s (Eldredge and Gould 1972; Stanley 1975) was to draw attention to species selection or clade selection (Jablonski 2008), selection above the level of the individual or the local population, which may be detected as nonrandom differences in diversification rate among clades, and may sometimes be attributed to certain characters (see below). Models of species selection can account for some evolutionary trends, especially in characters that affect speciation or extinction rates. This hierarchical approach was important in distinguishing “active” trends (a shift in the entire distribution of character states among species in a clade) from “passive” trends (in which the variance expands from a boundary, carrying with it a change in the mean) (Gould 1988; McShea 1994). A hierarchical perspective, recognizing that selection can act at multiple levels, has been invaluable for understanding macroevolutionary patterns.

Such theory, however, takes speciation rates, extinction rates, and the properties of species as given; it does not include microevolution, i.e., the evolutionary processes within species that might account for speciation and extinction. Williams (1992, p. 31), who perhaps more than anyone else is associated with the defense of individual selection and criticism of group selection, wrote that “the microevolutionary process that adequately describes evolution in a population is an utterly inadequate account of the evolution of the Earth’s biota. It is inadequate because the evolution of the biota is more than the mutational origin and subsequent survival or extinction of genes in gene pools. Biotic evolution is also the cladogenetic origin and subsequent survival and extinction of gene pools in the biota.” However, speciation is based on genetic changes within populations; extinction occurs when genetic changes (if they occur) are insufficient to enable survival of any of the organisms that make up a population or species. Ideally, a microevolutionary theory of these changes could be scaled up to describe a theory of rates of speciation, extinction, and diversification. A combination of theory and data can account for some examples of speciation and of population extinction, but we are very far from having the empirical information that would be necessary to apply such a theory on the scale of entire clades.

The possible role of species selection in shaping diversity and macroevolutionary trends is viewed by some as an extension of and challenge to the ST (e.g., Erwin 2010). However, advocates of species selection differ in whether the process is based only on features that are “emergent” at the species level (such as breadth of geographic range) or an any “aggregate” feature of the organisms that constitute the species. Few cases of species selection based on emergent properties have been identified, but many features of organisms have been identified that affect diversification rate. Such cases seem to fit squarely within the Synthetic Theory. For example, Mitter et al. (1988) introduced the method of “replicated sister-group comparisons,” in which the species diversity in lineages that possess a feature hypothesized to increase diversification, and that has evolved repeatedly, is compared with their sister groups that lack the feature. A causal role in diversification is inferred, based on the assumption that other diversity-enhancing features are randomized among the various lineages. Determining whether a difference in diversification rate resides in the rate of speciation, extinction, or both is difficult, although extinction rate may sometimes be estimated from the fossil record or perhaps from the shape of a phylogeny (a controversial procedure; see Rabosky 2010). Mitter et al. (1988) found that herbivorous lineages of insects usually have more species than their nonherbivorous sister groups. It is not yet known whether herbivorous insects are more diverse because adapting to different host plants causes rapid speciation (“ecological speciation;” Nosil 2012), because specializing on different plants reduces competition and the likelihood of extinction, or both. From such comparisons, diversification rate has been associated with many features (Coyne and Orr 2004), such as resin canals, nectar spurs and the herbaceous growth form in plants, sexual dichromatism and feather ornamentation in birds, and viviparity in fishes. (More powerful phylogenetic methods have since been developed to infer the impact of characters on rates of diversification FitzJohn et al. 2009.)

Identifying features that affect diversification rate may provide a qualitative relationship between evolutionary processes within species (microevolutionary processes) and the rate or extent of diversification, but it falls short of a functional model that would predict diversity differences in different times or places. Population genetic models and data of speciation are extensive (Gavrilets 2004; Nosil 2012), but only in the last few years have there been efforts to scale the models up to the macroevolutionary level. Using individual-based computer models of parapatric populations that adapt to a variety of multidimensional ecological niches, Gavrilets and Vose (2005) simulated adaptive radiation, and obtained results that matched empirical patterns, especially a rate of diversification that is initially high but later declines (cf. also Gavrilets and Losos 2009). In another such model, Aguilee et al. (2013) found that landscape dynamics affect diversification: In a mosaic of several habitat types, the number of ecologically divergent species is greatest if geographic barriers between habitats are alternately stronger (permitting divergent adaptation) and weaker (enabling populations to meet and evolve reinforced reproductive isolation).

Possibly, the theoretically least developed component of macroevolution is extinction. Populations that are small, for any reason, are susceptible to extinction by random fluctuation of population size, an effect that is exacerbated by accumulation of deleterious mutations. However, extinction of entire species is usually attributed to failure to adapt fast enough to a changing environment. This statement finds its theoretical expression in models of a single quantitative (polygenic) trait, in which the rate of population growth declines, and may become negative, as the difference between the trait mean and the new optimum increases. The models assume either a sudden change in environment of a specified magnitude (i.e., different between initial trait mean and trait optimum) (Gomulkiewicz and Holt 1995) or a steadily changing optimum that is tracked, with a lag, by a changing trait mean (Chevin et al. 2010). In the latter case, the rate of trait evolution after initial standing genetic variation has been depleted depends on the rate at which new genetic variance arises by mutation. Because more mutations occur in larger populations, the chance that a population survives is affected by its size. The more a population dwindles in size, the more likely it is to dwindle further.

As is true of many models, these are undoubtedly sufficient to predict survival versus extinction, when conditions meet the assumptions. They could certainly be modified for different assumptions, such as dependence of fitness on more than one character, in which case the genetic variance–covariance matrix (G) and its analog for de novo mutation (M), would be substituted for the additive genetic variance of the single character. The problem with predicting extinction of any particular population or species, or accounting for variation in extinction rate, is an empirical one, comparable to predicting the weather in New York two years from today, or accounting for the difference in the mean July temperature in two successive years: We are not remotely capable, at this time, of obtaining (or, probably, of processing) all the necessary data. If we ask, for example, what the likelihood is that the American population of the monarch butterfly (Danaus plexippus) will survive the next century of climate change, we should need to know the predicted extent and pattern of changes in temperature both in its breeding areas of North America and its overwintering areas in southern Mexico, in relation to the temperature tolerances of the relevant life history stages of the butterfly; and we need to know the magnitude of genetic variation in and genetic correlations among these several physiological measurements, as well as the rate at which these genetic statistics are changed by input of new mutations. That would require a staggering amount of research, but it would by no means be enough. The butterfly will experience other ecological changes than temperature alone: There are now and will be temperature-related changes in precipitation that can affect abundance and quality of its food plants (species of Asclepias, milkweeds) and probably the coniferous trees in the Mexican mountains where it overwinters; there will be changes in land use and in the communities of predators, parasites, and competing species. Whether or not the monarch’s current host plants can adapt to the climate change, or be replaced by northward-moving alternative species of Asclepias, and whether or not the butterfly populations have genetic variation in traits that mediate their ecological interaction with other species are unknowns that might be critical determinants of the species’ future. That is, we do not know what ecological factors are likely to require adaptation, much less the butterfly’s “evolvability” with respect to those factors. In this area, as with many aspects of evolutionary biology, we have a theory that explains extinction but has very restricted predictive value—perhaps like physics, which explains climate but is unlikely to yield precise predictions of daily weather in the long term.

6 Conclusions

Maynard Smith (1966), surely one of the most open-minded of the great evolutionary biologists, wrote “It is in the nature of science that once a proposition becomes orthodox it should be subjected to criticism…It does not follow that, because a proposition is orthodox, it is wrong.” More recently, Wagner (2010), acknowledging criticism of his ideas on the evolution of evolvability, wrote, “But, critics are good because only with relentless rational criticism will any scientific idea mature and serve the scientific community or society at large.” I think there is value in all the challenges to the ST that I have discussed in this essay, for at the very least they have forced biologists to examine and defend orthodoxy, and in almost all cases, there has been at least some supportable and valuable content in the new idea. At the same time, I have tried to be critical of these challenges, for two major reasons. First, although science depends on new ideas and challenges to orthodoxy, blind enthusiasm for new ideas can be immensely counterproductive if it is misguided, for it may consume resources, time, and at worst careers, and so the challenges themselves need to be challenged. (And not all challengers are unsung Barbara McClintocks and Alfred Wegeners; some are Velikovskys.) Second, orthodox propositions usually have staying power for good reason. The evolutionary principles articulated in the Evolutionary Synthesis displaced and vanquished anti-Darwinian ideas by force of rigorous theory and multiple lines of evidence consistent with (and in some cases rigorously testing) that theory. The claims embodied in the Evolutionary Synthesis were well founded, and hold up today to an extraordinary extent. It is, of course, inconceivable that they should be complete and sufficient in the face of the vast increase of biological knowledge, especially of molecular, developmental, and physiological processes, but they were well founded enough not to be abandoned lightly. Having considered several challenges to the explanation of macroevolution developed during the Evolutionary Synthesis, I conclude that the ST remains fairly intact, but that the challengers have advanced our understanding or at least introduced considerations worth pursuing. My specific conclusions follow.

Higher taxa, with pronounced morphological differences from related taxa, do not arise saltationally, by single “macromutations,” or reorganization of the genome. But there is no strong evidence that all character changes proceed by very slight steps, by the substitution of alleles of small effect at multiple loci. Some mutations (and genomic changes such as polyploidy) of fairly large effect are now known to contribute to evolution. It is possible that mutations of critical regulatory genes that switch on certain developmental pathways have caused large evolutionary changes, but as far as I know, this is still an open question.

The pattern of stasis punctuated by rapid evolutionary changes was wrongly interpreted to mean that natural selection cannot readily alter characteristics except via massive genetic change in small populations during speciation. However, stasis, which had been neglected before Eldredge and Gould brought it to the fore, requires explanation and is plausibly explained by fluctuating and geographically variable selection. The possibility that rapid episodes of character evolution do represent speciation, and that speciation facilitates departure from stasis, remains to be tested, but is consistent with data.

Critiques of adaptation have some validity, but have probably been overemphatic and more skeptical than warranted. Probably no evolutionary biologist has ever subscribed to the caricature of the Evolutionary Synthesis in which variation was supposed to be copious and “isotropic,” i.e., equally available for all possible modifications. Nevertheless, many evolutionary biologists have supposed that genetic or developmental constraints have been so loose as to be negligible in practice. Identification and characterization of such constraints is now a major area of interest, thanks in large part to critiques of the “adaptationist program,” and it is clear that constraints can be very important in biasing the direction of evolution or preventing adaptation altogether. Still, it remains heuristically valuable to ask what kind of selection might have impelled such evolution as has occurred, and in many (perhaps most) cases, it is likely that selection of some form has played a role. There is little reason to doubt a role for selection in the evolution of features that clearly have a close and important bearing on fitness.

The reunion of evolutionary and developmental biology, long overdue, is beginning to fill a major gap in evolutionary theory, the nature of evolutionary changes in the mapping between genotype and phenotype and the origin of phenotypic variation. Before and since the Evolutionary Synthesis, however, some developmental biologists have sought to minimize the significance of natural selection, and even of genetics, in evolution and development, by viewing the physical processes of development, and of biomolecules and cell structures, as the locus of explanation. But these are proximal explanations of form, necessary but not sufficient for explaining evolution. Proximal physical processes can constrain form and are clearly involved in the production of new forms, which cannot exist other than by physical events. But these events cannot explain the fixation of the new forms in species populations, nor the further honing of such features into more precise, effective adaptations. All proteins and cell structures produce effects by physical processes, but genetically based alteration of the proteins and structures alters the processes. Explanation by gene frequency change and explanation by changes in the material, mechanistic properties of organisms are complementary; one need not diminish the significance of the other. Natural selection on genetic variation remains the ultimate explanation of all adaptive evolution.

A reawakening of a major role of phenotypic plasticity in evolution is being presented as another challenge to orthodox theory. Most of the phenotypically plastic traits under discussion appear to be adaptations to environmental heterogeneity that have been shaped by natural selection among genetically variable reaction norms. In some cases, part of such a reaction norm (the phenotype evoked by and adapted to one of the environmental states) has been genetically consolidated or assimilated. In other cases, a more extreme phenotype, developed as a simple extension of the ancestral reaction norm, develops in response to a more extreme state of the environment. Both of these events, viewed only in the immediate context, appear to illustrate “genes as followers” of developmental phenotypic change, but in a longer historical perspective are seen to emerge as a by-product of a history of selection on genetic variation. Perhaps plasticity could be viewed as the leader, and genes as followers, when a plastically produced phenotype is fortuitously “preadapted” to a qualitatively novel environment. I suspect this occurs rarely, but it remains to be seen.

Many or most epigenetic alterations of phenotype can often be viewed as a form of phenotypic plasticity. The developmental switch is usually adaptive; it is often genetically variable, and so it presumably evolved by natural selection. Epigenetic changes that are inherited across generations can be modeled as ordinary mutations, the long-term evolutionary effect of which depends on their stability (or, conversely, on the rate of “back-mutation”) and frequency of occurrence. Their stability seems seldom to extend beyond a dozen generations or so, and no cases have yet been described in which epigenetic differences are fixed between different populations. They clearly can affect fitness and may affect immediate, local adaptation, but any macroevolutionary role has yet to be established. There is no evidence, to my knowledge, of a Lamarckian spontaneous origin of adaptively directed “epimutation” arising de novo.

In agreement with some other authors (e.g., Sterelny 2000; Minelli 2010), I conclude that the developmental phenomena described to date can readily be encompassed by the broad principles of the Evolutionary Synthesis.

Variation in rates of diversification stems from dynamics of speciation and extinction, both of which are explicable in microevolutionary terms. Indeed, the theory of speciation is far advanced, even if still controversial. However, attempts to build a theory of diversification from speciation theory have only started. The fairly minimal existing theory of extinction is surely valid, but obtaining the information necessary to predict extinction or to explain differences in extinction rates will be very difficult.

Finally, can microevolution explain macroevolution? It depends on what “explain” means. Existing theory can provide a plausible account of the history and causes of most or all evolutionary phenomena. In many but not all cases, it will be possible to derive some support or counterevidence from data. The degree of detail of the account will satisfy some, but not others: For example, there may be evidence of selection on the genes underlying a phenotype, and of the source and strength of selection, but the developmental events between gene and phenotype may be unknown. Opinion will vary on whether or not the explanation is complete or sufficient in that case. Likewise, if “explanation” requires that evolution be predictable for more than a few generations, the theory and data of microevolution will provide no more satisfying “explanation” than does physics if it is required to make long-term predictions of weather. I do not know of any macroevolutionary phenomena that are inconsistent with existing evolutionary theory, any phenomena that would require us to reject one of its principles as simply false. Nonetheless, the relative importance of many of the factors of evolution is debatable, and I assume that every part of our explanatory theory is incomplete. Of course, the Evolutionary Synthesis will be extended, molded, and modified. But there will not be a Kuhnian “paradigm shift.” Science really does accomplish something.