Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

One of the major findings of the new field of evolutionary genomics is that duplication events involving individual genes or multigene segments arise at rates comparable to the rate of mutation at single-nucleotide sites (Lynch and Conery 2000, 2003a, b), or possibly at even higher rates (Lipinski et al. 2011). Such observations lend credibility to Ohno’s (1970) early speculation that gene duplication is a major resource for the origin of evolutionary novelties. Moreover, it is now clear that whole-genome duplication (WGD) events have occurred in a wide diversity of phylogenetic lineages, including most of the model systems relied upon in molecular, cellular, and developmental biology. For example, budding yeast is a descendant of an ancient genome duplication (Wolfe and Shields 1997; see Chap. 15, this volume), as is the frog Xenopus laevis (Morin et al. 2006; see Chap. 18, this volume) and the zebrafish (Postlethwait et al. 2000; see Chap. 17, this volume). Many ray-finned fish lineages have experienced additional rounds of WGD (Meyer and Van de Peer 2005; see Chap. 17, this volume), and Ohno’s (1970) suggestion that two WGD events preceded the radiation of the vertebrate lineage has become increasingly credible (Panopoulou and Poustka 2005; Hughes and Liberles 2008; Putnam et al. 2008; see Chap. 16, this volume). Finally, three WGD events are recorded within the genome of Arabidopsis thaliana (Simillion et al. 2002), and nearly all other land-plant genomes appear to harbor a legacy of at least one polyploidization event (Doyle et al. 2008), with a proposed WGD in the ancestor of all seed plants and another in the ancestor of all angiosperms (Jiao et al. 2011). Thus, it is clear that understanding the mechanisms of origin and preservation of duplicate genes promises to reveal not only the ways in which genes acquire new functions and organisms respond to natural selection, but also the roots of organismal diversity across the tree of life.

Because genome duplication adds thousands of duplicate genes to the genome, understanding the evolutionary forces that act on individual duplicate genes is critical to our understanding of polyploidization. Processes such as neofunctionalization and subfunctionalization have the potential to influence all gene duplicates, whether created through polyploidization or smaller scale duplication events. It has become increasingly clear, however, that duplicates that arise via polyploidization are subject to unique evolutionary forces, such as increased retention due to dosage-balance constraints. Further, there may be processes that are exclusive to gene duplicates that arise via specific types of polyploidization, such as changes in duplicate-gene expression due to the genomic merger that occurs with allopolyploidization. The relative contributions of these evolutionary forces that give rise to the maintenance and evolution of duplicate genes that arise via WGD, or to the evolution of the genome or species as a whole, are currently unknown. However, discriminating between these forces and their effects is likely to be the subject of much research over the next several years.

1.2 Fates of Duplicate Genes

The fate of the vast majority of duplicate genes arising by segmental duplication is nonfunctionalization of one member of the pair (Lynch and Conery 2000, 2003a, b), and this is expected to occur within a few million years in the absence of any intrinsic advantage of a duplicate copy (Watterson 1983; Lynch et al. 2001). Despite this, most genomes that have been studied contain a large number of duplicate genes, some of which are clearly quite ancient (Lynch and Conery 2000). Based on this observation, several mechanisms have been proposed for the permanent preservation of duplicate genes (Hughes 1994; Force et al. 1999; Lynch et al. 2001; Taylor and Raes 2004; Lynch 2007; Innan and Kondrashov 2010): (1) neofunctionalization, whereby one copy acquires a novel, beneficial function at the expense of an essential ancestral function; (2) subfunctionalization, whereby complementary mutations lead to a partitioning of independently mutable subfunctions in the ancestral gene; (3) selection for increased gene product; and (4) the masking of nonfunctional alleles.

When a duplicate is maintained by selection for increased gene product, it experiences purifying selection (and may also undergo repeated gene conversion) in order to maintain its ancestral function; this process is likely responsible for the multiple copies of ribosomal RNA genes present in many genomes (e.g., Pinhal et al. 2011). Neofunctionalization, on the other hand, is thought to involve positive selection for the mutation(s) responsible for the new function, generally arising at the expense of an essential original function, thereby preserving both copies. There are many examples of neofunctionalization giving rise to novel gene functions in a variety of organisms, including Arabidopsis (Erdmann et al. 2010), fish (Ngai et al. 1993), vertebrates (Layeghifard et al. 2009), and yeast (Byrne and Wolfe 2007; Tirosh and Barkai 2007). Because one duplicate is undergoing positive selection for a new function while the other is under purifying selection to maintain the ancestral function, asymmetric evolutionary rates between duplicates are often thought to be a hallmark of neofunctionalization (Johnson and Thomas 2007; Han et al. 2009), though purely stochastic mechanisms can also give rise to apparent rate asymmetry (Lynch and Katju 2004).

Subfunctionalization may involve positive selection acting on both duplicates if the partitioning of the ancestral functions leads to relaxation of pleiotropic constraints, enabling each ancestral function to be fine-tuned and improved through mutation independently in each copy (Piatigorsky and Wistow 1991; Hughes 1994; Des Marais and Rausher 2008). Alternatively, subfunctionalization may be a completely neutral process if each duplicate copy simply acquires a degenerative mutation that renders it unable to perform one of the ancestral functions (Force et al. 1999). At this point, both copies are needed in order to provide the organism with all of the functionality of the original, single-copy gene, and so both will be maintained in the genome by selection. Although identifying definitive cases of subfunctionalization requires determining that the ancestral gene carried multiple functions that have been partitioned in the daughter duplicates, there are nonetheless several compeling examples (e.g., Force et al. 1999; Altschmied et al. 2002; Yu et al. 2003; Adams and Liu 2007; MacNeil et al. 2008; Semon and Wolfe 2008; Buggs et al. 2010; Deng et al. 2010; Hickman and Rusche 2010; Colon et al. 2011; Froyd and Rusche 2011).

In addition to these cases of qualitative subfunctionalization, where duplicates eventually come to be expressed in different tissues or at different times or carry out different functions relative to each other, quantitative subfunctionalization, in which reduction-of-expression (Force et al. 1999) or activity-reducing mutations (Stoltzfus 1999; Scannell and Wolfe 2008) affect both duplicates, is also possible. In quantitative subfunctionalization, both duplicates acquire partial loss-of-function mutations that affect the same function, again rendering both copies essential for the proper dosage or activity of the gene products. In this case, both copies are preserved while retaining the ancestral gene function. Although few studies demonstrating quantitative subfunctionalization exist, Qian et al. (2010) estimated that this process has been responsible for the maintenance of a large proportion of duplicates in yeast and mammals, whereas Woolfe and Elgar (2007) postulated that sequence evolution in cis-regulatory elements may have caused quantitative subfunctionalization among Fugu duplicates.

A related consequence of gene duplication is that it can allow for the differentiation of multimeric subunits, such as the evolution of heterodimers from homodimers. Consider a gene whose protein product forms a homodimer. After duplication of this gene, protein subunits produced by the two duplicates (denoted A and B) may randomly associate to make mixtures of dimers in the ratio 1 AA: 2 AB: 1 BB. If the duplicate genes are identical initially, the AA, AB, and BB dimers will be identical as well. However, subsequent differentiation of the duplicate genes causes the three types of dimer to become distinct. This differentiation could be neutral, or it could be selective. If, for example, there were pleiotropic constraints on the form or function of the pre-duplication homodimer, duplication could allow for escape from these constraints in the AB heterodimer, as each subunit (A and B) can now evolve independently. This can be viewed as a special type of subfunctionalization of duplicates. Winter et al. (2002) showed that a class-B floral protein heterodimer had evolved from an ancestral homodimer via this mechanism during the gymnosperm/angiosperm transition. In gymnosperms, GGM2-like genes form homodimers, while the duplicated homologs in eudicots, DEF-like genes and GLO-like genes, form heterodimers. Monocots also have duplicated DEF-like genes and GLO-like genes, but, interestingly, it appears the GLO-like proteins of monocots can both homodimerize and heterodimerize with DEF-like proteins, perhaps representing the transition between the homo- and heterodimerized states (Winter et al. 2002; Kanno et al. 2003; Soltis et al. 2006).

A similar process appears to have occurred several times in the evolution of the DUF606 family of transmembrane proteins in bacteria (Lolkema et al. 2008). In bacteria with a single DUF606 gene, the DUF606 proteins are able to insert into the membrane in both orientations, and functional homodimers are formed by two subunits in opposite (antiparallel) orientations. Other species of bacteria, however, have duplicated DUF606 genes located tandemly in an operon. In all of these latter cases, the two protein subunits each have a fixed but opposite orientation in the membrane, and they heterodimerize to form the necessary antiparallel two-domain complex. A phylogenetic analysis of the DUF606 gene family reveals that this process of duplication followed by heterodimerization likely occurred five different times in the history of this gene family lineage (Lolkema et al. 2008). Other proposed examples of this mechanism include SMC proteins (Surcel et al. 2008), adenylyl cyclases (Sinha et al. 2005), and mitochondrial peptidases (Brown et al. 2007), all gene families that contain duplicates that form heterodimers in eukaryotes (or eukaryotic mitochondria) with single-copy homologs that form homodimers in prokaryotes.

1.3 Fates of Duplicate Genes Arising via WGD

In addition to the general preservational processes just mentioned, paralogs resulting from WGD events are subject to unique mechanisms of duplicate-gene maintenance and evolution (Force et al. 1999; Lynch and Conery 2000; Yang et al. 2003; Davis and Petrov 2005; Veitia et al. 2008). Well-studied polyploid species commonly exhibit 25–75 % retention of paralogous gene pairs from the most recent WGD event (reviewed in Lynch 2007; Otto 2007), budding yeast being an exception with only ~8 % duplicate-gene preservation (Wolfe and Shields 1997). These are surprisingly high preservational levels, when, as discussed above, the fate of the vast majority of duplicate genes arising by segmental duplication is nonfunctionalization of one duplicate (Lynch and Conery 2000, 2003a, b). Although it is possible that many polyploid species have not yet reached equilibrium and are still in an ongoing phase of duplicate-gene loss, it has become increasingly clear that there are likely to be additional forces acting to preserve duplicate genes arising via WGD.

A simple explanation for the large number of preserved duplicates within polyploids is that, unlike single-gene duplicates, WGD duplicates exhibit complete conservation of surrounding regulatory sequences, chromosomal environments, etc. Although this likely contributes somewhat to the pattern of higher duplicate retention in polyploids, it does not explain the observation that different types of genes seem to be preserved following WGD compared to smaller scale duplications. This fact can be better explained by selection for dosage balance among proteins. Due to stoichiometric relationships with other interacting genes (e.g., multi-subunit complexes and numerous pathways involved in metabolism and transcriptional regulation), the functions of a subset of protein-coding loci can be highly influenced by dosage imbalances (Veitia 2002; Papp et al. 2003; Birchler et al. 2005; Veitia et al. 2008). In such cases, duplication of just a single member of a gene interaction may be detrimental and actively selected against. In contrast, following a WGD event, most stoichiometric relationships are initially intact, and therefore subsequent losses of interacting paralogs will be inhibited by selection for proper dosage relationships. Thus, for dosage-dependent genes, the dosage-balance hypothesis predicts an under-representation among duplicates created by single-gene duplications, but an over-representation among those created by WGD (Yang et al. 2003; Davis and Petrov 2005; Veitia et al. 2008). For example, Davis and Petrov (2005) showed that the pool of preserved duplicates from the WGD event in S. cerevisiae is enriched for ribosomal genes (which form a large complex) and regulatory genes encoding transferases, kinases, and transcription factors, while those involved in ion transport are under-represented. Likewise, the Paramecium tetraurelia genome exhibits elevated retention of duplicate genes involved in known complexes (Aury et al. 2006) and in metabolic pathways (Gout et al. 2009). As in yeast, ribosomal genes, transferases, and kinases are over-represented among surviving paralogs, while ion-transport genes are underrepresented. In Paramecium, there also appears to be an additional effect whereby highly expressed genes are over-retained in duplicate following the most recent polyploidization event (Gout et al. 2010). That certain types of genes are maintained preferentially following a WGD has achieved fairly convincing empirical support from other studies as well (Papp et al. 2003; Yang et al. 2003; Barker et al. 2008; Liang et al. 2008; Qian and Zhang 2008; Edger and Pires 2009), including studies in Arabidopsis (Blanc and Wolfe 2004; Maere et al. 2005; Thomas et al. 2006), vertebrates (Makino and McLysaght 2010), and across divergent species (Paterson et al. 2006). Selection to maintain dosage balance following WGD has also been hypothesized to be the driving force behind the original selective advantage of the WGD in the Saccharomyces cerevisiae lineage (Conant and Wolfe 2007). In this scenario, the maintenance of glycolytic genes and the loss of non-glycolytic genes following WGD might have increased the relative dosage of glycolytic genes, thereby increasing flux through the glycolysis pathway and providing polyploid yeast with a growth advantage over non-polyploids due to increased glucose fermentation ability.

Duplicate genes that arise via WGD are further unique in that entire (or partial) duplicated pathways or networks of interacting proteins can diverge in concert. For example, Evlampiev and Isambert (2007) modeled the evolution of protein–protein interaction networks following WGD and concluded that such networks grow under exponential, rather than time-linear, dynamics following WGD. Interestingly, they also found that these exponential dynamics relied on asymmetric divergence between duplicates.

Another intriguing possibility is that following WGD, a whole ancestral network may become neofunctionalized or subfunctionalized following polyploidization, with one set of paralogs carrying out one task or reaction and a parallel set of paralogs carrying out a related, but largely independent, task. Obviously, such innovations require the establishment of multiple mutations and the avoidance of pathway crosstalk. Although the essential population genetic theory remains to be worked out, several examples of such paralog coevolution appear to have followed the WGD in yeast: parallel paralogous networks have been identified where the expression of each gene is highly correlated with the other genes within its network but poorly correlated with its paralog (Blanc and Wolfe 2004; Conant and Wolfe 2006). In this way, polyploidy provides a unique mechanism for the evolution of gene networks with new (or subdivided) functions.

A final consideration in duplicate-gene evolution is whether the forces that act to preserve duplicates change over evolutionary time. For example, it seems possible that following WGD, a large proportion of genes could be initially maintained due to dosage-balance constraints. Subsequently, however, over longer periods of evolutionary time, some duplicates might accumulate mutations that could lead to neofunctionalization or subfunctionalization. Because these genes are dosage sensitive (hence their initial preservation due to selection for dosage balance), it is likely that such neo- or subfunctionalizing mutations would need to be preceded or rapidly followed by mutations affecting the dosage of one or both copies. For a more detailed example, imagine proteins A and B that must interact in a 1:1 ratio for proper functioning. Both genes become duplicated during a WGD, giving rise to duplicates A1 and A2 and B1 and B2. Initially, all four genes are preserved by selection for dosage balance, as loss of any one gene interrupts the 1:1 interaction ratio. Over evolutionary time, however, slightly deleterious mutations in the A1 promoter that decrease its expression level become fixed due to drift. To compensate, mutations in the A2 promoter that increase its expression level are fixed, which helps to restore the 1:1 A/B ratio. At this point, A1 is contributing fewer products to the overall A protein pool. A subsequent mutation that changes the function of A1, allowing it to take on a new role completely, is now more easily accommodated, as A2 is better able to compensate and take on the full load of the ancestral A activity. Note that, instead of A1 and A2 dosage evolving in concert, as above, A1 and B1 dosage could also evolve in concert to maintain the proper 1:1 A/B ratio, allowing both A1 and B1 to take on new functions.

While still just a verbal theory, this scenario has two advantages in terms of allowing for neofunctionalization (or subfunctionalization) of WGD duplicates. First, there is a longer time frame in which neo- or subfunctionalizing mutations can arise, as duplicates are maintained for longer time-scales without becoming nonfunctionalized. This is important because neofunctionalization requires the accumulation of beneficial mutation(s), which are thought to be rare. Second, this process would allow for neo- or subfunctionalization of dosage-sensitive duplicates, both of which might otherwise be constrained to maintain their ancestral function indefinitely following WGD.

1.4 Autopolyploidy Versus Allopolyploidy

Whether polyploidization occurs by autopolyploidy or allopolyploidy can have a significant impact on the expression and evolution of duplicate genes. Autopolyploids arise when there is an increase in ploidy within a single species (often within a single individual), while allopolyploids are created by hybridization between two different species, each of which contributes a full complement of chromosomes to the hybrid, thus doubling the genome (reviewed in Coyne and Orr 2004). Many plant and frog polyploids are the result of allopolyploidization (Adams 2007; Evans 2008), while the yeast WGD appears to have been an autopolyploidization event (Scannell et al. 2007), although in practice it is difficult to ascertain the ancestral state once paralog divergence has become high.

It has long been assumed that autopolyploids would initially form multivalents at meiosis, with all four homologous chromosomes pairing randomly, while allopolyploids would be more likely to form bivalents, with homologous chromosomes from each diploid ancestor pairing independently. This would mean that duplicate copies in autopolyploids would not represent true paralogs as the term is usually understood, but would instead represent a doubling of the number of homologs (i.e., four homologs instead of two). The presence of multivalents is significant biologically, as multivalent pairing can lead to intergenomic recombination via segregation, crossing-over, and double reduction. Certain duplicates from one diploid parent could be lost completely via this process, leaving only duplicates from the other diploid parent. This would not represent gene silencing as it is typically understood then, but would rather be a byproduct of multivalent formation and segregation. Evidence from plants indicates that multivalent pairing is indeed more prevalent among autopolyploids, though the difference between the two forms of polyploidy is perhaps less than originally expected: a survey of plant polyploids indicated that the mean percent occurrence of multivalents is 28.8 % in autopolyploids and 8.0 % in allopolyploids (Ramsey and Schemske 2002). Although multivalent formation occurs at a lower rate in allopolyploids, it may be more biologically significant than multivalent formation in autopolyploids, as intergenomic recombination is likely to have a greater effect when genomes are more divergent. Over time, divergence between duplicated chromosomes would lead to increased bivalent formation.

Because allopolyploids are the result of a genomic merger between two species, duplicate genes in allopolyploids are already differentiated to some extent immediately after polyploidization, while duplicates in autopolyploids are likely to be more similar in sequence and may even be identical. Allopolyploids often exhibit immediate changes in gene expression due to the genetic differentiation present between homeologs. This can lead to changes in methylation (Salmon et al. 2005; Gaeta et al. 2007), changes in heterochromatin formation and transposable element suppression (Josefsson et al. 2006), biased expression of homeologs (Adams et al. 2003; Bottley et al. 2006; Tate et al. 2006; Udall et al. 2006; Rapp et al. 2009), and non-additive expression effects between homeologs (Hegarty et al. 2006; Wang et al. 2006; Rapp et al. 2009). These initial expression differences between homeologs can, in turn, impact the long-term evolution of duplicates, as selection pressures may be expected to act differently on genes that are differentially expressed. For example, Anderson and Evans (2009) showed that in octoploid and dodecaploid Xenopus species, paralogs of RAG1β were more likely to become pseudogenized than paralogs of RAG1α (the homeolog of RAG1β from an earlier allopolyploidy event), and they inferred that this was due to differences in ancestral expression between RAG1α and RAG1β.

Many of these effects seen in allopolyploids are believed to be due to the hybridization between two divergent genomes, rather than genome doubling per se. Flagel et al. (2008) estimated that of the genes with biased expression between homeologs in the allopolyploid Gossypium hirsutum, 24 % exhibit a bias due to the genomic merger (i.e., the bias existed immediately when the allopolyploidization occurred, at time zero), while the bias in the remaining 76 % is due to long-term evolutionary forces such as neofunctionalization and subfunctionalization. The relationship between the magnitude of these alterations in gene expression and the genetic divergence between the two parental genomes is not well understood, however, as demonstrated by Brassica allopolyploids (Pires and Gaeta 2011). While the parental species that gave rise to the allopolyploid Brassica napus are more similar to each other than those that gave rise to B. juncea, resynthesized B. napus polyploids exhibit more genomic rearrangements, changes in gene expression, and epigenetic alterations than do resynthesized B. juncea polyploids.

1.5 Polyploidization and Speciation

Perhaps the most pivotal role that polyploidization plays in evolution is in the creation of new species. The polyploidization event itself can lead to instantaneous reproductive isolation and speciation, as the cross between a new tetraploid (4n) and its diploid progenitor (2n) yields triploid (3n) offspring, which are often sterile due to problems with chromosome pairing/segregation during meiosis and the production of aneuploid gametes (reviewed in Coyne and Orr 2004). It is for this reason that models predict that species capable of self-fertilization are more likely to give rise to a successful polyploid lineage (Rodriguez 1996; Baack 2005; Rausch and Morgan 2005).

Perhaps more importantly, however, once a polyploid lineage is established, subsequent silencing of duplicate genes can lead to further reproductive isolation among subpopulations of the polyploids themselves and, therefore, give rise to additional daughter species (Oka 1988; Werth and Windham 1991; Lynch and Conery 2000; Lynch and Force 2000). In this model, we assume a pair of fully functional and redundant duplicate genes, A and B, in an ancestral population, such that each member of the initial population has the genotype AABB (Fig. 1.1). If two subpopulations become geographically isolated and one duplicate becomes nonfunctionalized in each subpopulation, there is a 50 % probability that a different duplicate copy will be lost in each of the two groups. This reciprocal gene loss (or divergent resolution) would result in the genotypes aaBB and AAbb for the two subpopulations, where a and b denote null alleles. Hybridization between the two groups would then lead to offspring with the genotype AaBb. Gametes produced by these F1 individuals would have a 1/4 probability of carrying an ab genotype and would therefore be inviable if a functional copy of the A/B gene were essential for gamete survival or function. Even if this were not the case, 1/16 of the F2 individuals would have the aabb genotype and, if a functional copy were essential for zygote viability or sterility, would be inviable or sterile, whereas another 1/4 would have three null alleles and might experience reduced viability or sterility. Up to 50–65 % of the genes encoding transcription factors, membrane receptors, and members of macromolecular protein complexes are estimated to be haploinsufficient (Jimenez-Sanchez et al. 2001; Veitia 2002), suggesting that only one functional allele of such genes is indeed likely to be deleterious.

Fig. 1.1
figure 1

Divergent resolution of duplicate genes can lead to hybrid sterility/inviability and reproductive isolation. Red bar represents a gene that becomes duplicated. See text for details

An appealing aspect of the divergent resolution model is that it is a natural consequence of degenerative mutations, requiring no adaptive evolution at the molecular level for speciation to occur. Moreover, in addition to genes that are reciprocally silenced, duplicate pairs that undergo neofunctionalization or subfunctionalization may also contribute to hybrid sterility/inviability in a similar fashion—for example if a different duplicate becomes neofunctionalized and loses its ancestral function in each subpopulation, or if the two duplicates become subfunctionalized in complementary ways in the two subpopulations (Lynch and Force 2000).

The process of reciprocal gene loss has been shown to be responsible for male sterility between hybrids of Drosophila melanogaster and D. simulans (Masly et al. 2006). A gene essential for male fertility, JYAlpha, is located on the fourth chromosome in D. melanogaster and on the third chromosome in D. simulans. This translocation presumably occurred via duplication of the JYAlpha gene and subsequent silencing of one copy. The difference in chromosomal location of the gene in the two species causes a proportion of hybrids to completely lack JYAlpha, leading to their sterility.

Two similar cases were recently identified in rice. The first involves reproductive isolation between two subspecies of Oryza sativa. The ancestral O. sativa genome appears to have had a pair of duplicates termed DOPPELGANGER1 (DPL1) and DOPPELGANGER2 (DPL2) (Mizuta et al. 2010). The subspecies japonica and indica have experienced independent losses of one copy each: DPL1 has become a pseudogene in indica, while DPL2 has been nonfunctionalized in japonica. Hybrid pollen lacking a functional copy of either DPL1 or DPL2 is nonfunctional and does not germinate, contributing to the partial reproductive isolation present between the subspecies. This validates an earlier hypothesis by Oka (1988) that F1 sterility between japonica and indica was caused by “duplicate gametophytic sterility genes”, japonica being homozygous for one nonfunctional copy and indica being homozygous for another nonfunctional copy. In the second rice example, reciprocal loss of one of the duplicated nuclear genes encoding mitochondrial ribosomal protein L27 in O. sativa and O. glumaepatula again causes a proportion of the pollen produced by F1 hybrids to be sterile (Yamagata et al. 2010).

The final example of reproductive isolation through reciprocal gene loss comes from A. thaliana, where the histidinol-phosphate amino-transferase gene appears in different chromosomal locations (as in the Drosophila example, presumably via duplication and subsequent silencing of one copy) in the Columbia and Cape Verde Island accessions (Bikard et al. 2009). F2 offspring homozygous for both null alleles completely lack the gene’s product, HPA, which results in arrested embryo development and seed abortion. In addition, in at least one intermediate heterozygote, a quantitative phenotype termed “weak root” was observed, suggesting that the presence of three null alleles is somewhat deleterious in this cross. As these four examples constitute ~1/3 of the dozen or so successful searches for the genes underlying the speciation process (most in Drosophila species; Presgraves 2010), there now seems little question that the passive nonfunctionalization of duplicate genes is a major mechanism of speciation.

These examples demonstrate that the divergent resolution of even one duplicated gene can lead to detectable reproductive isolation. However, genetic incompatibility between two populations can be magnified substantially when reciprocal gene loss occurs at hundreds or thousands of duplicated loci simultaneously, as is the case in polyploid lineages. The probability that an F2 offspring obtained by outcrossing will be double null for at least one of n pairs of divergently resolved loci is 1−(15/16)n, which takes on values of 0.063, 0.276, 0.476, and 0.998 for n = 1, 5, 10, and 100, respectively. Moreover, in species that undergo autogamy or selfing, such as Paramecium, this probability can be as high as 1−(3/4)n, giving probabilities of 0.250, 0.763, 0.944, and ≈ 1 for n = 1, 5, 10, and 100. Speciation events will continue to occur as long as duplicates are still being resolved between subpopulations, leading to nested rounds of speciation, and, because a large number of duplicates are thought to be silenced quickly following WGD (Scannell et al. 2006), a cluster of speciation events might occur within a brief period of time. The net result is the expected generation of a species radiation following a WGD event.

It has been suggested that this nested speciation process might be responsible for the radiations of the polyploid yeast species (Scannell et al. 2006), teleost fishes (Semon and Wolfe 2007; see Chap. 15, this volume), angiosperms (Soltis et al. 2009; though see Mayrose et al. 2011), and the Paramecium aurelia species complex (Aury et al. 2006). This mechanism may also be responsible for reproductive isolation between mutagenized lines of an experimentally derived allotetraploid created by hybridizing two species of Saccharomyces (Maclean and Greig 2010).

1.6 Unsolved Problems

The maintenance of duplicate genes via selection for increased gene product, neofunctionalization, and subfunctionalization has been hypothesized for nearly 40 years (Ohno 1970). Recent genetic and genomic data have now identified compelling examples of these processes and have further contributed to our understanding of the prevalence of whole-genome duplications and the dosage-balance theory of duplicate maintenance. However, a number of unresolved questions related to WGDs and duplicate maintenance merit further scrutiny.

The first avenue for future study involves a more comprehensive understanding of the relative importance of the forces behind duplicate-gene maintenance, including maintenance for increased dosage, dosage-balance constraints, neofunctionalization, and subfunctionalization. All of these mechanisms have been demonstrated to be responsible for duplicate maintenance in certain cases, but it remains unclear which, if any, is responsible for maintaining the majority of duplicate genes or how such contributions vary among phylogenetic lineages. Most likely, there will be no single driving force for duplicate maintenance but the relative strength of these forces will differ among taxonomic groups or among functional classes of genes. For example, subfunctionalization of duplicates may be more likely within species that have evolved a modular (and therefore independently mutable) regulatory structure. Such modular systems are predicted to arise more easily within species with smaller population sizes (Force et al. 2005), demonstrating how species-level features of an organism may influence the evolutionary forces acting upon duplicate genes. More comprehensive studies of large numbers of duplicates from a variety of organisms are required to address what other features might influence the relative strengths of mechanisms of duplicate maintenance. Such studies must not only detail what genes remain duplicated versus single-copy, but must also detail whether existing duplicates have the same function as each other (to assess rates of neofunctionalization) or share functions with the pre-duplicated ancestor (to assess rates of subfunctionalization) (Fig. 1.2).

Fig. 1.2
figure 2

Distinguishing between the evolutionary forces that maintain duplicate genes. Panel A shows the expression level of a gene (purple) across different conditions or tissues before duplication. The bottom six panels (B–G) show patterns that might be seen for the two copies (red and blue) once the gene has been duplicated and what evolutionary processes these patterns would indicate. Note that panel D might indicate maintenance for increased dosage in the case of a single-gene duplicate or maintenance for dosage balance in the case of a duplicate arising via WGD

There are few data on the rate of duplicate-gene loss over time following a WGD, though data from polyploid yeast species suggest that the rate changes over time (Scannell et al. 2006). Data from additional taxa would aid in determining whether this is a general pattern among all WGDs (Fig. 1.3). A related unresolved question is whether the evolutionary forces controlling duplicate maintenance change over time following a WGD, e.g., whether a dosage-sensitive gene may initially be preserved due to selection for dosage balance, but then evolve a new function concurrent with its release from such dosage constraints. An analysis of this question could be made by comparing the fates of duplicate genes in multiple lineages descended from a single WGD event. Such an analysis might identify duplicates that had been maintained due to dosage constraints in the majority of daughter lineages but that had become neofunctionalized in one lineage, perhaps suggesting a secondary mechanism of retention.

Fig. 1.3
figure 3

Determining the rate of duplicate-gene loss over time. a An example of a tree for four species that share a whole-genome duplication (WGD). Duplicate-gene presence/absence information for each of the four species could be used to infer the number of duplicate genes lost on each of the branches labeled A, B, C, and D. b The data on gene retention/loss gathered in (a) could be used to plot the percentage of duplicate genes lost per unit time (or divergence). Depending on where points A, B, C, or D land on the graph, the data may indicate that the rate of gene loss remains constant (top dashed line) or changes (bottom dotted line) over time

The unsolved question that promises to be the hardest to answer is why certain lineages or taxonomic groups appear to contain more WGD events than others. This is not the same as asking why certain groups contain more polyploid species, as this may be a simple reflection of the fact that WGD may promote subsequent reproductive isolation and speciation. The pattern remains, however, that some phylogenetic groups seem to contain more independent WGD events than others in their evolutionary pasts. For example, a recent analysis estimated that among ferns, 31 % of speciation events involve polyploidization, while the value for angiosperms is only 15 % (Wood et al. 2009). Similarly, in the history of the Xenopus lineage, there are many more instances of WGD events than compared to, say, mammals. Several factors could contribute to such patterns, such as the ability to hybridize and form allopolyploids, or the ability to self-fertilize (at least transiently) or undergo asexual reproduction, which helps a polyploid lineage become abundant in a surrounding world of diploids. It is not even understood whether mechanistic reasons (at meiosis, say) or differences in developmental programs would facilitate or hinder creation of a viable polyploid in certain lineages, or whether discrepancies in ecological persistence of polyploid species alone are able to explain the patterns that we see. Perhaps the best way to approach such a question is to study closely related lineages where one exhibits several WGD events and the other does not, though teasing apart the mechanistic and ecological differences is certain to remain a challenge for decades to come.

1.7 Conclusions

Whole-genome duplications are widespread across the tree of life and appear in the evolutionary history of a large number of model organisms. Processes such as neo- and subfunctionalization affect retention of individual gene duplicates, and dosage-balance constraints promote the retention of large sets of genes following polyploidization. Allopolyploidization, through hybridization and subsequent changes or biases in homeolog expression, has the ability to instantaneously create a population of individuals that are ecologically and epigenetically unique from either parent lineage, providing a new lineage upon which natural selection can act. Both allo- and autopolyploidization provide a unique opportunity for the differentiation of new gene networks and pathways through concerted evolution of duplicated, interacting proteins. Most importantly, WGD can lead to reproductive isolation through divergent resolution of duplicated genes, thus creating new species and species groups. Further understanding of the relative importance and the temporal properties of the forces acting on polyploid species and the duplicate genes within their genomes promises to enhance our knowledge of the origins of species as well as genetic, protein network, and organismal complexity.