Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

I. Introduction

The classical endosymbiont hypothesis for the origin of mitochondria and plastids (chloroplasts) posits that on two separate occasions, free-living prokaryotes – from within the bacterial groups α-Proteobacteria and Cyanobacteria, respectively – entered into increasingly intimate associations with eukaryotic host cells, ultimately evolving into the well-studied sub-cellular organelles they are today. Over time and with much data accumulated, this hypothesis has risen to the status of theory; that these quintessential membrane-bound organelles are derived from prokaryotes is now considered textbook ‘fact’. In this era of high-throughput genomics and proteomics, how much do the data still support this classical view?

This chapter provides an overview of the ‘primary’ endosymbiotic origins of mitochondria and plastids and in so doing sets the stage for the chapters that follow. As we shall see, many aspects of the classical endosymbiont hypothesis still hold true for both organelles, but others have changed significantly in response to a wealth of comparative data from diverse eukaryotic lineages. This is especially true in the case of mitochondria, where recent years have seen numerous alternative evolutionary scenarios proposed. Mitochondria are still believed to have evolved before plastids, but there is no evidence supporting the existence of an amitochondriate phase in eukaryotic evolution. Indeed, it is possible that the origin of the mitochondrion was contemporaneous with the origin of the eukaryotic cell itself.

II. Mitochondria

A single origin of mitochondria from within the domain Bacteria (eubacteria) is now widely accepted and overwhelmingly supported by a variety of evidence, in particular molecular ­evidence that has accumulated over the past four decades. At the heart of this conclusion is the demonstration that the mitochondrial genome is clearly the remnant of a eubacterial genome, arising specifically from within the α-class of the phylum Proteobacteria (Alphaproteobacteria, also known as α-Proteobacteria). Data underlying this conclusion have been extensively reviewed elsewhere (Gray and Doolittle 1982; Gray 1989, 1992, 1993, 1999; Gray and Spencer 1996; Lang et al. 1999a; Gray et al. 2001), and are consistent with the long-standing endosymbiont hypothesis for the evolution of mitochondria. Indeed, the most gene-rich and ancestral (least derived) mitochondrial genome known, that of the protozoon Reclinomonas americana, resembles to a striking degree a miniaturized eubacterial genome, retaining distinct traces of its α-proteobacterial genomic heritage (Lang et al. 1997).

Although the α-proteobacterial ancestry of the mitochondrial genome seems firmly established, our understanding of the evolutionary route from eubacterium to mito­chondrion is still murky. It is somewhat disconcerting to realize that the appearance of new molecular data, particularly genomic, phylogenomic and proteomic, has muddied the waters with respect to mitochondrial origin and evolution, rather than clarifying the issue. The bottom line is that we are rather less certain today than we were two decades ago that the ‘classical’ endosymbiont hypo­thesis, which posits an amitochondriate eukaryotic host cell taking up a bacterial symbiont (Margulis 1970), provides a correct – or at least fully accurate – description of the origin of mitochondria. One reason for this reservation is that although the mitochondrial genome is indisputably α-proteobacterial in evolutionary origin, most of the constituents of the mitochondrial proteome – the collection of proteins that constitute the functional organelle – are not (Gray et al. 2001). Thus, accepting that an α-proteobacterial endosymbiont was the ancestor of mitochondria, the evolutionary re-modeling that occurred subsequently has obviously been so extensive that the contemporary mitochondrion would best be described as a genetic and functional mosaic (Szklarczyk and Huynen 2010).

Another complication is that no extant eukaryotic lineages have been discovered that can convincingly be shown to have diverged before the putative endosymbiotic acquisition of mitochondria (i.e., there are no known eukaryotic lineages that are primitively ­amitochondriate, in the sense of never having harbored mitochondria during their ­evolution). Hence, we cannot currently point to any existing eukaryotic candidates that could serve as examples of the sort of host cell that the classical endosymbiont hypothesis requires. As we shall see, these complications considerably constrain our understanding of the origin and evolution of mitochondria.

A. Genetic, Genomic and Phylogenomic Data Bearing on Mitochondrial Origins

Following the discovery that mitochondria contain a genome and carry out DNA replication, transcription and translation, various biochemical and molecular biological data based on these findings were marshaled in support of a eubacterial, endosymbiotic origin of mitochondria (a xenogenous origin), as opposed to an autogenous origin (an origin from within the eukaryotic cell itself; Gray and Doolittle 1982). For example, the fact that mitochondrial protein synthesis is sensitive to chloramphenicol but not to cyclo­heximide was an early indication that the mitochondrial ribosome is functionally eubacterial in character, and not an evolutionary derivative of the cytoplasmic ribosome. Molecular data stemming from comparative studies of mitochondrial and bacterial genomes have been particularly informative when it comes to assessing the origin of mitochondria. Arguments based on these data, and supporting a single origin of the mitochondrial genome from within the eubacterial class Alphaproteobacteria, are of three sorts (Gray et al. 1999; Lang et al. 1999a). First, mitochondrial genomes in different eukaryotes encode relatively few genes (<100), but in all studied mitochondrial genomes, these genes are essentially sub-sets of the ones found in the R. americana mtDNA. Accepting that mitochondrial genomes in different eukaryotic taxa are radically (but differently) reduced versions of a much larger eubacterial genome carrying a substantially greater number of genes, it is highly impro­bable that independently acquired α-proteobacterial genomes would have undergone convergent reduction to the same small set of residual genes. Second, although mitochondrial genome organization and gene order vary markedly among eukaryotes, a number of minimally derived mitochondrial genomes (principally found among single-celled eukaryotic microbes, or protists) retain vestiges of eubacterial operons. These operon-like clusters are invariably missing some of the genes that are found in the corresponding eubacterial operons, and these specific deletions are shared among mitochondrial genomes. For example, in the mitochondrial version of the eubacterial S10 operon, comprising a cluster of 11 ribosomal protein genes, the same six genes (rpl3-rpl4-rpl23, rpl22, rpl29-rps17) are missing in all characterized mitochondrial genomes that encode clustered ribosomal protein genes. The inference is that these mitochondrion-specific deletions must have been present in the mitochondrial genome of a common ancestor of the taxa in question. Finally, in phylogenetic reconstructions based on alignments of mtDNA-encoded genes, both rRNA and protein-coding, mitochondria appear as a monophyletic clade branching within Alphapro­teobacteria. This is the case even when alignments include nucleus-encoded homologs that are encoded in the mtDNA of some eukaryotes but in the nuclear DNA of others (e.g., Burger et al. 1996), additionally providing support for the concept of mitochondrion-to-nucleus gene transfer that has shaped the evolution of the mitochondrial and nuclear genomes.

Initial phylogenetic reconstructions based on rRNA sequence data identified the Alphaproteo­bacteria as the probable evolutionary source of mitochondria (Yang et al. 1985), a satisfying conclusion in view of the fact that a member of this class (Paracoccus denitrificans) had earlier been proposed as the bacterium whose electron transport chain bears an especially strong resemblance to the mitochondrial one in its biochemical properties (John and Whatley 1975). Subsequently, in phylogenetic trees based on protein as well as rRNA sequences, mitochondria were found to branch together with Rickettsiales (Gupta 1995; Lang et al. 1999a), one of six or more orders within Alphaproteobacteria (Williams et al. 2007). Again, this result was intellectually pleasing because members of Rickettsiales (genera such as Rickettsia, Anaplasma, Ehrlichia and Wolbachia) are obligate, intracellular parasites of eukaryotic cells, superficially resembling mitochondria in their dependence on a host: and, like mitochondria, they harbor markedly reduced genomes compared to typical eubacteria (Sällström and Andersson 2005). However, the genomes of the mitochondrion and members of the Rickettsiales are clearly the products of independent evolutionary reduction (Andersson et al. 1998; Gray 1998), which argues that mitochondria cannot have been derived directly from a Rickettsiales taxon; rather, these two groups share a more distant common ancestor.

Although a specific evolutionary connection between mitochondria and Rickettsiales has been repeatedly demonstrated (Viale and Arakaki 1994; Gupta 1995; Sicheritz-Pontén et al. 1998; Lang et al. 1999b), it is still not certain whether the two are sister groups, or whether mitochondria actually branch within Rickettsiales, which comprises two distinct families, Rickettsiaceae and Anaplasmataceae (Williams et al. 2007). A number of studies have concluded that mitochondria are more closely related to the former family (containing various Rickettsia species) than to the latter (comprising the genera Anaplasma, Ehrlichia and Wolbachia) (Karlin and Brocchieri 2000; Emelyanov 2001a, b, 2003a, b).

The specific affiliation of mitochondria and Rickettsiales in phylogenetic trees has been questioned on the grounds that this rooting, although robust, may represent a phylogenetic artifact attributable to the high rate of sequence divergence and high A  +  T content of the genomes of Rickettsiales taxa and mitochondria (in other words, a long-branch-attraction artifact). Indeed, Esser et al. (2004) were not able to ascertain with certainty the placement of mitochondria within their phylogenetic trees, pointing out that Rhodosprillum rubrum, a member of a different α-proteobacterial order (Rhodospirillales), “came as close to mitochondria as any α-proteobacterium investigated”. Accordingly, there has been considerable interest in expanding the availability of α-proteobacterial genome sequences, in particular ones from free-living members of Rickettsiales. These phylogenetic analyses have used different combinations of data sets and methods of phylogenetic inference to first show that it is possible to generate a robust phylogeny of Alphaproteo­bacteria, despite concerns about possible complications such as base composition, codon bias, variable rate of sequence divergence, and horizontal gene transfer (HGT) compromising the underlying phylogenetic signal (Williams et al. 2007). These analyses, based on expanded taxon sampling, uniformly place mitochondria within Alphaproteobacteria, although they still vary somewhat in the specific branching position of mitochondria within this subdivision. Wu et al. (2004), for example, reported strong support for a grouping of Wolbachia (family Anaplasmataceae) + Rickettsia (family Rickettsiaceae) within Rickettsiales, to the exclusion of mitochondria. Fitzpatrick et al. (2006), however, concluded that the Rickettsiales as a whole constitutes the sister group to mitochondria. On balance, there is strong convergence on an evolutionary affiliation of mitochondria with the order Rickettsiales within the class Alphaproteo­bacteria, but still no compelling consistency as to whether mitochondria branch within the Rickettsiales or as a sister group to this order.

A potentially discriminating addition to this debate has been the recent discovery of a large group of predominantly marine members of Alphaproteobacteria, the so-called SAR11 clade, which constitute new representatives of the Rickettsiales (Morris et al. 2002; Williams et al. 2007). Unlike other members of this order, the SAR11 clade comprises free-living species that nevertheless share a number of genomic features with their parasitic cousins, such as streamlined genome and limited metabolic capacity. The genome of one SAR11 taxon, Pelagibacter ubique, is the smallest currently known for a free-living microorganism, contains the lowest number of predicted open reading frames, and exhibits the shortest intergenic spacers yet observed for any cell, with no evidence of pseudogenes, introns, mobile or extrachromosomal elements or inteins (Giovannoni et al. 2005). The phylogenetic placement of the SAR11 clade with respect to mitochondria within Alphaproteobacteria is still uncertain, but we may anticipate that the addition of more genome sequence data from this group will augment phylogenetic reconstructions aimed at answering this question.

While debate about the origin of mitochondria has centered on wholesale acquisition of an α-proteobacterial genome, in which case the various genes encoded by its mitochondrial descendent are assumed to have had the same evolutionary origin, a new wrinkle has been introduced by the discovery that gene transfer agents (GTAs) are pervasive in the genomes of Alphaproteobacteria taxa, inclu­ding within the various Rickettsiales genera (although not, apparently, Pelagibacter; McDaniel et al. 2010). GTAs are virus-like elements that seem to function solely in the high-frequency transfer of DNA (and therefore genes) between cells, with no apparent adverse effects on the recipient. Recently, Richards and Archibald (2011) raised the possibility that the α-proteobacterial ancestor of mitochondria might have had a genome that had already undergone GTA-mediated gene exchange with other proteobacteria as well as non-proteobacterial species, so that the available phylogenetic signal is to some extent scrambled. Such a situation could conceivably compromise our ability to identify with certainty the precise α-proteobacterial lineage from which mitochondria originated “by generating incongruent tree topologies with ‘mitochondrial’ genes branching in different places within the α-proteobacterial phylogeny and prokaryotes as a whole”. Some analyses (e.g., Esser et al. 2007) have reported discordant phylogenetic affinities for some mtDNA-encoded genes, although apparent discordance is much less pronounced in other studies (e.g., Fitzpatrick et al. 2006). It remains to be seen how much of this ­discrepancy can be attributed to the vagaries of single-gene tree reconstruction, where the number of phylogenetically informative characters is likely to be limited. In any event, potential gene transfer into the α-­proteobacterial proto-mitochondrial ancestor (Esser et al. 2007), perhaps GTA-mediated at least in part (Richards and Archibald 2011), is yet another complication that will have to be taken into account in the continuing quest to delineate more precisely the evolutionary origin of the mitochondrial genome and the genes it contains.

B. Nature of the Host

Although we have a good albeit still imprecise idea of the phylogenetic provenance of the organism that ultimately gave rise to mitochondria, our picture of the host cell is far less clear, particularly regarding whether it was a full-fledged but amitochondriate eukaryotic cell, as classical endosymbiotic theory suggests. Our uncertainty about the host reflects two fundamental but still unanswered questions about the process of eukaryotic cell evolution (eukaryogenesis): (1) What is the evolutionary relationship of eukaryotes to the two prokaryotic groups, archaebacteria and eubacteria? (2) Were the formation of the eukaryotic cell per se and the formation of mitochondria congruent or sequential processes? In other words, was the prior emergence of an amitochondriate eukaryotic cell a sine qua non for the subsequent formation of mitochondria, or did the emergence of the defining structural and biochemical complexity of the eukaryotic cell depend on the prior establishment of the mitochondrion?

Debate about the phylogenetic relationships among eukaryotes, eubacteria and archaebacteria has centered on two opposing views of the so-called ‘tree of life’. Supporters of the three-domains tree accept three separate and phylogenetically distinct (monophyletic) primary divisions, or domains, of life: Eucarya (eukaryotes), Archaea (archaebacteria) and Bacteria (eubacteria; Pace et al. 1986; Woese et al. 1990). On the basis of phylogenetic analyses of ancient paralogous genes – products of a gene duplication event that is presumed to have occurred before the separation of the three groups – Eucarya and Archaea are often considered to be sister groups, to the exclusion of Bacteria (e.g., Gogarten et al. 1989; Iwabe et al. 1989). On the other hand, proponents of the eocyte tree argue that eukaryotes branch within archaebacteria, with a specific group called Chrenar­chaeota (eocytes), to the exclusion of the other major group of archaebacteria, Euryarchaeota (Lake et al. 1984; Rivera and Lake 1992). In the eocyte tree, therefore, archaebacteria are paraphyletic, not monophyletic.

Irrespective of which of the above alternatives is correct (Archibald 2008; Cox et al. 2008), what has become increasingly evident is that the nuclear genome of eukaryotes is a genetic mosaic: some of its genes are clearly more similar to archaeal homologs, others to eubacterial homologs, and still others appear to be eukaryote-specific inventions, having recognizable homologs in neither of the other two groups. Koonin (2010) has pointed out a number of examples where components of key functional systems and molecular machines of eukaryotes show evidence of varied phylogenetic ancestry. For example, chromatin/nucleosome proteins and protein constituents of the RNA interference machi­nery and the endomembrane/endoplasmic reticulum all appear to be complex mixes of archaeal and bacterial origin. Archaeal homologs tend to be ‘informational’, i.e., involved in genetic information transfer and processing (principally components of the ­replication, transcription and translation machineries), whereas eubacterial homologs are largely ‘operational’, i.e., involved in biosynthesis and metabolism (Rivera et al. 1998; Jain et al. 1999).

This dichotomy has prompted a variety of prokaryote–prokaryote fusion models for the origin of the nuclear genome, whereby an archaeal-type genome is combined with a eubacterial-type genome (e.g., Rivera and Lake 2004). Differential gene loss in the resulting hybrid then results in retention of mostly archaeal informational genes and eubacterial operational genes, although how the initial genome fusion and subsequent gene re-assortment occur (and why) is largely unspecified in these models. Alternatively, in models that invoke symbiosis-type interactions, an archaea-like cell is often seen as serving as host to a eubacteria-like symbiont. The resulting combination subsequently evolves the various hallmarks of the eukaryotic cell, such as membrane-bounded nucleus, endocytosis, complex endomembrane system and cytoskeleton. Importantly, however, the archaea-like and eubacteria-like gene complements of eukaryotic nuclear genomes show a diversity of origins within these two domains, to the extent that it has not been possible, as it has in the case of mitochondria, to pinpoint specific archaeal and eubacterial taxa as ‘founding’ lineages of a putative chimeric proto-eukaryotic cell. In rationalizing these observations, Koonin (2010) has suggested “that the archaeal ancestor of eukar­yotes combined a variety of features found separately in diverse archaea”.

Complicating efforts to untangle the evolutionary history of eukaryotes is the phenomenon of horizontal gene transfer (HGT), which is a major contributor to prokaryotic genome evolution (Doolittle 1999; Ochman et al. 2000), but also operates in eukaryotic genome evolution (e.g., Archibald et al. 2003; Andersson 2005; Keeling and Palmer 2008). If one accepts that the nuclear genome arose through a fusion of archaeal-type and eubacterial-type cells, the genomes of these progenitor cells may have themselves already been mosaic to some extent, added to which subsequent HGT from diverse archaeal and eubacterial sources could have further scrambled the nuclear genome’s underlying phylogenetic signal. Thus, we should perhaps not be surprised that the three-domains tree is robustly supported in some analyses while the eocyte tree is strongly supported in others: in the case of a highly mosaic genome, the concept of a unique single origin simply does not apply.

Given the limitations attending reconstruction of deep phylogenies based on single-gene and multiple-gene alignments (even using concatenates of >100 genes), an alternative ‘supertree’ approach shows considerable promise. Supertree methods take individual phylogenetic trees as their input, synthesizing a single consensus tree from the collection of separate trees (Steel et al. 2000; Bininda-Emonds 2004; Wilkinson et al. 2005). Employing a supertree approach, Pisani et al. (2007) generated phylogenetic trees for 5,741 single-copy genes contained in 165 sequenced genomes, including >10 eukaryotic nuclear ones. After rigorously pruning the underlying conservative alignments to reduce sequence-based metho­dological artifacts (see Esser and Martin 2007), the results suggested that the nuclear genome of eukaryotes is dominated by genes of cyanobacterial and α-proteobacterial origin, attributable to the bacterial symbionts that gave rise to plastids (see below) and mitochondria, respectively, as well as a third component of archaeal origin. These intriguing results suggest that supertree methods may be able to recover signals due to symbiosis/genome melding events even in the face of considerable HGT ‘noise’. Whether this approach is able to further tease apart the key events underlying the formation of the nuclear genome, to the degree we have been able to do so with mitochondrial and plastid genomes, remains to be seen.

A radically different view of eukaryo­genesis has been championed by Kurland et al. (2006), who argue that the modern eukaryotic cell is the evolutionary descendent of “a unique primordial lineage”, and that prokaryotes are derived from a common ancestor with eukaryotes through a process of genomic streamlining (reductive genome evolution) that has operated minimally in the eukaryotic lineage, affecting only select groups. These authors dispute the evidence that has been marshaled in support of the idea that the nuclear genome comprises a hybrid of archaea-derived and eubacteria-derived genes; they further posit that proteins unique to eukaryotes (so-called ‘eukaryotic signature proteins’, or ESPs) are retained primitive traits, rather than derived, eukaryote-specific inventions.

While it is increasingly clear that the Last Eukaryotic Common Ancestor (LECA) was already a highly complex cell containing most or all of the cellular signature structures (CSSs) that distinguish eukaryotic cells from archaeal and bacterial cells (Koonin 2010), compelling data are lacking to support the thesis that these features trace back to the last common ancestor of all cells. This particular eukaryogenesis scenario remains controversial (Kurland et al. 2007; Martin et al. 2007; Koonin 2010). It does not, for example, account for the large number of proteins of apparent archaeal origin in some of the key functional systems and molecular machines of eukaryotes (particularly DNA replication and repair, transcription and translation; Koonin 2010). Instead, Kurland et al. (2006) view archaea-like and eubacteria-like genes in the nuclear genome largely as retained primitive traits present in the last common ancestor of the three domains. The implication of this assumption is that ancestral genes shared specifically between Archaea and Eucarya were selectively lost from Bacteria, whereas those specifically shared between Bacteria and Eucarya (save those attributable to mitochondrial and plastid symbioses) were selectively lost from Archaea. At the moment, the accumulated genomic and phylogenomic data are better accommodated by eukaryogenesis scenarios in which the nuclear genome is initially formed through contributions from (an) archaeal and (a) eubacterial genome(s), however that mixing occurred (via cell/genome fusion or more indirectly; see below), with ESPs evolving as eukaryote-specific inventions within the resulting hybrid cell.

C. How Did It Happen?

A surprisingly large number of endosymbiotic models have been proposed over the years to account for the origin of mitochondria (see Martin et al. 2001 for an excellent and comprehensive overview). In essence, these models can be seen as variations on two fundamentally different themes that have been referred to, respectively, as the archezoan scenario and the symbiogenesis scenario (Koonin 2010). The archezoan scenario holds that “the host of the proto-mitochondrial endosymbiont was a hypothetical primitive amitochondrial eukaryote, termed archezoan”. In contrast, the symbiogenesis scenario proposes that “a single endosymbiotic event involving the uptake of an α-proteobacterium by an archaeal cell led to the generation of the mitochondria”, followed subsequently “by the evolution of the nucleus and compartmentalization of the eukaryotic cell.” The archezoan scenario most closely approximates the classical endosymbiotic hypothesis of mitochondrial origin (Margulis 1970; Doolittle 1980). The hydrogen hypothesis of Martin and Müller (1998) exemplifies the symbiogenesis scenario. A fundamental difference between these two scenarios is whether the α-proteobacterial endosym­biosis that provided the proto-mitochondrion occurred at the same time as (and was integral to) the formation of the eukaryotic cell, or occurred subsequent to the formation of a primitive, amitochondriate cell that was already essentially eukaryotic.

Archezoan Scenario

The archezoan scenario received a major boost with the discovery of eukaryotes that not only lacked identifiable mitochondria but that also appeared to be the earliest branching taxa in phylogenetic trees based on rRNA sequences. The purportedly amitochondriate lineages included microsporidia, diplomonads and parabasalids, protists living as parasites of other eukaryotes, in anaerobic environments. These ‘amitochondriate’ parasites (collectively termed Archezoa) were initially assumed to represent contemporary examples of the sort of primitive eukaryote that might have served as host to an α-proteobacterial symbiont, which would subsequently become the mitochondrion.

In recent years, the archezoan scenario has been substantially weakened by two key findings. First, a number of studies have convincingly demonstrated that the apparent early branching of archezoan taxa is a long-branch artifact, due to an unusually rapid rate of sequence divergence of the genes selected for phylogenetic analysis (e.g., Inagaki et al. 2004). These long-branch sequences cluster at the base of eukaryotic phylogenetic trees, closest to the outgroup (prokaryotic) sequences used to root the trees. Microsporidia, for example, are now known to be evolutionarily degenerate fungi rather than ‘early-branching’ eukaryotes (Hirt et al. 1999; Keeling et al. 2000). In fact, the root of the eukaryotic tree has been notoriously difficult to discern and is not yet established, with six or so eukaryotic supergroups appearing to diverge from one another virtually simultaneously, on an evolutionary timescale. As a result, the eukaryotic tree more closely resembles a bush, and no one lineage can be clearly identified as earliest diverging (Keeling et al. 2005; Koonin 2010).

A second notable nail in the coffin of the archezoan scenario was the discovery in one archezoan species after another of organelles that were eventually recognized as highly reduced mitochondria. These ‘mitochondrion-related organelles’ (MROs) are of two basic types, distinguished by whether or not they retain any capacity for energy (ATP) generation. The first of these MROs to be discovered was the hydrogenosome (Lindmark and Müller 1973). This double-membrane-bound organelle, found in certain anaerobic protists such as the parabasalid Trichomonas vaginalis, lacks a number of the defining features of a conventional mitochondrion: it has no genome, no complete tricarboxylic acid (TCA) cycle, no cytochromes and lacks a complete electron transport chain. Although the T. vaginalis hydrogenosome does not have a capacity to produce ATP through coupled electron transport-oxidative phosphorylation, it is still able to generate ATP from pyruvate via a substrate-level pathway, through the combined activities of a set of enzymes characteristic of this organelle, including an iron–iron hydrogenase. Molecular hydrogen (H2) is one of the end products of this pathway, accounting for the organelle’s name. Because of its unique anaerobic metabolism, it was originally supposed that the hydrogenosome might be the evolutionary product of a separate endosymbiosis – in this case with an anaerobic-type eubacterium such as a Clostridium (Whatley et al. 1979) – than the one that gave rise to the mitochondrion. However, subsequent studies have demonstrated that the T. vaginalis hydrogenosome contains a number of proteins typical of mitochondria, such as chaperonins (Bui et al. 1996), the NADH dehydrogenase module of electron transport Complex I (Hrdy et al. 2004), and components of the mitochondrial machinery for synthesis of iron–sulfur (Fe-S) clusters, the ISC biosynthesis pathway (Sutak et al. 2004). These results strongly support the view that the T. vaginalis hydrogenosome is an evolutionary derivative of a conventional mitochondrial ancestor.

A second group of double membrane-bound MROs, in this case lacking any capa­city to generate ATP, has been found in a number of anaerobic, parasitic protists, including the amoebozoans Entamoeba histolytica (Clark and Roger 1995; Mai et al. 1999; Tovar et al. 1999) and Mastigamoeba balamuthi (Gill et al. 2007), the microsporidians Trachipleistophora hominis (Williams et al. 2002) and Encephalitozoon cuniculi (Goldberg et al. 2008; Tsaousis et al. 2008), and the diplomonad Giardia lamblia (Tovar et al. 2003). Collectively, the term ‘mitosome’ is most often applied to these particular MROs (Embley et al. 2003; Embley 2006; Hjort et al. 2010). Again, molecular data identifying typical mitochondrial proteins in these organelles has solidified the view that mitosomes, also, are derived mitochondria, but even more highly reduced than hydrogenosomes (see Hjort et al. 2010 for a detailed listing and discussion of relevant data). The limited metabolic capacity of mitosomes has focused attention on what functionality has been retained in these organelles, suggesting that Fe-S cluster formation, rather than oxidative phosphorylation, may be the essential raison d’ être of the mitochondrion and its evolutionary derivatives.

More recently, the distinction between mitochondria, ‘classical’ hydrogenosomes and mitosomes has become blurred by the discovery of what appear to be transitional evolutionary forms that retain a reduced genome, lacking a number of typical mtDNA-encoded genes. Like genome-deficient hydrogenosomes, these novel genome-containing MROs are able to generate H2 via a hydrogenase-mediated reaction; however, they also carry out a more complex bioche­mistry than genome-deficient hydrogenosomes. Two such genome-containing MROs that have been studied in some detail are those in the anaerobic ciliate Nyctotherus ovalis (Boxma et al. 2005) and the anaerobic stramenopile Blastocystis sp. (Pérez-Brocal and Clark 2008; Stechmann et al. 2008; Wawrzyniak et al. 2008), a relative of brown algae, diatoms and oomycetes (see Chap. 2). In both cases, the MRO genome encodes components of an organellar translation system (rRNAs, tRNAs, ribosomal proteins) as well as components of electron transport complexes I and II, suggesting the presence of a partial electron transport chain.

The punctuate phylogenetic distribution of MROs of various types, and their interspersion with aerobic taxa within particular lineages, strongly indicate that MROs have arisen independently a number of times from a conventional mitochondrial ancestor (Embley et al. 2003; Embley 2006; Hjort et al. 2010), and that many of the seemingly shared characteristics of MROs (e.g., between those of Nyctotherus and Blastocystis) are due to convergent evolution rather than vertical inheritance. The continued study of variously evolved MROs will not only be key to elucidating both the pathways and mechanisms involved in the evolutionary conversion of conventional to relict mitochondrion, but will also give us a better appreciation of the evolutionary flexibility of mitochondria: a theme considered below in the discussion of mitochondrial proteome evolution.

Significantly, with regard to models of mitochondrial origin, the discovery of MROs has greatly weakened the concept of primitively amitochondriate protists. Although a number of ­‘amitochondrial’ eukaryotes (i.e., lacking conventional mitochondria) obviously exist, we can point to no convincing examples of primitively ‘amitochondriate’ eukaryotes (i.e., ones whose evolutionary ancestors never had mitochondria). Accor­dingly, we are forced to conclude that if such organisms ever existed, their descendent lineages must all have become extinct.

Symbiogenesis Scenario

As support for the archezoan scenario has waned, the alternative view – that the host cell for the mitochondrial endosymbiosis was a prokaryote (specifically an archaeon), not a eukaryote – has correspondingly gained prominence (Koonin 2010). Perhaps the best-known symbiogenesis scenario is the hydrogen hypothesis of Martin and Müller (1998), which suggests that eukaryotes have arisen “through symbiotic association of an anaerobic, strictly hydrogen-dependent, strictly autotrophic archaebacterium (the host) with a eubacterium (the symbiont) that was able to respire, but generated molecular hydrogen as a waste product of anaerobic heterotrophic metabolism. The host’s dependence upon molecular hydrogen produced by the symbiont is put forward as the selective principle that forged the common ancestor of eukaryotic cells.”

Assuming that the symbiont was an α-proteobacterium that was capable of both anaerobic and aerobic energy metabolism, the hydrogen hypothesis can account for the origins of eukaryotic energy metabolism, assuming that respiration machinery genes (Krebs cycle and oxidative phosphorylation) and genes for anaerobic energy metabolism (PFO, hydrogenase) were both retained in the hybrid cell but differentially expressed under the relevant environmental conditions, and that genes for aerobic respiration were differentially lost in those eukaryotic lineages, in which the mitochondrion was converted to an anaerobic MRO. Thus, the hydrogen hypothesis “posits that the origins of the heterotrophic organelle (the symbiont) and the origins of the eukaryotic lineage are identical”. A corollary of the hydrogen hypo­thesis and other symbiogenesis scenarios is that the complexity of the eukaryotic cell and its defining features developed after the mitochondrial symbiosis, rather than before.

As noted by Koonin (2010), several arguments can be advanced against a symbioge­nesis scenario for the origin of mitochondria. For example, endocytosis (a hallmark eukar­yotic character) has long been considered to be an essential capacity for uptake of a bacterial endosymbiont. Cases of bacterial endosymbioses (e.g., γ-proteobacteria inside β-proteobacteria) have, however, been documented (von Dohlen et al. 2001; Thao et al. 2002). Also, as noted earlier, it has not been possible to trace the archaeal and eubacterial contributions to the nuclear genome to single extant prokaryotic lineages: although an α-proteobacterial signal does predominate (Pisani et al. 2007), in any given eukaryotic taxon collectively more eubacterial-type genes appear to derive from a diversity of non-α-proteobacterial lineages (or to branch within Bacteria as a whole, but not robustly with any specific group). Nevertheless, it is possible that ancestral lineages contributing to a eubacterial-archaeal symbiogenesis might have had more complex genomes than their contemporary relatives: genomes already affected to a certain extent by HGT. The hydrogen hypothesis does make a number of testable predictions, for example, that genes of anaerobic energy metabolism (such as PFO and hydrogenase) should form monophyletic clades in phylogenetic reconstructions, branching together with α-Proteobac­teria. However, a rigorous study of the phylogenetic distributions and histories of proteins involved in anaerobic pyruvate metabolism in eukaryotes has not provided support for this prediction (Hug et al. 2010).

Very recently, a new hypothesis, based on a consideration of the energetics of genome complexity, has added fuel to the eukaryogenesis fire. Lane and Martin (2010) argue that the increase in the number of proteins that eukaryotes encode and express, compared to prokaryotes, required an increase in cellular energy that only the mitochondrion could have provided. Accordingly, this hypothesis views mitochondria as the sine qua non to eukaryotic genomic and cellular complexity. The authors conclude, rather definitively, that “the host for mitochondria was a prokaryote”.

On balance, a symbiogenesis scenario (eubacterial endosymbiont in an archaeal host) better accommodates the accumulated data that address the origin of the mitochondrion than does an archezoan scenario (eubacterial endosymbiont in an amito­chondriate but essentially eukaryotic host). However, the latter scenario cannot be ruled out absolutely at this stage. Each scenario raises complications and objections that are difficult to rationalize without resorting to ad hoc explanations – e.g., that true archezoan eukaryotes may exist but simply have not yet been discovered, or that all such lineages have become extinct – and each is complicated by the fact that there is no obvious way to discern how similar the genomes of the proposed prokaryotic ancestors of the eukar­yotic cell were compared to their extant descendants.

D. Evolution of the Mitochondrial Proteome

Given that even the most gene-rich mitochondrial genomes retain only a small fraction of the genes that are assumed to have been contained in the genome of its α-proteobacterial ancestor (Gray 1999), gene loss has evidently played a major role in the evolution of the mitochondrial genome. Many of these ‘lost’ genes have been transferred to the nuclear genome from where they are now expressed, with import of only a minority of the resulting proteins back into the organelle; in fact, most of these transferred ‘proto-mitochondrial’ genes now function in other subcellular compartments (Gabaldón and Huynen 2003). Because functional mitochondria are composed of hundreds or even thousands of nucleus-encoded proteins, many of which belong to the category of ESPs, re-tailoring of the mitochondrial proteome through addition of new proteins and functions has been extensive in the course of evolution. Hence, just as comparative mitochondrial genomics, based on complete sequencing of mtDNAs, has proven to be a powerful approach for discerning the nature of the ancestral mitochondrial genome and revealing patterns and mechanisms of mitochondrial genome evolution (Gray et al. 1998; Gray 1999), so is comparative mitochondrial proteomics, based on mass spectrometric analysis of whole mitochondria or sub-mitochondrial fractions and complexes (Dreger 2003; Yan et al. 2009), proving to be an equally powerful method for elucidating the evolution of the mitochondrial proteome.

Initially, the composition of the mitochondrial proteome and the phylogenetic origins of mitochondrial proteins were assessed from complete genome sequence data via bioinformatics analyses of proteins possessing N-terminal mitochondrial targeting peptides. A number of algorithms have been developed to identify such targeting sequences (e.g., Claros and Vincens 1996; Emanuelsson et al. 2000), although not all imported mitochondrial proteins possess mitochondrial import signals identifiable in this way, and the algorithms are variably accurate and may have limited sensitivity in cases where protein sequences are highly divergent (Richly et al. 2003). Early estimates of the number of proteins in the yeast mitochondrial proteome ranged from ∼400 to ∼800, or between ∼7% and ∼13% of the total yeast proteome of ∼6,100 proteins (Karlberg et al. 2000; Marcotte et al. 2000; Kumar et al. 2002). More broadly applied predictions suggest that functional mitochondria could harbor as few as several hundred proteins in Plasmodium falciparum, the malaria parasite, to >3,000 in vertebrate animals (Richly et al. 2003).

Such studies provided a first, and surprising, overview of the evolutionary origins of proteins constituting the yeast mitochondrion (Karlberg et al. 2000; Marcotte et al. 2000; Kumar et al. 2002): surprising because a much smaller proportion (only ∼10–15%) of the mitochondrial proteome than might have been anticipated proved to originate clearly from the α-proteobacterial lineage. A larger, generically ‘prokaryotic’ proportion (∼40–50%) contained proteins whose origins appear to be outside α-Proteobacteria but without necessarily a robust affiliation to any particular bacterial or archaeal lineage. Members of another, ‘eukaryotic’ fraction (∼20–30%) have no obvious homologs in either Archaea or Bacteria and so are, by definition, ESPs. A final, ‘unique’ subset (∼20%) comprises seemingly species-specific proteins having no identifiable homologs in other eukaryotes or in prokaryotes. These results indicate that the yeast mitochondrial proteome has multiple evolutionary origins, and a complex evolutionary history (Kurland and Andersson 2000; Gray et al. 2001), a conclusion now firmly established for the mitochondria of other eukaryotes (Gabaldón and Huynen 2004; Szklarczyk and Huynen 2010). A small contribution of ­bacteriophage-like proteins (notably the mitochondrial RNA polymerase in most eukaryotes) has also been added to the evolutionary mix (Shutt and Gray 2006).

Direct proteomics surveys relying on mass spectrometry (Aebersold and Mann 2003; Yan et al. 2009) have confirmed and extended the initial, bioinformatics-based findings that pointed to a mosaic evolutionary origin of the mitochondrial proteome. This approach, while not biased toward proteins containing N-terminal mitochondrial targeting sequen­ces, has its own limitations, most particularly a bias in favor of the most abundant, soluble targets. Nevertheless, mass spectrometry (MS) has afforded a powerful means of uncovering novel mitochondrial proteins that cannot be identified on the basis of sequence similarity with known mitochondrial proteins. For example, in an MS study of mitochondria from the ciliate protozoon, Tetrahymena thermophila, ∼30% of identified proteins were found to have no demonstrable sequence homologs outside of the ciliate lineage, while a further ∼10% are unique to T. thermophila (Smith et al. 2007). At least 13 of the novel, ­ciliate-specific proteins have subsequently been found as components of the purified mitochondrial F1FO-ATP synthase (Complex V) of this protist (Nina et al. 2010), illustrating an emerging theme in mitochondrial research: taxon-specific re-tooling of mitochondrial complexes such as electron transport chain assemblies and ribosomes, only the core components of which derive from the α-proteobacterial ancestor of mitochondria. This re-tailoring occurs by addition of novel proteins of generally unknown function, sometimes accompanied by loss of otherwise conserved components. One such example is the ATP ­synthase of Chlamydomonas reinhardtii, a chlorophycean green alga, in which nine novel ‘Asa’ subunits of unknown evolutionary origin replace eight subunits that are otherwise conserved in the ATP synthase of other non-chlorophycean green algae, as well as in plants, animals and fungi (Lapaille et al. 2010).

This re-tailoring theme can also be seen in other well-studied mitochondrial respiratory complexes, such as Complex I (CI; NADH:ubiquinone oxidoreductase), the multi-subunit proton pump that carries out the first step in the canonical respiratory chain – the oxidation of NADH and subsequent reduction of ubiquinone. Bacterial CI comprises 14 subunits, all of which are present in the corresponding mammalian complex, with seven of the subunits encoded in the mammalian mitochondrial genome. A further 18 subunits that are present in mammalian CI are ubiquitous throughout eukaryotes but are not found in bacteria, and so are assumed to be eukaryote-specific additions already present in the last eukaryotic common ancestor. Thirteen other subunits of mammalian CI appear to have a narrow phylogenetic distribution, having so far been found only in metazoan animals (Brandt 2006).

Attempts have been made to reconstruct the proteins contributed to the eukaryotic cell by the proto-mitochondrial endosym­biont, through comparisons of proteins encoded in sequenced α-proteobacteria with those specified by sequenced eukaryotic genomes. This approach has identified at least 840 orthologous groups that are consi­dered to bear a clear α-proteobacterial signature – i.e., a close and specific evolutionary relationship to α-proteobacterial homologs, without any evidence of recent HGT (Gabaldón and Huynen 2003, 2007; Szklarczyk and Huynen 2010). Comparisons among α-proteobacterial genomes suggest that the free-living bacterial ancestor of mitochondria contained ∼3,000–5,000 genes (Boussau et al. 2004), with an upper bound of ∼1,700 ancestral clusters of orthologous genes in the proto-­mitochondrial genome (Szklarczyk and Huynen 2010). These estimates imply that upwards of 1,000–3,000 genes were lost in the transition from bacterial symbiont to proto-organelle. Signifi­cantly, of the >800 human genes that display an α-proteobacterial signature, only ∼200 comprise part of the human mitochondrial proteome, clearly implying that the proto-mitochondrial contribution to eukaryotic cell evolution and function extends well beyond the mitochondrion itself.

Pathways that are considered to have been complete in the proto-mitochondrion include the full electron transport chain and β-oxidation of fatty acids (providing NADH and FADH2 to the former), indicating that the mitochondrial endosymbiont had an aerobic metabolism. Also prominently represented are pathways for the synthesis of lipids, biotin, heme and iron-sulfur clusters, as well an abundance of cation transporters. In all, the reconstructed metabolism suggests that the proto-mitochondrion was capable of at least facultative aerobic respiration (Szklarczyk and Huynen 2010). More than half of what remains of this proto-mitochondrial metabolism in modern mitochondria comprises functions involved in energy metabolism and translation, including post-translational modifications: a veritable “hijacking of mitochondrial protein synthesis and metabolism” (Gabaldón and Huynen 2007).

In attempts to elucidate in more detail the ancestral state of selected mitochondrial components and pathways, several groups have initiated comparative analyses of emerging eukaryotic genome data. As noted earlier, mitochondrial CI has an additional 18 subunits that are not present in its bacterial counterpart, and that are considered to have been incorporated at the earliest stages of mitochondrial CI evolution (Gabaldón et al. 2005; Brandt 2006). In plants (Heazlewood et al. 2003; Perales et al. 2004) and green algae (Cardol et al. 2004), mitochondrial CI has also been found to contain multiple proteins with high similarity to γ-type carbonic anhydrases (γCAs), with comparative studies initially suggesting that these proteins represented specific additions in the plant lineage (Parisi et al. 2004). However, a more recent study focusing on protists has revealed a much broader distribution of mitochondrial γCAs, either demonstrated or presumed to be associated with mitochondrial CI (Gawryluk and Gray 2010), than previously suspected. It appears likely that γCAs were ancestral components of mitochondrial CI, and that they were subsequently lost from CI specifically in the evolutionary line leading to animals and fungi (opisthokonts), rather than added to CI specifically in the line leading to plants and algae. These results emphasize the importance of comprehensive taxon coverage in drawing conclusions about mitochondrial proteome evolution.

Other studies have demonstrated that the ancestral mitochondrial ribosome in the last eukaryotic common ancestor was already much larger than its bacterial ancestor, containing some 19 additional eukaryote-specific proteins (Smits et al. 2007; Desmond et al. 2011). The fact that these novel mitochondrial ribosomal proteins are found throughout the eukaryotic domain, in all of the currently recognized eukaryotic supergroups, is yet another strong argument in favor of a monophyletic origin of contem­porary mitochondria: a conclusion in this case based on eukaryote-specific rather than prokaryote-specific features.

The mitochondrial ribosome presents a particularly dramatic example of mitochondrial re-tailoring, with both the RNA and protein components varying markedly in size and number among eukaryotes. For example, the 55S human mitochondrial ribosome contains rRNA species that are about half the size of their bacterial 23S and 16S counterparts; however, it has 29 different small subunit and 48 different large subunit proteins, compared to values of 21 and 34, respectively, in E. coli (O’Brien 2003). Clearly, the human mitochondrial ribosome has lost substantial RNA and gained substantial protein in the course of its evolution from a bacterial progenitor, reversing the usual protein:RNA ratio (33:67) to become protein-rich (69:31) (O’Brien 2002).

An even more extreme situation is seen in the kinetoplastid protozoa, such as Trypanosoma brucei (Ziková et al. 2008) and Leishmania tarentolae (Sharma et al. 2009). Here, rRNA shrinkage is even more pronounced than in the human mitochondrial ribosome whereas protein content has been further expanded, with the Trypanosoma mitochondrial ribosome containing 56 small subunit and 77 large subunit proteins. Notably, the novel mitoribosomal proteins identified in these analyses do not have detectable homologs outside of the kinetoplastid protozoa, and display only a low degree of sequence conservation within this lineage. These observations reinforce the importance of direct mass spectrometric analyses of isolated mitochondrial complexes in order to accurately determine their composition, given that so many of these components appear to be new, lineage-specific inventions. Overall, the mitochondrial proteome has proven to be surprisingly malleable, a situation that is mirrored by the picture emerging from investigations of the plastid proteome in photosynthetic eukaryotes.

III. Plastids

The notion that plastids are of endosym­biotic ­origin is more than a century old. The Russian botanist Konstantin Mereschkowsky (1855–1921) is generally credited as having been the first to elaborate on the significance of similarities between ‘Cyanophyceae’ (cyanobacteria) and the ‘chromatophores’ (chloroplasts or plastids) of plants and unicellular algae such as diatoms (Mereschkowsky 1905; Martin and Kowallik 1999). Meresch­kowsky developed the concept of symbiogenesis – the evolution of new life forms from the amalgamation of two separate organisms – which was championed and rendered ‘mainstream’ by Margulis (1970) as the endosymbiont hypothesis for the evolution of mitochondria and plastids. In the sections that follow we discuss the wealth of data brought to bear on the origin and early evolution of ‘primary’ plastids, i.e., those that have been inherited in a vertical fashion since their inception. The following chapter by Bhattacharya and colleagues deals with so-called ‘secondary’ and ‘tertiary’ endosymbioses, whereby plastids have spread horizontally by mergers between two eukar­yotes. Molecular evidence in support of a classical endosymbiotic origin for primary plastids is (and has always been) stronger than that for mitochondria. Nevertheless, as is the case for mitochondrial evolution, genomic and proteomic investigations continue to expose layer upon layer of unexpected complexity.

A. Cyanobacterial Endosymbiont, Complex Eukaryotic Host

With several decades worth of ultrastructural, biochemical and molecular phylogenetic data in hand, it can now be concluded that (1) plastids evolved after mitochondria, (2) the endosymbiont was an ancestor of modern-day cyanobacteria capable of oxygenic photosynthesis, and (3) the host was a ‘complex’, fully formed eukaryote with the ability to phagocytose prey (Gray and Spencer 1996; Reyes-Prieto et al. 2007; Gould et al. 2008). The precise ecological and physiological conditions present at the time of the evolution of plastids are unknown, but eukaryotic heterotrophs would presumably have benefited greatly from the ingestion of organisms capable of generating energy from sunlight. Like today’s cyanobacteria, primary plastids are characterized by the presence of two membranes (Gould et al. 2008). This suggests that the endosymbiont somehow ‘escaped’ from its phagocytic vacuole, perhaps allowing it to persist for progressively longer periods of time without being digested. Regardless, the cyanobacterial progenitor of the plastid gradually became one with its eukaryotic host: non-essential genes were lost, scores of essential genes were transferred to the nuclear genome, a protein import machinery evolved, and mechanisms for metabolite transport were ‘invented’, allowing the proto-alga to reap the benefits of cyanobacterial carbon fixation (Martin and Herrmann 1998; McFadden 1999, 2001; Soll and Schleiff 2004; Weber et al. 2006; Howe et al. 2008). Both endosymbiont- and host-derived components appear to have contributed to the integration of the two cells.

How derived are plastids relative to cyanobacteria? Hundreds of plastid genomes have now been sequenced, and even the most gene-rich among them contain only ∼250 genes (Stoebe and Kowallik 1999; Martin et al. 2002; Hagopian et al. 2004). This coding capacity stands in stark contrast to that of cyanobacteria, which have at least ∼1,700 genes (Rocap et al. 2003). The plastid genomes of all photosynthetic organisms retain a very similar core set of genes encoding proteins primarily involved in photosynthesis, transcription and translation (Turmel et al. 1999; Martin et al. 2002; Howe et al. 2003; Kim and Archibald 2009). In a situation analogous to the retention of mitochondrion-related organelles in anaerobic protists, virtually all known secondarily non-photosynthetic eukar­yotes, including parasitic plants (Krause 2008), the green algal parasites Helicosporidium (de Koning and Keeling 2006) and Prototheca (Borza et al. 2005), and the malaria parasite Plasmodium (Waller and McFadden 2005; see Chap. 2) retain a plastid. This is because the plastid is the site of essential biochemical processes entirely unrelated to photosynthesis, including the synthesis of heme precursors, fatty acids and certain amino acids (Borza, et al. 2005; Mazumdar et al. 2006).

B. Single or Multiple Origins?

While mitochondria (and their derivatives) are part-and-parcel with the eukaryotic condition, plastid-bearing organisms exhibit a ‘patchy’ phylogenetic distribution. Primary plastids bearing two membranes are restricted to three lineages, the glaucophyte (or glaucocystophyte) algae, red algae and green algae (Bhattacharya et al. 2003; Keeling 2010). Glaucophytes are poorly studied fresh-water unicells that are of particular interest due to the fact that their plastid envelopes possess a layer of peptidoglycan, as do the cell walls of cyanobacteria (Graham and Wilcox 2000). Despite retention of this ancestral feature, glaucophyte plastids have a genome that is as reduced as those of green and red algae and that shares many features in common with them (Löffelhardt et al. 1997). Red algae are a diverse lineage comprising both unicellular and multicellular forms, some of which are capable of living in highly acidic environments and at temperatures greater than 50°C (Ciniglia et al. 2004; Reeb and Bhattacharya 2010; Yoon et al. 2010). Green algae are a speciose assemblage of terrestrial and aquatic (both freshwater and marine) phototrophs that are divided into two distinct lines, the chlorophytes (e.g., the model laboratory alga Chlamydomonas) and the streptophytes. It is from within this latter group that multicellular land plants evolved (Karol et al. 2001; Lewis and McCourt 2004; Finet et al. 2010). Together, glaucophytes, red algae and green algae plus land plants belong to the eukaryotic ‘supergroup’ Archaeplastida or Plantae (Adl et al. 2005; Keeling et al. 2005). Whether Archaeplastida represents a monophyletic assemblage is a topic of ongoing debate.

Molecular phylogenetic analyses of plastid rRNA and protein genes from red, green and glaucophyte algae almost always show a clear connection to cyanobacteria (e.g., Douglas and Gray 1991; Delwiche et al. 1995; Turner et al. 1999), but no particular extant cyanobacterial lineage has yet emerged as an unambiguous, specific relative of plastids. The topologies of individual protein and rRNA gene trees have proven frustratingly sensitive to phylogenetic artifacts and taxon representation of both algae and cyanobacteria (Lockhart et al. 1992a, b; Sato 2006; Larkum et al. 2007). Nevertheless, the current trend towards multi-gene and whole-genome-scale analyses has improved matters somewhat. For example, a recent analysis of combined 16S rRNA and ribulose 1,5-bisphosphate carboxylase/oxygenase (rbcL) genes by Falcón et al. (2010) resolved a monophyletic primary plastid clade and suggested a specific association between plastids and nitrogen-fixing unicellular cyanobacteria belonging to the Chroococcales. This result is consistent with a phylogenomics-based analysis carried out by Deusch et al. (2008). These authors compared the complete nuclear genomes of Arabidopsis thaliana, rice, Chlamydomonas reinhardtii, and the red alga Cyanidioschyzon merolae to nine cyanobacterial genomes. They found that, in terms of gene presence/absence and overall sequence similarity, the cyanobacterial-derived gene sets contained in the algal genomes were most similar to those of the nitrogen-fixing, heterocyst-forming cyanobacteria Nostoc sp. and Anabaena variabilis (Deusch et al. 2008). While certainly not definitive, these studies are nevertheless significant in positing a specific role for nitrogen fixation in the early stages of plastid evolution.

As noted above, the plastid genomes of all three of the primary plastid-harboring lineages are highly reduced compared to those of known cyanobacteria. Their gene contents overlap to a substantial degree and when structural similarities are taken into consi­deration, such as the near-universal presence of rDNA-containing inverted repeats, conserved gene clusters (e.g., the atpA operon) and an unusual tRNALeu intron, it seems improbable that green, red and glaucophyte plastid genomes could have evolved to such similar ‘endpoints’ from different (but closely related) cyanobacterial endosymbionts (Kowallik 1997; Martin and Herrmann 1998; Stoebe and Kowallik 1999; Besendahl et al. 2000; Palmer 2003). Nevertheless, convergent evolution of plastid genome structure and content, as could occur if there were serious gene-specific constraints on the process of endosymbiont-to-nucleus gene transfer, cannot be dismissed outright (Palmer 2003; Stiller et al. 2003). Furthermore, as emphasized recently by Larkum et al. (2007), the recovery of monophyletic plastid sequences in phylogenetic trees does not necessarily mean that the organelle was acquired in a common ancestor shared exclusively by the organisms that harbor them (Howe et al. 2003). Much recent attention has thus been given to answering the question of whether phylogenies of mitochondrial, plastid and nuclear genes of red, green and glaucophyte algae agree with one another. The answer appears to be a qualified ‘maybe.’

It is now common practice to try and maxi­mize the extraction of ancient phylogenetic signal from molecular data by analyzing dozens to hundreds of loci together in the context of a single supermatrix (Delsuc et al. 2005). Applied to the question of primary plastid monophyly versus polyphyly, the first such phylogenomic analyses of concatenated mitochondrial-, plastid- and nucleus-encoded proteins yielded results consistent with the hypothesis that red and green algae are each other’s closest relatives (e.g., Burger et al. 1999; Moreira et al. 2000; Rodríguez-Ezpeleta et al. 2005; Burki and Pawlowski 2006). When available, glaucophyte sequences were also found to branch specifically with those of red and green algae, although the relative branching order of the three groups was not resolved (Moreira et al. 2000; Rodríguez-Ezpeleta et al. 2005).

Unfortunately, with more data and increased analytical sophistication, the interrelationships between primary plastid-containing algae have become less and less clear. Individual protein trees sometimes do not agree with one another, even when they correspond to different genes from the same genome (e.g., Stiller and Hall 1997; Longet et al. 2003; Kim and Graham 2008), and analyses of particular subsets of the data in isolation (e.g., slowly evolving proteins) sometimes yield trees that do not show red, green and glaucophyte algae as specific sister lineages (Nozaki et al. 2007, 2009). Such results have spawned alternate hypotheses, such as the idea that a truly ancient primary endosymbiotic event occurred in a common ancestor shared between members of the Archaeplastida and other eukaryotic groups that currently lack plastids, including members of the supergroup Excavata (Nozaki 2005; Nozaki et al. 2007). Parfrey et al. (2010) recently carried out a comprehensive ‘taxon-rich’, multi-gene analysis designed to resolve higher-order relationships amongst eukaryotes. These authors concluded that “…there is no support in any analysis for ‘Archaeplastida’ (‘Plantae’)”. In 2003, Palmer provided the following synopsis of the state of knowledge on early plastid evolution, which, as it has turned out, still fits today: “There is universal consensus that all well-recognized types of primary plastid-containing organisms fall into three groups, each clearly monophyletic: the green algae (including, of course, land plants), red algae and glaucophytes…. There is also broad consensus, based on many lines of evidence, that all three of these lineages ‘probably’ trace back to the same cyanobacterial endosymbiosis; that is, primary plastids arose once and only once. I say ‘probably’, because some authors regard the issue as settled and others see a need for more evidence” (Palmer 2003). One of the few points on which there is unanimous agreement is the notion that more data are sorely needed from glaucophytes and red algae: only expressed sequence tag (EST) data are currently available for glaucophyte algae, and red algae are at present represented by only a single (and apparently highly reduced) nuclear genome sequence (Matsuzaki et al. 2004). Fortunately, such data will soon be forthcoming and will undoubtedly give rise to another wave of phylogenomic analyses.

C. Primary Endosymbiosis and Genome–Proteome Mosaicism

One of the most profound recent advances in the field of plastid evolution has been recognition of the huge extent to which the cyanobacterial progenitor of the plastid appears to have contributed to the biochemistry and cell biology of the earliest photosynthetic eukaryotes. Endosymbiotic gene transfer has long been recognized as the mechanism by which endosymbionts surrender genetic material to their hosts (Martin et al. 1993; Martin and Herrmann 1998; Timmis et al. 2004; Kleine et al. 2009). Together with the evolution of an import apparatus for targeting the products of transferred genes, endosymbiotic gene transfer is an essential step in the transition from endosymbiont to organelle (Cavalier-Smith and Lee 1985; Theissen and Martin 2006; Cavalier-Smith 2007). In the days before whole genome-scale analyses, Weeden’s ‘product specificity corollary’ (Weeden 1981) posited that the products of transferred genes remain faithful to their subcellular compartment of origin: proteins functioning in the plastid that are not currently encoded in its genome are the product of cyanobacterial-derived nuclear genes that were present in the plastid ­progenitor (Weeden 1981). This has turned out to be true in many cases but it is by no means the rule (Martin and Cerff 1986; Martin and Schnarrenberger 1997; Martin 2010). Conversely, few would have predicted that the cyanobacterial ‘footprint’ on the nuclear genome of algae and plants would extend so far beyond the plastid and photosynthesis.

Pioneering work in the 1980s and 1990s by Martin, Cerff and colleagues provided the first glimpses of the remarkable degree of mosaicism now known to exist in plant metabolic pathways. For example, both the plastid-targeted and cytosol-localized isoforms of the Calvin cycle/glycolytic enzyme phosphoglycerate kinase (PGK) are of cyanobacterial ancestry (Brinkmann and Martin 1996). Such gene duplication-enabled functional reassignments are known as endosymbiotic gene replacements (Martin and Schnarrenberger 1997) and can also happen ‘in reverse’: in the case of plant fructose-1,6-bisphosphatase, the plastid-localized and cytosolic enzymes are derived from duplicated genes of cytosolic (i.e., eukaryotic host) origin (Martin et al. 1996). Endosymbiotic and reverse endosymbiotic gene replacements are now well recognized as generators of metabolic complexity and innovation, and are also useful markers for testing evolutionary hypotheses (e.g., Fast et al. 2001; Nowitzki et al. 2004; Patron et al. 2004; Rogers and Keeling 2004). The phenomenon has been aptly summarized as follows: “there is no evolutionary ‘homing device’ that automatically directs the product of a transferred gene back to the organelle of its provenance, the products of genes that are acquired by endosymbionts are free to explore any and all targeting possibilities within the cell; they can and do replace pre-existing host genes, or even whole pathways, and sometimes pre-existing host genes can be duplicated to provide organelle-targeted copies of host enzymes that can replace organelle-encoded functions” (Martin 2010).

The potential full scope of genome and proteome mosaicism in photosynthetic eukaryotes was revealed in 2002 with an analysis of the flowering plant, Arabidopsis thaliana. Martin et al. (2002) compared the complete set of ∼25,000 genes in the A. thaliana genome to the gene sets of yeast, archaea, bacteria, and cyanobacteria. Approximately 1,700 of the 9,368 A. thaliana genes whose ancestry could be inferred were deemed to be of cyanobacterial origin. Extrapolated to the whole genome, this amounts to ∼4,500 genes, or 18% of the complete gene complement. Unexpectedly, fewer than half of the genes of putative cyanobacterial ancestry were predicted to encode plastid-targeted proteins. Those that did not could be assigned to a wide range of predicted functional categories having nothing to do with the plastid, including cell division and intracellular transport (Martin et al. 2002). Conversely, Suzuki and Miyagishima (2010) recently estimated that ∼40% of the plastid-targeted proteins thought to have been present in the common ancestor of red algae and plants are not of cyanobacterial ancestry but are derived from the eukaryotic host and various bacterial groups. As amply demonstrated on a case-by-case basis for metabolic enzymes in plants (above), whole genome-scale analyses suggest that there is no strict correlation between the evolutionary origin of a given protein and the cellular compartment or biological process in which it presently functions (Martin 2010).

As striking as these numbers are, there are reasons to tread cautiously. Reyes-Prieto et al. (2006) carried out an analysis of cyanobacterial genes in an EST-based dataset assembled for the glaucophyte alga Cyanophora paradoxa, concluding that ∼10% (∼1,500) of the estimated 12,000–15,000 genes in the genome are cyanobacterial in origin. In contrast to the predictions for A. thaliana (Martin et al. 2002), these authors found that >90% of these proteins were predicted to be plastid-targeted, i.e., <10% of the cyanobacterial proteins in C. paradoxa appear to have plastid-independent functions (Reyes-Prieto et al. 2006). The reasons for these differences are not clear but could be both biological and methodological in nature (Archibald 2006b; Reyes-Prieto et al. 2006). In addition, there is growing evidence for the existence of non-canonical protein import pathways in algae. Primary plastids utilize an evolutionarily conserved import apparatus comprised of the Toc and Tic super-complexes (translocators of the outer and inner chloroplast membranes, respectively; Soll and Schleiff 2004; Gutensohn et al. 2006). Nucleus-encoded pre-proteins destined for the plastid possess a characteristic N-terminal transit peptide extension (McFadden 1999; Gould et al. 2008), and it is these extensions that are the target of in silico screens (Emanuelsson et al. 2000, 2007). Modern biochemical analyses have, however, revealed that we currently have a quite limited understanding of the biochemical determinants of plastid targe­ting. For example, only ∼60% of a set of 604 A. thaliana plastid proteins identified by proteomics contained plastid targeting signals that could actually be identified using bioinformatics (Kleffmann et al. 2004). Examples of ER-to-Golgi-to-plastid targe­ting have also been uncovered (Radhamony and Theg 2006). The take-home message is that our inferences about the extent to which endosymbiotic gene transfers and replacements have shaped the biology of photosynthetic eukaryotes are ultimately only as good as our ability to accurately determine where in the cell proteins actually function.

Several additional points are worthy of mention from the perspective of non-cyano­bacterial contributions to the establishment of primary plastids. Evolutionary analyses have revealed that most – but apparently not all – of the protein components of the Tic and Toc import machinery are demonstrably cyanobacterial in nature (Gould et al. 2008). One such exception is Tic110, a protein found in both red and green algae, which lacks a cyanobacterial counterpart and has been suggested to represent a host-derived contribution to the plastid protein import apparatus (McFadden and van Dooren 2004; Kalanon and McFadden 2008). In addition, phylogenetic analyses of plastid metabolite transporters reveal that they are of host, not endosymbiont, origin (Weber et al. 2006; Tyra et al. 2007). The primary endosymbiosis that gave rise to the plastid was clearly “…a period of considerable evolutionary experimentation, facilitated on one hand by functional redundancy at the level of enzymes and metabolic pathways, and on the other by combining the genetic potential of two very different cell types” (Archibald 2005).

Did other cells play a role as well? Some authors believe so. Huang and Gogarten (2007) uncovered 21 instances of apparent gene transfer from members of the bacterial genus Chlamydia into the algal nuclear genome and proposed that such genes are the remnants of a chlamydial endosymbiont that was somehow involved in cementing the relationship between the cyanobacterial progenitor of the plastid and its eukaryotic host. Interestingly, one of the chlamydial genes in the genome of the red alga Cyanidioschyzon merolae encodes an ATP/ADP translocase, which could have allowed the cyanobacterial endosymbiont to acquire energy from its host. There is little to go on in terms of confirming or refuting this hypothesis: on balance the data are also consistent with an ancestral relationship between cyanobacteria and Chlamydiae (Brinkman et al. 2002) or the presence of chlamydial genes in the cyanobacterial genome prior to the evolution of plastids. Regardless, the results of Huang and Gogarten are interesting in that most of the chlamydial genes in algal nuclear genomes encode proteins with predicted plastid targeting sequences, suggesting that they now contribute to the function of the organelle regardless of their origin (Huang and Gogarten 2007).

D. ‘Recent’ Cyanobacterial Endosymbioses: A Window on Plastid Evolution?

Evolutionary biologists work on the assumption that understanding processes taking place in modern-day organisms can shed light on events that have occurred in the past. Understanding the ancient origin of plastids is no exception. This section is devoted to discussion of two examples of recently established associations between microbial eukaryotes and cyanobacterial ‘endosymbionts’. We say ‘endosymbionts’ because it is often far from clear whether the term ‘endosymbiont’ or ‘organelle’ is most appropriate. When does an endosymbiont become an organelle? As noted in previous sections, gene transfer from endosymbiont to host is a major part of this process, and the evolution of a mechanism for importing protein products of such transferred genes is often consi­dered to be the tipping point (e.g., Cavalier-Smith and Lee 1985; Theissen and Martin 2006). There is seemingly no end to the number of recent host-endosymbiont relationships with the potential to improve our understanding of symbiogenesis and the origin of organelles (Nowack and Melkonian 2010).

Arguably the most striking example is that of the rhizarian testate amoeba Paulinella chromatophora, first discovered by the German biologist Robert Lauterborn in 1894. Lauterborn (1869–1952) noticed that P. chromatophora possesses one or two blue/green-pigmented bodies – ­chromatophores – in its cytoplasm and was clearly struck by their resemblance to cyanobacteria (Lauterborn 1895; Melkonian and Mollenhauer 2005). More than 100 years later, this organism has become the focus of intense genomic investigations to understand the precise nature of the chromatophore-host relationship. Preliminary molecular data indicated that the chroma­tophore was clearly not specifically related to canonical plastids, but rather to a specific sub-lineage of cyanobacteria, the Synechococcus/Prochloroccus group (Marin et al. 2005; Yoon et al. 2006). This connection has been firmly established by complete sequencing of the chromatophore genome (Nowack et al. 2008). At ∼1 Mbp in size and with only 867 protein genes, it is the smallest cyanobacterial genome yet sequenced. The genes it retains – and has lost – provide a fascinating window into the biology of the chromatophore and the extent to which it has integrated with its host (Keeling and Archibald 2008).

First and foremost, the chromatophore is clearly all about phototrophy: its genome contains a near-complete set of genes for photosynthesis. It also lacks many genes that would be predicted to be dispensable for an obligate endosymbiont, in particular, those encoding membrane transporters and proteins involved in certain amino acid and cofactor biosynthetic pathways (Nowack et al. 2008; Nowack and Melkonian 2010). With only a quarter of the coding capacity inferred to have been present in its free-living cyanobacterial progenitor, the chromatophore is obviously no longer an autonomous entity. But is it an organelle? Chromatophore division is known to happen in concert with its host (Hoogenraad 1927; Kies 1974; Johnson et al. 1988), an observation that fueled speculation that genes encoding division proteins had been transferred to the P. chromatophora nuclear genome (Archibald 2006a; Yoon et al. 2006). Indeed, while most of the ‘usual suspects’ for cell division are encoded by the chromatophore (e.g., FtsZ and MinD), sulA, a gene encoding an FtsZ polymerization inhibitor, is absent (Nowack et al. 2008). This gene might now reside in the nucleus.

The first definitive evidence for endosymbiotic gene transfer in P. chromatophora came not from cell division protein genes but for a core photosystem gene. Nakayama and Ishida (2009) showed that a cyanobacterial-derived, spliceosomal intron-containing psaE gene encoding subunit IV of the PSI reaction centre is located in the host nuclear genome. Exactly how a nucleus-encoded PsaE protein would make its way to the chromatophore was not immediately obvious. A canonical N-terminal plastid targeting signal was not detected (Nakayama and Ishida 2009) but a follow-up investigation revealed the presence of a signal peptide of the sort that directs co-translational insertion of proteins into the eukaryotic secretory pathway (Mackiewicz and Bodyl 2010). Bodyl and colleagues have now presented compelling data and arguments for the existence of a bona fide protein import apparatus in P. chromatophora, one that could involve divergent chromatophore-encoded Tic-Toc components and the host cell signal peptide secretion system (Bodyl et al. 2010; Mackiewicz and Bodyl 2010). Most recently, Nowack et al. (2011) used next-generation transcriptome sequencing to expand the number of endosymbiotic gene transfer candidates to 32, most of which encode small photosynthetic proteins. Combined with information gleaned from another chromatophore genome sequence from a second species (Reyes-Prieto et al. 2010), these authors speculate on the existence of a minimum of several dozen to perhaps as many as 100 chromatophore-derived genes in the Paulinella nuclear genome (Nowack et al. 2011). Whether the term ‘plastid’ should be used to describe the chromatophores of Paulinella species is perhaps a matter of taste, but ‘organelle’ would now seem to be entirely appropriate.

A second interesting example of recent endosymbiosis involving a eukaryote and a cyanobacterium is in the diatom Rhopalodia. This case is very different from the situation in Paulinella, and indeed from what is believed to have occurred in the primary endosymbiotic origin of canonical plastids, because the host was already photosynthetic. Diatoms are environmentally significant marine algae that acquired their ­plastids by secondary endosymbiosis, i.e., the engulfment of a primary plastid-bearing alga (in this case a red alga) by a eukaryotic heterotroph (see Chap. 2). Therefore, the selection pressures that would have driven the establishment of a permanent connection between the photosynthetic host and photosynthetic endosymbiont are not as clear-cut. In the case of R. gibba, the so-called ‘spheroid bodies’ reside within cytoplasmic vacuoles (Geitler 1977) and have been shown to be most closely related to members of the cyanobacterial genus Cyanothece (Prechtl et al. 2004), which are well known for carrying out nitrogen fixation. As was the case with Paulinella, genome sequencing has provided important insight into the raison d’etre of the Rhopalodia host-spheroid body association. Large genomic fragments from the R. gibba spheroid body genome reveal the presence of a complete set of N2-fixation enzymes and, interestingly, recent pseudogenization of numerous photosynthetic genes (Kneip et al. 2008). There is still much to learn about this fascinating system, but for now it is possible that Rhopalodia is well on it way to becoming a nitrogen-fixing eukaryotic organelle.

IV. Conclusion

Three decades ago we viewed the evolutionary origin of mitochondria and plastids as two sides of the same coin. At that time, the existing evidence supported the view that the two organelles had a ‘classical’ endosymbiotic origin from different eubacterial groups (α-Proteobacteria in the case of mitochondria, Cyanobacteria in the case of plastids) within an initially organelle-less but essentially eukaryotic host cell, with the mitochondrion emerging first and the plastid some time later. Endosymbiosis was followed by pronounced genome reduction as the transition from free-living bacterium to organelle progressed, with the chloroplast genome retaining a greater number of genes and a more pronounced resemblance to a bacterial ancestor than the mitochondrial genome. Endo­symbiotic gene transfer from organelle to nucleus contributed in a major way to the evolution of the resulting compo­site cell, with many initially proto-organellar proteins now performing their functions elsewhere in the cell, and/or acquiring new functions. At the same time, newly minted proteins, novel inventions within the eukar­yotic lineage, were acquired by the organelles and assumed essential roles in their biogenesis and function. Particularly prominent among these acquired proteins are membrane components that allow the regulated flow of both small metabolites and macromolecules (proteins but also RNA) across the double-membrane-bound organelles.

In the ensuing years, the evolutionary scenario for the plastid has changed little from that summarized above, but our understanding of the origin and subsequent evolution of the mitochondrion has undergone a substantial shift. With the recognition that mitochondria or mitochondrion-related organelles (MROs) are present in all eukaryotes that have been studied, the archezoan scenario has been severely challenged. It is still possible that the mitochondrion originated in a eukaryotic host cell populating an amitochondriate lineage (‘archezoan’) that has since become extinct; however, recent evidence and argument are turning the tide in favour of a symbiogenesis scenario, in which the host organism for the α-proteobacteria-like endosymbiont was a prokaryotic cell (archaeon?) rather than a eukaryote. Such a scenario raises the possibility that the origin of the mitochondrion was not only concurrent with the origin of the eukaryotic cell, but was in fact the sine qua non of eukaryogenesis.

Comparative analysis of mitochondrial and plastid proteomes has shown that these organelles are genetically highly mosaic, with organellar proteins having evolutionary origins well beyond the specific eubacterial lineages from which the organelles originated. Such studies are increasingly emphasizing how evolutionarily malleable organelles are, with a limited set of universally conserved core proteins and functions and a much larger assemblage of proteins that are phylogenetically diverse. Determining the functions of these lineage-specific proteins constitutes a formidable challenge.

As always, more data from phylogenetically strategic groups will be required to address many of the questions still outstanding about organelle evolution. For example, comprehensive genomic data from red algae and glaucophytes will be critical to resolving once and for all the question of primary plastid monophyly versus polyphyly. Additional data will undoubtedly yield many more examples of biochemical ‘tinkering’ in the course of mitochondrial and plastid evolution. We confidently expect that ‘Origins of Mitochondria and Plastids’ will continue to be a subject of debate for the foreseeable future, and we will not be at all surprised if our understanding of this evolutionary process takes a few more unexpected twists and turns as relevant new information continues to challenge currently accepted ideas.