Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

This chapter summarizes current knowledge on the diversity of mitochondrial DNAs (mtDNAs) with a focus on unusual genomes discovered in the last decade. For broader reviews on mtDNAs in general and for specialized reviews on the intriguing mitochondrial genomes of kinetoplastids, we refer the reader to earlier publications (Shapiro and Englund 1995; Lang et al. 1999; Burger et al. 2003b; Lukes et al. 2005).

In the context of this chapter, the term “unusual” combines various meanings. One is synonymous with departing from the traditional view on mtDNAs that was minted by the first published mitochondrial genomes in the 1980s and is still surviving in many textbooks. For example, “small is beautiful” was the title of a Nature news and views article (Borst and Grivell 1981) featuring the first report of a complete mitochondrial genome, that of human (Anderson et al. 1981). Not long after that, a number of other mammalian mtDNAs were sequenced cementing the impression that this genome is commonly a small circle of ~16 kbp including a dozen protein-coding and two dozen structural RNA genes. Another meaning of “unusual” denotes difference compared to the majority of mtDNAs. According to the compilation in NCBI’s Genome section (subdivision “Organelles”), the large majority of mtDNAs are 15–17 kbp long. This is obviously due to the much biased taxonomic sampling, with more than 3,000 sequences from animals, but only 450 from the other (50 or so) eukaryotic groups. A third meaning of “unusual” is deviation from the ancestral state. Mitochondria originated from an endosymbiotic α-Proteobacterium, and the ancestral genome was likely a bacteria-like circular molecule of a few million base pair with a thousand or so genes.

Here, we focus on mitochondrial genome size, genome architecture, and gene structure, touching upon posttranscriptional gene expression only insofar as it relates to exceptional gene structures. Otherwise, expression mechanisms of mitochondrial genes are dealt with more broadly in Chaps. 1012. We also compiled Internet-accessible data sources on mitochondrial genomes, listed in the Appendix. For bioinformatics tools used in mitochondrial genome annotation, we refer the reader to Chap. 17.

2 Taxonomic Background

Recently reported most exceptional mitochondrial genomes are from three different groups of unicellular eukaryotes, ichthyosporeans, diplonemids, and dinoflagellates. Before describing their mtDNAs, we will briefly introduce what these organisms look like, how and where they make their living, and where they belong in the eukaryotic phylogenetic tree.

Traditional classification has subdivided eukaryotes into animals, fungi, and plants, and termed the leftover “protists” – a panoply of obscure creatures belonging to none of the three former divisions. More recent phylogeny-based classifications have abandoned the notion of a protist lineage, while animals and fungi, for instance, are now united together with certain protists from which the former two groups emerged. This new clade known as opisthokonts is one of the groups relevant to the present chapter. The other clades of interest are euglenozoans and alveolates, and all three are phylogenetically extremely distant from one another (Fig. 3.1).

Fig. 3.1
figure 1_3

Phylogenetic positions of ichthyosporeans, diplonemids, and dinoflagellates. Dashed lines indicate uncertain relationships

Amoebidium (Opisthokonta) is the best-described genus of ichthyosporeans and has been recognized not too long ago as a close unicellular relative of animals (Lang et al. 2002). This protist group has an amazing lifestyle and morphology. Amoebidium species populate the armor of freshwater crustaceans and insect larvae, hitch-hiking (epibionts) rather than parasitizing (Fig. 3.2). In nature, an Amoebidium cell grows as a tiny bush with a thick cell wall (filamentous microthallus) and contains multiple nuclei. For asexual reproduction, microthalli produce uninuclear, naked amoeboid cells (hence the genus name) or walled spores [Fig. 3.2b–d; (Lichtwardt 1986)]. Under rich culture conditions in the laboratory, we observe large spheres, inside which form dozens of small daughter cells that are eventually released [Fig. 3.2a (Jostensen et al. 2002; Ruiz-Trillo et al. 2007)]. Ichthyosporea are specifically related to Capsaspora, choanoflagellates and animals that together form the Holozoa (Lang et al. 2002; Ruiz-Trillo et al. 2008). The only ichthyosporean for which mitochondrial genome information is available is Amoebidium parasiticum (for references, see below).

Fig. 3.2
figure 2_3

Amoebidium cell morphology. (ac) Light microscopy. (a) Spheric growth in synthetic medium (courtesy of Inaki Ruiz-Trillo). Shown is a relative of Amoebidium, Spheroforma arctica, which under these conditions, has the same morphology as A. parasiticum. (b) Sporangiospores released from an A. parasiticum thallus that grew on a waterflea. (c) A. parasiticum thalli sitting on the antennae of a waterflea (b, c, courtesy of Robert Lichtwardt). (d) Amoebidium lifecycle adapted from Lichtwardt (1973) by adding spheric forms that occur only in synthetic medium

Diplonemids (Euglenozoa) are the poorly known sister group of the notorious kinetoplastids. Species of the two diplonemid genera, Diplonema and Rhynchopus, are abundant in marine habitats, but some occur also in freshwater. In contrast to kinetoplastids, which include numerous obligatory parasitic species pathogenic to humans, life stock and plants, diplonemids are free-living. They only occasionally parasitize lobsters, clams, and diatoms or cause the sudden decay of aquarium plants (Kent et al. 1987). Diplonemids in the “feeder” stage (under optimal growth conditions) are oval to pear-shaped with two flagella at one tip that are clearly visible in Diplonema, but concealed in Rhynchopus species [Fig. 3.3a–c; (Roy et al. 2007a)]. Rhynchopus has two morphologies. In addition to the moderately fast-moving, short-flagellated “feeder” under conditions when food is abundant, transformation is seen to a leaner, rapidly swimming “swarmer” with long flagella when nutrients are in short supply [Fig. 3.13d; (Vickerman 2000; Roy et al. 2007a)]. Mitochondrial genome information is available for three Diplonema species (D. papillatum, D. ambulator, D. sp. 2), and one from Rhynchopus (R. euleeides; for references, see below). The best characterized mtDNA is that of D. papillatum.

Fig. 3.3
figure 3_3

Diplonemid cell morphology. (ac) Light microscopy of Rhynchopus euleeides. (a, b) Feeder cells dynamically changing shape. (c) Swarmer cell. (d) Scanning microscopy of Diplonema papillatum (courtesy of Brian Leander)

Dinoflagellates (Alveolata) are important unicellular organisms of the marine and aquatic ecosystems. This taxon is extremely diverse morphologically and biologically, including photosynthetic and heterotrophic species, predators, and symbionts. They not only are notorious for toxic red tides, but also engage in beneficial partnerships with reef-building corals. Photosynthetic dinoflagellates are major contributors to ocean carbon fixation. The great diversity of cell morphologies is achieved by flattened membrane sacs (alveoli) beneath the plasma membrane. These sacs may be rigid due to polysaccharides and form armors of most ludicrous shapes (Fig. 3.4). All dinoflagellates have two flagella that are inserted at the same point (in heterotrophic taxa at the “mouth”). One flagellum wraps around the cell, while the other is oriented perpendicularly to the first, and their combined action generates a whirling swimming pattern sometimes of prodigious speed. Investigation of dinoflagellate mitochondrial genomes was initiated in M.W. Gray’s group (Norman and Gray 1997). Today, mtDNA data are available for about 15 different dinoflagellates species, and the ones with most genomic sequence available are Amphidium carterae (33 kbp), Alexandrium catenella (27 kbp), and Karlodinium micrum (25 kbp). Further mtDNA and/or EST data is available from Crypthecodinium cohnii, Gonyaulax polyedra, Oxyrrhis marina, Prorocentrum micans, Katodinium rotundatum, Lingulodinium polyedrum, Heterocapsa triquetra, Pfiesteria piscicida, Karenia brevis, Symbiodinium, and Noctiluca scintillans (for references see below).

Fig. 3.4
figure 4_3

Dinoflagellate cell morphology. Scanning electron microscopy of (clockwise from top left) Protoperidinium claudicans; Ornithocercus sp.; Goniodema sphaericum; Phalachroma cuneus. Images courtesy of David Hill

3 Approaches to Studying mtDNAs

When a newly described mtDNA is “usual,” few will question whether the genome is truly mitochondrial. Indeed, if the DNA has a “normal” circular-mapping shape (see Sect. 3.3.2), and if there is really only one chromosome, then there would be little reason to doubt it as the mitochondrial genome. Reservations only arise when an apparent mtDNA does not conform. To substantiate nonconformity, a number of biochemical methods are being employed.

3.1 Genome Localization

A major concern vis-à-vis an unusual mtDNA is whether the genome indeed resides in mitochondria of the particular organism. Large chunks of mtDNA inserted in nuclear genomes are relatively commonplace [e.g., Arabidopsis (Bensasson et al. 2001) and human (Richly and Leister 2004); for more details, see Chap. 7], and might be mistaken for the genuine organelle genome. Often used for separating mitochondrial DNA from nuclear DNA is elevated A + T content (i.e., in combination with dyes such as bisbenzimide or Hoechst dye that bind preferentially to A + T-rich DNA), or the propensity to form supercoils in the presence of intercalating dyes. Both alter the physical density of molecules, and CsCl equilibrium density centrifugation is able to distinguish these DNAs based on these differences. But these features will not stand critical perusal, because not all mtDNAs are richer in A + T than the nuclear DNA of a given organism, and only few mtDNAs can form supercoils as detailed in the next paragraph. A deviant mtDNA should be confirmed to come from isolated, pure mitochondria, or at least shown that it can be co-enriched with mitochondrial particles. For experimental methods, see (Lang and Burger 2007).

3.2 Mitochondrial Genome Shape

The topology of a genome can be inferred by various techniques, but on their own, each method has its particular limitations. For example, in DNA sequencing and restriction mapping, both rings and the more common linear tandem-arranged molecules appear as circular. Apparent circles determined by these two techniques should be referred to as circular-mapping. Circular and linear-tandem shapes are easily discernable by pulsed-field electrophoresis (PFE), but this technique cannot resolve more complicated shapes such as branched molecules. These latter forms are more readily discernable by electron microscopy (EM) or fluorescence microscopy [see e.g., (Bendich 1993)].

3.3 Number of Mitochondrial Chromosomes

The number of chromosomes is often inferred from PFE and EM, but these techniques determine the number of size classes and not of distinct chromosomes. To provide a comprehensive picture of a mitochondrial genome’s architecture, it is necessary to combine the two methods with DNA sequencing. It should be noted that mitochondrial chromosome numbers reported in the literature are not always comparable, because they can be counted in different ways. Here, we define a mitochondrial chromosome as a (assumed) self-replicating unit, not considering recombination products. For instance, the multiple mtDNA molecules observed in angiosperms are recombination-generated subcircles of a single large mastercircle (Fauron et al. 1995); see discussion in Sugiyama et al. (2005), so the number of mitochondrial chromosome types in these plants is one. Another special case is Dicyema, a mesozoan animal of uncertain phylogenetic affiliation. Its mtDNA is a collection of small circles each carrying a single gene (Watanabe et al. 1999). Analysis of different tissues showed that the germ line of these animals contains one type of a larger master molecule, while the reported mitochondrial minicircles reside only in differentiated somatic cells that no longer replicate mtDNA (Awata et al. 2005). Again, under the above premise, the mitochondrial genome of Dicyema consists of a single chromosome.

Furthermore, we distinguish between the number of distinct chromosome types and the number of identical copies. For example, a trypanosome mtDNA is sometimes described as being composed of ~25 mitochondrial maxicircles (the molecule carrying the genes encoding proteins and structural RNAs) and ~5,000 minicircles [e.g., (Lukes et al. 1998)]. However, the figure for maxicircles refers to the number of identical copies, while that for minicircles, which carry gRNA genes involved in RNA editing, represents the total count of molecules including a hundred or so distinct types (Simpson 1997).

4 Mitochondrial Genome Architecture and Genome Size

Prevailing throughout eukaryotes is a single type of mitochondrial chromosome (in multiple copies), and this is presumably the ancestral state inherited from the mitochondrion’s bacterial predecessor. There are, however, several exceptions to this rule in animals, fungi, and green algae, where canonical mitochondrial genes are distributed over a few, and in one case 18, chromosomes (Table 3.1). As we will see below, the mitochondrial genes from Amoebidium, diplonemids, and dinoflagellates are spread out over many more chromosomes and in most peculiar ways.

Table 3.1 Multipartite mitochondrial genomes

The majority of mtDNAs are circular-mapping with a physical shape of linear, head-to-tail concatenated molecules (Bendich 1993). Concatemers are likely the product of rolling-circle DNA replication as demonstrated in yeast (Ling and Shibata 2004) and the liverwort Marchantia (Oldenburg and Bendich 2001). Less frequent are truly circular mitochondrial chromosomes and monomeric linear molecules that occur sporadically across the eukaryotic tree [(Valach et al. 2011) and references therein; for multipartite genomes, see Table 3.1].

A size of 15–20 kbp – as seen in many animals – has long been regarded typical for mitochondrial genomes, but more recent data have changed this view dramatically. In animals alone, mtDNAs range in size from 11 kbp [(Sagitta decipiens (chaetognath) (Miyamoto et al. 2010)] to 43 kbp [(Trichoplax (placozoan) (Dellaporta et al. 2006; Burger et al. 2009)]. For eukaryotes as a whole, the smallest mtDNAs are found in Plasmodium and relatives, with 6 kbp only (Feagin et al. 1991), and the largest in the cucumber family with ~3,000 kbp (Ward et al. 1981). Currently, the largest fully sequenced mtDNA is that of pumpkin (Cucurbita pepo) with ~1,000 kbp (Alverson et al. 2010). In the face of such diversity, there is little meaning in sustaining the notion of a “usual” mitochondrial genome architecture and size.

4.1 Multiparite Genome Architectures in Amoebidium, Diplonemids, and Dinoflagellates

In the following, we will describe the makeup of the mitochondrial genomes in Amoebidium, diplonemids, and dinoflagellates. In all three cases, a novel, deviant mtDNA architecture has been observed, and the genome has been demonstrated to be located indeed inside mitochondria, with experimental evidence available for Amoebidium parasiticum (Burger et al. 2003a), the diplonemid D. papillatum (Marande et al. 2005), and the dinoflagellate C. cohnii (Jackson et al. 2007).

The mitochondrial genome of Amoebidium is extremely large. As of today, ~170 kbp of the genome have been sequenced, but it is still incomplete and the total size is unknown. There are probably several hundred different types of chromosomes that are all (monomeric) linear ranging in size from ~7 to 0.2 kbp (Fig. 3.5, left panel). Agarose gel electrophoresis visibly separates only the ten or so largest chromosomes (7–3.5 kbp), while the smaller ones (<3.5 kbp) are so numerous that they appear as a contiguous smear. Physical size determination of the linear chromosomes and size estimation based on DNA sequencing are in full agreement (Burger et al. 2003a).

Fig. 3.5
figure 5_3

Appearance of mitochondrial DNA. Electrophoretic separation on agarose gel of mtDNAs from Amoebidium parasiticum (Amo), Diplonema papillatum (Dip), and the dinoflagellate A. carterae [(Din); courtesy of Ellen Nisbet]

In D. papillatum, ~220 kbp of mtDNA have been sequenced and the estimated genome size is in the order of 600 kbp. This genome appears to possess somewhat fewer distinct mitochondrial chromosomes than that of Amoebidium, with probably about one hundred in total. But in contrast to Amoebidium, Diplonema molecules are circular, and furthermore, they fall in only two different size classes of 6 and 7 kbp (Fig. 3.5, middle panel). Size and shape were determined by gel electrophoresis and EM, and distinctness of chromosomes within size classes was established by DNA sequencing. Three other diplonemid species, D. ambulator, D. sp. 2, and R. euleeides also have circular mitochondrial chromosomes of unequal size classes, but sizes range between 5 and 10 kbp (Roy et al. 2007b; Kiethega et al. 2011).

Dinoflagellate mtDNAs have confused us from the outset. Southern hybridization of uncut mtDNA reveals a continuum of molecule sizes from ~15 kbp (upper size limit of resolution under the particular experimental conditions) down to 0.5 kbp [Fig. 3.5, right panel; (Norman and Gray 2001; Jackson et al. 2007)], whereas preliminary PFE separation in A. cartera indicates ~30-kbp molecules (Nash et al. 2007). In any case, DNA sequencing shows that genes occur as multiple copies all in different contexts, probably reflecting a high level of recombination. Apparently, this has resulted in inflated genome sizes, and sequencing of upward of 30 kbp in some taxa continues to find novel genomic combinations (Nash et al. 2008; Waller and Jackson 2009). Whether these hundreds of mitochondrial DNA molecules detected in electrophoresis and Southern analyses are self-replicating units or recombination products of one or several master molecules remains obscure. Therefore, the notion of chromosomes is not applied in this system; instead, “mtDNA elements” is being used (Norman and Gray 2001; Jackson et al. 2007).

5 Noncoding Regions of mtDNA

Genomic sequences are subdivided into coding regions – which are the stretches occupied by genes specifying proteins and structural RNAs and may or may not contain introns – and noncoding regions. These latter harbor replication and transcription origins (known for only few mtDNAs), telomeric repeats in linear molecules, and then what is sometimes referred to as “junk” DNA, i.e., genome regions of unknown biological role. Note that experimental determination of essential and nonessential mtDNA regions is not readily tractable, because reverse genetics techniques are not available for the large majority of mitochondrial systems; and where established [yeast, Chlamydomonas (Butow and Fox 1990)], the methodology is extremely cumbersome.

Some lineages such as vertebrate animals, Chlamydomonas, and apicomplexans have streamlined their mtDNAs to a degree that genes are cramped with virtually no space in between. In other lineages, mtDNAs expand through accumulation not only of introns inserted in genes, but also of repeats and other untranscribed sequences between genes. Angiosperm mtDNAs also accumulate large quantities of mostly inactive chloroplast sequences [for a review, see (Kubo and Newton 2008)]. Noncoding regions are the major contributor to size differences of mtDNAs from closely related species. For example, the mtDNA of different Schizosaccharomyces species varies in size by a factor of four despite the nearly identical gene content (Bullerwell et al. 2003).

5.1 Conspicuous Noncoding mtDNA Regions of Amoebidium, Diplonemids, and Dinoflagellates

In all three taxa, Amoebidium, diplonemids, and dinoflagellates, it is the noncoding part of mtDNA that predominates.

In Amoebidium, the overall noncoding portion of its mtDNA is estimated at 80%, ranging between 20 and 100% for individual chromosomes. In fact, the majority of small chromosomes (>2 kbp) lacks recognizable coding sequence. The noncoding regions are structured in an amazingly regular pattern [Fig. 3.6a; (Burger et al. 2003a)]. At the very ends of linear mitochondrial chromosomes sits one copy each of a ~45-nt motif (termed repx) in inverted orientation that has the propensity to form a guanine quadruplex structure known to stabilize ends of nuclear chromosomes [Figs. 3.5, left panel and 3.6a, large arrowheads; (Bartoszewski et al. 2004)]. Terminal inverted repeats have been implicated in replication initiation (Pritchard and Cummings 1981) and in circularization of linear molecules prior to replication (Rycovska et al. 2004). The terminal repeats in Amoebidium mtDNA are flanked by subterminal motifs that are specific for the “left” (repa) and “right” (repb) ends of chromosomes (Fig. 3.6a, circles and octagons). These motifs are most likely involved in transcription initiation (repa) and termination (repb), as all genes are oriented in the direction repa → repb. In addition, Amoebidium mitochondrial chromosomes enclose in their central region at least 20 further repeat motifs (>50 bp) that all have the same orientation and are arranged either as dispersed single units or in tandem arrays (Fig. 3.6a, small arrows/arrowheads).

Fig. 3.6
figure 6_3

Mitochondrial chromosome structures. (a) Amoebidium parasiticum; (b) Diplonema papillatum; (c) Dinoflagellates. Protein-coding sequence is shown as shaded cylinders, structural RNAs as black boxes. Repeats are shown by arrows/arrowheads. In Amoebidium, terminal repeats and subterminal motifs (large triangles and circles/octagons, respectively) are found on each chromosome. In diplonemid circular chromosomes the black region is common to all chromosomes (irrespective of their class), and the constant regions A and B are common to class A and B chromosomes, respectively. In dinoflagellates, head-to-head arrows indicate inverted repeats that are abundant in intergenic regions. cob-E1 to E4, exons of the cob gene. LSUE, fragment number 5 of the large subunit rRNA gene

Diplonemid mtDNA contains ~95% noncoding sequence on each of its numerous circular chromosomes. These noncoding regions display a highly regular structure that is diametrically opposite from what is seen in Amoebidium [Fig. 3.6b; (Vlcek et al. 2011)]. In D. papillatum, the two size classes of circular chromosomes (6 kbp, Class A and 7 kbp, Class B) contain a contiguous stretch of approximately 5.7 and 6.7 kbp noncoding sequence that is nearly identical in molecules of the same class. These sequences include a moderate number (>10) of distinct repeat motifs (40 bp or longer), all oriented in the same direction, some dispersed others in tandem (Fig. 3.6b, arrowheads). Most conspicuous is a 2.2 kbp-long tandem array of a ~70-bp motif in class B chromosomes. Shared between Class A and B chromosomes is a stretch of 2.6 kbp, which likely includes the replication origin [Fig. 3.6b, black bars; (Vlcek et al. 2011)]. Noncoding sequence that is unique to individual chromosomes is on average only ~150 bp long and flanks the coding region on both sides (see below).

In dinoflagellates, noncoding regions of mtDNA are roughly estimated at 85% for A. carterae [the only taxon for which mtDNA has been purified and randomly sequenced; (Nash et al. 2008)]. A conspicuous feature of dinoflagellate noncoding mtDNA (noted from this species and C. cohnii and K. micrum) is the presence of numerous distinct repeat motifs, most in inverted orientation, with the propensity to form densely packed arrays of stem-loop structures that occasionally overlap genes [Fig. 3.6c; (Norman and Gray 2001; Jackson et al. 2007; Nash et al. 2007)]. Palindromic sequences have been reported before in mtDNAs from various other taxa and have been implicated in different biological processes. For example, the double-hairpin elements present in several fungal mtDNAs are believed to be mobile, spreading across considerable phylogenetic distances and mediating lateral gene transfer (Paquin et al. 2000; Bullerwell et al. 2003). In other organisms, palindromes have been proposed to play roles in mitochondrial recombination (Bartoszewski et al. 2004), replication (Kornberg and Baker 1992), chromosomal rearrangements (Lewis and Cote 2006), and transcript stability (Kuhn et al. 2001). Which role(s) palindromic sequences play in dinoflagellates is yet to be unraveled.

6 Mitochondrion-Encoded Genes

The noncoding regions substantially define the architecture and regulation of mitochondrial genomes, but it is the gene coding regions that define genome functionality. While the biological processes and gene sets residing on mtDNAs are generally confined to a handful of common categories, there is still considerable variation seen throughout the eukaryotic groups. Moreover, the structure of genes themselves can be subject to further modification.

6.1 Pathways and Biological Processes Involving mtDNA-Encoded Genes

Broad sampling of mtDNAs from throughout the eukaryotic tree shows that the pathways and biological processes involving mtDNA-encoded genes are confined to a very select set (Table 3.2). This is in spite of mitochondria performing numerous biological functions, the majority of which rely entirely on nucleus-encoded genes. Two processes that universally depend on at least some mtDNA-encoded genes are electron transport plus oxidative phosphorylation (often referred to collectively as OXPHOS), and mitochondrial translation. Whereas the first of these processes necessitates protein-encoding mitochondrial genes, the latter may be only represented by mitochondrial genes encoding structural RNAs, particularly the ribosomal RNAs (rRNAs) and often, but not always, tRNAs. The basic requirement of the mtDNA to service these two functions has been attributed to the necessity for fast, organelle-based regulation of the redox state of mitochondria through control of oxidative phosphorylation [see Chap. 5 on the CoRR hypothesis (Allen 2003)], and for the presumed difficulty in importing large structural RNAs such as the rRNAs. Mitochondrial genomes that encode only genes for OXPHOS and translation are broadly found among the “usual” mtDNAs of most animals and fungi, but also in more disparate groups including apicomplexans and chlorophycean algae. It is common, however, for mtDNAs to encode molecules involved in a small number of other processes (see Table 3.2). These generally include not only the transmembrane protein transport via the twin-arginine translocase, but also, rarely, the SecY-type transport system. In many lineages, the process of cytochrome c maturation, namely heme transport into the inner-membrane space and its covalent linkage to cytochrome c, are controlled by mitochondrial genes. An RNase P RNA for tRNA processing, and a cytochrome oxidase assembly protein, are specified by mtDNA in select lineages. Finally, mitochondrial genes for transcription have been detected, but in only a single lineage, the jakobids [for a review, see (Gray et al. 2004)].

Table 3.2 Gene content and biological processes encoded by mtDNAs across eukaryotic diversitya

All of the mitochondrial processes outlined above are of bacterial origin and are directly derived from the organelle’s bacterial progenitor. The breadth of functions covered by mtDNA-encoded genes generally corresponds to their level of gene reduction, but not necessarily to their phylogenetic affinities. This is evidenced by several of the pathways being distributed seemingly randomly across disparate lineages (Table 3.2). Furthermore, it implies that the genes for these pathways persisted well after the radiation of most eukaryotic groups, and that loss from the mitochondrial genome has happened numerous times independently. Thus in terms of eukaryotic diversity there is really no such thing as a “usual” set of mitochondrion-encoded biological processes.

6.2 Gene Sets

The mtDNA-encoded genes representing the processes discussed above follow themselves a pattern where some are almost universally retained, and others show multiple instances of independent loss (Table 3.2). The large and small subunit rRNA genes (rnl, rns) are encoded on all known mtDNAs, whereas the gene for mitochondrial 5 S rRNA is found only sporadically [namely, in some plants, green, red, and brown algae, amoebozoans, and jakobids (Gray et al. 2004)]. The number of mtDNA-encoded tRNAs is quite variable. In many instances, the gene set serves all codons observed in mitochondrial protein-coding genes, but partial sets and complete absence is seen as well (see Chap. 17). Select genes for the respiratory chain complexes and oxidative phosphorylation are universally retained, while others show a hierarchically pattern of those frequently retained to those seldom retained on mtDNA. The genes for cytochrome b (cob) of Complex III, and cytochrome oxidase subunit 1 (cox1) of Complex IV reside in all, and additional genes for Complex IV (cox2 and cox3) in most mtDNAs. Complex I subunits are mitochondrion-encoded in most eukaryotes, the basic gene set including nad1, nad4, and nad5, and in most cases also nad2, nad3, nad4L, and nad6, while further nad genes are less frequent. In some lineages such as apicomplexans and certain fungi in the Saccharomycetales, this complex has been entirely lost, and the function has been substituted by a nucleus-encoded single-subunit enzyme (van Dooren et al. 2006). Several genes for Complex V are typically located on mtDNA (notably atp6, atp8, and atp9), although mtDNA-encoded genes for this complex are completely absent from apicomplexans, some green alga and some animals. Finally, genes for Complex II are less common in mtDNAs, present only in a few lineages.

Genes for ribosomal proteins are the other major class of mtDNA-encoded genes (Table 3.2). While up to 27 such genes are found in jakobids, other lineages contain few or no such genes on their mtDNAs. When multiple ribosomal protein genes exist, genes for the small subunit are most common (notably rps1-4, 7, 8, 11–14, 19) with typically fewer genes for the large subunit (notably rpl2, 5, 6, 11, 14, 16). Analyses of various plant mtDNAs demonstrate well that mtDNA gene loss has taken place frequently and independently. Some ribosomal protein genes were lost over 40 different times within plants alone (Adams et al. 2002; Bergthorsson et al. 2003).

A small set of rarely occurring mitochondrial genes were probably gained secondarily through lateral gene transfer. These specify DNA and RNA polymerases (dpo and rpo) and maturase (matR) and reverse transcriptase (rtl) that are involved in intron propagation and splicing, and are typically comprised in introns, but sometimes free-standing. Rarely, mtDNAs encode DNA mismatch repair protein (mutS) and adenine methyl transferase (dam or mtf) (Gray et al. 2004).

ORFs (unidentified open reading frames) constitute the final gene class, and these are potential protein-encoding genes. They might be either unrecognized, highly divergent versions of common genes [e.g., the former orfB (atp8) (Gray et al. 1998), ymf39 (atp4) (Burger et al. 2003c) and murf1 (nad2) (Kannan and Burger 2008)], or open reading frames by chance, neither transcribed nor translated. The highest numbers of ORFs (up to 100 and more) are correlated with the inflated genome sizes seen in plants; the majority of these ORFs most likely occur by chance.

The mtDNAs with the largest gene count belong to the Jakobida, with 66 identified protein-encoding genes and a further 31 genes for structural RNAs (Lang et al. 1997). This genome contains all of the genes represented on any other mtDNA, in addition to unique ones such as ssrA, which specifies tmRNA that releases stalled ribosomes from “stop-less” mRNAs (Jacob et al. 2004). This most ancestral gene complement is restricted to the jakobids. At the other end of the spectrum are the more common minimal mtDNA with 4–9 genes: rns, rnl, cob, and cox1, and often also cox2, cox3, nad1, nad4, and nad5.

6.3 Gene Structure

The “prototype” gene structure is a contiguous coding sequence that corresponds to a single contiguous product. In mitochondria, the coding sequence may be interrupted by one or more Group I or Group II introns that are removed posttranscriptionally. Introns are most abundant in plants and fungi, and a few protist lineages [for a recent review, see (Lang et al. 2007)], but are completely lacking in other protists, e.g., apicomplexans, ciliates, and kinetoplastids.

Several mitochondrial genes have broken the basic convention of gene structure in a number of creative ways. Perhaps, the simplest of these is that genes have become discontinuous, resulting in multiple gene products instead of one. This is the case with rRNA genes that can be split into two pieces as for rnl in ciliates, green algae, fungi, and animals (Heinonen et al. 1987; Boer and Gray 1988; Nedelcu et al. 2000; Forget et al. 2002; Dellaporta et al. 2006), to more than 20 pieces as for the apicomplexan rns and rnl (Feagin et al. 1997). Gene pieces are usually arranged on mtDNAs in an unordered fashion, and the corresponding transcripts assemble in the ribosome through intermolecular interactions.

Protein-encoding genes are also known to fragment, with two possible outcomes for the protein product: either discontinuous or contiguous. For example, the nad1 gene of ciliates is split into two coding sequences that apparently result in a split protein (Edqvist et al. 2000). A gene split can also be accompanied by relocation of one or both parts to the nucleus. This has happened several times independently in green algae, plants, amoebozoans, and apicomplexans (Nedelcu et al. 2000; Adams et al. 2001; Funes et al. 2002a, b; Waller and Keeling 2006; Gawryluk and Gray 2010). The second outcome for fragmented genes is restoration of a single gene product at the RNA level, and the processes that enable this are discussed in Sect. 3.7.1.

A further variation on gene structure seen in mtDNAs is gene fusion. The Complex IV genes cox1 and cox2 are found in the same reading frame and produce a single mRNA in some amoebozoans (Burger et al. 1995; Ogawa et al. 2000). In Acanthamoeba castellanii, there is evidence that the two proteins are separate, but it is not known whether this is due to an unusual translation termination of the cox1/cox2 mRNA or to posttranslational processing (Lonergan and Gray 1996).

6.3.1 Fragmented, Recombined, and Intact Gene Versions in Amoebidium mtDNA

In the currently sequenced portion of Amoebidium mtDNA, we find all genes known from animals, plus three ribosomal protein genes that are also present in their choanoflagellate neighbors (Burger et al. 2003a) (Table 3.3). Unlike mitochondrial genes from the other Holozoa, those of Amoebidium enclose more than 20 introns that are predominantly of the Group I type. Surprisingly, Amoebidium mtDNA contains many gene fragments in addition to complete gene versions. For four nad genes (see Table 3.3), complete gene versions are missing and are thought to be located on as-yet-unsequenced chromosomes. It is also conceivable that that these genes are in the process of migrating to the nucleus leaving incomplete pseudogenes behind. This idea will be testable when more nuclear genome data become available.

Table 3.3 Mitochondrial genes from amoebidium, diplonema, dinoflagellates, and their relativesa

The most abundant gene class in Amoebidium mtDNA is tRNA genes, existing in astounding numbers. Among the 85 tRNA-like sequences [identified by tRNAScan (Lowe and Eddy 1997)], 54 are bona fide functional genes and 31 are pseudo genes (Burger unpublished). The large majority (80%) of all tRNA genes (functional plus pseudo) reside on only three chromosomes in clusters of 20–25 genes (while protein-coding genes are single or grouped by two at most per chromosome). Functional tRNA genes occur in up to four almost identical copies and are often arranged in tandem. Pseudo tRNA genes appear to be mostly recombination products of two or more functional tRNAs. The explosion of tRNA-related sequences indicates ongoing and frequent recombination among and within chromosomes, likely facilitated by similar sequence motifs in tRNA genes as well as intergenic repeat elements.

6.3.2 Systematically Fragmented Mitochondrial Genes in Diplonemids

As in Amoebidium, the mtDNA of D. papillatum is not fully sequenced. The gene content in the currently known portion of Diplonema mtDNA (Vlcek et al. 2011) is not much different from that of its sister group, the kinetoplastids (Table 3.3), and includes eleven protein-encoding genes with the common atp6, cob, cox1-3, and five nad genes. The situation for structural RNA genes is less clear. For rnl, only the 3′ terminal portion of the gene has been detected. Apparently, LSU rRNA is fragmented, but the number of pieces is uncertain. The otherwise omnipresent and well conserved rns remains elusive. Transfer RNA genes seem to be missing from mtDNA of diplonemids, as is the case in kinetoplastids (Simpson et al. 1989).

Gene identification is most difficult in Diplonema mtDNA, not only because the sequences are highly divergent, but also because of a most startling dispersed-fragmented gene structure. In fact, all genes in D. papillatum mtDNA seem to be broken up into multiple pieces. But in contrast to LSU rRNA, protein-coding regions rejoin at the RNA level. The absence of complete protein gene versions from both the mitochondrial and nuclear genome has been confirmed by PCR on total cellular DNA.

Gene fragmentation in D. papillatum mtDNA is surprisingly regular, with pieces (also referred to as modules) of relatively constant size (on average 170 nt, +/−100), so that genes consist of a total of four (e.g., the small atp9 gene) to twelve (the large nad7 gene) parts. Even more surprising, each gene module resides on a separate chromosome, rationalizing the large number of distinct chromosomes in Diplonema mitochondria. But exuberance is paired with parsimony: no chromosome has been detected that does not contain a (potential) short coding region (in contrast to Amoebidium mtDNA, where the majority of chromosomes appears to be noncoding; see above). The peculiar, systematically fragmented structure of mitochondrial genes not only occurs in D. papillatum, but also is shared by all diplonemids as we conclude from a survey of the cox1 gene in three additional species from both diplonemid genera (Kiethega et al. 2011).

6.3.3 Not So Systematically Fragmented Mitochondrial Genes in Dinoflagellates

The gene content of dinoflagellate mtDNA most likely reflects that of the sister phylum Apicomplexa, containing only cob, cox1, and cox3, along with heavily fragmented rns and rnl, and no tRNAs (Norman and Gray 2001; Jackson et al. 2007; Kamikawa et al. 2007, 2009; Nash et al. 2007; Slamovits et al. 2007). Although the complex structure of dinoflagellate mtDNAs has prevented a complete genome survey, broad sampling of this genome has been conducted in several taxa, and there is no evidence of further mitochondrial genes. For example, cox2 has relocated to the nucleus in split form (Waller and Keeling 2006), and EST data suggest that dinoflagellates share the loss of Complex I (neither mitochondrial nor nucleus-encoded genes are found) with apicomplexans (Waller and Jackson 2009), fission yeasts (Bullerwell et al. 2003), and a subgroup of budding yeasts including Saccharomyces cerevisiae (Foury et al. 1998; Su et al. 2011).

The complete set of rns and rnl fragments is yet to be identified in dinoflagellate mtDNA, but based on available information, they appear to closely resemble those in apicomplexans in terms of size, fragmentation pattern, and sequence boundaries (Jackson et al. 2007). Unlike in apicomplexans, protein genes exist in numerous and varyingly sized pieces, and together all coding sequences are present in many copies and genomic arrangements (Figs. 3.5c and 3.6c) (Norman and Gray 2001; Jackson et al. 2007; Nash et al. 2007; Slamovits et al. 2007; Kamikawa et al. 2009). In addition to these gene fragments, full-length coding sequences are found for cob and cox1, and it is currently unknown whether the gene fragments serve any function. The cox3 gene is an exception in that complete gene sequences were not detected in dinoflagellate mtDNA, although full-length mRNAs were observed (see Sect 3.7.1.2) (Jackson et al. 2007; Waller and Jackson 2009). Altogether, mitochondrial gene structure in dinoflagellates is somewhat reminiscent of that in Amoebidium.

The above-described characteristics of dinoflagellate mtDNA genes prevail in a broad taxonomic sample, yet basal dinoflagellate lineages display some variation. Oxyrrhis marina does encode a complete cox3, but this is merged with the upstream cob united in a contiguous ORF (Slamovits et al. 2007). It is unknown whether a fused protein is generated, but given that cob and cox3 contribute to different complexes, they likely form two proteins. Fragmented versions of all genes are also seen in O. marina. Curiously, fragments of genes are only found linked to complete copies or fragments of the same gene (for example, coding sequences of cox1 are only linked to other coding sequences of cox1, but never to those of other genes). This suggests a particular, short-range-restricted form of recombination in this taxon. Hematodinium sp. is another basal dinoflagellate that shares most mtDNA features with other dinoflagellates. In this taxon, however, full-length coding sequences of all three protein-encoding genes (cox1, cox3, and cob) exist, along side copious numbers of gene fragments (Jackson and Waller unpublished).

7 Expression of Mitochondrial Genes

In general, mitochondrial gene expression is relatively poorly understood across eukaryotic diversity (particularly at the level of regulation, see Chaps. 1113 and 18). Several observations can be made, however, that pertain to “usual” versus “unusual.” The machinery for transcription in mitochondria was apparently inherited initially from the progenitor α-proteobacterium, evident by the persistence of genes rpoA-C for bacterial-type RNA-polymerase in the jakobid, Reclinomonas americana (Lang et al. 1997). This state, however, is very unusual and replacement of this polymerase with a bacteriophage type RNA-polymerase in all other eukaryotic groups suggests an early move to this phage-type system (Shutt and Gray 2006). Transcription in mitochondria from many eukaryotic lineages (e.g., ciliates, apicomplexans, green and red algae, stramenopiles, amoebozoans) is polycistronic with a small number of transcription initiation sites employed (Gray and Boer 1988; Wolff and Kuck 1996; Richard et al. 1998; Edqvist et al. 2000; Rehkopf et al. 2000). Individual gene transcripts are then generated by precise processing between the often closely spaced genes [e.g., (Wolff and Kuck 1996; Rehkopf et al. 2000)]. An implication of this system is that much of the regulation of gene expression must be posttranscriptional given that large banks of genes are initially expressed as one. This relatively simple mode of mitochondrial transcription might be common to many eukaryotes, but is unlikely to apply to mtDNAs that are much less gene-dense (e.g., plant mtDNAs) or in those with coding elements dispersed across separate molecules as, for example, in diplonemids.

Polyadenylation of transcripts is known from several mitochondrial systems including animals, apicomplexans, and trypanosomes (Anderson et al. 1981; Gillespie et al. 1999), but also diplonemids and dinoflagellates. The length of the poly(A) tail can contribute to translation control [(Etheridge et al. 2008); for a review, see (Gagliardi et al. 2004)], and we will discuss below that nucleotides of this tail can also contribute to the coding information. Two further posttranscriptional processes can have profound impacts on the expression of mitochondrial genes, notably (a) trans-splicing and (b) RNA recoding. These processes are able to rescue effectively fragmented and/or cryptic genes.

7.1 Trans-splicing

Trans-splicing produces complete RNAs from transcribed pieces of fragmented genes. In mitochondria, this process was first described in plants (Bonen 1993) where trans-splicing of mRNAs is mediated by Group II intron structures. Trans-splicing takes place for several of the Complex I genes (nad1-3, nad5) and requires cofactor molecules, some encoded in the nucleus [reviewed in (Bonen and Vogel 2001; Glanz and Kuck 2009)]. Recently, trans-splicing mediated by discontinuous Group I introns has been reported in early branching animals (placozoans), a lycophyte plant, and a green alga (Burger et al. 2009; Grewe et al. 2009; Pombert and Keeling 2010). In either case, initial evidence for trans-splicing was gathered by modeling complete Group I intron structures from the partial intron sequences that flank gene fragments. In addition, cDNA or RT-PCR data have provided experimental confirmation that trans-splicing takes place in vivo (For a recent review on trans-splicing of all intron types, see Moreira et al. 2011). However, not all trans-splicing in mitochondria is mediated by Group I or II introns, as we will discuss in the following sections.

7.1.1 Trans-splicing in Diplonemid Mitochondria

The systematically fragmented mitochondrial gene sequences of diplonemids are joined at the RNA level. The process of gene module trans-splicing has been investigated in detail for cox1 of D. papillatum by employing various experimental techniques. These include Northern hybridization using individual gene modules as probes, demonstrating that intermediates of trans-splicing are abundant in the cell. Furthermore, sequencing of a cDNA library and of amplicons generated by targeted RT-PCR (reverse transcription followed by polymerase chain reaction) detected module transcripts with noncoding 5′ and 3′ extensions, partially processed transcripts, and various intermediates of the module joining process.

The diverse processing intermediates allow reconstruction of the steps involved in the biogenesis of the cox1 mRNA (Fig. 3.7). First, gene modules are transcribed individually, together with several hundred nucleotides of the constant region upstream and downstream of the module. Subsequently, noncoding regions are clipped off, leaving only the module RNAs. Finally, processed modules engage in trans-splicing, apparently in no specific directionality, to yield a mature mRNA (Marande and Burger 2007). Module joining is a most accurate process, since misassembled RNA species (e.g., Module 1 linked to Module 3) were not encountered.

Fig. 3.7
figure 7_3

Processing of Diplonema mitochondrial transcripts. Gene fragments (modules) together with noncoding flanking regions are transcribed as separate RNA fragments. Noncoding sequence is then removed and the last module is poly-adenylated. Gene modules (here indicated as m1 to m9) are joined in no particular directionality. In the case of cox1 shown here, six Us are inserted between Modules 4 and 5

The key question is how neighbor modules recognize each other in trans-splicing given a population of one hundred or so different gene pieces to be assembled into a dozen mRNAs. Initial hypotheses postulated discontinuous introns of Group I or Group II type, or alternatively introns of the archaeal type. Yet, no signatures of these introns or spliceosomal introns were detected at module junctions, nor reverse-complementary motifs in adjacent modules or their corresponding flanking regions. Not even single residues are conserved across the various cox1 module boundaries from the same diplonemid species or in the same cox1 module boundary across different species (Kiethega et al. 2011). The lack of significant motifs in cis suggests that module matchmaking is achieved by a third party, for example, guide RNAs similar to those that mediate RNA editing in kinetoplastid mitochondria. Equally possible are guiding proteins. Work is in progress to identify the nature of matchmaking molecules.

7.1.2 Trans-splicing in Dinoflagellate Mitochondria

Unlike in diplonemids, the majority of gene fragments in dinoflagellate mtDNA probably do not contribute to functional transcripts via splicing events. For example, fragmented rRNAs can be observed as discrete poly-adenylated transcripts both by RT-PCR-based techniques and Northern hybridization analysis, with no evidence of larger species being generated by fragment ligation (Jackson and Waller unpublished). Poly(A) tails are typically ~10–20 nt in length and are presumably tolerated in the assembled ribosome by complementary base pairing (Jackson et al. 2007; Slamovits et al. 2007). For the protein-encoding genes cob and cox1, only transcripts corresponding to complete gene sequences are seen, and the detected shorter transcripts do not match gene fragments (Jackson et al. 2007; Nash et al. 2007). Although long polycistronic transcripts containing gene fragments are occasionally found in EST data, these are not sufficiently abundant for Northern detection, and their fate and utility is unknown.

Expression of Karlodinium micrum cox3 is unlike that of cob and cox1 in that trans-splicing is required to generate a complete transcript. In this species, partial cox3 transcripts correspond precisely in length to two cox3 gene fragments encoding nucleotides 1–712, and 718–839 of “ordinary” cox3 [Fig. 3.8a (Jackson et al. 2007; Waller and Jackson 2009)]. Both of these transcripts lack additional 5′ sequence beyond their respective coding regions, are poly-adenylated, and readily detectable by Northern hybridization (Jackson and Waller unpublished). It is conceivable that a split Cox3 protein is generated from these two transcripts. Yet, a third transcript with roughly equal copy number as the two partial ones represents a full-length Cox3 coding sequence (Jackson et al. 2007). It likely arises by trans-splicing of the two shorter RNAs because no single gene corresponds to it, and because the extremities coincide perfectly with those of the two shorter transcripts. The lack of any flanking sequences resembling introns, both in the coding sequences and in the transcripts, suggests that a process other than discontinuous intron splicing takes place in this system. The only cox3 region that seems not encoded by mtDNA is from nucleotides 713–717, between the two fragments, and this is filled with five As in the full-length K. micrum cox3 transcript. Given that the upstream cox3 fragment is poly-adenylated after nucleotide 712, splicing could retain some of these adenosines (As) in the splice product. This event must be controlled precisely to avoid generating a frameshift.

Fig. 3.8
figure 8_3

Trans-splicing of cox3 transcripts in dinoflagellates. (a) Gene fragments (i) are transcribed as separate, poly-adenylated RNA fragments (ii) that are then trans-spliced to form a continuous cox3 mRNA (iii). The junction between the two spliced fragments likely inherits adenosine nucleotides from the poly-adenylation tail of the upstream fragment. (b) The length of the internal adenosine stretch varies across dinoflagellate taxa in order to maintain the correct length and reading frame of the encoded product. Poly-adenylation is used in all dinoflagellate cox3 transcripts to generate a UAA termination codon (boxes)

Several dinoflagellate taxa for which EST data are available show equally truncated cox3 transcripts in addition to full-length transcripts with several As bridging the two fragments. The number of As varies between taxa according to the gap in the cox3 coding sequence, and thus the reading frame and protein length are generally maintained (Fig. 3.8b). Precise trans-splicing within the poly(A) tail likely requires some form of a guide molecule, but as in Diplonema mitochondria, no candidates for this role have yet been identified.

7.2 RNA Recoding: Alternative Genetic Codes and Editing

Cryptic genes are those where the sequence of a gene does not correspond to that of the gene’s product. Discrepancies might be amino acid substitutions or interruptions to the reading frame by stop codons or frameshifts. Some cryptic genes can be so obscured as to be completely unrecognizable. Two scenarios can account for such cryptic genes. One is the use of an alternative genetic code to what is expected. Recoding of this type is quite frequent in mitochondria amounting to 16 deviations from the “universal” translation code scattered across plant and animal (including human) mitochondria alone [(Lekomtsev et al. 2007); see also Chap. 17]. Such changes can be identified in multiple protein alignments where otherwise conserved residues are consistently exchanged for another residue. Mitochondrial genomes with very few and divergent gene sequences can present challenges to identifying code changes, and it is conceivable that some have gone unnoticed. A particularly difficult case was cox1 of the dinoflagellate Perkinsus marinus (Masuda et al. 2010), which contains numerous frameshifts in the gene sequence. One hypothesis brought forward by the authors invokes quadruplet and quintuplet codons that are recognized by special tRNAs, and another proposes programmed ribosome frameshifting. Both modes would constitute a radical way of recoding during the translation process.

A second type of cryptic genes are those recoded by RNA editing. There are several forms of RNA editing that are both mechanistically and evolutionarily distinct (Gray 2003). In plant mitochondria (and chloroplasts), C-to-U substitutions are found in most genes, with U-to-C changes less frequent in vascular plants but abundant in some ferns and mosses. Enzymatic base conversion by cytidine deaminases (that is without cleaving the RNA backbone) is presumed for C-to-U substitution, although the exact biochemistry is unknown (Takenaka et al. 2008). The specificity of changes in plant mitochondria is directed by a large suite of RNA-binding proteins that are thought to interact with sequence motifs in the region of the edited nucleotide. Similar substitution editing is also seen in some animal and protist groups, but again, the mechanisms are unknown [reviewed in (Gray 2003, 2009)]. More drastic editing takes the form of insertion and deletion editing that corrects frameshifts obscuring even a gene’s identity. This is best investigated in trypanosomatid mitochondria where extensive insertion and/or deletion of single to multiple Us is accomplished by elaborate “editosome” complexes. As mentioned earlier, this editing is directed by short RNA molecules known as guide RNAs that bind to the transcript through base-paring [reviewed in (Stuart et al. 2005)]. Insertion editing is also known from myxomycete protists (slime moulds) where all nucleotide types can be inserted, but unlike in trypanosomatids, editing takes place during transcription and most likely without the participation of guide RNAs (Gott and Emeson 2000; Gray 2003).

The above discussed editing systems can act on all gene types – those encoding proteins, rRNAs and tRNAs – and can have profound effects on the gene product. In protein genes, substitutional changes predominate in the first two codon positions where they usually specify an amino acid change [e.g., (Lin et al. 2002)], whereas insertions/deletions mostly restore open reading frames [e.g., (Liu and Bundschuh 2005)]. Editing of tRNAs can even reengineer the anti-codon, and in trypanosomatids this recodes the UGA stop as a tryptophan codon to achieve an alternative genetic code (Alfonzo et al. 1999).

7.2.1 Rare RNA Editing in Diplonemid Mitochondria

In diplonemid mitochondria recoding is rare, but oddly, it seems to always occur exactly at gene module boundaries. Most noticeable is the addition of six nonencoded uridines (Us) at the junction between Modules 4 and 5 of the cox1 transcript, and this is the case in all diplonemids investigated (Kiethega et al. 2011). The U residues are added in frame 3, contributing position 3 of a first codon, a complete second codon, and positions 1 and 2 of a third codon. This editing event has an important consequence for the protein, because the corresponding three amino acids in the Cox1 protein are invariably present, albeit not highly conserved, across eukaryotes. A lack of these Us would drastically change not only the protein sequence, but also its secondary and tertiary structure and make the protein nonfunctional (Kiethega et al. 2011). Three scenarios of U addition are conceivable. First, the nucleotides could originate from a gene module, encoded by mtDNA and transcribed and processed as described above. But this is unlikely, because no cassette has been found in D. papillatum mtDNA that encloses six contiguous Us, and known gene pieces are all significantly longer. Alternatively, the additional nucleotides could be inserted in a cox1-precursor transcript at the junction of Modules 4 and 5 via concerted action of an endonuclease and a nucleotide transferase. Yet, no RNA species has been detected where the junction 4/5 lacks these nucleotides. Another possibility is that these Us are appended to one of the unjoined modules prior to trans-splicing, and indeed, we have experimental evidence for this latter scenario (Yan and Burger unpublished). The editing event of diplonemid cox1 is conspicuously similar to the much more frequent RNA editing in kinetoplastid mitochondria, which is also limited to Us, but includes nucleotide deletions [for a review, see e.g., (Stuart et al. 2005)]. The main difference is that kinetoplastid mitochondria have contiguous primary transcripts that, prior to nucleotide addition or removal, need to be cleaved by an endonuclease that is integral part of the editosome. In diplonemids, however, U residues are simply appended to the free end of a module transcript.

A few less obvious and less consistent recoding events appear to take place in Diplonema at the ends of mitochondrial mRNAs. These events affect the termination signal for translation and will be discussed in Sect. 3.7.3.1.

7.2.2 Abundant and Assorted RNA Editing in Dinoflagellate Mitochondria

RNA editing in dinoflagellate mitochondria has a major impact on gene expression. Both protein-encoding and rRNA transcripts are edited with up to 6% of nucleotides affected per gene (Jackson et al. 2007; Nash et al. 2007; Waller and Jackson 2009; Kamikawa et al. 2007; Lin et al. 2002; Zhang et al. 2008). This editing is exclusively substitutional. Dinoflagellates are exceptional in that at least nine out of 12 possible forms of nucleotide substitution are observed including both transitions and transversions. Base conversion could not account for all of these changes and therefore, excision-replacement has to take place. However, the details of the underlying mechanism are unknown including how changes are specified.

Conservation of some editing sites across dinoflagellate taxa indicates a certain evolutionary stability of this process, yet frequent emergence of new editing sites in a lineage-specific manner also demonstrates that it is adaptable (Lin et al. 2002; Zhang et al. 2008). The vast majority of editing events in the protein-encoding genes occur at the first and second codon positions that typically lead to changes in the amino acid specified, including the removal of internal stop codons in several instances (Zhang et al. 2008; Waller and Jackson 2009). Thus, this editing process can recover cryptic genes from mtDNA sequences.

7.3 Start and Stop Codons of Mitochondrial Reading Frames

In the nucleus and organelles likewise, the coding region of protein-coding genes is framed by a start codon at the 5′ end (an ATG) that initiates translation, and a stop codon at the 3′ end (in mitochondria usually TAA or TAG) that signals termination of polypeptide elongation and release of the protein and the mRNA from the ribosome. The use of alternative initiation codons such as GTG, ATT, ATA, etc. is seen in bacterial systems (Kozak 1983) and also sporadically in mitochondria from diverse eukaryotic groups (Feagin 1992; Bock et al. 1994; Edqvist et al. 2000). The evidence for alternative start codons in mitochondria is generally indirect and based on multiple protein alignments of close relatives, where the probable beginning of the coding region lacking an ATG includes instead one of the possible alternatives. However, when sequence information from closely related species is unavailable, the inferred protein sequence is divergent, and protein sequence data are unavailable, its N terminus can be placed only tentatively.

Alternative stop codons are rather rare and have been reported, for example, in a green alga (Nedelcu et al. 2000), and proposed (Jukes and Osawa 1990), but recently refuted (Temperley et al. 2010), in humans (see discussion in Chap. 16). Sometimes, the stop codon is incomplete at the gene level and becomes completed at the transcript level by attachment of the poly(A) tail as for instance in animal mitochondria (Anderson et al. 1981).

Apparently, a stop codon is not always required in the mitochondrial system. One example is “nonstop” mRNAs in plant mitochondria that give rise to functional polypeptides (Raczynska et al 2006). Another is a human mitochondrial transcript that lost its stop via a deletion (Chrzanowska-Lightowlers et al. 2004). Here, the RNA is polyadenylated and translated normally, i.e., without read-through into the poly(A) tail that would otherwise generate a polylysine extension of the protein. This is thought to be achieved by poly(A) binding proteins that stall the ribosome, RNases that subsequently trim the A-tail, and specific release factors that allow ribosome detachment at the transcripts 3′ end in the absence of a stop codon.

7.3.1 Unusual Start and Stop Codons in Diplonema Mitochondria

Information on mitochondrial start and stop codons in D. papillatum has been inferred from 11 genes [five complete (GenBank acc. nos. HQ288820-22; EU123538), and six incomplete ones lacking the 5′ portion (Burger et al. unpublished]. Canonical initiation codons (in the gene and transcript sequence) are found for half of the genes, while GTG appears to serve as a start codon in the other half (Vlcek et al. 2011). The picture for stop codons is confusing. EST data indicate that the most C-terminal modules of four out of five genes (cob cox2, cox3 and nad7) lack a conventional stop signal. For cob, U addition in the transcript seems not only to complete the last amino-acid codon to specify Phe, but also to supply the first position of the stop codon that, in turn, is apparently completed by addition of poly(A) (Kiethega and Burger unpublished). For cox2, cox3, and nad7, the situation is similar, except that the added nucleotide positions are polymorphic so that a sizable number of transcripts lack canonical stop codons. These observations need to be validated by experiments not relying on a reverse transcriptase reaction primed with an anchored oligo-d(T) primer, which might introduce artifacts [for methods, see (Rodriguez-Ezpeleta et al. 2009)]. If confirmed, this raises several questions. What directs nucleotide addition at the end of the last modules prior to polyadenylation? Are transcript versions without stop codons translated like the nonstop mRNAs mentioned above?

The cox1 gene of Diplonema does have a stop codon encoded by mtDNA, notably an in-frame TAG at a position of the gene where the cox1 reading frame of most other eukaryotes ends. However, this codon is followed by a T (or U) in the genomic and transcript sequence of cox1 Module 9, and this nucleotide is completed upon poly-adenylation to a UAA codon, thus adding a second termination signal. Perhaps, the upstream nucleotide context makes the UAG stop codon less effective (Mottagui-Tabar and Isaksson 1998), which may have required recruitment of a second one.

7.3.2 Unusual Start and Stop Codons in Dinoflagellate Mitochondria

Transcripts of cox1 and cox3 from several dinoflagellate taxa consistently lack an AUG in the 5′ region (Jackson et al. 2007; Nash et al. 2007; Slamovits et al. 2007; Kamikawa et al. 2009). While some cob transcripts do contain an AUG codon toward the 5′ end of the transcript, multiple protein alignments suggest that these are downstream of the N terminus. Given the high A + T content of dinoflagellates mtDNAs, any of the many AUA or AUU codons might serve as alternative start codons. Nonstandard initiation codons seem also to be used in most mitochondrial genes of apicomplexans, and the predominance of this trait appears to unite the two groups.

In dinoflagellates, neither the cox1 or cob genes, nor their edited transcripts, contain a stop codon, and the poly(A) tail starts immediately after the predicted 3′ coding sequence (Jackson et al. 2007; Nash et al. 2007; Slamovits et al. 2007; Kamikawa et al. 2009). A potential alternative stop codon is not observed at the 3′ end of either of these sequences. It is unknown which mechanism enacts translation termination and ribosome detachment.

Dinoflagellate cox3 transcripts are consistently distinct in that a UAA termination codon is present, although not in the gene sequence but in the transcript upon poly-adenylation immediately after an in-frame U [Fig. 3.8; (Jackson et al. 2007; Slamovits et al. 2007)]. This suggests that more conventional translation termination could apply to Cox3. It remains puzzling why in both mitochondrial systems, in Diplonema and dinoflagellates, only a minority of genes would retain a standard stop codon, and also why this codon is generated through poly-adenylation rather than coding for it in the gene sequence.

8 Convergent Evolution of Highly Derived mtDNAs

As detailed above, the mtDNAs of Amoebidium, diplonemids, and dinoflagellates share an extraordinary large genome size, multi-chromosome genome structure, and fragmented genes. The two latter groups share also poly-adenylation, trans-splicing, and RNA editing of mitochondrial transcripts, as well as certain peculiar features of nuclear gene expression and subcellular organization [for a review, see (Lukes et al. 2009)]. The resemblances are startling because these taxa belong to three completely different eukaryotic lineages, opisthokonts, euglenozoans, and alveolates (see Fig. 3.1). In fact, as illustrated in Table 3.3, there is considerably more similarity between mtDNAs of Amoebidium, diplonemids, and dinoflagellates than with those of their phylogenetic neighbors, for example, between Amoebidium and animals, diplonemids and heteroloboseans, and dinoflagellates and ciliates. These neighbors possess all relatively traditional mtDNAs, which implies that the shared, deviant characters of Amoebidium, diplonemids, and dinoflagellates have emerged independently and represent spectacular cases of convergent evolution.

9 Which Forces Shape the Evolution of Organelle Genomes?

Mitochondrial DNAs of Amoebidium, diplonemids, and dinoflagellates are indeed eccentric, each in its own way. It is even not evident that these genomes stem from one common ancestor and that this ancestor is an α-proteobacterial genome with a single large, compact chromosome that encodes a thousand or more genes.

Commonly, evolution is perceived as a force seeking innovative solutions toward a selective advantage for the species (adaptive evolution). But novelty can also emerge in other ways. In the cases of the three protist groups discussed here, their respective ancestors may have been faced with deteriorating mitochondrial replication or gone-wild recombination leading to massive genome and gene fragmentation. Instead of restitution of the original state or extinction, the damage may have been countered by diverse and quite complex compensatory “measures.” For example, the ancestor of diplonemid mitochondria may have coped with gene fragmentation by adapting an existing RNA ligation process to enable trans-splicing of fragmented genes. Furthermore, some DNA repair machinery may have been recruited and tailored to fix defective gene fragments at the RNA level. As pointed out earlier in the explanation of how RNA editing may have emerged, the compensatory system must have pre-existed (Covello and Gray 1993). Obviously, innovation is not driven alone by natural selection, and we have to consider as well what is called nonadaptive or neutral evolution (Jacob 1977; Lynch et al. 2006). In this light, as Gray and co-workers have put it, highly complex phenomena “generally regarded as evidence of ‘fine tuning’ or ‘sophistication,’ …might be better interpreted as the consequences of runaway bureaucracy – as biological parallel of nonsensically complex Rube Goldberg machines” (Gray et al. 2010).