Keywords

Introduction

Higher photosynthetic organisms possess many cell types and display extensive compartmentation. These characteristics make the study of the different metabolic pathways that take place throughout the life of plant cells highly complex.

Specifically, mitochondria (derived from the Greek mitos—a thread—and chondros—a grain) and chloroplasts (or plastids) (from the Greek chloros—green—and plastós—formed) are the intracellular organelles which contain the entire machinery necessary for cell respiration and photosynthesis processes, respectively. These organelles also participate in the biosynthesis of essential metabolites, such as amino acids, nucleotides, lipids, and starch.

Both, chloroplasts and mitochondria, are the two types of cellular power stations. The first harnesses light energy from the sun and the other “unpacks” the captured energy into smaller packets of adenosine triphosphate (ATP) which are then used as a source of chemical energy for powering the cellular work. Thus, a clear understanding of the physiological processes at the whole plant level, necessarily requires a complete comprehension of the interactions occurring between the power-station organelles with the rest of the cellular compartments. In mammals, these interactions involve transference of mainly proteins and metabolites. However, the transfer of genes from plant mitochondria and chloroplasts to the nuclei is another essential interaction in plant cells. Although mitochondria and chloroplasts keep part of their ancestral genomes, gene transfer processes with the nuclei are continuously operating.

Mitochondria were first observed in a variety of cell types during the last decades of the nineteenth century as threads of granules previously called sarcosomes, bioblasts, or chondrioconts (Schmidt 1913). On the other hand, Nägeli (1846) discovered that chloroplasts multiplied by division in plant cells (Guilliermond and Atkinson 1941). At the beginning of the 20th century, the first reports of non-Mendelian inheritance in higher plants based on studies of variegation in higher plants were published (Correns 1908). These reports showed that few of the green-and-white variegated leaves were caused by factors inherited in a non-Mendelian manner. Further analyses of variegation in higher plants revealed that the genetic determinants for these characters were associated with chloroplasts, suggesting that these organelles may harbor genetic information. These observations led the Russian botanist Mereschkowski to first speak about the endosymbiotic theory (Mereschkowski 1905). Wallin (1923) extended this idea to the explanation about the mitochondria origin. Many textbooks describe this theory in detail, so we will not dwell on this aspect in this chapter.

Ris and Plaut (1962) demonstrated the presence of DNA in chloroplasts of the green alga Chlamydomonas moewusii by electron microscopy and cytochemical methods. Years later, Gibor and Granick (1964) established that chloroplasts are endowed with their own DNA complement (referred as plastome—cpDNA) and thus suggested that these organelles are semi-autonomous systems capable of self-replication and useful models for the study of differentiation. At the same time, the discovery of the 70S ribosomes within the chloroplast stroma (Stutzt and Noll 1967) set the foundations for further studies on the importance of chloroplast genomes from a functional perspective. Bedbrook and Bogorad (1976) reported the first physical map of the maize chloroplast genome, which added convincing evidence of the homogeneity and circularity of chloroplast DNA molecules. One-year later, they cloned the first chloroplast gene from this species (Bedbrook et al. 1977).

Contemporary to these discoveries were the observations reported by Nass and Nass (1963) and by Schatz et al. (1964). By using two different approaches, these authors concurrently reported for the first time that the chick embryo and the yeast mitochondria contain a significant quantity of DNA (mtDNA), respectively. Regarding higher plants, studies in the early 1960s showed that cytoplasmic male sterility (CMS) is a maternal inherited trait, bringing attention to the existence of unique DNA species within the mitochondria of plant cells in different crop species (Leaver and Gray 1982).

Regarding tomato, Palmer and Zamir (1982) reported the first studies on its chloroplast genome based on a restriction map. This map was designed through comparative restriction enzyme digestion with tobacco and Petunia cpDNA. Later on, Phillips (1985) reported a physical map generated by digestion of the cloned PstI fragments and by Southern-blot hybridization. The model consisted of a circular molecule of ~160 kb with a large inverted repeat. Simultaneously, Piechulla et al. (1985) described nine genes in the tomato chloroplast genome that are coordinately regulated during fruit ripening.

Regarding the mitochondrial DNA (mtDNA) from tomato, however, it was not until 1992 that Melcher et al. published the first physical map of the mitochondrial genome. Years later, a model of its size and organization was reported based on mtDNA digestions and hybridizations (Shikanai et al. 1998). This model proposed that the genome is structured in five subgenomic particles of different sizes with a total length of approximately 450 kb. These particles coexist in a dynamic range regulated somehow by the recombination activity of sequence hotspots.

In this chapter, we will provide an updated overview about the current knowledge of the chloroplast and mitochondrial genomes from tomato. Particularly, their structures in comparison with sequenced genomes from other Embriophytas species will be described. We will also summarize findings on the functionality of these two genomes together with their dynamic in relation to recent events of DNA exchange with the nucleus, a process which seems to remain still operative.

The Tomato Chloroplast Genome

It was not until 1986 that the first chloroplast genome from Marchantia polymorpha (the common liverwort) was completely sequenced providing insights into its structural organization (Ohyama et al. 1986). Since then, over hundreds of chloroplast genome sequences from different plant species have been continuously reported. After these pioneer works, in 2006, two research groups simultaneously reported the complete chloroplast genome sequence of tomato. Daniell et al. (2006) analyzed a genome sequence from a Purdue University accession (LA3023 according to the Tomato Genetic Resource Center: http://tgrc.ucdavis.edu/), while Kahlau et al. (2006) sequenced two distinct genotypes (IPA-6, a Brazilian cultivar, and Ailsa Craig [LA2838A] a European cultivar). Although both groups performed different approaches, they reported exactly the same size of 155,461 bp for all three genotypes of Solanum lycopersicum chloroplast genome. These results are in agreement with sizes reported for plastomes of other land plant species (Fig. 7.1b). As observed by these authors, and somehow surprisingly, the nucleotide sequences of the IPA-6 and Ailsa Craig chloroplast DNA (cpDNA) were absolutely identical. However, current information is still controversial about conservation degrees of plastome sequences between Solanaceae species. Whereas Clarkson et al. (2004) described very little sequence variation between Nicotiana sylvestris plastid genomes and its allopolyploid descendant N. tabacum, Daniell et al. (2006) revealed several InDels within certain coding sequences when tomato, potato, tobacco, and Atropa are compared.

Fig. 7.1
figure 1

Number of encoded genes in relation to mitochondrial (a) and chloroplast (b) genome sizes for tomato (red circles) and other selected taxa (black circles). Names of those species with a genome size and/or a gene number above or below the average ± SD are given on the graph for chloroplasts analyses (panel a). On panel (b), species are referenced as follows: 1 Pelargonium × hortorum. 2 Chlamydomonas reinhardtii. 3 Phaseolus vulgaris. 4 Arabidopsis thaliana. 5 Triticum aestivum. 6 Marchantia polymorpha. 7 Agrostis stolonifera. 8 Oryza sativa. 9 Oryza nivara. 10 Oryza sativa (indica cultivar-group). 11 Guillardia theta. 12 Hordeum vulgare subsp. vulgare. 13 Chlorella vulgaris. 14 Pinus thunbergii. Data were extracted from GenBank (www.ncbi.nlm.nih.gov/genbank/) and/or from the corresponding published paper listed in the reference section

Even though chloroplast genomes are usually represented by circular double-stranded DNA molecules, it is currently accepted that they exist as linear, concatemeric, and highly branched complex molecules (Bendich 2004). Generally, plastomes present highly conserved tetrapartite structures with two copies of large inverted repeat (IR) regions separating the large and small single copy regions (LSC and SSC). IR regions usually range from 5 to 76 kb (Palmer 1991; Sugiura 1992). In the case of the tomato plastome, two IR regions of 25 kb each separate the LSC and SSC regions of 85.6 and 18.4 kb, respectively. Compared to tobacco and potato plastomes, the tomato IR region is slightly expanded on both ends (into rps19 and ycf1 genes in the LSC and SSC, respectively). Besides the two large IR, tomato plastome contains also near 40 IR of 30–40 bp that are highly conserved among closer species and are located in the same genes or intergenic regions. These characteristics thus suggest a functional role. Moreover, this plastome also harbors other four IR of 57 bp, which are not found in those of potato, tobacco nor Atropa (Daniell et al. 2006). However, the tomato chloroplast genome is smaller than that of tobacco owing to deletions in the noncoding intergenic spacer regions (Kahlau et al. 2006; Daniell et al. 2006).

In noncoding regions, the tomato plastid contains 25 intergenic spacer regions shearing 80–100 % identity with the same regions of potato, tobacco, and Atropa. Only four regions are 100 % identical among species and three of them are located in IR regions. These identical variations made intergenic spacer regions useful markers for phylogenetic research studies.

Regarding gene content, the tomato chloroplast genome is more gene-dense than the mitochondrial (see below) and the nuclear genomes (The Tomato Genome Consortium 2012). This chloroplast genome consists of 41.7 % of noncoding regions (intergenic spacers and introns) and 58.3 % of coding regions, with 133 annotated genes. Of these 133 genes, 113 are unique and 20 were found duplicated in the IR. The same gene content and gene order is found conserved in the closest tobacco, potato, and Atropa species (Fig. 7.1b). Of the 113 annotated genes: 61 encode for tRNA, rRNA, ribosomal proteins, RNA polymerase, -maturase, and proteases; 47 correspond to photosynthesis-related genes; and the remaining 5 to other genes and conserved open reading frames. Table 7.2 summarizes a comparative analysis between the main features reported for all Embriophyta plastome sequences up to November, 2012.

Chloroplast Functional Genomics

Most plant plastid genomes encode proteins that function in photosynthesis. However, few of these proteins are involved in many other cellular functions: the chloroplast tRNA-Glu is required in tetrapyrrole biosynthesis (Schön et al. 1986); the plastid genome-encoded D subunit of the essential enzyme acetyl-CoA carboxylase participates in fatty acid biosynthesis (Kode et al. 2005; Kahlau and Bock 2008); and the plastid-encoded ClpP1 protease subunit is involved in plastid protein homeostasis (Shikanai et al. 2001; Kuroda and Maliga 2003). The involvement of plastid gene expression in these essential functions is probably why the loss of plastid translational activity is fatal in most plants.

Regarding the organization of the chloroplast genomes, similarly to cyanobacteria genomes, genes are clustered and arranged in operons and co-transcribed as polycistronic mRNAs and translated on 70S ribosomes. The gene processing and maturation consist of several steps such as cleavage of polycistronic mRNA, intron splicing and RNA editing by C-to-U conversions (Barkan and Goldschmidt-Clermont 2000; Bock 2000). However, higher plant plastids are far more complex than those from the prokaryotes, because the regulation of plant plastids depends on their own mechanisms and on nuclear genome “signals” influencing plastid functionality. For instance, plastid genes are transcribed specifically by plastid-encoded RNA polymerase or nuclear encoded RNA polymerase or they can even share both RNA polymerases (Allison et al. 1996; Hajdukiewicz et al. 1997; Lerbs-Mache 2000; Legen et al. 2002). More complexity is observed in the transcription factors required for promoter recognition, which are encoded by genes residing in the nuclear genome (Tanaka et al. 1996). The regulation of plastid genes is mainly at transcriptional and translational levels; however, their contributions have been scarcely discussed and remain controversial. Some studies support that transcriptional regulation is the main contribution to gene control in plastids (Pfannschmidt et al. 1999; Tullberg et al. 2000). By contrast, other studies pointed that translation constitutes the rate-limiting step in plastid gene expression (Eberhard et al. 2002).

Particularly in tomato, chloroplasts undergo peculiar drastic changes in both ultrastructure and function during fruit maturation. Among these changes, researchers have described the disappearance of the thylakoid membrane system, the degradation of chlorophyll, the appearance of plastoglobuli, and an increment in carotenoid biosynthesis that finally accumulated inside the chromoplast membranes (Rosso 1968; Harris and Spurr 1969; Egea et al. 2011). The genetic control of chloroplast during this transition has been studied for many years (Piechulla et al. 1985; Bathgate et al. 1985; Kahlau and Bock 2008). In this regard, whereas a drastic downregulation of photosynthetic genes and significant decreases in ribosomal RNAs occur, the expression of other nonphotosynthetic genes rises. Accordingly, recent studies on tomato plastid transcriptomics and proteomics have shown that photosynthetic and carbohydrate metabolism genes are strongly downregulated during fruit development (Kahlau and Bock 2008; Barsan et al. 2012). Conversely, the expression of the genetic system genes (rRNAs, tRNAs, ribosomal proteins, RNA polymerase) seems to be kept at higher levels. Interestingly, the chloroplast-to-chromoplast conversion during the ripening period is not accompanied by drastic changes in transcript abundance. Translational regulation analyses by polysome-bounded mRNA analyses showed that a strong downregulation also affects most of plastid genes in fruits in comparison with expanded leaves. During ripening, polysome association successively declines and is particularly pronounced in the photosynthesis gene group, suggesting that plastid translation is the main contribution in gene expression control during chloroplast-to-chromoplast differentiation. An exception to this was observed for the accD gene, which encodes an acetyl-CoA carboxylase subunit. The expression of this gene displays strong upregulation and polysome association during fruit ripening; which correlates with the high demand of lipid biosynthesis to generate a storage matrix that will accumulate carotenoids (Kahlau and Bock 2008). However, ACCD protein level decreased between mature green and ripe fruit stages, suggesting another point of regulation for this enzyme (Barsan et al. 2012). TrnA (encoding the tRNA-Ala) and rpoC2 (encoding an RNA polymerase subunit) genes tended to be also upregulated during this process. In the same study, Kahlau and Bock (2008) analyzed the expression of genes predominantly transcribed by the nuclear (NEP) and plastid (PEP) encoded RNA polymerases. In their study, they found that the PEP is more intensively used in leaves, whereas transcription from the NEP promoter prevails in red fruits.

Notwithstanding the mentioned contributions to the functional role of the tomato plastid genome, knowledge about how plastid translation is regulated in fruits during the autotrophic to heterotrophic transition is scarce.

On prokaryotic-type 70S ribosomes, the plastid translation machinery consists of two subsets of RNA components. A subset comprises those components encoded by the plastid genome: the 16S rRNA of the small ribosomal subunit as well as the 23S, 5S, and 4.5S rRNAs of the large subunit. The remainder consists of the components encoded by the nuclear DNA. Although the abolishment of plastid protein biosynthesis is lethal, particular studies are focused on identifying each individual component of plastid ribosome that may not be essential (Rogalski et al. 2008). Fleischmann et al. (2011) studied candidates for non-essential plastid ribosomal proteins in tobacco. Through reverse genetic analyses, the authors revealed a previously unrecognized role of plastid translational fidelity in two developmental processes: shoot branching and leaf morphogenesis. Noteworthy in this study, the authors also suggested that the transfer of plastid ribosomal protein genes to the nucleus is greatly accelerated in non-photosynthetic lineages. Besides the common plastid ribosomal proteins, plant plastid contains plastid-specific ribosomal proteins (PSRP) not found in bacteria (Sharma et al. 2007). PSRP are encoded by the nuclear genome and the function of five of them has been recently studied (Tiller et al. 2012). In that research, the knock-down of three of these proteins decreased accumulation of the 30S or 50S subunit of the plastid ribosomes, while the others showed no change.

In general, whereas all the mentioned evidence accounts for the functional role of the tomato plastid genome, the intricate network of coregulation with the other genomes (i.e., mitochondrial and nuclear) is still obscure.

The Tomato Mitochondrial Genome

Anderson et al. (1981) reported the first complete genome sequence from a eukaryotic organelle (the human mitochondrion), and in 1997, Unseld et al. published the first complete mitochondrion genome sequence from a higher plant (Arabidopsis thaliana). After these groundbreaking reports, and within few decades, the advent of rapid DNA sequencing methods resulted in a profound boost over the scope and speed required for the completion of large-scale whole genome sequencing projects. As a result, in 2012, the Tomato Genome Consortium (a multinational team of scientists from 14 countries) reported a high-quality genome draft for the Heinz cultivar 1706 (LA4345 according to the Tomato Genetic Resource Center: http://tgrc.ucdavis.edu/). In this context, not only the nuclear sequence was obtained but the semi-autonomous DNA from the mitochondria (chondrome) was also sequenced, assembled and annotated.

A shotgun sequencing strategy was used to produce an assembly of the tomato mitochondrial genome. Highly purified mitochondrial DNA (mtDNA) isolated from etiolated seedlings was used as starting material to produce 4154 Sanger paired-end sequence reads with an average length of 750 nt. Shotgun clones were deposited into a dedicated database and are currently available upon request at http://www.mitochondrialgenome.org/. After trimming, clipping and filtering, high-quality (Q v  ≥ 20) paired-reads were used as input for the assembly pipeline. In brief, an overlap-layout-consensus algorithm was chosen owing to their lengths and library features and the reads were then fed to the CAP3 Sequence Assembly Program (Huang and Madan 1999). As a result, the tomato chondrome was assembled into six scaffolds (SlmtSC_A, _V, _M, _R, _L and _B) and 164 contigs, spanning 579,717 nucleotides for the first draft of the tomato chondrome (SOLYC_MT_v1.50). The tomato chondrome is also available for download at the Mitochondrial Genome website mentioned above. At the same time, these sequences have been deposited as a whole genome project (BioProject ID: 67471) at DDBJ/EMBL/GenBank under the accession AFYB00000000.

The version described in this chapter is the first version, AFYB01000000. Overall, the size of the final assembly is in agreement with the physical map previously reported by Shikanai et al. (1998). Furthermore, its multipartite organization (i.e., the existence of mtDNAs of varying structures) is comparable to those reported for the tobacco (Sugiyama et al. 2005) and rice (Tian et al. 2006) chondromes. In this regard, it is currently accepted that the organization of angiosperm chondromes is characterized by the presence of multipartite genome structures, which arises from high-frequency recombination via repeated sequences in the genome (Fauron and Casper 1995). A master circle (MC) model is traditionally constructed based on the restriction fragment mapping of mtDNA in higher plants, in which the total genetic information can be accommodated (Tian et al. 2006). By contrast, an extensive electron microscopy investigation has shown that the mtDNA from Chenopodium album cell cultures appear to consist mainly of linear molecules of various sizes, together with rosette-like and sigma-like structures, in vivo (Backert and Börner 2000). Since the relative amounts of these structures change during the course of cell growth, they may represent replication intermediates. Similar large branched molecules have also been observed in mtDNA from BY-2 tobacco cells under the light microscope (Oldenburg and Bendich 1996). Thus, there are differences between the forms of mtDNA molecules derived from genome mapping data and from microscopic observations (Sugiyama et al. 2004).

Although this discrepancy has not yet been resolved, both types of evidence indicate that the structural organization of mtDNA is highly dynamic. Furthermore, the multipartite structure can provide a redundant gene assembly and modulate the genome copy number in plant chondromes. Low-frequency ectopic recombination among multipartite structures will produce chimeras, aberrant ORFs, and novel subgenomic DNA molecules (Abdelnoor et al. 2003). Thus, multipartite structures are an important factor to consider when analyzing the scaffolds and contigs of the tomato chondrome assembly. This genomic shuffling is apparently reversible and can alter plant phenotype as suggested by two early reports of Kanazawa and Hirai (1994) and Janska et al. (1998). These authors showed that cytoplasmic male sterility (CMS) in Nicotiana tabacum and Phaseolus vulgaris species is related to the occurrence of multipartite structures, heteroplasmy (see below), and/or paternal leakage.

The origin of the tomato chondrome various scaffolds can also be related to the occurrence of heteroplasmic DNA structures. Heteroplasmy is defined as a state in which more than one mitochondrial genotype occurs in an organism. Usually, one mitotype is prevalent and the alternative one(s) are present in a very low proportion. Under such conditions, the phenotype of the organism is determined by the predominant mtDNA variant (Kmiec et al. 2006). In plants, this phenomenon has been investigated most often to clarify some mitochondrial abnormalities. For example, there are reports on CMS (Janska et al. 1998), non-chromosomal stripe mutants in maize (NCS) (Yamato and Newton 1999), the chloroplast mutator mutant in Arabidopsis (CHM) (Martínez-Zapater et al. 1992; Sakamoto et al. 1997) and the mitochondrial mutator system in maize (Kuzmin et al. 2005). Recent studies indicate that heteroplasmy exists also in healthy humans (Kajander et al. 2000) and wild-type plants (Arrieta-Montiel et al. 2001; Taylor et al. 2001).

Recombinations between large repeated sequences are commonly assumed to be the most important force responsible for maintaining the multipartite structure of the chondrome as a dynamic entity (Kmiec et al. 2006). These recombinations are frequent and easily reversible during plant life probably in order to fulfill their integrative role. Besides the main genome whose parts are maintained in a dynamic equilibrium by large repeated sequences, plant mitochondria contain recombinant molecules known as sublimons. These sublimons are very low in number compared to the main mitochondrial genome and are products of rare and irreversible recombinations mediated by short repeated sequences (Kmiec et al. 2006). Short repeats are common in plant mitochondrial genomes (Notsu et al. 2002; Sugiyama et al. 2005; Kubo et al. 2000; Clifton et al. 2004) and they may be originated from the insertion of reverse-transcribed copies of un-translated RNA (Gualberto et al. 1988). Another possible origin could be from the recombinational activity of oligonucleotide motifs (Woloszynska et al. 2001). As a consequence of these active recombination events mediated via both large and short repeats, two types of mtDNA of different quantitative representation coexist in one organism: the mitotype and the sublimons. The mytotype is the most predominant and creates the main genome, while the sublimons exist at a substoichiometric level. These findings suggest that chondrome heteroplasmy may also occur in the tomato cell. This is an important feature to take into account while revising the assembly results. In this vein, the tomato chondrome possesses a high number (849) of single repeats of 50 and 2200 bp. Likewise, 34 short tandem repeats (2–8) of size ranging between 15 and 100 bp were detected.

Gene Annotation

In spite of their larger size, chondromes from higher plant species do not encode many more proteins than mitochondrial genomes from other eukaryotes such as mammals. Most plant mitochondrial genomes are comprised of non-coding sequences. In Arabidopsis, only 20 % of the mitochondrial genome is responsible for functional genes (Unseld et al. 1997). The number of mitochondrial genes in angiosperms ranges from 25 (in the rice cultivar japonica) to 78 (in melon—Cucumis melo) without considering copy number (Fig. 7.1a). Most of the genes that are lost from the mitochondrion appear to have been transferred to the nuclear genome (Adams and Palmer 2003). The tomato mitochondrial genome encodes at least 36 protein-coding genes, three ribosomal RNA genes and 18 tRNA genes. These numbers are similar to those reported for other angiosperm mtDNAs, in which most of the genes encode conserved ribosomal proteins and components of the electron transport chain (complexes I–V). Furthermore, an ORF search resulted in the identification of 30 additional sequences encoding hypothetical proteins. A preliminary survey on the expression levels of these mitochondrial genes throughout tomato fruit development have indicated that many of the annotated genes are differentially expressed during this process. For instance, 23 genes belonging to the electron transport chain machinery and 11 ORFs that presented detectable levels of expression differed in their expression during fruit development (Conte et al. 2013).

Nuclear Copies of Mitochondrial DNA (NUMTs) and Nuclear Insertions of Chloroplast DNA (NUPTs)

The plastome is considered the evolutionary remnant of a cyanobacterial genome (Keeling 2010) where genetic information was transferred from the endosymbiont’s genetic system to the host nuclear genome; interestingly, this transfer is still underway (reviewed in Kleine et al. 2009).

In 2012, the fully sequenced nuclear genome of tomato was published along with a comprehensive structural and comparative analysis with other Solanaceas (The Tomato Genome Consortium 2012). Similarly to other species (Timmis and Scot 1983; Stern and Palmer 1984; Blanehard and Schmidt 1995; Thorsness and Weber 1996), sequences of plastid and mitochondrial origin contribute also to the complexity of the nuclear tomato genome. These sequences have long been called “promiscuous DNA” and the idea behind this regrettable name was that they constitute a kind of mutation buffering (Conrad 1985). In mechanistic terms, the concept of plastid and mitochondrial DNA transposition to the nucleus and their subsequent integration into the nuclear genome has prevailed. In this respect, the small genomes of these organelles are also believed to be remnants after the relocation of gene function from the ancestral prokaryotes. This process has been accompanied by deletion of the endosymbiont genomes with a subsequent dependence of mitochondrial and chloroplast biogenesis on nuclear genes. Strong molecular evidence (Baldauf and Palmer 1990) suggests that such gene transfers have occurred. Furthermore, these gene transfers have also been achieved experimentally in mitochondrial (Gray et al. 1996) and chloroplast (Kanevski and Maliga 1994) systems. Both mitochondrial and chloroplast sequences homologies have been identified within the nuclear genomes of spinach (Timmis and Scot 1983; Scott and Timmis 1984; Cheung and Scott 1989), tomato (Pichersky and Tanksley 1988; Pichersky et al. 1991), tobacco (Ayliffe and Timmis 1992a, b), potato (du Jardin 1990), and members of the Chenopodiaceae family (Beta vulgaris, C. album, Chenopodium quinoa, Atriplex cinerea, and Enchyleana tomentosa) (Ayliffe et al. 1998).

Through different analyses, the Tomato Genome Sequencing Consortium further demonstrated the presence of DNA fragments of mitochondrial and chloroplastic origin found as insertions within the nuclear genome (NUMTs and NUPTs, respectively). In summary, 667 fragments, longer than 250 bp, were found and reported as NUPTs insertions. Furthermore, a colinearity analysis between the tomato chloroplast and the nuclear genome sequences demonstrated that 492 fragments could be true insertions with a plastome origin. In addition, two noteworthy long colinear insertions were found inserted in chromosomes 2 and 11. Conversely, the tobacco nuclear genome contains multiple chloroplast DNA integrants (i.e., >100 copies of a single plastid sequence), which can be in excess of 18 kb (Ayliffe and Timmis 1992a, b).

Following the endosymbiont theory (Margulis and Bermudes 1985), the mitochondrion and its genome are the remnants of a free-living eubacteria ancestor (probably an extant α-proteobacterium). Therefore, this ancestor was engulfed by a eukaryotic host cell and, as a result, established a symbiotic relationship with it (Gray 1999). The host provided the nuclear genome and most of the endosymbiont genes were either lost or transferred to the nuclear genome at an early stage in evolution. Thus, very little of the original gene pool is found in modern mtDNA. In this regard, many features distinguish the mtDNAs of higher plants from those of animals and other organisms (Sugiyama et al. 2004). Although the transfer of mitochondrial genes to the nucleus and their functional activation ceased in the common ancestor of animals, mitochondrial gene loss, and gene transfer have been an ongoing and frequent process in flowering plants (Palmer et al. 2000). Extensive Southern-blot analyses of 280 genera of flowering plants have provided a global view of gene loss in plant mtDNA (Adams et al. 2000). In addition, the possible mechanisms of DNA transfer between organelles with closed membrane systems and the integration of the DNA into the host genome have been reviewed by Kurland and Andersson (2000). The different chondromes in land plants have significantly expanded in size compared with those of green algae. Land plants evolved from green algae belonging to the Charophyceae (Graham et al. 2000). By comparisons of completely sequenced mtDNAs, Chara vulgaris was recently inferred to be the last common ancestor of green algae and land plants (Turmel et al. 2003). Chara possesses a densely packed mitochondrial genome with a gene content similar to that of its Marchantia counterpart (Oda et al. 1992). This led Turmel et al. (2002a, b) to infer that the growth in mtDNA size in Marchantia occurred by the enlargement of intergenic spacers because of frequent duplications and substitutions during evolution from Charophytes to Bryophytes. The subsequent size increase of angiosperm chondromes during evolution from bryophytes occurred both by further enlargement of spacer regions owing to frequent duplications and by the frequent capture of sequences from the chloroplast and nuclear genomes (Marienfeld et al. 1999). Of these incoming DNAs, only plastome-tRNA genes have gained functions in angiosperm chondrome-DNA (Joyce and Gray 1988). Furthermore, the contribution of frequent recombination and transposition of many different classes of retrotransposons to the mitochondrial genome expansion of land plants is at most 15 %. Thus, the origin of most unique sequences (~50 %) in plant mtDNA is not known (Sugiyama et al. 2004). The chondrome size variation is exceptionally wide among higher plants, ranging from the smallest 208 kb estimated for white mustard (Brassica hirta; Palmer and Herbon 1987) to the largest that are believed to be over 2400 kb in muskmelon (C. melo; Ward et al. 1981) (Fig. 7.1a). Such an extensive expansion is attributable to two major factors: protein-coding redundancy and a high level of mitochondrial DNA recombination that results in extraneous DNA integration (Mackenzie and McIntosh 1999). Altogether, these findings have allowed researchers to establish that fragments of mitochondrial DNA are integrated into the nuclear genomes of many organisms including numerous animal and plant species (Bensasson et al. 2001; Timmis et al. 2004). These sequences are named NUMTs (pronounced “new mights”), an abbreviated term for “nuclear mitochondrial DNA,” and describe any transfer or “transposition” of cytoplasmic mtDNA sequences into the separate nuclear genome of a eukaryotic organism (Lopez et al. 1994). As whole genome sequencing projects accumulate, more and more NUMTs have been detected in many diverse eukaryotic organisms (see http://www.pseudogene.net for a list of examples). Although no evidence of recent mtDNA transfer into metazoan nuclei has been reported, this process is still ongoing in plants. Current studies indicate that escape of the genetic material from organelles to the nucleus occurs much more frequently than generally believed (Timmis et al. 2004). Computational analyses comparing the tomato mitochondrial and nuclear assemblies revealed 111 locally collinear blocks (LCB) on the chondrome, which are collinear with the nuclear sequence. Of these LCB, 72 (~197 kb) were inferred to be NUMTs. The analysis showed NUMTs of varied number, size, and position, ranging between zero and seven on chromosomes 2 and 5, respectively, and with the highest number (21) detected over chromosome 11. Fluorescence in situ hybridization (FISH) of mtDNA generally supported this in silico analysis. Whether this kind of instability of the chondrome (called “molecular poltergeists” by Hazkani-Covo et al. 2010) has direct consequences over the tomato plant fitness is still an open question.

Chloroplast and Mitochondrial Genomes Comparisons Across Green Species

As an additional resource of the tomato genome project, a mitochondrial database (www.mitochondrialgenome.org) was built and made available to facilitate exchanging information about chondrome genomes. This tool allows flexible BLAST searches and comparisons of more than 47 mitochondrial genomes from Viridiplantae species that are currently available, including the different versions of the tomato chondrome assembly. Nucleotide sequences of all clones included in the tomato chondrome assembly are available to be downloaded from the same website and, if necessary, these clones can also be requested for research purposes.

Similarly, the Chloroplast Genome Database (http://chloroplast.cbio.psu.edu/, Cui et al. 2006) offers data from more than 100 plastomes of land plants; which allows the search of genes, by using their annotated names, as well as flexible BLAST searches. This database also allows researchers to download protein and nucleotide sequences extracted from a selected chloroplast genome and to browse the putative protein families (tribes).

Among many different applications, these resources allow very general descriptions of the main features founds in the up to date known plastomes and chondromes. Tables 7.1 and 7.2 summarize the main features of these mitochondria and chloroplast genomes, respectively.

Table 7.1 Main features of mitochondrial genomes from Viridiplantae species
Table 7.2 Main features of plastid genomes from Viridiplantae species

Comparatively, the size disparity between the Viridiplantae species chondromes appears to reflect a dynamic history of expansion and possibly contractions of several regions, such as intergenic and/or repetitive regions. Indeed, these disparities could be explained by the loss or acquisition of nuclear and chloroplastic sequences. However, gene content analyses of all Embryophyta chondromes showed that these genomes share the complete core gene set of the electron transport chain complexes I, III and IV. Exceptions are the chondromes of Pleurozia purpurea, Phaeoceros laevis, Megaceros aenigmaticus, Mesostigma viride, and M. polymorpha which lack the nad7 gene. Besides, the chondromes of Pseudendoclonium akinetum lacks the nad9 gene and that from Oryza rufipogon lacks 4 genes of complex I (nad1, nad2, nad4 and nad5) and the cox3 gene (complex IV). Although an incomplete annotation cannot be ruled out, this might reflect an important gene loss in the chondromes of these species.

Regarding genes of the other complexes (II, V, cytochrome C biogenesis and rRNAs—rps and rpl), a wide range of situations can be found. Whereas in some species they are all encoded by the chondrome, for others these complexes are completely absent. A conspicuous example is the case of the green alga Ostreococcus tauri, which harbors two copies of the nad4L, cob, cox1 and atp8 genes in its mitochondrial genome. Furthermore, many species (A. thaliana, B. vulgaris subsp. vulgaris, Oriza sativa subsp. Indica, O. sativa subsp. japonica, Sorghum bicolor, Tripsacum dactiloides, Zea luxurians, Zea mays mays, Z. mays parviglumis, Zea perennis, Ferrocalamus rimosivaginus, Bamboosa oldhamii, Silene latifolia and Vigna radiate) harbour the complete set of cytochrome C biogenesis genes (ccmC, ccmFC, ccmFN, ccmB) but, by contrast, they lack the sdh3 and sdh4 genes of the protein complex II. On the other hand, other species (Chaetosporidium globosum, C. vulgaris, Chlorokybus atmophyticus, M. polymorpha and M. viride) contain all of the complex II genes but they lack the cytochrome c biogenesis genes. Only N. tabacum, P. purpurea, Vitis vinífera, Physcomitrella patens, Carica papaya, Ricinus communiis, and S. lycopersicum harbor the complete set of genes for these two complexes encoded in their mitochondrial genomes. The rest of the analyzed chondromes showed disparity regarding the complex II and cytochrome C biogenesis encoding genes.

As for the different encoded ATP synthase subunits (complex V), it is also very variable among Embryophyta species. Similarly, ribosomal coding genes are all well conserved in some species (i.e. M. polymorpha-16 in total, P. purpurea-16 in total and Cycas taitungensis-18 in total), whereas in others, most of them are absent (as for example for S. latifolia and B. vulgaris). In this regard, it should be noted that V. vinífera chondrome encodes for the highest number of rRNA genes (29) among all analyzed species, being 17 of them of chloroplastic origin.

Phylogenetic Analyses

Conservation of gene content and a relatively slow rate of nucleotide substitution in protein-coding genes have made the chloroplast genome an ideal focus for studies of plant evolutionary history (Martin et al. 1998; Adachi et al. 2000; De Las Rivas et al. 2002). However, several criteria should be taken into account for these kind of analyses such as exclusion of: (i) species with non-annotated sequences, (ii) missing genes in their annotated genomes, and (iii) protein-encoding sequences that are not present across the chosen species.

Figure 7.2a shows a phylogenetic tree performed by comparing the sequences of 50 orthologous proteins from 50 species of the Viridiplantae clade. The clusters of different species match with the current accepted plant classification, thus, confirming the strong association between chloroplast protein modification and the plant speciation. Noteworthy in this respect, S. lycopersicum clustered closer to Solanum bulbunocastum than to Solanum tuberosum and, altogether, S. lycopersicum clustered with Atropa and Nicotiana species (Fig. 7.2a). Clarkson et al. (2004) described a very low degree of sequence variation between the plastid genomes of N. sylvestris and its allopolyploid descendant N. tabacum. By contrast, Daniell et al. (2006) revealed a significant number of InDels within certain coding sequences between tomato, potato, tobacco and Atropa.

Fig. 7.2
figure 2

Evolutionary relationships of taxa assessed with chloroplast (a) and mitochondrial (b) protein sequences. The evolutionary history was inferred using the neighbor-joining method (Saitou and Nei 1987). The optimal tree with the sum of branch lengths is shown (1.51474643 in a and 2.79018422 in b). Percentages of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches (Felsenstein 1985). Trees are drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the p-distance method (Nei and Kumar 2000) and are in the units of the number of amino acid differences per site. Analyses involved 50 and 35 amino acid sequences for a and b, respectively. All ambiguous positions were removed for each sequence pair. There were a total of 3275 and 5856 positions in the final dataset for a and b, respectively. Evolutionary analyses were conducted by using the MEGA5 software package (Tamura et al. 2011)

The closest phylogenetic position to tomato within the Viridiplantae, as inferred from mitochondrial genomic data, appears to be N. tabacum (Fig. 7.2b). In this sense, the nearest species to these last two are Vitis vinifera and C. papaya (from the order Vitales and Brassicales), which are connected by Ricinus communis (Malpighiales order).

As expected, an analysis based on the neighbor-joining method strongly supports the placement of most of the included taxa with Chlorophycean green algae separated from Streptophyta taxa. Within Streptophyta, the only Gymnosperm included in the analysis (C. taitungensis) appeared as the ancestor of all Angiosperm species, showing that Gymnosperms are the earliest-diverging lineage among the Streptophyta. M. polymorpha and P. purpurea, which are placed as the early diverging lineages of land plants, are the exceptions regarding Angiosperms. Thus, they represent the ancestral type of mtDNA. This hypothesis is in line with the finding that the mitochondrial genome of these species closely related to protists, in both gene content and order (Wang et al. 2009). This analysis also shows that gene loss, especially those encoding ribosomal proteins, seems to have occurred after the Angiosperms lineage divergence. This hypothesis is also in agreement with the evolutionary analysis reported by Chaw et al. (2008). Finally, within the land plant taxa, monocots and dicots are clearly separated. In general terms, the reconstructed tree is in accordance with the current accepted phylogenetic relationships (Pombert et al. 2004; Terasawa et al. 2007; Chaw et al. 2008; Ma et al. 2012). However, in few cases, low bootstrap values were observed within taxa with known phylogenetic relations, such as Zea genus (61 % between Z. mays subsp. marviglumis and Z. luxurians and 79 % between the Brassicaceae A. thaliana and Brassica napus). This observation alerts about the appropriateness of the neighbor-joining method for phylogenetic relations based on mitochondrial genome data.

Further Perspectives and Applications

Outcomes from whole genome sequencing projects of crop plant species exponentially increase the available information needed to understand the incidence of plastid genome modification in plant evolution and plant speciation. Particularly in tomato, post-genomic, and functional genomics tools can help elucidating how the transition of chloroplasts to chromoplasts occurs during the ripening of fruits. However, little is still known about the regulation of gene transcription and protein translation as well as of the flow of information between the nucleus and the chloroplast. Knowing the intricate connections between the nucleus and chloroplast is the challenge for the future and will probably introduce an improvement in crops. These organelles are fundamental for the production of a wide variety of metabolites for the food industry as well as for the adaptation of plants to stressful conditions.

Even less understood is the function and regulation of the evolutionary mosaics that represent plant mitochondrial genomes. Solid evidence supports the acquisition (and loss) of genetic information (and possible even active genes) from several distinct sources in the course of evolution. However, the impact of these events at the whole plant level has been overlooked.