Introduction

The Solanaceae (nightshade) family consists of more than 3000 species and has its center of diversity near the equator in South America. The family comprises many agriculturally important crop species, such as tomato, potato, bell pepper, and eggplant, as well as a number of ornamental and medicinal plants. For several thousand years, solanaceous crops have been subjected to intensive human selection. This has led to an enormous phenotypic diversity within species and the adaptation of individual varieties to widely different habitats. Showing this great interspecific and intraspecific diversity and being the most important plant family of vegetable crops, Solanaceae have recently become a model of comparative and evolutionary genomics research. These efforts are bundled in the international Solanaceae Genomics Network (SOL Genomics Network; SGN: http://www.sgn.cornell.edu/), which aims to take a systems approach to genetic diversity and adaptation. While EST projects are under way for a number of solanaceous species and selected members of sister families, the structural genomics efforts are centered on tomato, one of the classical model systems of plant genetics and breeding. The tomato nuclear genome comprises approximately 950 Mb and the gene-rich euchromatic portion of the tomato nuclear genome is currently being sequenced.

The plastid genome (plastome) of higher plants is a circular molecule of double-stranded DNA. Among the three genomes of the plant cell, the plastome is the most gene-dense, with more than 100 genes in a genome of only 120 to 210 kb (for review see, e.g., Sugiura 1989, 1992; Wakasugi et al. 2001). The plastid genome is the evolutionary remnant of a cyanobacterial genome: The endosymbiotic uptake of a cyanobacterium by eukaryotic cells was followed by (i) the loss of dispensable genetic information from the endosymbiont’s genome (e.g., genes for bacterial cell wall biosynthesis), (ii) the elimination of redundant genetic information (e.g., genes for biosynthetic pathways present in both the host’s and the endosymbiont’s genomes), (iii) the acquisition of new (regulatory) gene functions to coordinate gene expression and metabolism between the host cell and the endosymbiont, and (iv) the massive translocation of genetic information from the endosymbiont’s genome to the nuclear genome of the host cell (Martin and Herrmann 1998; Race et al. 1999; Timmis et al. 2004). This has resulted in a dramatic reduction in plastid genome size and coding capacity and, thus, contemporary plastid genomes contain only a small proportion of the genes of their free-living cyanobacterial ancestors. Consequently, all cellular functions fulfilled by present-day plastids are strictly dependent on the import of nuclear-encoded proteins which make up the by far largest fraction of the chloroplast proteome (Abdallah et al. 2000; Rujan and Martin 2001; Martin et al. 2002; Hippler and Bock 2004).

Both genome organization and mechanisms of gene expression in present-day plastids resemble those of their cyanobacterial ancestors. Groups of genes are linked together in operons giving rise to polycistronic mRNAs which undergo complex processing and maturation steps. Likewise, the translational apparatus of plastids is highly similar to that of prokaryotes and the plastid-encoded RNA polymerase is highly homologous to eubacterial RNA polymerases. In addition to these conserved prokaryotic traits, several evolutionary inventions of the eukaryotic cell have added to the complexity of gene expression and its regulation in plastids. These include, for example, transcription of some genes by nuclear-encoded bacteriophage-type RNA polymerases (Hajdukiewicz et al. 1997; Hedtke et al. 1997; Hess and Börner 1999) and RNA editing as an additional processing step changing the coding properties of chloroplast transcripts (Hoch et al. 1991; Bock 2000, 2001).

We report here the complete sequence of the plastid genome (plastome) from two cultivars of tomato, Solanum lycopersicum. To determine patterns of plastid genome evolution in Solanaceae, we have compared the tomato plastid genome, its structure, coding capacity, and RNA editing sites, with the two previously sequenced plastomes from tobacco (Nicotiana tabacum [Shinozaki et al. 1986; Wakasugi et al. 1998]) and deadly nightshade (Atropa belladonna [Schmitz-Linneweber et al. 2002]).

Materials and Methods

Plant Material

Solanum lycopersicum cv. IPA-6 is a commercially grown South American tomato cultivar. Seeds of cv. Ailsa Craig were obtained from Unwins Seeds (Histon, Cambridge, UK) and germinated and grown in a greenhouse with supplementary lighting of 200 μmol photons m−2 s−1.

Purification of Chloroplasts

For large-scale purification of chloroplasts, IPA-6 plants were grown for 6 weeks in the greenhouse (16 h/22°C light, 8 h/20°C dark). For each isolation, the young expanded leaves from 10 plants were pooled. Leaf material (50 g) was homogenized for 2 × 5 s at high speed and 2 × 5 s at low speed in 2 L ice-cold extraction buffer (350 mM sorbitol, 50 mM Tris-HCl, pH 8.0, 5 mM EDTA, 15 mM 2-mercaptoethanol, 0.1% BSA) in a Waring blender. The homogenate was filtered through four layers of gauze (Hartmann) and one layer of Miracloth (Calbiochem). All subsequent steps were performed at 4°C. The chloroplast suspension was centrifuged at 100g for 5 min and the resulting pellet (largely consisting of cell nuclei) was discarded. The supernatant was centrifuged for 10 min at 2000g to pellet the chloroplasts. Following resuspension in 400 ml wash buffer (350 mM sorbitol, 50 mM, Tris-HCl pH 8.0, 25 mM EDTA, 0.1% BSA), the chloroplasts were pelleted again by centrifugation for 10 min at 2000g. This washing step was repeated three more times. Subsequently, the chloroplast pellet was resuspended in 30 ml wash buffer and loaded on top of six sucrose step gradients (17.5 ml 60% sucrose, 22.5 ml 37% sucrose, each with the same concentrations of Tris and EDTA as present in the wash buffer). The gradients were centrifuged for 1 h at 7000g. The chloroplast band at the interphase between the 37% and the 60% sucrose layers was collected, mixed with ∼3 vol dilution buffer (175 mM sorbitol, 50 mM Tris-HCl, pH 8.0, 25 mM EDTA), and pelleted by centrifugation for 10 min at 2000g and 4°C.

Isolation of Nucleic Acids

DNA was extracted from purified chloroplasts by lysing the chloroplast pellet in 1–2 vol lysis buffer (50 mM Tris-HCl, pH 8.0, 20 mM EDTA, 2% N-lauroylsarcosine sodium salt) for 15 min followed by one extraction each with phenol, phenol/chloroform (1:1), and chloroform. Subsequently, the DNA was precipitated with 2.5 vol ethanol at –20°C overnight. Total cellular RNA was extracted using the peqGOLD TriFast reagent (Peqlab GmbH, Erlangen, Germany). RNA samples for cDNA synthesis were purified by treatment with RNase-free DNase I (Roche, Mannheim, Germany).

Cloning and DNA Sequencing

Purified IPA-6 plastid DNA (25 μg) was used to generate fragments for construction of a shotgun library by mechanical shearing. Fragments of an average length of 2 kb were cloned into pUC19 and 768 randomly selected clones were terminally sequenced from both ends (AGOWA GmbH, Berlin). The sequence data from the 1536 sequencing reactions were assembled into contigs and remaining gaps were closed by PCR using primers derived from the sequences flanking the gap. A list of PCR primers is available upon request. Amplified products were purified using the GFX PCR (DNA and Gel Band Purification) kit (Amersham). To exclude mutations introduced by PCR amplification, PCR products were directly sequenced by cycle sequencing.

The nucleotide sequence of the Ailsa Craig plastome was determined on cloned PstI restriction fragments obtained previously (Phillips 1985) using primers designed from the sequence of the tobacco plastid genome (Wakasugi et al. 1998) and from the resulting tomato sequence. Uncloned regions and regions spanning the PstI sites used for the original cloning were amplified by PCR and the PCR products directly sequenced.

cDNA Synthesis and Polymerase Chain Reactions (PCRs)

Approximately 5 μg purified DNA-free RNA was used in a 50-μl cDNA synthesis reaction. Reverse transcription of RNA samples was primed with a random hexanucleotide mixture (2.5 μg per reaction). Elongation reactions were performed with SuperScript III RNase H-free reverse transcriptase (Invitrogen) according to the manufacturer’s instructions. Total cellular DNA or first-strand cDNAs were amplified in an Eppendorf thermal cycler using GoTaq Flexi DNA polymerase (Promega) and gene-specific primer pairs. The standard PCR program was 30 to 40 cycles of 1 min at 94°C, 40 s at 58°C, and 1 to 2.5 min at 72°C, with a 10-min extension of the first cycle at 94°C and a 5-min final extension at 72°C.

Bioinformatics Analyses

The Lasergene software package (DNASTAR; GATC Biotech, Konstanz, Germany) was used to assemble the final genome sequence and to align the genome with other plastid DNAs. Whole-genome alignments were produced with the Martinez/Needleman-Wunsch method, applying a gap penalty of 1.10 and a gap length penalty of 0.33. DNA sequences for individual genes were compared with sequences available in the databases using the NCBI BLAST program. Nucleotide sequences for species other than tomato were extracted from public databases and aligned with the corresponding tomato sequences. The physical map of the tomato plastome was drawn using the Adobe Illustrator software.

Results and Discussion

Properties of the Tomato Plastid Genome

In order to construct a shotgun clone library for sequencing of the tomato IPA-6 plastid genome, chloroplasts were purified from young tomato leaves at large scale. Purified chloroplast DNA was sheared mechanically and used for shotgun cloning followed by terminal sequencing of individual clones. Altogether 768 clones were sequenced in order to obtain a roughly eightfold coverage of the genome. Assembly of the sequence data from 1536 sequencing reactions yielded five sequence contigs. Alignment of these contigs with the tobacco plastid genome confirmed the presence of four gaps and suggested that all four gaps were small enough to be closed by PCR (gap sizes in the tobacco ptDNA: 1901, 1798, 218, and 48 bp, respectively). Primers derived from the termini of the contigs yielded PCR products for all four gaps. Direct sequencing of purified PCR products provided the missing sequence information and thus allowed gap closure and assembly of a single contig with a circular map (Fig. 1), indicating that the complete sequence of the tomato plastid genome had been obtained (accession number AM087200).

Fig 1
figure 1

Physical map of the tomato plastid genome. Genes inside the circle are transcribed clockwise; genes outside the circle are transcribed counterclockwise. Asterisks indicate intron-containing genes; introns are depicted as open boxes.

In order to identify possible causes for gaps in the sequence despite the high-coverage sequencing, the sequences at the gap termini were inspected. In two cases, the DNA sequences had become ambiguous at oligo(T) tracts (of 10 and 13 Ts, respectively), which apparently caused DNA polymerase stuttering in standard cycle sequencing reactions.

The size of the tomato plastid DNA was found to be 155,461 bp (Fig. 1, Table 1) This deviates only slightly from previous estimates based on gel electrophoretic separations of restriction fragments which suggested a plastome size of 156.6 to 159.4 kb (Phillips 1985). The tomato plastome shows the typical tetrapartite genome organization found in most higher plants with a large single copy region (LSC) and a small single copy region (SSC) separating two inverted repeat regions (IRA and IRB; Fig. 1). As expected, the tomato plastome harbors the conserved set of genes present in the plastid genomes of dicotyledonous plants. With the exception of several open reading frames (ORFs; which are discussed below), the gene content is identical to the previously analyzed plastid DNAs from Nicotiana (Shinozaki et al. 1986; Wakasugi et al. 1998; accession number of an updated release from September 2005, Z00044.2) and Atropa (Schmitz-Linneweber et al. 2002). The tomato plastome harbors 114 genes and conserved ORFs (ycfs: hypothetical chloroplast reading frames). Incorporating the data from recent reverse genetics studies, these genes can be grouped as follows (Shimada and Sugiura 1991) (Fig. 1).

Table 1 Properties of the sequenced solanaceous plastid genomes

Photosynthesis-Related Genes

Genetic System Genes

Other Genes and Conserved Open Reading Frames

Comparison of Solanaceous Plastid Genomes

Completion of the nucleotide sequence of the tomato plastid genome offered the unique opportunity to conduct an in-depth comparison of the plastomes of three closely related species belonging to the same family of dicotyledonous plants: Nicotiana tabacum (tobacco [Shinozaki et al. 1986; Wakasugi et al. 1998]), Atropa belladonna (Schmitz-Linneweber et al. 2002), and Solanum lycopersicum (tomato).

Plastid genome sizes, structural properties, and AT content in different genome regions and gene classes are compared in Table 1. Compared to the tobacco plastome, the inverted repeat region (IR) in tomato is slightly expanded on both ends (into rps19 in the LSC and into ycf1 in the SSC; Fig. 1, Table 1). Nonetheless, the tomato genome is smaller than the tobacco plastome, which can be chiefly ascribed to deletions in noncoding intergenic spacer regions (Supplementary Table 1). The tomato plastome is also smaller than that of Atropa, which is mainly due to a slightly larger IR in Atropa (Table 1). While the overall AT content is nearly identical in tobacco and tomato, it is significantly higher in Atropa (Table 1). Whether this can be explained by a somewhat stronger selection pressure toward AT richness operating in Atropa or, alternatively, by differences in the mutation rate and/or spectrum remains to be investigated. Comparison of the AT content in different regions and gene classes reveals striking differences: In all three genomes, AT content is highest in the SSC region and lowest in the IR. Noncoding regions have a dramatically higher AT content than coding regions and protein coding genes are much more rich in AT than RNA genes (Table 1) (Shimada and Sugiura 1991). The latter may be explicable by a high demand for stable GC base pairs to ensure proper folding of the highly structured rRNAs and tRNAs.

While the gene content of the tomato plastome is identical to that of the previously sequenced solanaceous plastid genomes, it was of interest to assess the conservation of ORFs. ORFs that are not conserved between closely related solanaceous species are unlikely to be genuine genes. Table 2 shows a comparison of ORFs in the tobacco and tomato plastomes. This data set excludes highly conserved ORFs (ycfs), for most of which good experimental support has been gained that they constitute genuine genes (see list above). Many of the tobacco ORFs are not conserved and have suffered frameshift mutations and/or larger deletions in tomato, suggesting that these reading frames are fortuitously present in tobacco and are unlikely to encode functional gene products. A few short ORFs in the IR are conserved, which, however, is not necessarily indicative of a possible functional significance: It is well established that the mutation rate in the IR is much lower than in the single-copy regions of the plastome (Wolfe et al. 1987; Maier et al. 1995). This phenomenon is generally explained by the operation of gene conversion between the two IR copies and may well be responsible for the conservation of some of the short ORFs in the IR (Table 2). In the absence of experimental evidence supporting a possible function, these ORFs were therefore not considered in the physical map of the tomato plastome (Fig. 1). Tomato ORF380 (equivalent to ORF350 in tobacco) represents a special case in that it encodes the N-terminal portion of the Ycf1 protein, the function of which is unknown but has been shown to be essential for cell survival (Drescher et al. 2000). Being a partial duplication of ycf1, it is not surprising that this ORF is conserved. As also the expression signals upstream of the reading frame (promoter, 5’ UTR) are identical to those of ycf1 (Fig. 1), it is reasonable to assume that ORF380 is expressed. However, whether or not the corresponding protein product serves some function remains to be investigated.

Table 2 Conservation of open reading frames in the tobacco and tomato plastid genomes

When the insertions and deletions (InDels) in the tomato plastome were analyzed using the tobacco sequence as a reference, the vast majority of InDels were located in noncoding spacer regions and introns (Supplementary Table 1). However, a few InDels affect coding regions (Table 3), which prompted us to analyze the consequences for the encoded gene products. With a single exception, all InDels in protein-coding regions do not alter the reading frame and thus just change the lengths of the encoded protein by one or a few amino acids (Table 3). The only exception is rps16, where the deletion of 10 nucleotides causes a frameshift mutation (Table 3, Fig. 2). This frameshift mutation, however, has occurred very close to the termination codon so that the resulting changes at the amino acid level are limited to the very C-terminus of the protein (Fig. 2). The C-terminus of Rps16 is not very well conserved among higher plant species (Fig. 2), suggesting that the frameshift mutation is functionally neutral. Two other InDels affect RNA genes (16S rRNA and tRNA-Ser-UGA). Alignment of the corresponding sequences from a number of species revealed that both InDels are in variable regions of the two genes, again suggesting that they are unlikely to negatively impact on gene product function (Fig. 2).

Table 3 InDels in coding regions of the tomato plastid genome
Fig 2
figure 2

InDels in coding regions of the tomato plastid genome. Partial sequence alignments for rrn16 (16S rRNA), trnS-UGA, and rps16 are shown. InDels are indicated by hyphens. The brace below the 16S rRNA alignment marks a duplicated sequence motif in tomato and Atropa belladonna which presumably has arisen from replication slippage. Note that InDels in the highly variable 3’ end of the rps16 reading frame cause substantial interspecific variation in the C-terminus of the Rps16 protein as shown in the amino acid sequence alignment. Abbreviations of species names are given at the bottom.

Having three solanaceous plastome sequences available enabled pairwise comparisons of the three plastomes in order to deduce phylogenetic relationships and evolutionary trends for individual genes and gene classes. Pairwise homology values were first determined by aligning the entire genomes (Table 4). This analysis revealed that tobacco and Atropa may be more closely related to each other than tomato is to either of the two other solanaceous species. The analysis of a subset of genes (or even of the entire SSC) would not have been informative enough to deduce phylogenetic relationships among the three closely related species (Table 4), underscoring the importance of acquiring large sequence data sets to resolve such relationships. This becomes even more evident when individual genes are compared pairwise and grouped in identity classes (Table 5). While the overall picture seems to confirm that Atropa and Nicotiana are more closely related than tomato with either tobacco or Atropa (Table 5), the analysis of individual genes can tell a different story, illustrating the danger of basing phylogenetic conclusions on the analysis of only one or a few plastid genes. The pairwise comparison of individual genes presented in Table 5 also reveals a set of plastid genes which display a relatively high level of interspecific variation and thus, together with intergenic spacer regions (Kress et al. 2005), may be particularly informative to resolve phylogenetic relationships at the species level. Among them are clpP, ycf1, ycf10, accD, matK, and ccsA, some of which (e.g., matK and ycf1) are often used in phylogenetic analyses. With the exception of accD, the pairwise comparison of all of them supports the closer relationship between Nicotiana and Atropa. Nonetheless, caution is needed in spite of using entire plastid genome sequences for phylogenetic analyses. When, for example, the three InDels in trnS-UGA, rrn16, and rps16 are analyzed (Fig. 2), in all three cases, Atropa belladonna and Solanum lycopersicum show identical insertions and deletions relative to tobacco, which, other than the total genome sequences, would support a closer phylogenetic association of these two solanaceous species (which would be in congruence with existing phylogenies of the Solanaceae [Olmstead et al. 1999]). Thus, even complete plastome sequences can be insufficient to unambiguously resolve phylogenetic relationships among closely related species.

Table 4 Pairwise comparison of solanaceous plastid genomes
Table 5 Pairwise comparison of plastid genes in Solanaceae and their grouping in identity classes: Genes in the IR are in boldface

Evolution of Plastid RNA Editing Patterns in Solanaceous Species

A hallmark of gene expression in higher plant cell organelles is the requirement for an additional RNA processing step referred to as RNA editing. RNA editing in plastids and mitochondria is a posttranscriptional process changing the identity of single nucleotides in primary transcripts at highly specific sites. In plastids of seed plants, these changes appear to be restricted to cytidine-to-uridine conversions (Hoch et al. 1991; Kudla et al. 1992; for review see, e.g., Bock 2000, 2001), whereas in chloroplasts of hornworts extensive “reverse” editing by U-to-C transitions has been observed (Kugita et al. 2003). With very few exceptions (Hirose et al. 1996; Kudla and Bock 1999), the vast majority of known plastid editing events alters the coding properties of the affected mRNAs and usually results in the restoration of triplets for phylogenetically conserved amino acid residues (Maier et al. 1992a b). Transgenic experiments creating tobacco plants with a noneditable version of a plastid gene have provided direct evidence for the functional importance of RNA editing in chloroplast gene expression (Bock et al. 1994).

Comparative phylogenetic analyses of RNA editing sites in plastid genomes have revealed that many plastid editing sites are poorly conserved interspecifically (Freyer et al. 1997). For example, of the 26 RNA editing sites located in the chloroplast genome of black pine, Pinus thunbergii, only one site is also found in the tobacco plastid genome (Wakasugi et al. 1996). Remarkably, even closely related plant species can differ significantly in their editing patterns (Freyer et al. 1995; Schmitz-Linneweber et al. 2001, 2005). While some plastid RNA editing sites are well conserved, at least within certain taxonomic groups, others appear more or less sporadically in largely divergent taxonomic groups (Freyer et al. 1997; Fiebig et al. 2004). Whether the latter can be explained by several independent acquisition events during evolution or, rather, by several independent losses of ancient editing sites that were present in a common ancestor, is largely unknown.

To assess the evolutionary dynamics of RNA editing in plastid genomes in greater detail, the editing patterns in the three sequenced solanaceous plastomes were compared. In previous work, 34 RNA editing sites had been found in the tobacco plastome and 31 sites in Atropa (Hirose et al. 1999; Tsudzuki et al. 2001; Schmitz-Linneweber et al. 2002; Sasaki et al. 2003) (Table 6). In order to identify possible tomato-specific sites of C-to-U RNA editing, all protein-coding genes in the tomato plastome were conceptually translated and the resulting amino acid sequences compared with the corresponding sequence from tobacco, which, however, were obtained by translating the edited mRNA sequences. All mismatches were evaluated with respect to the possibility that a C-to-U conversion in tomato potentially could restore a codon for the amino acid residue present in the corresponding position in the tobacco sequence. This led to the identification of altogether nine candidate sites (data not shown). Two of these candidate sites (ndhD codon 293 and rpoB codon 809) were previously identified as RNA editing sites in Atropa belladonna (Schmitz-Linneweber et al. 2002) and analysis of the corresponding cDNA sequences from tomato confirmed C-to-U RNA editing at these sites also in tomato (Table 6 and data not shown). The remaining seven potential editing sites were also analyzed experimentally by directly sequencing amplified tomato cDNA samples. Whereas in six cases no evidence for RNA editing was detected, a new editing site was discovered in the rps12 gene (Fig. 3A). The editing event changes a genomic serine codon into a conserved leucine codon by a C-to-U change in second codon position (Fig. 3A, Table 6). The two other solanaceous species, tobacco and Atropa, contain the TTA leucine codon at the DNA level and, thus, lack RNA editing at this site. As none of the other candidate sites identified in our computer analyses could be confirmed experimentally, the newly discovered site in rps12 is the only tomato-specific RNA editing site (Table 6, Fig. 4).

Table 6 Comparison of RNA editing patterns in solanaceous plastid genomes
Fig 3
figure 3

Discovery of novel RNA editing sites. A Identification of a tomato-specific RNA editing site in the rps12 transcript. DNA and corresponding cDNA sequences are shown. The editing position is indicated by a vertical arrow and numbered in the DNA sequence with the nucleotide numbers in the plastid genome (two numbers because the sequence is part of the IR) and in the cDNA sequence with the codon numbers in the rps12 reading frame. Partial rps12 nucleotide and amino acid sequences from selected plant species are also shown, with the editing position marked by vertical arrows. Note that the editing event changes a genomic serine codon (TCA) to a leucine codon (UUA) in tomato. In all other species, a leucine codon is already present at the DNA level, and hence, no editing occurs. B Identification of two novel RNA editing sites in the ndhD transcripts of solanaceous plastids. DNA and corresponding cDNA sequences from tomato are shown. The two editing positions are indicated by vertical arrows and numbered in the DNA sequence with the nucleotide numbers in the plastid genome and in the cDNA sequence with the codon numbers in the ndhD reading frame. In the bottom panel, partial ndhD nucleotide and amino acid sequences from selected plant species are depicted. The editing positions are marked by vertical arrows. Note that the editing event in codon 437 changes a proline codon (CCA) to a leucine codon (CUA) in tomato but alters a UCA serine codon to a UUA leucine codon in the two other solanaceous species. Amino acid residues derived from the unedited sequence are shown in lowercase letters in the amino acid sequence alignments.

Fig 4
figure 4

Overview of RNA editing site evolution in Solanaceae. All sites present in a given species are enclosed in the respective color-coded circle. Shared sites between two or more species are shown in blue. Unique sites are shown in the color of the respective species. The majority of sites (30) is shared between all three species, whereas only a few sites are unique or shared by only two species.

Most of the editing sites present in tobacco and Atropa are also conserved in tomato (Table 6, Fig. 4), suggesting that these sites also undergo C-to-U editing in tomato. About two-thirds of the sites were experimentally analyzed in tomato, and as expected, editing was confirmed (Table 6). A notable exception is site 2 in the atpA gene. This site is unique in that it is the only site in tobacco (and Atropa) plastids where editing is silent, that is, does not change the coding properties of the affected triplet. This is because editing occurs in the third position of the codon and both the unedited codon UCC and the edited codon UCU specify the amino acid serine. Editing at this site is only partial, with a large fraction of the atpA mRNA population remaining unedited (Hirose et al. 1996). Interestingly, analysis of amplified atpA cDNAs from tomato revealed no evidence of RNA editing, indicating that the editing at this site in tobacco and Atropa is functionally irrelevant.

To obtain a genome-wide picture of plastid RNA editing, we also were interested in discovering additional editing sites shared by the three solanaceous species which might have escaped detection in the earlier work with tobacco and Atropa (Hirose et al. 1999; Tsudzuki et al. 2001; Schmitz-Linneweber et al. 2002; Sasaki et al. 2003; Tillich et al. 2005) (Table 6). Potential RNA editing sites can be most easily identified by comparing the conceptual translations of plastid genes from higher plant species with those from the liverwort Marchantia polymorpha, a species known to lack RNA editing in plastids (Bock 2001). In these analyses, we discovered two closely spaced codons in the ndhD gene where conserved leucine residues could be restored by C-to-U editing in all solanaceous species (Fig. 3B). Although the ndhD genes from tobacco and Atropa had been analyzed in several earlier studies on plastid RNA editing (Neckermann et al. 1994; Hirose et al. 1999; Tsudzuki et al. 2001; Schmitz-Linneweber et al. 2002; Sasaki et al. 2003), the location of the sites within an otherwise highly conserved protein domain (Fig. 3B) prompted us to test for the editing of these two candidate sites experimentally. To this end, we analyzed ndhD cDNA sequences from both tobacco and tomato. Tobacco was included because, for the second candidate editing site, the potentially edited codons are different between the solanaceous species: Whereas tobacco (and Atropa) has a genomic TCA serine codon, tomato has a CCA proline codon in this position (Fig. 3B). Interestingly, in both cases, C-to-U conversion in the second codon position would restore the conserved leucine residue, because both UUA and CUA triplets specify leucine. Comparison of ndhD DNA and cDNA sequences revealed that the two candidate sites undergo RNA editing in both tomato (Fig. 3B) and tobacco (data not shown; Fig. 3B, Table 6). As the two serine codons are also present in Atropa (Fig. 3B), there is little doubt that they also undergo editing in the third solanaceous species, although we have not verified this experimentally. In order to confirm that the seven identified ndhD editing sites (Table 6) represent the full set, we amplified and sequenced the complete ndhD cDNA from tomato. No other sites of RNA editing were found, which is in line with the absence of additional candidate sites from our bioinformatics analysis.

The current picture of RNA editing in solanaceous species is summarized in Table 6 and Fig. 4. Although it cannot be formally excluded that there are additional sites present in the three genomes which thus far have escaped detection, the reliable prediction of candidate sites by rather simple bioinformatics analyses makes it unlikely that many more sites remain to be discovered. It is noteworthy that the vast majority of sites (31 out of 41) creates leucine codons (Table 6), confirming codon biases of RNA editing observed earlier (Maier et al. 1995; Bock 2000, 2001). Figure 4 illustrates that the total number of sites is highly similar in the three species (37 in tobacco, 35 in Atropa, 36 in tomato). Most of the sites (30) are conserved in all three species, suggesting that they were already present in their common ancestor. Only four sites are unique in that they occur only in one of the three species (two in tobacco, one in Atropa, one in tomato). Few sites are absent from only one of the three species: tobacco and tomato share three sites that are absent from the Atropa plastome, tomato and Atropa have two sites that are missing in tobacco, and, finally, tobacco and Atropa also have two sites in common that are not present in tomato (Fig. 4, Table 6).

The finding that each species and each pair of species show a few specific sites (Fig. 4) allows some speculation about the evolutionary origin of these sites. The most parsimonious explanation may be that the sites shared by two of the three species were present in the common ancestor of the Solanaceae and were then lost in one of the three species by a genomic C-to-T mutation. The opposite scenario (acquisition in two species) may seem less likely but, at present, cannot be entirely excluded.

With three plastid genomes being sequenced and the editing patterns determined, the Solanaceae family currently offers the most comprehensive data set about the evolutionary flexibility and dynamics of plastid RNA editing among closely related species. In view of the functional importance of mRNA processing by editing (Bock et al. 1994) and the recently discovered role for RNA editing in nucleocytoplasmic incompatibility phenomena in solanaceous plants (Schmitz-Linneweber et al. 2005), the significance of this dynamic of RNA site evolution cannot be underestimated.

Assessing Intraspecific Plastome Variation in Tomato

The cultivated tomato, Solanum lycopersicum, has been subjected to extensive breeding programs in both Latin America and Europe. The tomato was introduced from Mexico into Europe in the sixteenth century but was initially regarded as poisonous, and the fruits were not widely consumed until the nineteenth century ( Simpson and Ogorzaly 2001). To determine whether there were differences in the plastome sequences of European and Latin American tomato cultivars, we compared the plastome sequence of IPA-6, a commercial tomato cultivar bred in Brazil, with that of Ailsa Craig, one of the oldest cultivars bred and grown in Europe. Ailsa Craig was bred in a nursery in Girvan, Ayrshire, Scotland, overlooking the island of Ailsa Craig, and was released in 1910 (Lisman 1961). It is suited to a northern European climate and is unlikely to have been directly involved in the origins of IPA-6. Surprisingly, the nucleotide sequences of the IPA-6 and Ailsa Craig plastomes were absolutely identical and did not show a single nucleotide difference. This indicates a remarkable conservation of sequence over a period of at least several hundred years of separation of the two tomato cultivars and suggests that the plastomes of modern tomato varieties display very little, if any, sequence variation. An earlier comparison of large sequence stretches of the closely related plastid genomes of Nicotiana sylvestris and its allopolyploid descendant Nicotiana tabacum had revealed a single nucleotide substitution in 4656 bp of plastid DNA sequence (Clarkson et al. 2004), pointing to a very low degree of sequence variation also in tobacco.