Introduction

Chloroplasts are intracellular organelles that have their own genome in which a number of genes are encoded for the chloroplast components and photosynthesis. Chloroplast genomes vary in size from 35 to 217 kb, but in land plants, most of them are between 115 and 165 kb and exhibit a high similarity in their structure and gene organization (Jansen et al. 2005). The overall structure of chloroplast genome consists of two copies of an inverted repeat (IR) ranging from 5 to 76 kb in length. These copies are separated by a large single-copy (LSC) and a small single-copy (SSC) region (Sugiura et al. 1998).

Although the overall structure of the chloroplast is relatively uniform and is conserved in land plants, a number of mutations have been observed in the chloroplast genome. The mutations include structural changes such as inversions (Kim et al. 2005; Kim and Lee 2005; Sugiura et al. 2003), rearrangements of gene order (Cosner et al. 2004; Saski et al. 2005), and insertions/deletions (indels) (Calsa Junior et al. 2004; Kato et al. 2000; Ogihara et al. 2002; Shahid Masood et al. 2004) as well as base substitutions (Schmitz-Linneweber et al. 2002). A comparison of the complete chloroplast sequences of the closely related grass family, maize, wheat, rice, and sugarcane, reveals the presence of several hotspots for length mutations (Asano et al. 2004; Calsa Junior et al. 2004; Guo and Terachi 2005; Maier et al. 1995; Ogihara et al. 2002; Ogihara et al. 1991). One of the divergent hotspots is the tRNA gene cluster, trnS(UGA), trnG(GCC), trnfM(CAU), trnG(UCC), trnT(GGU), trnE(UUC), trnY(GUA), trnD(GUC), trnC(GCA) in the LSC region. In the intergenic region of this tRNA gene cluster, a large number of indels ranging from 1 to 811 bp in length are commonly presented. The largest deletions over 500 bp in size are found at upstream of trnC(GCA) and trnT(GGU), and at downstream of trnD(GUC) in rice chloroplast (Calsa Junior et al. 2004; Maier et al. 1995; Ogihara et al. 2002). The hotspot of divergence is also recognized in the region between rbcL and cemA. In dicots, accD, psaI, and ycf3 genes are present within this region. However, in three grass species such as rice, wheat, and maize, there occurs the deletion of accD gene and the non-reciprocal translocation of rpl23 (Maier et al. 1995). Another hot spot of divergence characterized in grass species is in the ycf2 gene. In dicots, the ycf2 gene is conserved and located along with ycf15, ORF115, and ORF92 between trnI and trnL. However, one or two deletions occur within the ycf2 gene which generates several open reading frames (ORFs) in case of maize, rice, wheat and sugarcane (Calsa Junior et al. 2004; Maier et al. 1995; Ogihara et al. 2002).

Table 1 Insertion/deletion and base substitution events in the LSC, IR, and SSC region of S. tuberosum relative to S. bulbocastanum

Currently, complete chloroplast DNA sequences of 48 species, ranging from single-cell organism Euglena longa to land plant Oryza sativa, have been determined (NCBI: Organelle Genomes: http://www.ncbi.nlm.nih.gov: 80/genomes/static.euk_o.html). Among the land plants, complete chloroplast genome sequence has been first accomplished in tobacco (N. tabacum), which belongs to the Solanaceae family (Shinozaki et al. 1986). The Solanaceae family is composed of more than 3000 species, including potato (S. tuberosum), tomato (Solanum lycopersicum), eggplant (Solanum melongena), pepper (Capsicum annuum), and petunia (Petunia hybrida). Potato is an herbaceous perennial cultivated plant bearing tuber that is commercially significant portion of the plant. Over the years, potato has become an important crop for both farmers and consumers, and is the fourth important crop in the world in terms of production, after rice, wheat, and maize. Reflecting the importance of the crop, there are growing numbers of researches into the genetic engineering of potatoes, including the production of GM potatoes such as insect resistance, virus resistance and changes in nutritional quality such as starch or protein content.

By this time, the entire chloroplast DNA sequence of the three species from the Solanaceae family, N. sylvestris (AB237912), N. tomentosiformis (AB240139), and A. belladonna (AJ316582), along with N. tabacum (Z00044), have been additionally determined. Very recently, at the time of this study, the chloroplast DNA sequences from two Solanaceaes, S. lycopersicum (DQ347959) and S. bulbocastanum (wild species potato, DQ347958), were also widely available. Among the Solanaceae species, comparative analysis of chloroplast DNA sequences has been performed between N. tabacum and A. belladonna (Schmitz-Linneweber et al. 2002), between S. lycopersicum and S. bulbocastanum (Daniell et al. 2006), and between Nicotiana species (Yukawa et al. 2006). In this paper, we present the complete chloroplast DNA sequence from S. tuberosum L. cv. Desiree and the comparison of chloroplast genome sequences with those of the other six Solanaceae species, N. tabacum, N. sylvestris, N. tomentosiformis, S. lycopersicum, S. bulbocastanum and A. belladonna. We especially focus on the ORFs and the length mutations. Our study will provide a rich source of the nucleotide and amino acid sequence data, which can be utilized to address phylogenetic and molecular evolutionary question to engineer as well as to breed potatoes.

Materials and methods

Isolation of chloroplast DNA

Fresh leaves were harvested from 2 to 3 weeks old S. tuberosum L. cv. Desiree. Chloroplasts were isolated by the sucrose-gradient method as described in Oharamays and Capwell (1993). Chloroplast DNAs (cpDNAs) were isolated from the purified chloroplast by lysis and ultracentrifugation. Template DNA was prepared with 100 ng of genomic DNA by polymerase chain reaction (PCR). Amplification was performed using ExTaq (Takara Bio Inc., http://www.takara-bio.com) and with the following conditions: 5 min denaturation at 95°C followed by 35 cycles of denaturation at 95°C for 50 s, annealing at 55–60°C for 45 s, and extension at 72°C for 5 min. PCR products were about 4000–5000 bp in size, and each product was overlapped with adjacent fragments in 500–800 bp. Primers were designed according to the chloroplast sequence of N. tabacum.

Shotgun library construction

Twenty micrograms of five or six combined PCR products were sheared into approximately 1 kb in size by HydroShear DNA shearing device at speed code 3 (GeneMachines, http://www.bst-asia.com). The sheared DNA was blunted and phosphorylated by using the DNA End-Repair kit (Epicentre, http://www.epibio.com) according to manufacturer's instructions. The end-repaired DNA was fractionated by MicroSpin S-400 HR columns (Amersham, http://www. amershambiosciences.com), and DNAs of approximately 0.5–1.5 kb were isolated from the end-repaired DNA. The fragments were ligated into a pUC118 plasmid vector that previously had been digested with HincII and treated with bacterial alkaline phosphatase. The ligated DNA samples were introduced into Escherichia coli DH5α by electroporation, and plated on LB Ampicillin plate. The titers of five libraries were 0.7×103 to 2.0×103 cfu.

Fig. 1
figure 1

Gene map of S. tuberosum chloroplast genome. IR regions (25,595 bp) are separated by the SSC (18,373 bp) and LSC (85,749 bp) regions. Genes inside the map are transcribed clockwise, those outside are counterclockwise. Intron-containing genes are indicated by asterisks

Sequencing, assembly and annotation

Individual clones were picked into deep well blocks containing 760 μl of TB with 8% glycerol and 50 μg ml−1 ampicillin. The clones were grown overnight at 37°C, shaking at 600 rpm, and plasmid DNA was isolated using a HT prep machine (Bioneer, http://www.bioneer.co.kr). The 5′ and 3′ DNA sequences were determined by a capillary DNA sequencer (RISA 384, Shimadzu, http://www.shimadzu.com) using the DYEnamic ET Terminator cycle sequencing kit (Amersham, http://www.amershambiosciences.com). The full sequences of three large gaps (1, 8, and 17), which are not amplified by PCR, were determined by a primer walking method. The remaining 31 small gaps were amplified by PCR and sequenced directly from the PCR products. The sequence data were processed by the base-calling program Phred and the assembler Phrap (version 0.990319, http://www.genome.washington.edu/UWGC). The resulting contigs were analyzed in Consed, a powerful software package used for sequence finishing (http://www.phrap.org/consed/consed.html) (Gordon et al. 1998). Identification and annotation of genes in S. tuberosum chloroplast genome were performed using DOGMA (Dual Organellar GenoMe annotator) (Wyman et al. 2004). This program takes a FASTA-formatted input file of the complete genomic sequences and identifies putative protein-coding genes by performing BLASTX searches against a custom database of 16 published chloroplast genomes of green plants, including Arabidopsis, Chlorella, Lotus, Oenothera, Oryza, Pinus, Spinacia, Zea, and Nicotiana, etc. In addition, a stand-alone BLAST search was performed for comparative analysis of all the known chloroplast genes against a database of the S. tuberosum chloroplast sequences (Altschul et al. 1997). Alignment of the chloroplast genome sequences of seven Solanaceae species was performed using the BioEdit sequence alignment editor (North Carolina State University).

Fig. 2
figure 2

Comparison of the border positions of IR, SSC, and LSC regions among five Solanaceae chloroplast genomes. The border structures of N. sylvestris and S. bulbocastanum were not included because they had same border position as N. tabacum and S. tuberosum, respectively. S. tuberosum, S. lycopersicum, A. belladonna, and N. tomentosiformis chloroplast genomes contain pseudogenes rps19 and ycf1 that were created at the IRa/LSC and IRb/SSC borders as far as the IR regions extended into the genes. Locations of ndhF and trnH genes were drawn with numbers of base pairs

Phylogenetic analysis of ORFs

Seventeen ORFs in N. tabacum were used as queries in BLAST search against a database of chloroplast DNA sequence from seven Solanaceae species, Cucumis sativus and Spinacia oleracea, with normal stringency. Retrieved sequences from 17 ORF regions were readily aligned manually and unambiguously using Sequencher 4.1 (Gene Codes, http://www.genecodes.com). And we used Clustal X (Thompson et al. 1997) to align the sequences with varying gap opening and extension penalties, followed by some degree of manual editing in Sequencher. The aligned sequences were analyzed in combination using the parsimony algorithm of PAUP* for Macintosh (version 4.0b10; Swofford 1998). All characters were weighted equally (Fitch, 1971) and unordered; gaps were treated as the fifth base. Heuristic searches were conducted with ‘MULPARS’, TBR branch swapping, and ‘ACCTRAN’ optimization. Internal support was determined by bootstrap analysis (Felsenstein 1985) with 1000 heuristic replicates with simple addition, TBR branch-swapping. For the parsimony analysis of 13 combined chloroplast DNA ORF regions, C. sativus and S. oleracea were used as outgroups to analyze the phylogenetic status of S. tuberosum among the other ingroup taxa of Solanaceae.

Results and discussion

Overall structure and gene content of S. tuberosum chloroplast DNA

The entire chloroplast DNA sequence of S. tuberosum was determined (GenBank accession no. DQ231562). It is circular double-stranded DNA molecule of 155,312 bp length, with typical quadripartite structure of most plastid genomes; and its gene map is shown in Fig. 1. S. tuberosum chloroplast DNA consists of a pair of inverted repeat regions of 25,595 bp that are separated by small single-copy (SSC) region of 18,373 bp, and a large single-copy (LSC) region of 85,749 bp. The GC content of chloroplast DNA is 37%, which is the same as other Solanaceae species. The S. tuberosum chloroplast genome contains a total of 130 genes, 17 of which are duplicated in IR. Thirty tRNAs and four rRNAs were identified. Seven tRNAs are duplicated in the IR, and 4 rRNAs are clustered and inversely oriented in the IR, as reported in other land plants. Eighteen genes contain one or two introns, and six introns are tRNAs. Four introns are located in IR and one intron in SSC.

Contraction and expansion of IRs

The borders between the inverted repeats (IRa and IRb) and the two single-copy regions (LSC and SSC) usually differ among various plant species. Large expansions or contractions of the inverted repeat regions often caused variation in length of chloroplast genomes. The IR of S. tuberosum is 25,595 bp long, which is almost similar to the length of S. lycopersicum (25,611 bp). But the IR is either 253 bp longer than the length of N. tabacum or 311 bp shorter than the length of A. belladonna. In Fig. 2, the exact IR border positions of five Solanaceae species, except N. sylvestris, are compared. The IR border position of N. sylvestris was same as that of N. tabacum. In all species, the border between IRa and SSC was located within the coding region of ycf1 gene and created the ycf1 pseudogenes at the IRb/SSC border with lengths as far as the IR expanded into the ycf1 gene. For example, S. tuberosum IR was far extended into the ycf1 gene, resulting in 1122 bp of the ycf1 pseudogene at the IRb/SSC border. Likewise, N. tabacum has 996 bp of ycf1 duplicated, whereas A. belladonna and N. tomentosiformis have 1438 and 1010 bp of the ycf1 pseudogene at the IRb/SSC border, respectively. The ndhF genes were located entirely within the SSC region, with various lengths of intergenic space. In S. tuberosum, the ndhF gene was located one base apart from the border, whereas 32 bp or more of the intergenic space before the ndhF gene was observed in the other species.

Fig. 3
figure 3

Multiple alignments of the coding genes and introns containing the indels. The nucleotide sequences of the coding genes A and introns B are presented. Deleted nucleotides are marked by dashes, and numbers above were based on the position in S. tuberosum chloroplast genome sequence. When the nucleotide sequences in both N. tabacum and N. sylvestris are perfectly matched, N. sylvestris was excluded from the multiple alignments

In addition to the IRa expansion into ycf1 gene, the IRb region extended into the rps19 gene was also found in S. tuberosum, S. esculentum, A. belladonna, and N. tomentosiformis, creating a duplication of various lengths of the 5′ end of rps19 gene at the IRa/LSC border. Both A. belladonna and N. tomentosiformis have 59 bp of the rps19 pseudogene, and S. tuberosum and S. esculentum have 69 and 91 bp of the rps19 pseudogene, respectively. It is interesting that there was no extension of the IRb region into rps19 gene in N. tabacum and N. sylvestris. The location of trnH gene was quite conserved among Solanaceae. In S. tuberosum, the IRa/LSC border was located 30 bp downstream of the non-coding region of trnH(GUG) gene, while the other four Solanaceae species had the IRa/LSC border located 2–6 bp of downstream of trnH(GUG) gene. Similar IR contraction and expansion has been analyzed in Glycine, Arabidopsis, Lotus, Panax, cucumber, wheat, rice, and maize (Kim et al. 2005; Kim and Lee 2004; Maier et al. 1995; Ogihara et al. 2002; Saski et al. 2005). This structural feature of borders between two IRs and two single-copy regions may due to intramolecular recombination between two short direct repeat sequences within the genes located at the borders (Maier et al. 1995).

Fig. 4
figure 4

Sequence comparison of intergenic regions containing indels flanked by direct repeats. Deleted nucleotides are marked by dashes, and numbers above were based on the position in S. tuberosum chloroplast genome sequence. Direct repeats around indels are shaded. When the nucleotide sequences in both N. tabacum and N. sylvestris are perfectly matched, N. sylvestris was excluded from the multiple alignments

Length mutations in the coding genes

Although the overall chloroplast genomic structures of Solanaceae species were quite similar, a complete alignment of chloroplast genome sequences revealed a reasonable number of indels in the protein-coding genes and intergenic spacer regions between Solanaceae species. Regarding the protein-coding genes, there was indels in five protein-coding regions when compared with other six Solanaceae species. Comparison of accD genes between seven Solanaceae species, two events of indel were observed. The deletion of 24 bp occurred in three Solanum species and A. belladonna, whereas insertion of 9 bp was found only in three Solanum species (Fig. 3A). Direct repeats of TAGT and ACATGT are associated with these indels. The accD gene encodes the beta-carboxyl transferase subunits of acetyl-CoA-carboxylase (ACCase) and is present in the plastids of most land plants. The tobacco ACCase is essential for leaf development, leaf longevity, and seed yield (Kode et al. 2005; Madoka et al. 2002). The accD gene located between rbcL and ycf4 has been progressively deleted among grass family (Calsa Junior et al. 2004; Maier et al. 1995). ORF106 is present only in rice as a remnant of accD gene, and a complete deletion of accD occurs in maize, wheat, and sugarcane.

Additional indels did take place in the ycf1 and ycf2 genes. Most higher plants contain these two genes, which appear to be essential for cell survival, as homoplastomic ycf1 deletion mutants of Chlamydomonas and tobacco could not be achieved (Boudreau et al. 1997; Drescher et al. 2000; Maier et al. 1995); although ycf1 in the chloroplast genome of maize and rice is reduced to a series of shorter reading frames by various deletions (Maier et al. 1995; Ogihara et al. 2002). Since the sequence comparison of ycf2 has been reported (Daniell et al. 2006), the comparison of ycf1 between Solanaceae species was considered in this study. A total of 13 indels occurred within ycf1; 3 of them that found in three Solanum species are depicted in Fig. 3A. Small indels of 6 and 12 bp were one of the short base repeats, ATTTTT and GTTTTT, whereas a 36-bp deletion was not associated with any repeat sequences.

Fig. 5
figure 5

Aligned nucleotide sequences of intergenic regions containing indels not associated with direct repeats. Deleted nucleotides are marked by dashes, and numbers above were based on the position in S. tuberosum chloroplast genome sequence

Eighteen genes for six tRNAs and 12 proteins contain introns as described above. The indels in introns of ycf3 and trnK genes were identified. ycf3 contains two introns of 727 and 750 bp in length, whereas trnK has 2512 bp intron. Deletions of poly(T) tracks were found in both intron 1 and 2 of three Solanum species. In trnK intron of A. belladonna and S. bulbocastanum, a 18 bp was deleted at different position, whereas S. tuberosum and S. lycopersicum contained a 8-bp deletion in their introns.

Table 2 Comparison of the ORFs from seven Solanaceae species

Length mutations in the intergenic regions

Overall, the gene contents and their relative positions of S. tuberosum are similar to the other six Solanaceae species that we compared. The total length of S. tuberosum chloroplast DNA is the shortest among the Solanaceae plants reported so far, whereas A. belladonna contains the longest chloroplast genome. The LSC of S. tuberosum is 85,740 bp long, which is 937 and 1119 bp shorter than the length of A. belladonna and N. tabacum, respectively. This size difference is mainly due to large number of deletions in the LSC region. Multiple alignment of seven Solanaceae chloroplast genome sequences revealed that there are 64–69 deletions equal to or larger than 5 bp in the intergenic regions placed in the LSC region relative to N. tabacum. Three remarkable variable intergenic regions were identified in Solanaceae species. The intergenic regions locate between the tRNA cluster (trnC--trnfM), the trnL3′ and trnM, and rbcL and psbJ genes. Striking feature of S. tuberosum was a single large deletion of 452-bp in the intergenic region between the trnT and trnE genes compared with the other species (data not shown). Divergence of these indel regions is similarly identified in grass family wheat, maize, rice, and sugarcane (Asano et al. 2004; Calsa Junior et al. 2004; Maier et al. 1995; Ogihara et al. 2002). One or two large deletions of 240–800 bp in size are commonly found the intergenic regions between the trnT and trnG5 genes, and the psbM and trnD genes in the wheat and rice chloroplasts. Apart from the large deletions described above, the rice chloroplast also contains the large deletion at upstream of trnC (Ogihara et al. 2002). Thus, a comparison of our results with those of the grass family shows that the length mutation and its location within the tRNA cluster are quite divergent between dicots and monocots, and that the size divergence is even more extended among the grass family.

Table 3 Pairwise distance matrix from 13 ORF regions of chloroplast DNA on the S. tuberosum and related taxa

Detailed indels, at least over 6 bp, identified only in S. tuberosum were selected, and classified into two groups on the basis of the presence of direct repeats (Figs. 4 and 5). A single largest 543-bp insertion occurred especially in the intergenic region between ycf4 and cemA in S. tuberosum (Fig. 4). Direct repeats of TTGAGA were associated with this large insertion. A 508-bp insertion, corresponding to this largest insertion, was also found in S. lycopersicum and S. bulbocastanum. The variation in length polymorphism between ycf4 and cemA is similarly observed from wheat and its closely related species, Aegilops, by sequence comparison as well as restriction fragment pattern analysis (Guo and Terachi 2005; Ogihara et al. 2002; Ogihara et al. 1991). Ae. caudata has a 289-bp of large deletion in the intergenic region between ycf4 and cemA compared with that of Ae. mutica. In this case, a pair of direct repeats, “A4GAAGAA” is present in Ae. mutica and other related species.

Three indels, not flanked by direct repeats that showed the difference between S. tuberosum and S. bulbocastanum, were found (Fig. 5). One of them was a single largest 241-bp deletion in S. tuberosum. This deletion occurred in the intergenic region between ndhC and trnV. Interestingly, the 241-bp deleted sequence was absent in other land plants including Arabidopsis, spinach, rice, corn, and ginseng, but was present in A. belladonna and Nicotiana species. In N. tomentosiformis, a 41-bp deletion occurred within the corresponding region of the 241-bp deletion, which was associated with direct repeats, GGATAT. The cultivated potato, S. tuberosum, is a tetraploid and classified into Andigenum Group (S. tuberosum L. subsp. andigenum) and Chilotanum Group (S. tuberosum L. subsp. tuberosum) based on its origin. It has been shown that the presence of the 241-bp deletion is typical of Chilotanum Group (Kawagoe and Kikuta 1991; Spooner et al. 2005). Our study indicates that S. bulbocastanum, a diploid wild potato, shares the same gene pool with Andigenum Group, and that the 241-bp deletion clearly represents the genetic difference between the cultivated potato and wild potato. Other large deletions were found in the intergenic region between matK and rps16, and rpl18 and rpl20. The deletion position and its degree in the intergenic region between matk and rps16 were quite diverse even in Solanum species, whereas a 45-bp deletion between rpl18 and rpl20 occurred only in S. tuberosum. Relative to S. bulbocastanum, S. tuberosum chloroplast genome contained a 591 base substitution, 76 insertions, and 33 deletions including the indels described above, and most of them occurred in the LSC region. Taken together, our data obtained from the cultivated potato S. tuberosum could provide potential molecular markers to evaluate genetic diversity as well as evolutionary processes in potato landraces.

High divergence in annotated open reading frames between Solanaceae

In addition to defined genes, S. tuberosum chloroplast genome contains various ORFs like other higher plants. N. tabacum chloroplast genome harbors 17 of ORFs (Yukawa et al. 2006). The comparison of 16 ORFs, except ORF350 from seven Solanaceae species, is listed in Table 2. The ORFs located in IR were highly conserved in three Nicotiana species; but among them, three ORFs showed base substitutions and indels in three Solanum species and A. belladonna. In three Solanum species, ORF115 and ORF92 were reduced to either ORF89 or ORF48, and ORF66 or ORF54, respectively, mainly because of a 78-bp deletion. The ORF75 in A. belladonna contained premature stop codon due to base substitution. Unlike ORFs in IR, the ORFs in LSC showed remarkable divergences between N. tomentosiformis, A. belladonna, and three Solanum species. They are either reduced in size or fragmented or both due to indels and base substitution. The degree of changes in ORFs was quite variable between the species. For example, N. tabacum ORF90 was reduced to ORF64 in N. sylvestris due to 2-bp deletion, and to ORF25 in S. Tuberosum, S. lycopersicum and S. bulbocastanum due to early stop codon, whereas N. tomentosiformis contained ORF90 with three amino acid changes.

Fig. 6
figure 6

MP tree of S. tuberosum and related species inferred from the sequences of chloroplast DNA thirteen ORF regions. Numbers above the branches indicate character differences and those below the branches give bootstrap values from 1000 replicates. Tree length is 2544 with consistent index (=0.941) and retention index (=0.790)

Considering high degree of gene conservation in chloroplast genomes, the ORFs seem to be rapidly diverged so that they can be used to differentiate closely related species. Therefore, 13 ORFs retrieved from 7 chloroplast genome sequences from Solanaceae species and two outgroups were phylogenetically analyzed. The pairwise distance matrix for sequence divergence and patristics was generated from combined 4499 sites (Table 3). Parsimony analysis revealed 2347 constant, 1490 variable, and 662 informative characters. Sequence divergences among the ingroup taxa (Solanaceae) ranged from 0.00 (N. tabacum and N. sylvestris) to 0.265 (S. lycopersicum and A. belladonna), while the values between ingroup and outgroup ranged from 0.1345 to 0.2048. Two maximum parsimonious (MP) trees were produced with 2582 tree length. The trees differed from each other only in the terminal resolution within Solanaceae. The consistent index and retention index in the MP tree were 0.939 and 0.830, respectively (Fig. 6). Seven ingroup taxa resolved as a monophyletic group from outgroup with 100% bootstrap value. The Solanaceae taxa formed two sister clades. The first united a Nicotiana subclade and its sister group of Atropa, and three Nicotiana species showed strong monophyly with 96% bootstrap value. In the second clade, a very strong monophyletic Solanum group, with 100% bootstrap value, was generated. This result on phylogenetic relationship among the Solanaceae would be reflected by indel patterns among the ORF sequences as well as by morphological features such as leaf, flower and fruit shape patterns.

Conclusion

The complete sequence of S. tuberosum chloroplast genome revealed extensive similarity to six Solaneceae species in terms of the gene content and structure, suggesting a common chloroplast evolutionary lineage within Solanaceae. However, many of the features considered as typical to S. tuberosum chloroplast DNA were found in the intergenic regions and ORFs, and a few in protein-coding genes, which can be used as molecular markers to study the genetic diversity and population-genetic processes in potato landraces. In particular, this study inferred plastid phylogeny and evolution on the basis of ORFs from seven Solanaceae. The phylogenetic analysis produced a robust support for the phylogenetic positions of the S. tuberosum and S. bulbocastanum among Solanaceae.