Introduction

Mitochondrial genomes (mitogenomes) form units of genetic information, evolving independently from nuclear genomes. There are many unique features for mitogenomes, which include their small sizes, fast evolutionary rates, relatively conserved gene contents and organization, maternal inheritance and limited recombination events [14]. Thus, mitogenomes offer a broad range of characters to study phylogenetic relationships of animal taxa. Besides nucleotide and amino acid sequences, several features of mitogenomes have been successfully used as various sources for phylogenetic inferences, which include tRNA secondary structures [5], deviations from the universal genetic code [6, 7], as well as changes in the mitochondrial gene orders [8, 9].

Several genes encoded in the mitochondrion, particularly cox1, cox2, and rrnS, have been widely used in molecular phylogenetics [10, 11]. More recently, there has been growing attentions paid to the gene orders, in which the genes appear along the molecule, regarding it as a much informative genetic marker to resolve phylogenetic relationships between distantly related taxa [9, 1215].

The size of insect mtDNA varies from 14 to 19 kb [1619], although some sequenced mitogenomes possess an exceptionally large genome size (e.g., 36 kb) [20]. The representative metazoan mtDNA contains genes for a complete set of 22 tRNAs, 2 rRNAs (small- and large-subunit rRNAs) and 13 proteins components of the oxidative phosphorylation system. For these latter 13 components, there are subunits I, II, and III of the cytochrome oxidase (cox1, cox2 and cox3), the cytochrome b (cob), the subunits 6 and 8 of the ATPase complex (atp6 and atp8), and the subunits 1–6 and 4L of the NADH dehydrogenase (nad16 and nad4L) [21]. Additionally, it has a control region known as the A(adenine) + T(thymine)-rich region in insect mtDNA, which includes the sequences responsible for the origin of heavy-strand mtDNA replication in vertebrates [22] and those for the replication origin for both mtDNA strands in Drosophila species [21]. The length of this region is highly variable in insects due to the indels and the presence of variable copy numbers of tandemly repeated elements [2325].

The butterfly Papilio xuthus has been a famous pest of agriculture in Asia [26]. It is a species of the family Papilionidae (Lepidoptera: Papilionoidea) recorded in Burma, Philippines, Korea, Mongolia, Japan and China, etc. Several protein sequences of P. xuthus have been sequenced [27] and some partial mtDNA sequences are listed in GenBank. However, the mitogenome sequence of this species was not yet available before this paper. In this study, we report the nearly complete mitogenome sequence of the butterfly P. xuthus. Its sequence was compared with other insect mitogenomes and used for the reconstruction of insect phylogeny.

Materials and methods

Biological material

Specimens of P. xuthus were collected from the park of Hongshan Zoo in Nanjing, Jiangsu Province, People’s Republic of China.

DNA extraction, primer design, PCR amplification, cloning and sequencing

After the brief examination of the external morphology for the identification of the species, P. xuthus, the midgut and wings were removed. Total genomic DNA was extracted from the thorax of adult specimens using a simple proteinase K/SDS method. Scissored tissues were re-suspended in 400 μl 0.01 mol/l Tris (pH 8.0), 0.1 mol/l EDTA (pH 8.0), 0.05 mol/l NaCl, 1% SDS, 5 μl Proteinase K and incubated at 50°C for 8–10 h. The digested samples were phenol-extracted, ethanol-precipitated again before they were diluted in 30 μl ddH2O, pH 8.0. DNA quality was checked on a 1% agarose/Tris–borate–EDTA gel. All DNA samples were stored at −20°C. to be used as a template for subsequent PCR reactions.

There are some partial sequences of cox1, cox2, nad1, rrnS, rrnL, nad5 and cob genes published for P. xuthus [2830]. Basing on known sequences, we designed three pairs of primers (c1-F & c1-R, cb-F & cb-R, 12s-F & 12s-R) (Fig. 1) and amplified three short fragements of cox1, cob and rrnS. Based on above sequenced fragments, five new pairs of primers (c1-c2-F & c1-c2-R, c2-nd5-F & c2-nd5-R, nd5-cb-F & nd5-cb-R, cb-12s-F& cb-12s-R, 12s-c1-F & 12s-c1-R) (Table 1) were designed to amplify the P. xuthus mitogenome using the standard Takara LA TATM protocol (Takara) and the following Long-PCR conditions: an initial denaturation for 4 min at 94°C, followed by 15 cycles of 40 s at 94°C and 2 min 30 s at 58°C then 15 cycles of 40 s at 94°C and 2 min 30 s with 5 s added per cycle at 58°C, at last, a subsequent 10 min final extension at 72°C. Longer PCR products were sequenced by the primer walking strategy with internal primers using ABIPRISM 310 sequencer.

Fig. 1
figure 1

Summary of the sequencing strategy. Horizontal lines indicate the fragments which were amplified and cloned. Primers above each line were designed on the sequence of the strand for which the majority of genes were transcribed; primers below each line were designed on the opposite strand

Table 1 List of PCR primers used in this study

Because of the special structure PolyA/T located in nad4, we designed one more pair of primers (n5-n4-F & n5-n4-R). The fragment was sequenced after cloning using the TA cloning method, using the plasmid vector pUC19, by cloning kit (Takara) following the manufacturer’s protocol.

Sequence analysis

Genes were identified from nucleic acid or the derived protein sequences by BLAST [31], the National Centre for Biotechnology Information (NCBI). The 22 tRNA genes were identified using the software tRNA Scan-SE 1.21 (http://lowelab.ucsc.edu/tRNAscan-SE) and their clover-leaf secondary structure and anticodon sequences were identified using DNASIS (Ver.2.5, Hitachi Software Engineering). The nearly complete mitogenome sequence was submitted to GenBank.

Phylogenetic analyses

Among available insect mitogenomes including P. xuthus, 20 ones were included for phylogenetic analysis in this paper. Others were excluded due to unusual sequence evolution reported previously [32] (e.g., phthirapteran Heterodoxus macropus belonging to hemipteroid assemblage [33] and the hymenopteran Apis mellifera [34], or because of high gene rearrangements (e.g., species of hemipteroid assemblage, the thysanopteran Thrips imaginis [35] and the psocopteran Lepidopsocis sp. [33].

The nad2 gene was not entirely sequenced in this study. Except for nad2, all protein-coding genes (PCGs) of insect mitogenomes chosen were used for the phylogenetic analyses. All nucleotide sequences of each PCG were retro-aligned using the RevTrans 1.4 sever available through the DTUCBS website [36]. Positions encoding proteins were translated to amino acids using MEGA 3.0 [37] for confirmation of alignment. Unalignable regions were excluded manually from the phylogenetic analyses. The 5′ and 3′ unalignable ends of the PCGs were trimmed from the alignments to have the same length.

Phylogenetic analyses were performed on the complete amino acid sequences of 12 PCGs using bayesian approach as implemented in MrBayes,ver. 3.1 [38, 39]. Two species, Penaeus monodon and Pagurus longicarpus (Crustacea), were set as outgroups in all analyses in this paper [40, 41]. Selection of suitable nucleotide substitution models for Bayesian analyses was guided by results of hierarchical likelihood ratio tests calculated using Modeltest 3.06 [42] in PAUP* version 4.0b10 [43]. Bayesian inference was done with MrBayes 3.0b4 software [38] using the GTR + I + G model suggested by Modeltest. Bayesian analyses were launched with random starting trees and run for 4 × 106 generations, sampling the Markov chains at intervals of 100 generations. Four chains were run simultaneously, three hot and one cold, with the initial 200 cycles discarded as burn-in. To determine whether the Bayesian analyses had reached stationarity, likelihoods of sample points were plotted against generation time. Sample points generated before reaching stationarity were discarded as “burnin” samples. After the removal of burnin, a majority-rule consensus topology of all trees was constructed, with the percentage of trees where each node was found expressed on the tree as posterior probabilities (Table 2).

Table 2 Sequences used in this study

Results and discussion

Gene content and genome organization

The nearly complete mitogenome of P. xuthus is 13,964 bp (GenBank accession Number: EF621724), including the standard 12 protein-coding genes, 19 tRNA genes, whole rrnL gene and partial rrnS gene sequence (Table 3). The gene order is identical to that of Drosophila yakuba [44] (Fig. 2), which is conserved in divergent insect orders and even some crustaceans [40, 45, 46], and reflects the presumed ancestral condition for Pancrustacea. Some of the genes overlap as those in other animal mtDNAs. In P. xuthus, overlaps between genes occur 13 times and involve a total 72 bp. In the sequence, nucleotide composition of each major-coding strand was shown in Table 4.

Table 3 Organization of the P. xuthus mitochondrial genome
Fig. 2
figure 2

Compared with the gene arrangement of Drosophila yakuba and P. xuthus, protein and rRNA genes are transcribed from left-to-right except genes indicated by underbars, tRNA genes are designated by single-letter amino acid codes, those encoded by the J and N strands are show above and below the gene map. UNK, (A + T) rich region

Table 4 Nucleotide composition (%) of selected Lepidoptera mitochondrial genomes

Compared with other mitogenome of Lepidoptera [47, 48], A, T, or A + T compositions of this species are slightly lower than other insects. The reason might be that we did not get the sequence of the control region which known as A + T-rich region. Actually the primers (12s-c1-F & 12s-c1-R) were used to amplify the fragment from rrnS to cox1 gene (Fig. 1), which in most insect mtDNA is inclusive of the A + T-rich region. Unfortunately, we were unable to sequence the amplified fragment. The method of TA cloning has also been tried to sequence this fragment, but failed. So finally, we only successfully sequenced the partial nad2 sequence. We guess that the failure in sequencing the A + T-rich region was possibly because of its tandem-repetitive nature and relatively large size.

Gene initiation and termination

The 11 PCGs are observed to have a putative, inframe ATR methionine or ATT isoleucine codons as start signals, which are the triplets that usually initiate metazoan mitochondrial genes. Canonical initiation codons (ATA or ATG), encoding the amino acid methionine, are used in 10 PCGs (atp8, nad5, cox2, atp6, cox3, nad4, nad4L, nad6, cob and nad1) except nad3 (Table 3), which appears to use the nonstandard start codon as it often happens in animal mtDNAs [21]. The use of ATG as a start codon is limited to the protein genes encoded immediately downstream of either another protein gene (cox1cox2, atp8atp6, atp6cox3, nad6cob, nad4Lnad4)—and in all such cases, the downstream protein gene has an ATG start codon—or a tRNA gene encoded on the opposite strand (trnTnad4L) [49].

In cox1, however, a typical ATN initiator for PCGs is not found in the start site for cox1 or the neighbouring tRNATyr. None of the triplets known to act as initiation codons could be found in the vicinity of the supposed cox1 initiation: The amino acid sequence at the beginning of cox1 is well conserved in all arthropods, and the possible initiation signal is likely to be found in an area of four to five triplets [50]. In this location, no ATAA signal, proposed to intiate cox1 in Drosophila and Locusta, was present. A hexa-nucleotide ATTACG flanks the beginning of cox1, and was followed by a CGA triplet in P. xuthus. Similarly, the hexa-nucleotide ATTTAA has been proposed in a collembolan Tetrodontophora bielanensis [50]. All lepidopteran species examined to date use R (coded by CGA) as the initial amino acid for cox1 and the use of non-canonical start codon for this gene is common across insects [51, 52]. In the current situation where no mRNA expression data for P. xuthus are available, we tentatively designated the hexanucleotide ATTACG as an initiation codon for P. xuthus cox1. It is unclear why the sequence of cox1 is usually the most conserved among metazoan mitochondrial genes. A mechanism, which would permit translation to start at this sequence, was suggested, without any experimental documents, that the anticodon of the initiating N-formylmethionine tRNA might permit the ATTACG sequence to be recognized as a single codon [53].

Eight PCGs terminate with the complete termination codon TAA (cox1, cox2, atp8, atp6, cox3, nad5, nad6 and cob). In all other cases, stop codons are truncated (T or TA) and their functionality probably recovered after a post-transcriptional polyadenilation [54]. These abbreviated stop codons are found in PCGs (nad2, nad3 and nad4) that are followed by a downstream tRNA gene, suggesting that the secondary structure information of the tRNA genes could be responsible for the correct cleavage of the polycistronic transcript [55]. Thus, it is highly probable that, during mRNA processing the U is exposed at the end of a mRNA molecule and polyadenylated, forming the complete UAA termination signal. tRNA genes are usually interspersed among PCGs, which secondary structure acting as a signal for the cleavage of the polycistronic primary transcript [54, 56]. However, there is also a direct junction between two PCGs (nad4L/nad4) where other cleavage signals, different from tRNA gene secondary structures, may be involved in the processing of the polycistronic primary transcript [57]. For the last three genes (cox1, cox2 and atp8), the complete termination codons are all within the next gene or tRNA, for the overlaps between them are more than three nucleotides.

Ribosomal RNAs and transfer RNAs

As all other metazoan mtDNAs sequenced, P. xuthus mtDNA contains genes for both small and large ribosomal subunit RNAs (rrnS and rrnL). Both genes are encoded by the heavy (H) strand and are separated by tRNAVal, which is identical to the arrangement in many other metazoans. The size of the inferred rrnL is 1,332 bp, and the partial rrnS is about 598 bp.

The series of 19 tRNAs typical of metazoan mitogenomes were found, and secondary structures were drawn for each one (Fig. 3). The 18 tRNAs showed typical clover-leaf secondary structures except for the tRNASer(AGN). The tRNASer(AGN) gene observed in P. xuthus could not form a stable stem loop structure in the DHU arm as shown in many other insect tRNASer (AGN)s. Despite of the unusual secondary structure (Fig. 3), the tRNA is still predicted to adopt an appropriate tertiary structure, based on the folding rules proposed by Steinberg and Cedergren [58].

Fig. 3
figure 3figure 3

Predicated secondary clover-leaf structure for the 19 tRNA genes of P. xuthus.The tRNAs are labelled with the abbreviations of their corresponding amino acids. Nucleotide sequences from 5′ to 3′ as indicated for tRNAAla. Dashes (–) indicate Watson–Crick basepairing, and plus sign (+) G–U base-pairing. Arms of tRNAs (clockwise from top) are the amino acid acceptor (AA) arm, TΨC (T) arm, the anticodon (AC) arm, and dihydrouridine (DHU or D) arm

In 10 cases, tRNA coding did show overlaps with the sideward tRNA or gene length ranging from 1 to 35 bp. The 8 bp overlapping was observed for tRNATrp/tRNACys, as reported by Lessinger et al. [59], producing separate transcripts with their opposite directions like other insect species [34, 44, 59, 60].

Phylogenetic analysis

A 22-taxon data set, 3,830 characters after removal of gap-experiencing sites, of which 2,539 were variable and 1,989 were parsimony-informative, was analyzed using Bayesian method. The topology of the Bayesian tree is shown in Fig. 4.

Fig. 4
figure 4

Bayesian tree based on amino acid sequences of the 12 protein-coding genes. Number at nodes indicate posterior probabilities

The most striking result of this analysis is that Insecta (Microcoryphia + Zyentoma + Pterygota), Microcoryphia, Zyentoma, Pterygota, Diptera, Lepidoptera and Coleoptera are monophyletic. Within Pterygota, the Dictyoptera and the holometabolan orders Diptera, Lepidoptera and Coleoptera are monophyletic, Homometabola per se are not, because the Coleoptera (Pyrocoelia, Tribolium and Crioceris) is sister to the Hemiptera (Philaenus) of the Hemimetabola. These results confirm most recent molecular analyses [41, 61] but which the Holometabola do not form a monophyletic clade is in open disagreement with most morphological [62, 63] and molecular [64] analyses.

The seven lepidopteran mitogenome sequences represent four lepidopteran superfamilies within the lepidopteran suborder, Ditrysia: B. mandarina and A. pernyi for the Bombycoidea, P. xuthus and C. raphaelis for the Papilionoidea, O. furnacalis and O. nubilalis for the Pyraloidea, and A. honmai for the Tortricidea. This phylogenetic analysis led to well supported monophyletic groups, Bombycoidea, Papilionoidea, Pyraloidea, and Obtectomera. This result illuminated traditional classification system very well [65, 66]. Although further studies are needed for more diverse species, the result supports the relationship of (Apoditrysia (Obtectomera (Macro-lepidoptera))) [67, 68].