Introduction

The animal mitochondria gene content is highly conserved, which usually encodes 13 proteins, 2 ribosomal RNAs (rRNAs), and 22 transfer RNAs (tRNAs). The gene order is also highly conserved among most vertebrates [1]. However, gene loss and gene rearrangement have been found in some taxa [2]. Comparisons of mitochondrial systems are useful for modeling genome evolution and phylogenetic inference [3]. They include gene content and gene arrangement, base composition, modes of replication and transcription, protein, tRNA and rRNA gene secondary structures, and genetic codon variations. These features are currently more accessible for study in the much smaller and simpler mitochondrial genomes [3, 4]. During the last 10 years, mitochondrial genome sequence and gene arrangement comparisons were employed as powerful new tools for resolving ancient phylogenetic relationships [2, 3, 57].

According to the traditional classification based on morphology, Crocodylia is divided into three major groups: the Family Alligatoridae (Alligator, Caiman, Melanosuchus, and Paleosuchus), the Family Crocodylidae (Crocodylus, Osteolaemus, Tomistoma), and the Family Gavialidae, which includes only one species, Gavialis gangeticus [8]. Crocodylus siamensis belongs to genus Crocodylus, Family Crocodylidae. As a freshwater crocodilian (a group that also includes alligators, caimans and the gharial), Crocodylus siamensis is one of the most endangered crocodiles in the wild, which is widely bred in captivity [9]. At present there are seven crocodilian species with their mitochondrial genomes having been completely sequenced [6, 7, 1013]. But the data is not enough to explain the phylogenetic relationships of crocodilians. So we sequenced the complete sequence of mitochondrial genome of Crocodylus siamensis and analyzed its gene organization.

Materials and methods

Samples and sequencing

Crocodylus siamensis samples were obtained from the specimen storeroom of College of Life Sciences in Anhui Normal University. The fresh blood was stored at −80°C.

Mitochondrial DNA was extracted following the procedure described in Arnason et al. [14]. Extracted DNA was diluted 10 times and stored at −20°C until use as a template for polymerase chain reaction (PCR). The primers for PCR and sequencing were designed based on the complete mtDNA sequences of Alligator mississippiensis (The GenBank accession number is Y13113), Alligator sinensis (The GenBank accession number is AF511507) and Caiman crocodilus (The GenBank accession number is AJ404872). By using ClustalX 1.8 [15] we designed 66 primers (Table 1) for the PCR amplification (synthesized by Shanghai Sangon Biotechnology Co., Ltd., Shanghai, China). The length of amplified products is from 900 bp to 1200 bp. All overlaps between the products were confirmed by direct sequencing of PCR products amplified from the purified mtDNA template. The whole mitochondrial genome was read at least two times.

Table 1 Sequencing primers used in the analysis of Crocodylus niloticus and Crocodylus siamensis mitochondrial genomes

PCR reactions were conducted on a PTC-200 thermal cycler with the following conditions: an initial denaturation step of 95°C (3 min) followed by 34 cycles of 95°C (30 s), 49–60°C (30 s), and 72°C (90 s) followed by 72°C for 10 min. The PCR products were separated by electrophoresis in 1% agarose gels, then were puried by PCR Cleanup Kit (V-gene, Biotechnology Limited, Hangzhou, China) and sequenced with ABI 3730 (Shanghai Sangon Biotechnology Co., Ltd., Shanghai, China). At last we got the complete mtDNA sequences of Crocodylus siamensis. The determined nucleotide sequences are deposited in the DDBJ/EMBL/GenBank nucleotide sequence databases with the accession number DQ353946.

Gene identification and sequence analysis

With the analysis of DNAStar (Version 5.01) and Sequin (Version 5.35), 12 Heavy-strand encoded protein-coding genes and one Light-strand encoded protein-coding gene were obtained. By the software of tRNA Scan-SE1.21 (http://lowelab.ucsc.edu/tRNA Scan-SE), 21 transfer RNA (tRNA) genes were found. The sequences were aligned by using ClustalX 1.8 [15], and then were corrected by using DNASIS 3.5 (Hitachi). After checking the position of the tRNA-Ser (AGC) and rRNA genes, the positions of the rest genes were revised and determined.

Results and discussion

Characteristics of the crocodilian mitochondrial genomes

The total length of the mtDNA molecular is 16836 bp.

31 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and one control region were identified in the mitochondrial genome of Crocodylus siamensis (Fig. 1).

Fig. 1
figure 1

Gene organizations of vertebrate mtDNAs. The organizations found in Crocodylus siamensis were shown in Table 2. Genes encoded by the heavy strand are shown outside the circle, whereas those encoded by the light strand are shown inside the circle. Gene abbreviations used are 12S, 12S rRNA; 16S, 16S rRNA; ND1-6, NADH dehydrogenase subunits 1–6; COI–III, cytochrome oxidase subunits I–III; AT6 and AT8, ATPase subunits 6 and 8; cyt b, cytochrome b; and one-letter codes of amino acids, tRNA genes specifying them (L1 and L2 for leucine tRNA genes specifying, respectively, UUR and CUN codons and S1 and S2 for serine tRNA genes specifying, respectively, AGY and UCN codons). OH and OL stand for the heavy-strand replication origin and the light-strand replication origin, respectively

Table 2 Locations of genes and noncoding regions in the mitochondrial genome of Crocodylus siamensi s

The overall base composition of the H-strand is A: 32.01% (5389 bp); T: 24.76% (4168 bp); C: 28.47% (4793 bp); G: 14.77% (2486 bp). The relative order of nucleotide composition is A > C > T > G, with A being the most represented base (32.12%) and the least of G (14.74%). Similar to cases reported in other vertebrate mitochondrial genomes [16, 17], the base compositions of mtDNAs were skewed, with more A–T base pairs (56.77%) than G–C (43.23%) base pairs. The skew of AT is 0.13, which is higher than that of GC (−0.32). As other vertebrate mtDNAs, overlap and noncoding bases in some genes were also be observed. In Crocodylus siamensis, the total size of the overlap of six times is 71 bp. Of the 71 bp, the ATPase 8 gene and ATPase 6 gene shared the 22 bp overlap, and NADH5 and NADH6 overlap by 44 bp. The total length of noncoding spacer nucleotides is 222 bp. The longest noncoding region of 56 bp, is located between Cyt b and tRNA-Thr.

Protein-coding genes

The 13 protein-coding genes in Crocodylus siamensis are 11380 bp. The longest one is NADH5 (1854 bp) gene, whereas the shortest one is ATPase 8 (162 bp) gene.

In Crocodylus siamensis mtDNA, ATG is the translation initiation codon in 7 of the 13 protein-coding genes. The start codon ATA occurs in the NADH1, NADH3, NADH5, while COX I and NADH4L begin with the nonstandard start codons GTG and ACC, respectively. In other crocodilians sequenced, the nonstandard start codons (ACC, AGC, ACT, ACA, ATC) occur in the NADH4L, which is a probable result of highly evolving rate. Standard termination codon TAA occur in 10 genes, however, AGG is found in NADH1. Incomplete termination codon T occurs in COX III and Cyt b.

tRNA and rRNA gene

Sequences of all 22 tRNA genes can be folded into a canonical cloverleaf secondary structure with the exception of tRNA-Ser (AGY), which loses “DHU” arm. These tRNAs range in size from 61 to 76 nucleotides. Figure 2 shows all the secondary structures of 22 tRNAs.

Fig. 2
figure 2figure 2

Inferred secondary structure of tRNA in Crocodylus siamensis

Noncoding sequences

The control region of mtDNA in Crocodylus siamensis is 1108 bp, which locates between the tRNA-Phe and 12S rRNA genes (Fig. 1). The general structure and conserved sequences of the crocodilian mitochondrial control region were comprehensively analyzed by Ray and Densmore [18]. In crocodilians, domain II is the most conserved part of D-loop and contains several conserved sequence boxes characterize in other vertebrates; domain III is the most variable part of D-loop and contains several interesting sequence motifs, including tandemly repeated sequences and a long poly-A (Crocodilydae) or poly-C (Alligatoridae); domain I tending to be shorter than the same region in mammals and birds, contains sequences similar in structure to both the goose-hairpin and termination associated sequences (TAS), so it is more conservative than domain III.

The origin of L-strand replication (OLR), which usually locates in a cluster of five tRNA genes: tRNATrp-tRNAAla-tRNAAsn-tRNACys-tRNATyr (WANCY) in vertebrates, lacks in the mtDNA of Crocodylus siamensis. Thus, the absence of OLR appears to be a common characteristic of all crocodilians. The OLR region between Asn-tRNA and Cys-tRNA, which has the potential to fold into a stable stem-loop secondary structure, only has “AATATT” 6 bp in Crocodylus siamensis. The absence of OLR is probable the result of rearrangement of tRNA.

The control region of Crocodylus siamensis mtDNA is similar with the others vertebrates, which is a most variable part of the mitochondrial genome. But this variable part still has some typical characteristic, for instance, microsatellite sequences and repetitive sequences (5′-CAACCTAGGCCAAAATAGGAAGAAATTTTAAAAAATTTT-3′, 39 bp repeat four times). These features, which found in the control region are the origin of heavy strand (H-strand) replication and the promoters for both heavy and light strand (L-strand) transcription [1921].