Introduction

The vertebrate mitochondrial DNA (mtDNA) is a circular molecule with a length of 16–19 kb that contains 37 genes encoding 22 transfer RNAs (tRNAs), 13 proteins, 2 ribosomal RNAs (rRNAs) and a putative control region (CR) [25, 54]. Mitochondrial DNA exhibits several properties that make it a useful tool in the study of phylogenetics, molecular evolution and even conservation genetics, due to its relatively simple genetic structure, maternal mode of inheritance (in most situations), and high rate of evolution [11, 15, 18, 36].

The bighead croaker, Collichthys niveatus is distributed in the Western Pacific, Yellow Sea and East China Sea, and inhabits in the costal waters of subtropical sea with the bottom composed of sand or muddy sand [8]. It is a very popular fish for Chinese consumers, and is one of the most important commercial fishes in China. Thus, intensive studies have been carried out in the areas of feeding habits, category composition and abundance of ichthyoplankton, and early life history [49, 52]. However, almost no molecular studies have so far been conducted on this species, and no one has yet analyzed its mitochondrial genome.

The C. niveatus, belongs to the family Sciaenidae, which comprises 70 genera and about 270 species [27], fish from this family being popularly known as croakers or drums due to the sound they produce using muscles associated with the gas bladder. Numerous studies of morphological and molecular phylogenetic relationships among Sciaenidae species have been conducted. Based on the characters of gas bladder, sagitta and mental pores, Zhu et al. [59] divided the family into seven subfamilies: Johniinae, Megalonibinae, Bahabinae, Sciaeninae, Otolithinae, Argyrosominae, Pseudosciaeniae. Johnius belengerii was placed as a basal species of Sciaenidae. Pseudosciaeniae was composed of Miichthys, Larimichthys (Pseudosciaenidea) and Collichthys, and a close relative to Argyrosominae. Collichthys and Larimichthys were also supported by many other morphological studies [20, 42, 43].

Compared with morphological studies, the molecular phylogenies have given different results. Tong et al. [46] investigated the phylogenetic relationships of 10 Sciaenidae fishes by using partial 16S rRNA gene sequence. Molecular phylogenetic analysis revealed a non-monophyletic Larimichthys, where L. polyactis was grouped with Collichthys and then grouped with L. crocea. Chen [4] also utilized 16S rRNA characters to resolve a non-monophyletic Collichthys. However, in a phylogeny proposed by Meng et al. [23] based on the 16S rRNA gene sequences, monophyletic Larimichthys and Collichthys were supported well.

Because of instability of the phylogenetic position of C. niveatus based on these limited data, the aim of this study was to obtain new sequence data of C. niveatus for uses in phylogenetic studies of Sciaenidae and population genetic structures. We present the first complete mitochondrial genome of the C. niveatus. The mitogenome sequence was described and compared with its coordinal species. Finally, we performed NJ, ME, and ML analyses in order to gain insights into the position of C. niveatus within the Sciaenidae by adoption of the three datasets (COI, 16S and Cytb).

Materials and methods

Fish sampling and DNA extraction

The specimens of C. niveatus were captured at the Zhoushan fishing ground (Zhejiang, China). A dorsal fin of an individual was partially excised. Genomic DNA was extracted using the conventional SDS\proteinae K method followed by organic extraction and ethanol precipitation [37].

PCR amplification

13 sets of primers were used to amplify contiguous, overlapping segments of the complete mitochondrial genome of C. niveatus (Table 1). The 50 μl PCR mixture contained 0.2 μM of each primers, 5.0 μl of 10 × Taq Plus polymerase buffer, 0.2 mM dNTPs, 2 unit of Taq Plus DNA polymerase with proof-reading characteristic (TIANGEN), and 1 μl of DNA template. PCR was performed on a PTC-200. The conditions of the PCR are as follows: predenaturalization at 94°C for 4 min; 35 cycles of denaturation at 94°C for 50 s, annealing at 60°C for 60 s, extension at 72°C for 2–3 min; and final extension at 72°C for 10 min. The PCR products were electrophoresed on a 1% agarose gel to check integrity and visualized by the Molecular Imager Gel Doc XR system (BioRad, USA). The PCR products were purified using a QIAEX II Gel Extraction Kit (Qiagen).

Table 1 PCR primers in the analysis of bighead crocaker C. niveatus

Cloning, sequencing and sequence analysis

The purified fragments were ligated into PMD18-T vectors (Takara) and transformed to TOP10 cells (TIANGEN) according to the standard protocol. Positive clones were screened via PCR with M13+/− primers. Amplicons were sequenced using the ABI 3730 automated sequencer with M13+/− primers. The obtained sequence fragments were edited in Sequencher™ (Gene Code, Ann Arbor, MI, USA) for a contig assembly to make the complete mitochondrial genome.

Annotation of protein-coding and ribosomal RNA (rRNA) genes and determination of their gene boundaries were carried out with reference sequences of Sciaenidae available in GenBank. Most tRNA genes and their secondary cloverleaf structure were identified in tRNAscan-SE1.21 [22]. The remaining tRNA genes, which could not be found by tRNAscan-SE were identified by sequence homology, secondary structure and specific anti-codon. Nucleotide base frequencies and codon usage of protein-coding genes were determined using MEGA 4 [41]. The complete mitochondrial genome (mitogenome) sequence of C. niveatus was deposited in the public database GenBank under accession number HM219223.

Phylogenetic analysis

To determine the phylogenetic position of C. niveatus, three data sets (COI, 16S and Cytb) were collected from the sequences available in GenBank/EMBL/DDBJ (Table 2). For each data set, the nucleotide sequences were aligned using the program Cluster X with the default settings [44], and then the sequences were checked and adjusted manually. Phylogenetic trees were estimated for each data set using NJ and ME methods as implemented in MEGA 4. The model GTR + I+G was selected for ML analyses by ModelTest 3.7 [34]. ML analyses were conducted using PhyML 3.0 [9]. Robustness of the inferred trees was evaluated using bootstrap analysis on 1000 replications [7].

Table 2 List of species used in this study, with GenBank/EMBL/DDBJ accession numbers

Results

Genomic organization

The total length of the C. niveatus mitogenome is 16469 bp. The mitogenome content (13 protein-coding genes, 2 rRNAs, 22 tRNAs), gene order and gene coding strand of the bighead croaker mitogenome conform to the vertebrate consensus (see Fig. 1 in Supplementary Material; Table 3) [48, 57]. The overall base composition is: T, 25.0%; C, 31.3%; A, 27.6; G, 16.2% (Table 4). The A + T content is slighter than the G + C content, which is similar to other fishes [1, 47]. An anti-G bias (8.5%) was observed in the third codon position of the protein-coding genes as reported in other vertebrate mitogenomes [17, 50].

Table 3 Organization of the mitochondrial genome of C. niveatus
Table 4 Base composition of C. niveatus mitochondrial genome

Protein-coding genes

The size of the protein-coding genes in the C. niveatus mitogenome is similar to the orthologs in the other fishes. Among the 13 protein-coding genes, two reading frame overlaps occurred on the same strand (Table 3): ATPase8 and ATPase6 overlapped by ten nucleotides, and ND4L and ND4 overlapped by seven nucleotides. This is a common vertebrate feature and has been found in other bony fishes [24]. At the second codon position, pyrimidines (T + C=67.8%) are overrepresented in comparison with purines, owing to hydrophobic character of the proteins [26]. All protein-coding genes start with ATG codon. Variance in the stop codons seems to be a common tendency in the fish mitochondrial genome [14, 53]. Open reading frames of the bighead croaker end with TAA (ATPase8, ATPase6, ND4L, ND5 and ND6), TAG (ND1) and AGA (COI) or incomplete stop codons, either TA (ND2, COIII) or T (COII, ND3, ND4, and Cytb). This condition is common among vertebrate mitochondrial genome, and it appears that TAA stop codons are created via posttranscriptional polyadenylation [30].

Codons in 13 protein-coding genes identified in C. niveatus are shown in Table 5. For amino acids with fourfold degenerate third position, codons ending in C are mostly seen, followed by codons ending in A and T for alanie, proline, serine and threonine. However, for arginine, glycine, leucine and valine, A is more frequent than C. Among twofold degenerate codons, C appears to be used more than T in pyrimidine codon family, whereas purine codon families end mostly with A. Except for arginine, G is the least common third position nucleotide in all codon families. All these features are very similar to those observed in vertebrates [19, 51].

Table 5 Condon usage in C. niveatus mitochondrial protein-coding genes

Non-coding regions

The origin of the light strand replication (OL) of C. niveatus is located in a cluster of five tRNA genes (WANCY) as in other vertebrates [29, 55]. This region is 57 bp long and has potential to fold into a stem-loop secondary structure. The folding of the OL requires eight nucleotides of tRNA-Asn and 12 nucleotides of tRNA-Cys including the conserved sequence (5′-GCCGG-3′) at the base of the stem-loop structure.

The mitogenome of C. niveatus contains a non-coding region up to 799 bp (Table 3). The CR has a highter A + T content (63.1%) than the average value of the whole genome (52.6%) of C. niveatus, a feature that has been reported in all Percoidei fishes. Structurally, the CR is divided into three domains, including the termination associated sequence domain, the central conserved sequence block domain and the conserved sequence block domain [38, 39]. By comparing with the recognition sites in some reported Percoidei species, the conserved blocks ETAS and CSB-1, -2, and -3 can be easily identified in the control regions of Collichthys species, no CSB-F, -E and -D were detected (Fig. 1). The sequences corresponding to CSB-F, CSB-E and CSB-D could not be aligned with the respective sequences in the control region of Miichthys miiuy (HM447240) (see Fig. 2 in Supplementary Material) and Cynoscion acoupa [35] (see Fig. 3 in Supplementary Material). The lack of the typical sequences of the central conserved sequence block was also confirmed in L. crocea and L. polyactis [6]. In the termination associated sequence domain, an ETAS was identified, the sequence of ETAS is TATATATATGTATTATCAAC ATACAATTATATTA ACCAT, whose motif sequence is TATAT with one palindormic sequence ATGTA. Moreover, another motif (TACAT) was detected at the downstream of ETAS. Although the central conserved sequence block is absence, however, C. niveatus dose contain a GTGGGG box which is a typical feature of CSB-E in teleosts. In the conserved sequence block, CSB-1, CSB-2 and CSB-3 of C. niveatus were found at the 3’-end of the control region, whose sequences are ATTTTAAGTATTCAAGTGCATAA, TAGACCCCCCCCTACCCCCCCC and TAAAACCCCATAAAACA, respectively (Fig. 1). CSB-1 is at the start of the conserved sequence block, and is relatively less conserved than CSB-2 and CSB-3.

Fig. 1
figure 1

Alignment of complete sequences of the mtDNA control regions of C. niveatus, C. lucida, L. crocea and L. polyactis (represented by CN, CL, LC, and LP, respectively). The ETAS, CSB-1, CSB-2, and CSB-3 are shadowed and marked

Transfer and ribosomal RNA genes

The mitochondrial genome of C. niveatus encodes 22 tRNA genes, ranging from 67 to 75 bp, which can be fold into the typical clover-leaf secondary structures with several mismatch pairs (see Fig. 4 in Supplementary Material). Of these tRNAs, we identified two forms of tRNA-Leu (UUR and CUN) and tRNA-Ser (UCN and AGY) (see Fig. 1 in Supplementary Material; Table 3). The three tRNA clusters (IQM, WANCY, and HSL) are well conserved in C. niveatus as those of typical vertebrate mitogenomes.

Although putative gene boundaries for the two rRNA genes in the mitogenome have been found, these cannot be accurately determined until transcript mapping is carried out. As in other vertebrate mitogenomes, these genes are located between tRNA-Phe and tRNA-Val and between tRNA-Val and tRNA-Leu(UUR). Preliminary assessment of their secondary structure indicated that the sequence could be reasonably superimposed on the proposed secondary structure of carp 12S RNA and loch 16S rRNA, respectively. The lengths of 12S rRNA gene and 16S rRNA gene are 949 and 1698 bp, respectively. The base composition of the two rRNAs gene sequence is T: 26.2%, C: 22.6%, A: 28.1%, G: 23.1%. The overall A + T contents of ribosomal RNAs being 54.3%, which is slightly A + T rich than other bony fishes [21].

Phylogenetic analysis of Sciaenidae and the position of C. niveatu

Phylogenetic analysis of 16S rRNA using NJ, ME, and ML methods revealed three distinct clades (Fig. 2). Clade I included Johinus (Johinus dussumieri and Johinus elongatus) distant from the other Sciaenidae fishes. Clade II included Otolithes (Otolithes ruber and Otolithes cuvieri), which was more closely related to Collichthys, Larimichthys and Miichthys were placed in clade III. Although Johinus belangrii and Johinus borneensis were not formed into a clade in the analyses based on COI gene, however, within Sciaenidae, Johinus was strongly supported as the most basal genus in this study and Otolithes was also more closely related to Collichthys, Larimichthys and Miichthys than Johinus with high bootstrap value (Fig. 3).

Fig. 2
figure 2

Phylogenetic tree of the Sciaenidae based on partial 16S rRNA gene. Branch lengths and topology are from the Minimum Evolution analysis. Numbers above branches specify bootstrap percentages for ME (1000 replications), NJ (1000 replications), and ML (1000 replications) analyses. Heterodontus francisci and Lampetra fluviatilis, belonging to the order Heterodontiformes and Petromyzoniformes, respectively, were used as the outgroup taxons

Fig. 3
figure 3

Phylogenetic tree of the Sciaenidae based on partial COI gene. Branch lengths and topology are from the Neighbor Joining analysis. Numbers above branches specify bootstrap percentages for NJ (1000 replications), ME (1000 replications), and ML (1000 replications) analyses. H. francisci and L. fluviatilis were used as outgroups. Bootstrap values are given for each branch

According to the morphological characters, Collichthys, Larimichthys and Miichthys were grouped as an independent subfamily Pseudosciaeniae; however, the monophyly of Pseudosciaeniae was not supported by this study. M. miiuy was placed in a clade grouping Nibea maculata, Chrysochir aureus, Protonibea diacanthus and Pennahia anea while Otolithoides biauritus was grouped with Collichthys and Larimichthys (Fig. 3). Analyses based on 16S rRNA gene also showed a non-monophyletic Pseudosciaeniae, where M. miiuy formed an independent clade (Fig. 2). In contrast to these findings from COI and 16S rRNA analyses, the monophyly of Pseudosciaeniae was supported in the analyses based on partial Cytb gene, however, the bootstrap value is very poor (Fig. 4).

Fig. 4
figure 4

Phylogenetic tree of the Sciaenidae based on partial Cytb gene. Branch lengths and topology are from the Neighbor Joining analysis. Numbers above branches specify bootstrap percentages for NJ (1000 replications), ME (1000 replications), and ML (1000 replications) analyses. H. francisci were used as an outgroup taxon

Relationships of taxa of Collichthys and Larimichthys derived from NJ, ME, and ML analyses of the COI, 16S rRNA and Cytb sequences were identical. C. niveatus is found to be most closely related to L. polyactis, the two species were grouped with L. crocea and then grouped with C. lucida. This result was against with morphological affiliations, Larimichthys and Collichthys were not supported.

Discussion

General features of mitogenomes of C. niveatus and its coordinal species

The complete mitochondrial genome of C. niveatus is the third to be reported for a member of the family Sciaenidae. The mitochondrial genome is 16469 bp in length, and consisted of 37 genes (13 protein-coding genes, 2 rRNAs and 22 tRNAs), which is nearly identical to L. crocea and L. polyactis, and longer than Collichthys lucida but shorter than M. miiuy (Table 6). This length variation of mitogenomes in these species is largely due to the number and size of non-coding spacer and length of main non-coding regions. The mitochondrial genome of C. niveatus has an overall 52.6% A + T content, identical to the value in L. polyactis (GU586227), but lower than L. crocea and C. lucida [6]; NC_014350). As in other vertebrates, most of genes of C. niveatus are encoded on the H-strand, with only the ND6, ND8 and eight tRNA genes encoded on the L-strand. In addition, all genes are nearly identical to those of other Sciaenidae species in length (Table 3). The length of protein-coding region in the mitochondrial genomes of C. niveatus and its coordinal species is also nearly identical with only several base differences (Table 6), and the presence of incomplete stop codon resulted in these variations. All the initiation codons have been identified as ATG in these species and some of the stop codons are incomplete, with TA or T. Such incomplete stop codons are common among fish mitogenomes. However, the stop codons of these species are TAG and AGA in ND1 and COI respectively, while the corresponding ones in other fishes are often TAA [33, 40]. The mitochondrial genome of C. niveatus contains 22 tRNA genes, which are interspersed between the rRNA and protein-coding genes (Table 3). Length of tRNA genes in these species all varied from 67 bp (tRNACys and tRNASer(AGY)) to 75 bp (tRNALys). All tRNA genes were predicted to have the typical cloverleaf structures except the tRNASer(AGY) showing the deviated secondary structure [3, 12, 13]. They harbor identical anticodons used in other vertebrates, and also conserved aminoacyl, DHU (dihydrouridine), anticodon and TΨC (thymidine-pseudouridine-cytidine) stems (see Fig. 4 in Supplementary Material). The tRNA-Ser(AGY) found in the C. niveatus mitogenome had no discernible DHU stem, similar to that shown in the lamprey [19], bichir [28], and rock bream [29]. The 12S and 16S rRNA genes of C. niveatus, C. lucida, L. crocea, L. polyactis, and M. miiuy are 949 bp/1698 bp, 947 bp/1696 bp, 947 bp/1693 bp, 950 bp/1697 bp, and 946 bp/1693 bp, respectively. The 12S rRNA genes in these species are all similar in size to its counterparts in Etheostoma radiosum (AY341348), Chaetodon auripes (AP006004), and Lutjanus rivulatus (AP006000). However, the 16S rRNA genes of C. niveatus, and its coordinal species are much shorter than those of Chaetodontoplus septentrionalis (AP006007) and Centropyge loriculus (AP006006), members of Pomacanthidae. All these molecular features from our study revealed marked similarities among these species.

Table 6 Mitochondrial genomes in Sciaenidae reported

The absence of the typical central conserved sequence block in the control region of Collichthys

Mitochondrial control region included the promoters for both strands, the heavy strand replication origin, and the displacement region [5]. It is also a unique and highly variable area in the mitochondrial genome noted for its non-protein coding and a faster rate of evolution. Southern et al. [39] first recognized the conserved sequences CSB-B, CSB-C, CSB-D, CSB-E and CSB-F in the central conserved sequence block domain in mammals. However, only CSB-F, CSB-E and CSB-D could be identified in fishes [2, 56, 58]. CSB-F is the mark to differentiate the central conserved sequence block domain from the termination associated sequence domain. CSB-D is highly conserved in fish and may function in the regulation of H-strand replication and the initiation of the D-loop structure and perhaps be involved in mitochondrial metabolism [10, 32]. Although termination associated sequence domain accumulated base substitutions, insertions and deletions at a substantially higher rate than the central domain, however, absence of several conserved motifs can also be observed in fishes [16, 31]. Cui et al. [6] reported the lack of the central domains observed in L. crocea, where the conserved blocks TAS and CSB-1, CSB-2, and CSB-3 were easily recognized with no CSB-F, CSB-E, and CSB-D. The same phenomenon was further confirmed in its congeneric species L. polyactis. In this study, the lack of the central domain were also identified in Collichthys species, however, we found that the consensus sequences of CSB-F, CSB-E, and CSB-D with ATGTAGTA----GAGACCACC, AGGG-----GTGGGG, and TTAT-CT-GG-ATCTG-T-AA, respectively, typically present in bagridae [58] can be identified in M. miiuy (see Fig. 2 in Supplementary Material) and C. acoupa (see Fig. 3 in Supplementary Material), and also in other Percoidei fishes [33]. Such variation imply a rapid evolution of the structure in the control region, which may provide information for elucidating the evolutionary origin of Collichthys and Larimichthys within the family Sciaenidae and it cannot be ruled out that the control region of these two species may not be functional due to the lack of central domains.

Phylogenetic relationships of Sciaenidae and the position of C. niveatus

Phylogenetic trees obtained using different methods based on the same data set are nearly identical, even though they are somewhat difference in detail and bootstrap value. Based on our analyses, Johnius was found to be distantly related to other Sciaenidae fishes, which was consistent with previous studies using different phylogenetic methods and also agree with the traditional morphological classification. Chen [4] proposed several NJ trees based on 16S rRNA sequence data using different methods that resulted in an independent clade composed of Miichthys, Collichthys and Larimichthys, which supported Zhu et al. [59]. However, the bootstrap values for this topology were extremely poor. Cui et al. [6] employed partial 16S rRNA and Cytb genes to build a phylogeny of 11 Sciaenidae fishes, where the support for the clade grouping Miichthys, Collichthys and Larimichthys was also poor. Phylogenies proposed in this study based on partial COI and 16S rRNA genes respectively showed Miichthys cannot be merged into the Collichthys- Larimichthys clade, on the other hand, the phylogeny based on Cytb placed the Miichthys, Collichthys and Larimichthys together to form an independent clade, which was consistent with the previous studies, however, this clade appeared ambiguous for having poor bootstrap value, being 29/21/24 (Fig. 4). The monophyly of Pseudosciaeniae was not supported and the relationship between Miichthys and Collichthys- Larimichthys clade deserve to be further studied.

Collichthys and Larimichthys were genera of Sciaenidae according to morphological researches [59]. In recent years, there have been several phylogenetic studies based on molecular data [4, 23, 45, 46], however, the results revealed unstable phylogenies in which there were some disagreements in the limit of the relationships within the members of Collichthys and Larimichthys. All phylogenetic trees proposed by this study produced completely identical and well-supported Collichthys-Larimichthys clades (Figs. 2, 3, and 4). In these clades, C. niveatus is found to be most closely related to L. polyactis, the two species were grouped with L. crocea and then grouped with C. lucida. Collichthys and Larimichthys should be merged to one genus by our phylogenetic analyses.

Our results is against with the traditional classification and the proposed phylogenetic position of C. niveatus within the Sciaenidae and the relationships among Sciaenidae species based on the findings of the present study should be accepted with caution, complete understanding of Sciaenidae relationships awaits assembly of additional DNA sequence data (e.g. whole mitochondrial genomes and multiple nuclear loci) and toxon sampling, and corroborating morphological evidence.