Introduction

The study of mitochondrial DNA (mtDNA) has always been a hot field in molecular biology. With their rapid evolutionary rate, and lack of recombination, fragments of mtDNA have been extensively used for studies of genetic structure, phylogenetics, and phylogeography at various taxonomic levels. Since complete mtDNA sequence can uncover more information about gene rearrangement and other variation at the genome level for all phyla, there have been significant increases in the number of mtDNA sequences available in recent years [17].

Metazoan mtDNA is a closed-circular molecule, except in some hydrozoan cnidarians where it is one or two linear molecules [8, 9]. The gene content of metazoan mtDNA typically consists of genes for 2 ribosomal subunit RNAs (small- and large-rRNA, s-rRNA and l-rRNA), for 22 tRNAs, and for 13 protein subunits (cytochrome c oxidase subunits I–III, COI-III, cytochrome b (Cytb), ATP synthase subunits 6 and 8 (ATP6 and ATP8), and NADH dehydrogenase subunits 1–6 and 4L (NDl-6, 4L). Metazoan mtDNAs range in size from 14 to 42 kb [1012]. Typically, there are few intergenic nucleotides except for a single large non-coding region generally thought to contain elements that control the initiation of replication and transcription [13]. Size variation in mtDNA is usually due to differences in non-coding regions [14, 15], but occasionally it is due to duplications or multiplications of coding regions [10, 1619].

Mollusca, the second largest animal phylum, exhibits much variation in the features and gene arrangement of their mitochondrial genomes. Some bivalve lineages (families Mytilidae, Unionidae, and Veneridae) have an unusual mode of inheritance for mtDNA, termed doubly uniparental inheritance [2022]. The ATP6 and ATP8 are separated into the scaphopods and most gastropods. Most bivalves are lacking the ATP8 gene, only in Hiatella arctica and Lampsilis ornata ATP8 has been found to date. In addition, several duplicated gene have been observed in two squid mtDNAs, Watasenia scintillans and Todarodes pacificus [23]. Veneridae is a large and diverse family of Bivalvia, including over 800 extant, presumably valid, species [24], many of which are commercially important by virtue of their dominance in benthic communities [25]. Despite their colonization of all types of soft bottoms, from coastal to deep-sea areas, venerids generally exhibit few morphological differences connected with soft-tissue anatomy; this makes it difficult to identify cases of morphological parallelism among evolutionarily distant species and of shell diversification among closely related ones. The study of phylogenetic relationship is further complicated by the large number of shell polymorphisms observed in populations of the same species [25].

Hard clams, genus Meretrix (Veneridae), are commercially important bivalves, scattered through coastal areas of the Indian Ocean, Southeast Asia, China, Korean Peninsula and Japanese Archipelago. Traditionally hard clams are mainly identified based on visible shell characters. Because of the remarkable variation of shapes and patterns of shells, early researchers divided this genus into many species, such as M. meretrix, M. typical, M. petechialis, M. castanea, M. graphica, M. labiosa, M. inpudina, M. zonaria, M. lusoria, M. lyrata, M. lamarckii, M. mophina, M. exilis, etc. The taxon of the Meretrix is still in argument over several discrepancies among systematists in the world.

In the present study, we present the complete mitochondrial genome of the clam M. meretrix, a commercially important aquatic resource in China. The mtDNA features of the clam are described and compared with other molluscs.

Materials and methods

Sample collection and DNA isolation

Fifty hard clam specimens were collected from the coast of Panjin, Liaoning province, China. One of them was randomly chosen as the specimen for complete mitochondrial genome sequencing. A portion of the food muscle tissue was dissected and frozen at −80°C. Total genomic DNA was isolated using Genomic DNA Extracting Kit (TaKaRa, Dalian) following the manufacturer’s protocols and stored at −20°C.

Primer design, PCR amplification and DNA sequencing

To amplify the entire mitochondrial genome of M. meretrix, seven pairs of primers were designed based on seven partial sequences of mt genes obtained from the ESTs data from cDNA library of M. meretrix. All the PCR primers were designed that there are at least 100 bp overlapping between each fragment to ensure proper sequence assembly. The sequences of primers and PCR amplification conditions were shown in Table 1. The PCR reactions were performed with a Mastercycler gradient PCR machine (Eppendorf, Germany). The cycling was set up with an initial denaturation step at 95°C for 5 min, followed by 35 cycles comprising denaturatin at 94°C for 50 s, annealing at 60°C for 1 min and extension at 72°C for from 2 to 6 min depending on the expected length of the PCR products. The process was completed with a final extension at 72°C for 10 min.

Table 1 Primers and PCR conditions for amplification of M. meretrix

Gene annotation, sequence analysis and genome arrangement

The sequences were assembled by using Sequence Analysis v3.4.1(Applied Biosystems), Seqman V5.05(DNASTAR) and the Blast Two Sequences program of NCBI(http://www.ncbi.nlm.nih.gov/BLAST/). Protein-coding genes, rRNA genes and the boundaries of each gene were identified with DOGMA [26], BLAST searches (http://www.ncbi.nlm.nih.gov/BLAST) and comparison with sequences of other published bivalve mitochondrial sequences including M. petechialis [27]. The tRNA genes were identified by tRNAscan-SE 1.21 by employing the cove-only search mode and the invertebrate mitochondrial genetic code for secondary structure prediction [28]. In addition, codon usage in the M. meretrix mitochondrial protein-coding genes was estimated with DnaSP 4.50.3 [29]. Mitogenome arrangements of all published species from Bivalvia class were compared. The tRNA genes were included and possible mitogenome arrangements were analyzed both manually and by utilizing the rearrangement program DERANGE II [30].

Phylogenetic analysis

Along with the complete mtDNA sequence from M. meretrix sequenced in this study, all currently available 29 bivalves mitogenome data from GenBank were used in phylogenetic analyses (Supplementary data file 2). The root of all trees was determined by using the data from Ascaris suum species as outgroup. The amino acid sequences of 12 concatenated protein-coding genes with the arrangement of COI–ND1–ND2–ND4L–COII–Cyt b–ND4–ATP6–ND3–ND5–ND6–COIII were aligned using Clustal X with the default setting [31]. Model selection for the amino acid dataset was done with ProtTest [32]. Protein maximum-likelihood tree was generated using the program PhyML, version 3.0 employing the the mtREV model of amino acid substitution for mtDNA-encoded proteins. Bayesian approach using MRBAYES 3.1.2 [33, 34] was employed to analyze the aligned data sets and trees were built using the mtREV matrix model of evolutionary change. An initial run of 30,000 generations was set to estimate the likelihood value convergence. The Markov Chain Monte Carlo analysis was run for 150,000 generations (sampling every 100 generations). After omitting the first 300 “burnin” trees, the remaining 1,200 sampled trees were used to estimate the 50% of majority rule consensus tree and the Bayesian posterior probabilities (BPP).

Results and discussion

Genome composition

The complete mitogenome of M. meretrix is 19,826 bp in length, which is within the range of genome sizes for already sequenced molluscan mitogenomes. The mitogenome of M. meretrix contains 37 genes including 12 protein-coding genes, two ribosomal RNAs, and 23 tRNAs. All genes are encoded on the heavy strand, a common feature in marine bivalves. In contrast to the typical animal mitochondrial genome, it lacks the protein-coding gene ATP8, and has only one copy of the tRNASer gene, but three duplications of the tRNAGln gene (Fig. 1). The sequence has been deposited in GenBank under accession number NC_013188. The M. meretrix mtDNA has an overall 68.4% A + T content, and the coding regions are 16,369 bp in length accounting for 82.6% of the whole genome. There are two overlapping regions located between three adjacent tRNA genes in the genome (see Supplementary Data 1).

Fig. 1
figure 1

Gene map of the mitochondrial genome of the M. meretrix. Genes have standard abbreviations except for tRNAs, which are designated by the one-letter code for the corresponding amino acid. S2, L1, L2 designate genes for those tRNAs recognizing the codons UCN, CUN and UUR, respectively. Grey shaded parts represent non-coding regions. The grey shaded part flanked by G and Q is the putative control region

Protein-coding genes

Of the thirteen typical protein-coding genes (cox1–cox3, nad1–nad6, nad4L, cytb, atp6, and atp8), twelve were determined. No Atp8 was identified in the genome. All genes were transcribed on the same strand. These findings have been observed in all marine bivalve genomes so far published except for Hialella arctica (Family Hiatellidae), Cristaria plicata, Hyriopsis cumingii and Lampsilis ornata (Family Unionidae) [3, 7]. Thus, the placement of all coding genes on the same strand and the lack of an atp8 gene are the most distinctive features of marine bivalve mitochondrial genomes. Eight of twelve protein-coding genes of M. meretrix initiate with ATG start codon, while Cyt b, ND5 and ND6 start with ATA and COI with TTG. Eight protein-coding genes use TAA as stop codon, and the remaining four genes terminate with TAG (see Supplementary Data 1).

The bias of the base composition of an individual strand can be described by skewness [35], which measures the relative numbers of As to Ts and Gs to Cs, and is calculated as (A% − T%)/(A% + T%) and (G% − C%)/(G% + C%), respectively. The twelve protein-coding genes show a typical positive GC skew and negative AT skew (Fig. 2). Only Cyt b has a slightly skew of G vs C (GCskew = 0.238), other eleven protein-coding genes have strong skew of G vs C (GCskew = 0.288–0.538). In contrast to the AT skew of COII gene (AT skew = −0.192), a slightly negative one, the others have strong negative skew of A vs. T (AT skew = −0.244 to −0.421). The nucleotide compositions of protein-coding genes in M. meretrix are all skewed away from C in favor of G and from A in favor of T as in other bivalves [36].

Fig. 2
figure 2

The GC and AT skew for mitochondrial protein-coding genes in Meretrix meretrix mtDNA. Graphical representation of absolute values is shown. Genes are ordered according to their position in the mitochondrial genome

The pattern of codon usage in M. meretrix mtDNA was also studied (Table 2). There are a total of 4014 codons in twelve mitochondrial protein-coding genes. Phenylalanine is the most frequent amino acid, Leucine is the second most frequent amino acid, and Glutamine is the least frequent. Individually, TTT (Phenylalanine) is definitely the most frequently used codon not only in M. meretrix, but also in other bivalves [27, 37].

Table 2 Codon usage in the mitochondrial genome of Meretrix meretrix

Transfer and ribosomal RNA genes

The mitogenome of M. meretrix encodes 23 tRNA genes, ranging in size from 62(tRNAHis) to 71(tRNAAla) nucleotides, which can be folded into the typical clover-leaf secondary structures with several mismatch pairs within acceptor and anticodons (see Supplementary Data 2). Compared with M. petechialis, which has two duplications of tRNAGln, the M. meretrix mitogenome has three duplications of tRNAGln, which is the first report among the present molluscan mtDNAs. In the mitogenomes of metazoan, almost all amino acids codons but leucine and serine are decoded by only one tRNA each [38]. However, the duplication of tRNAAsp was observed in the mitogenome of Mizuhopecten yessoensis and it is a common phenomenon that mtDNA of most bivalves contain two tRNAMet [27, 36, 39].

Although putative gene boundaries for the two rRNA genes in the genome have been found, these cannot be accurately determined until transcript mapping is carried out. The lengths of lrRNA, located between cytb and nad4, and of srRNA, located between tRNAThr and tRNACys, are 1581 and 1187 bp, respectively; their overall A + T contents being 71% and 69.4%, respectively. The lrRNA is the largest rRNA locus reported so far within bivalves. The size and A + T content of lrRNA and srRNA in M. meretrix are same as those in M. petechialis, and larger than those in the rest bivalves.

Non-coding regions

As in most bivalves, the M. meretrix mtDNA contains a large number of unassigned nucleotides. There are as many as 29 non-coding regions, ranging in size from 2 to 1625 bp and totaling 3,457 bp, throughout the mitochondrial genome (see Supplementary Data 1). Metazoan mtDNAs usually have lengthy non-coding regions that vary in size [40, 41]. There were seven relatively large non-coding regions greater than 100 bp. The largest non-coding region with increased A + T composition was thought to contain the signals for replication and transcription, and hence was regarded to as the control region [11]. The largest non-coding region with length of 1625 bp (putative origin of replication) in M. meretrix located between tRNAGly and tRNAGln,which accounted for 47 percent of all non-coding regions. This region had a 69.9 percent of A + T content, which is slightly larger than that of the whole genome and protein-coding regions (68.4 and 66.9%). Interestingly, in the M. meretrix mitogenome, one tandem repeat comprising three nearly identical motifs was found in the area covering four adjacent non-coding regions. Each of the motifs contains one tRNAGln; it is thus obvious that the second and third copy of tRNAGln occurred as the motif duplicated. In the M. petechialis mitogenome, only two nearly identical motifs were found in the area covering three adjacent non-coding regions. Tandem repeats are common within the control region of animal mtDNAs [42]; they often form stable secondary structures and play an important role in the early stages of the replication and transcription process [43]. Tandem repeat units within non-coding regions have also been extensively found in molluscs [44, 45].

Gene arrangement

Gene arrangement comparisons may be a useful tool for phylogenetic studies. This is based on the fundamental assumption that shared gene arrangements imply common ancestry since it is highly unlikely that the same gene order would arise independently in separate lineages [46]. Thirty mitochondrial genomes available allow a comparison of the mitochondrial gene orders within bivalvia (Fig. 3). In the family Veneridae, the gene arrangement between M. petechialis and M. meretrix are the same except for one more tRNAGln gene in M. meretrix. Comparing the gene arrangement between two species of the genus Meretrix and Venerupis, they share three completely identical gene blocks. In the family Pectinidae, three conserved gene blocks were identified between Mizuhopecten yessoensis and Chlamys farreri. However, the Argopecten irradians is very different from M. yessoensis and G. farreri. Only four small conserved gene blocks could be found between A. irradians and C. farreri; The gene orders are most highly rearranged in Pectinidae. All species published of Ostridae except Crassostrea virginica shared the same mitochondrial gene arrangement. Compared with C. hongkongensis, there were some tRNA translocation and one tRNA loss in C. virginica. The overall genomic organization of C. hongkongensis is more similar to that of C. ariakensis than to C. virginica, maybe corresponding evidently to their closer genetic relationship. It is likely that gene arrangement is highly conserved among Mytilus species. Among M. galloprovincialis, M. edulis and M. trossulus, not only protein-coding genes arrangement, but also the order of tRNAs, ribosomal RNA, and control region are almost the same except for one more tRNAGln flanked by control region and tRNATyr in M. trossulus. Several genes inversion are seen in all three species published from Unionidae family, those genes encompass COI, COII, ND3, tRNAHis, ND5, ND4, ND4L, ATP8, tRNAAsp, ATP6 and COIII. Additionally, the gene arrangement of Cristaria plicata differs from that of Lampsilis ornate by only the loss of tRNAGlu. Comparing the gene arrangement between C. plicata and Hyriopsis cumingii, they share two completely identical gene blocks.

Fig. 3
figure 3

Linearized representation of gene arrangement for bivalve mitochondrial genomes. The gene arrangement of mt genomes for all species here is start from COI and all genes are transcribed from left to right except those indicated by underlining, which are transcribed from right to left. The bars show identical gene blocks and CR denote the control region. The non-coding regions are not presented. Genes for tRNAs are designated by s single letter for the corresponding amino acid: L1, L2, S1 and S2 denote tRNALeu(CUN), tRNALeu(UUR), tRNASer(AGN) and tRNASer(UCN), respectively

Among the species of Bivalvia, the ATP8 gene is absent except the species of family Unionidae. The gene orders are most highly rearranged in Bivalvia. It may because all of the genes of most species of Bivalvia are encoded on the heavy strand except the species of family Unionidae, and the asymmetric gene replication and transcription accelerate the gene rearrangement in the evolutionary process [27]. Investigation on many more bivalve mtDNAs will provide invaluable information for understanding the detailed evolutionary process of the bivalve mitogenomes.

Phylogenetic analysis

Bayesian and ML trees based on amino acid sequences of 12 concatenated protein-coding from thirty bivalves place the Unionidae as a sister group to other bivalves and reflects the general opinion that the Unionidae deverged very early in Bivalvia evolution. The Mytilidae places the grouping composed with Pectinidae and Ostridae as the sister group. The Mizuhopecten yessoensis and Chlamys gigas are closer than other Pectinidaes, and the Crassostrea angulata is closer with Crassostrea gigas than other Ostridae, which is the same as the gene rearrangement method. The Lucinidae is placed in a clade which places the grouping composed with Veneridae, Solenidae, Cardiidae and Saxicavidae as the sister group. However, the relationship among Veneridae, Solenidae, Cardiidae and Saxicavidae is not identified. The Mytilidae and Unionidae are divided into two different clades, respectively. In present, considering many questions in the phylogeny of Bivalvia remain unresolved, it is desirable to increase the resolution by adding more mitochondrial genomes. Further taxon sampling will be very useful for determining the phylogenetic relationships among the major lineages of Bivalvia (Fig. 4).

Fig. 4
figure 4

Phylogenetic tree derived from Bayesian analysis of M. meretrix relative to 29 other bivalve species. Ascaris suum was used as the out-group. The phylogenetic relationship was performed by Bayesian and ML methods. The first numbers are from Bayesian inferences and the second numbers are from maximum likelihood analysis. Tree topologies produced by the two methods were similar and only more than 50% bootstrap values are shown and others are represented by “–”

Taxonomic position of M. meretrix and M. petechialis

The comparison of mitogenome structures showed that M. petechialis and M. meretrix were most closely related species. The sequence similarity of between M. petechialis and M. meretrix is as high as 99% and their gene arrangement is same except one more tRNAGln gene in M. meretrix. Our results did not support the present taxonomic status of M. meretrix and M. petechialis, which were in accordance with viewpoint that M. petechialis and M. meretrix belong to different geographic subspecies of one species [47, 48]. M. meretrix and M. petechialis were previously considered as independent species in the genus Meretrix. It was shown that the species differed from M. petechialis by the following shell characteristics: anterior and ventral margins, posterior end of shell. The Meretrix species are widely scattered through coastal areas of the Indian Ocean, Southeast Asia, China, Korean Peninsula and Japanese Archipelago. Because of the remarkable variation of shapes and patterns of shells, early researchers divided this genus into many species, such as M. meretrix, M. typical, M. petechialis, M. castanea, M. graphica, M. labiosa, M. inpudina, M. zonaria, M. lusoria, M. lyrata, M. lamarckii, M. mophina, M. exilis, etc. To date, nine species are generally recognized in Meretrix [49]. Zhuang pointed out that Meretrix has a little species and only three species, M. meretrix, M. lusoria and M. lamarckii, have been found in China [50]. It was pointed out that M. meretrix is a species which experienced the greatest variation in the group of bivalves, and because of shades of shells and shell colors, it was wrongly divided into many species [51]. In summary, the views of taxonomic status of M. meretrix and M. petechialis are ambiguous based on morphology. The viewpoint that M. petechialis and M. meretrix should be treated as a junior synonym of M. meretrix.