Introduction

Molluscs are an extraordinarily diverse group of animals with an estimated 200,000 species, second only to the Phylum Arthropoda. Some phylogenetic studies on molluscs have been reported at lower taxonomy levels, such as Genus and Family, et al. [13]. But at higher taxonomy levels, such as Class, molecular classification of molluscs was difficult because of the few genomic sequence information and inappropriate rate of nucleotide substitution. Though there were more complete mitochondrial genome sequences of molluscs deposited in GenBank, the large divergency of mitochondrial genes among classes, even among families, made it almost impossible that the species were distinguished using the mitochondrial genes. So, a nice marker gene for phylogenetic analysis of molluscs probably could be chosen from nuclear genes. Here, we evaluated some nuclear genes present in GenBank for their potential phylogenetic information and found that tropomyosin is a nice marker gene for phylogenetic analysis of molluscs.

Materials and methods

Data collection

The tropomyosin cds sequences of the following 41 bilaterians, Mizuhopecten yessoensis, Argopecten irradian, Sinonovacula constricta, Crassostrea gigas, Mytilus edulis, Solen strictus, Tresus keenae, Pseudocardium sachalinensis, Fulvia mutica, Scapharca broughtonii, Venerupis philippinarum, Chlamys nipponensis, Neptunea polycostata, Turbo cornutus, Biomphalaria glabrata, Haliotis asinina, Haliotis discus, Helix aspersa, Octopus vulgaris, Sepioteuthis lessoniana, Ommastrephes bartramii, Todarodes pacificus, Squilla aculeata, Scolopendra sp, Caenorhabditis elegans, Anisakis simplex, Schistosoma haematobium, Clonorchis sinensis, Danio rerio, Salmo salar, Thunnus thynnus, Pennahia argentata, Xenopus tropicalis, Rana catesbeiana, Ovis aries, Mus musculus, Oryctolagus cuniculus, Cervus elaphus, Homo sapiens, Gallus gallus and Japanese quail, and 2 cnidarians, Nematostella vectensis, and Podocoryna carnea, were acquired from GenBank. For bilaterians, long-form transcripts (translated to 284 aa) of tropomyosin genes were chosen for analysis, and for cnidarians, short-form transcripts (translated to 242 aa) were done because cnidarians only encode short-form transcripts.

Nucleotide substitution saturation detection

The stop codons of the obtained sequences were removed before further analysis. Based on the translated amino acid sequences, we conducted a comparative analysis of complete tropomyosin cds (without stop codons) of above species. According to the selected optimum replacement model, the nucleotide substitution saturation was tested using DAMBE V.4.5.2 [4].

Construction of phylogenetic tree

The above sequences were multi-aligned using Clustal W (http://www.ebi.ac.uk/Tools/clustalw) and edited manually. Aligned Sequences were studied using MEGA 4 [5] software for phylogenetic inference. The alignment file included all the sites with missing/ambiguous data and gaps. The nucleotide sequence ME tree (Model is Jukes-Cantor) was constructed using MEGA 4.0 accompanied with 1000 bootstrap tests. Two kinds of cnidarians, Nematostella vectensis and Podocoryna carnea, were used as the outgroup in this study.

Detection of positive selection sites

The dN (nonsynonymous substitution rate), dS (synonymous substitution rate), and the ω values (dN/dS) were calculated by popular methods described in PAML package [6]. PAML 4 [7] was used to analyze the variation of selective pressures during tropomyosin evolution. This program used three models, which are all based on the comparison of two sub-models. The first model is the “branch-specific” model. One of its sub-models is one ratio model, which assumes that the values of ω of all evolutionary branches are the same, while another is free ratio model, which assumes that they are all different. LRT (Likelihood ratio test) of these two sub-models can test whether the value of ω changes in different branches. The second model is the “site-specific” model. It assumes that the value of ω can be changed among different sites. There are two pairs of sub-models for the “site-specific” model, M1a and M2a as well as M8a and M8, respectively. The third model is the “branch-sitespecific” model, which divides the branches of the evolutionary tree into the “foreground” and “background” categories. The ω values may be variable in the two categories and the value of ω of “foreground” is higher than that of “background”. It further assumes that the ω value of each category is fixed, but can differ among different sites.

Analysis of sequence variation and prediction of motif change

Sequence variations, especially InDels found in the tropomyosin sequence of turbo cornutus, were analyzed, and the changes of motif were predicted at http://hits.isb-sib.ch/cgi-bin/PFSCAN.

Results and discussion

Nucleotide substitution saturation is little

After the nucleotide substitution saturation was detected among these species using DAMBE V.4.5.2, it was found that the observed Iss value was significantly lower than Iss.c when assuming a symmetrical topology (Table 1; P = 0.00 < 0.01), which hints that the nucleotide substitution saturation is little among the tropomyosin sequences of all the tested species and construction of the phylogenetic tree could be carried out according to usual methods.

Table 1 Test of substitution saturation of the tropomyosin sequences of the tested species

A nice mollusc phylogenetic tree was constructed

For molluscs, it is very difficult to find a nice molecular marker to distinguish each other because the nucleotide substitution was usually saturated. In this study, a phylogenetic tree was constructed base on tropomyosin sequences of all the studied species (Fig. 1). Based on tropomyosin sequences, all the molluscs were arranged to the same evolutionary location with classical taxonomy, which suggested that tropomyosin was a nice marker gene for phyligenetic analysis of molluscs. In the last two decades, mitochondrial DNA was frequently used to describe the evolution of mollusks, but usually limited within the level of Genus or Family [8, 9]. The results of this study suggested that the tropomyosin sequence could classify the molluscs at lower or higher taxonomy levels. To our knowledge, it is the first time that the phyligenetic tree of entire Phylum Mollusca was successfully constructed based on only one gene sequence.

Fig. 1
figure 1

The ME tree constructed based on the tropomyosin cds sequences of all the studied species

In addition, the whole phylogenetic tree based on all tested tropomyosin sequences was consistent with the classical species phylogenetic tree based on morphology, which indicated that the tropomyosin gene could also be regarded as a useful marker gene for the phylogenesis study of most animals.

No positive selection sites was found

Both genetic factors and selection factors have effect during the evolutionary history of mollusc genes. Two parameters, dN and dS, play an important role in estimating selection. The ratio of dN/dS provides a measure of the selection pressure to which a gene pair is subject. If dN/dS is <1, we consider that purifying selection acts on this gene; if dN/dS is >1, we consider that positive selection acts on this gene; if dN/dS = 1, then no selection acts on this gene, which means that this gene evolved neutrally; if dN/dS is <1, which means that some replacement substitutions have been purged by natural selection, presumably because of their deleterious effects. The smaller the dN/dS ratio is, the greater the number of eliminated substitutions and the greater the selective constraint under which the genes have evolved. Tropomyosin is an actin-binding protein that regulates actin mechanics. It is important, among other things, for muscle contraction. The nonsynonymous mutations of this gene can cause some severe diseasese [10, 11], suggesting it is functionally important. In this study, we found that all pairwise ratios of dN/dS were always less than 1 and 0 sites was positively selected, meaning that these genes are under purifying selection for very strong functional constraint. Once deleterious mutation (nonsynonymous mutation) occurs, this deleterious mutation can not fix in population because purifying selection may act on this mutation and remove it, so the detected dN was less than dS.

Two InDels in tropomyosin of turbo cornutus and their meaning

The InDels were found in Only three species, turbo cornutus, Nematostella vectensis and Podocoryna carnea. Nematostella vectensis and Podocoryna carnea are Cnidaria, belonging to Radiata, and their sequence are 42 amino acids shorter than the other animals, all belonging to Bilateria, at the sequence start in the study, which are understandable because of the very long divergency time between Radiata and Bilateria. More interesting, there were 1 deletion (aa 126–132) and 1 insertion (aa 152–158) found in turbo cornutus, compared with other Gastropod animals and other Bilaterias in this study. It is very interesting that the insertion (aa 152–157) is the same with the sequence (aa 159–164) and their DNA sequences are also the same (Fig. 2, Panel a, b), which indicated that segment repeating happened recently because the dN and dS are both 0 between the insertion (aa 152–157) and the sequence (aa 159–164). Next, the domain changes were scanned at http://hits.isb-sib.ch/cgi-bin/PFSCAN and two new domains (Table 2) appeared because of the InDels, which suggested the InDels in the tropomyosin sequence of turbo cornutus was of functional importance. It has been reported that positive selection could also act on InDels in the evolution of a protein [12]. In this study, because of the short period from InDels appearing and their functional importance, it could be deduced that the cause for the unique InDels fixed in turbo cornutus genome was also positive selection.

Fig. 2
figure 2

The InDels and amino acid fragment repeats in tropomyosin gene of Turbo cornutus. There were 1 deletion (aa 126–132) and 1 insertion (aa 152–158) found in turbo cornutus (Panel a), compared with other bilaterian animals in this study. It is very interesting that the insertion (aa 152–157) is the same with the sequence (aa 159–164) (Panel a) and their DNA sequences are also the same (Panel b)

Table 2 Two new domains caused by the InDels