Introduction

Transposable elements (TE) represent the DNA fragments capable of increasing their copy number and moving within the genome. They constitute a considerable part of prokaryotic or eukaryotic genomes, accounting for up to 35–69% of genomic DNA in different mammalian species (Lander et al. 2001; Waterston et al. 2002; de Koning et al. 2011) and up to 90% of the genome in some plants (Sergeeva and Salina 2011).

Transposable elements exert a considerable influence on genome function by altering structure and activity of the genes due to the genetic rearrangements (Strand and McDonald 1989; Zakharenko et al. 2000). The TE-induced genomic instability has a dual significance, as far as either a particular organism or the common population is concerned. For the individuals, the increased mutability most often results in lethality, while for the population, especially for the one under stress, an increased mutational background offers an ability to gain new beneficial features (Chow and Tung 2000; Del Re et al. 2003; Zakharenko et al. 2006), which can result in the adaptation and further evolution (Finnegan 1992; Piacentini et al. 2014).

Depending on the mechanism of transposition, the TEs are divided into two classes (Kapitonov and Jurka 2008). Class I includes the retrotransposons, which first construct their RNA intermediate copy and then the DNA copy to be inserted into the host genome. Transposing by this mechanism, the retrotransposons can easily increase their copy number and, thus, they account for the majority of the entire transposable genome.

Class II represents the DNA transposons, which transpose via excision and the subsequent reinsertion at random genomic loci. The increase of their copy number occurs only when the transposition event is coupled with the replication. The proportion of DNA transposon-derived sequences in the genome of a species rarely prevails over the proportion of retrotransposon-derived sequences (Arensburger et al. 2010; Mesquita et al. 2015; Fernández-Medina et al. 2016; Hernandez-Hernandez et al. 2017).

The class of DNA transposons is further divided into the three subgroups: TIR-transposons, helitrons, and polintons. TIR-transposons constitute the most representative and diverse subclass, which includes 21 superfamilies (Kapitonov and Jurka 2008). The presence of terminal inverted repeats (TIR) is a feature of TIR-transposons. The internal portion of a transposon flanked by the TIRs includes the open reading frame for the transposase enzyme responsible for the TE mobility. During the excision, the transposase recognizes the inverted terminal repeats, which are specific to a particular transposon. The TEs, which have a changed TIR unrecognizable by transposase or have lost their TIRs, become immobilized (Wicker et al. 2007).

One of the most diverse and widely distributed TIR-transposon superfamilies is IS630/Tc1/mariner. The first representatives of mariner family (Dmmar1 и Mos1) were isolated from Drosophila mauritiana (Haymer and Marsh 1986; Jacobson et al. 1986). Similar elements were later found in almost every species studied for their presence (Feschotte and Pritham 2007; Liu and Yang 2014).

The genome size of IS630/Tc1/mariner transposons varies between 1 and 3000 bp. These elements usually have a single ORF coding for the transposase (~ 350 amino acids (a.a.) in length). The transposase recognizes the TIRs and mediates the TE insertion into the duplicated target site TA. The length of terminal inverted repeats varies between 20 and 460 bp (Shao and Tu 2001; Claudianos et al. 2002).

Despite the fact that the number of species with fully sequenced genomes is increasing annually, the TE representation and activity across the marine species, including the invertebrates, remains rather poorly studied (Puzakov et al. 2017). According to the data obtained through the automatic annotation services, the content of TE-derived sequences in the genomes of mollusks is relatively lower, than that in the vertebrate genomes (35–52%). In the former, this figure varies between 2 and 8% (Yoshida et al. 2011; Takeuchi et al. 2012; Zhang et al. 2012; Simakov et al. 2013; Albertin et al. 2015). The Pacific oyster, Crassostrea gigas, belongs to a large, while scarcely studied genetically, phylum of the mollusks (Mollusca). The sequencing revealed that its genome is rather polymorphic and it harbors the repetitive sequences, including the TEs (Zhang et al. 2012). In this work, we present the results of a detailed analysis of IS630/Tc1/mariner family transposons in C. gigas.

Materials and Methods

In this work, the DNA transposons of IS630/Tc1/mariner superfamily found in the Pacific oyster, C. gigas (Jurka 2012, 2013; Bao and Jurka 2013a, b, c), are studied. Additionally, the sequences of the other 53 IS630/Tc1/mariner superfamily DNA transposons representing Tc1, pogo, mariner, maT, mosquito, plants, IS630, and rosa families were used for the phylogenetic analysis and DDE/D domain analysis (Online Resource 1).

The terminal inverted repeats were determined by blastn algorithm (Zhang et al. 2000). The presence of functional amino acid sequences was determined by blastx on the basis of their homology to the transposases of known transposable elements (Altschul et al. 1997). The copy number C. gigas elements was calculated as the number of homologs exceeding 10% of the transposon size using a blastn instrument (Zhang et al. 2000). Both the overall number of homologous loci and the number of full-length elements (those retaining > 95% of sequence length and the intact TIRs) were counted.

Multiple alignment of amino acid sequences was made by MUSCLE software (Edgar 2004) using the standard settings. The appropriate amino acid substitution model was selected using the Akaike information criterion. The search best protein models and phylogenetic analysis were performed using MEGA7.0 software (Kumar et al. 2016). The activity of the DNA transposons was evaluated by using the presence of transcripts homologous to the predicted transposase sequences as a proxy. The search for transcripts in the TSA and EST databases was performed with the use of tblastn (Altschul et al. 1997).

In order to estimate the evolution dynamics of the elements from C. gigas, we determinate the identity percentage of the consensus sequences of elements with their corresponding full-size copies. Consensus sequences were derived using the relative majority rule. This approach was previously used by Bouallègue et al. (2017).

Results and Discussion

IS630/Tc1/mariner Elements of C. gigas

Repbase contains 31 potentially functional (autonomous) and 21 non-autonomous IS630/Tc1/mariner superfamily elements from C. gigas (Online Resource 2). We also found and uploaded to Repbase six novel elements of this superfamily (Mariner-32_CGi, Mariner-33_CGi, Mariner-34_CGi, Mariner-36_CGi, Mariner-37_CGi, Mariner-38_CGi). For two elements from Repbase (Mariner-2_CGi, Mariner-N1_CGi), we found the more complete transposase-encoding sequences and, thus, the non-autonomous Mariner-N1_CGi transposon was re-classified and renamed as the autonomous Mariner-35_CGi.

The majority of the autonomous IS630/Tc1/mariner DNA transposons from the oyster have the sequence length of 1–3 kbp, which is typical for this superfamily. Five elements (Mariner-38_CGi, Mariner-31_CGi, Mariner-35_CGi, Mariner-9_CGi и Mariner-34_CGi) have sequence length between 3 and 4.5 kbp, while Mariner-21_CGi has a size significantly larger than the typical one (9744 bp). The increased length of TE sequences may be due to the insertions into its sequence or the duplications. Any of these events can result in the loss of the TE autonomous motility, as long as the ORF, the promoter, or the TIRs are damaged. The lengths of ORFs encoded by the majority of IS630/Tc1/mariner elements from C. gigas lie within the range of 300–400 a.a., which is a standard value for this group of DNA transposons. In ten elements, the length of putative transposase is > 450 a.a., while in three elements—Mariner-3_CGi, Mariner-4_CGi, and Mariner-6_CGi—the transposase is shorter than 250 a.a. The length of TIRs of IS630/Tc1/mariner DNA transposons from the oyster typically varies between 24 and 52 bp. Although the TIRs of Mariner-31_CGi and its non-autonomous copy are somewhat longer (173 and 273 bp, respectively), they are not beyond the scope described for this superfamily (20–460 bp) (Claudianos et al. 2002).

Analysis of Catalytic Domains

The classification of DNA transposons is based on the differences in the catalytic domain of the transposase, which is responsible for the cutting and relegation of DNA chains in the process of insertion. This domain has three characteristic amino acid residues—two aspartates (D) and either glutamate (E) or aspartate as the third one, hence the name of this triad, the DDE/D motif or domain. Based on the different number of amino acid residues between the second aspartate and the third glutamate/aspartate in DDE/D domains, IS630/Tc1/mariner superfamily is divided into eight groups: Tc1 (DD34E) (Henikoff 1992; Capy et al. 1996), mariner (DD34D) (Robertson 1995; Capy et al. 1996), pogo (DDxD) (Smit and Riggs 1996; Capy et al. 1997), maT (DD37D) (Robertson and Asplund 1996; Shao and Tu 2001; Claudianos et al. 2002), mosquito (DD37E) (Shao and Tu 2001), plants (DD39D) (Jarvik and Lark 1998; Shao and Tu 2001), rosa (DD41D) (Gomulski et al. 2001; Zhang et al. 2016b) и IS630 (DDxE) (Doak et al. 1994; Capy et al. 1996).

DDE/D domains were identified, virtually, in every IS630/Tc1/mariner transposon from C. gigas (Online Resource 2). The protein sequence of Mariner-17_CGi was not represented in Repbase, probably due to the multiple stop codons in the putative ORF; however, the catalytic triad (DD34E) still remained clearly recognizable. For Mariner-9_CGi, a fragmented ORF with its fragments lying between the nucleotide positions 1065–1319, 1323–1832, and 1660–2853 according to the nucleotide sequence of the element is represented in Repbase. This ORF is coding for a protein sequence of 653 a.a. in length. However, our further analysis suggests that another ORF (with its fragments localized between the positions 1065–1658 and 1660–2856) coding for a protein 596 a.a. in length is more likely to represent the gene sequence of a putative transposase. Mariner-31_CGi contains two ORFs, and two independent amino acid sequences—Mariner-31-CGi_p1 and Mariner-31-CGi_p2—were analyzed.

The fragments of identified DDE/D domains including the second and third amino acid residues of DDE/D triad and 10 a.a. both upstream and downstream of this region were aligned with the similar fragments of the transposases of the known IS630/Tc1/mariner superfamily transposons (Fig. 1). The group of DNA transposons (18 elements) with the ORFs coding for the amino acid sequences with DD34E catalytic domain was the largest one. As a result of multiple alignment, those DNA transposons were shown to have a high degree of homology with the known elements of Tc1 family. Mariner-35_CGi (DD36E) element has a DDE/D domain highly homologous to those of DD34E elements and, most possibly, it also belongs to Tc1 family.

Fig. 1
figure 1

Multiple alignment of the fragments of DDE/D domains including the second and third amino acid residues of DDE/D triad (indicated in gray) and 10 a.a. both upstream and downstream of this region. Amino acid residues (SEV) unique for Mariner-18_CGi are underlined

The second large group of DNA transposons (ten elements) codes for the proteins with catalytic domains typical for pogo family—DD30D (pogo-like) and DD35D (Fot1-like).

The three groups of DNA transposons (with DD34D, DD41D, and DD37E domains) included only two elements each. The elements with DD34D (Mariner-2_CGi, Mariner-11_CGi) and DD41D (Mariner-4_CGi, Mariner-12_CGi) domains show similarities with the mariner and rosa groups of transposons, respectively. The situation with the elements with DD37E domain (Mariner-18_CGi, Mariner-31_CGi) turned out to be more intriguing. For both elements, the pattern (disposition) of amino acids in the alignment is different from the representatives of mosquito family (Ae-atropalpus1 и An-gambiae1). In addition, three amino acid residues (SEV) of Mariner-31_CGi are localized to a completely different locus in the alignment than in all other elements of the IS630/Tc1/mariner superfamily (Fig. 1). Mariner-18_CGi and Mariner-31_CGi elements can belong to the new, unique clades of elements with DD37E signature or represent the extensively modified versions of DD34E or other elements.

No elements with DD37D, DD39D, or DDxE domains have been found among the IS630/Tc1/mariner transposons of C. gigas. To date, the DDxE elements (IS630 family) have been found only in prokaryotes (Doak et al. 1994; Capy et al. 1996), while DD39D transposons (plants family) include only the elements from the plants (Shao and Tu 2001).

Several oyster transposons were not included into the alignment, since they could not have been reliably aligned with the other members of the superfamily or were lacking the clearly recognizable catalytic triad. Those included Mariner-21_CGi with DD83D catalytic domain. Apparently, one or two large insertions occurred into the catalytic triad of this element (sequence not shown). Of note, this element has a huge size of 9744 bp while possessing an average size transposase (376 a.a.).

DDE/D domain of Mariner-6_CGi transposase could not have been precisely identified, since the region of the third amino acid residue is deleted or significantly modified. The catalytic domain of Mariner-3_CGi element also could not have been identified due to the missing region of the second aspartate of DDE/D domain.

The amino acid sequence of Mariner-31_CGi_p1 is lacking the domains relevant to its transposition activity, but instead it contains Myosin_tail_1 (pfam01576) domain. This domain mediates the assembly of myosin subunits into the macromolecule thread and its stability. This sequence was probably excised by the transposon in the course of a transposition event.

Phylogenetic Analysis

The complete transposase sequences of IS630/Tc1/mariner DNA transposons from C. gigas, as well as the transposases of 52 known DNA transposons representing the families from IS630/Tc1/mariner superfamily, were used for the phylogenetic analysis (Fig. 2, Online Resources 3). The amino acid sequences of the IS630 elements were used as outgroup.

Fig. 2
figure 2

Maximum likelihood (ML) tree for IS630/Tc1/mariner transposons of C. gigas. This tree was generated in MEGA7 using the rtREV + G + I + F model. Bootstrap percentages (100 replications) of > 60% are shown. The analysis involved 90 amino acid sequences (Online Resource 3). Families and subfamilies are indicated in the right-hand part of the tree

Elements from C. gigas possessing the catalytic triads DD35D (Mariner-9_CGi, Mariner-24_CGi, Mariner-28_CGi, Mariner-30_CGi, Mariner-34_CGi, and Mariner-38_CGi) and DD30D (Mariner-10_CGi, Mariner-13_CGi, Mariner-36_CGi, and Mariner-37_CGi) formed a single phylogenetic group with pogo family transposons. Mariner-21_CGi, which possesses the DD83D signature, probably due to the insertions in the catalytic domain region, was classified into the same clade on the phylogenetic tree. Thus, these 11 elements can all be classified as the members of pogo family.

The DD34D clade encompassed all the analyzed elements of mariner family, as well as four elements from C. gigas. Interestingly, for two of them (Mariner-3_CGi, Mariner-6_CGi), the DDE/D domains could not have been identified; however, according to the phylogenetic analysis both elements belong to mariner family (Fig. 2). All elements with DD41D signature (including Mariner-4_CGi and Mariner-12_CGi) formed the entire rosa clade.

The DD34E elements from oyster, as well as from the other organisms, formed a single phylogenetic group; however, the individual branches showed low bootstrap values (< 60).

In order to accurately establish the evolutionary relationship of DD34E elements from C. gigas, a phylogenetic analysis was performed, where we used the ML method (Fig. 3, Online Resources 4). For this analysis we used 52 known TE representing different families of IS630/Tc1/mariner transposons and a subset of the sequences from C. gigas, keeping those having DD34E signature. In addition, the sequence of Mariner-35_CGi transposase with DD36E signature was included in this analysis, since its similarity to DD34E-transposons has been established as a result of multiple alignment of DDE/D domain fragments (Fig. 1). The IS630 elements were used as an outgroup in this analysis. Thus, we show that all analyzed elements of C. gigas form an entire clade with a bootstrap value of 51. This group also included all Tc1 family elements and the members of TRT subfamily, which belong to Tc1-like elements, despite possessing DD37E domains (Zhang et al. 2016a). However, several groups can be identified within this clade, four of which are made of the elements of the oyster. Eleven of these elements from C. gigas were more closely related to Minos (Fig. 3), while five elements showed similarity to Tc1, Sleeping Beauty, and others (Fig. 3). Mariner-32_CGi and Mariner-35_CGi did not show close similarity to any of Tc1-like transposons analyzed. This indicates that those elements belong to separate subfamilies of Tc1-like transposons.

Fig. 3
figure 3

Phylogenetic relationships of Pacific oyster DD34E elements with other members of the IS630/Tc1/mariner superfamily based on their transposases. The best-suited amino acid substitution model for these data was the LG + G + I model. ML phylogenetic tree was then built using the MEGA7 software. Bootstrap values (100 replications) of < 60% are not shown. The analysis involved 71 amino acid sequences (Online Resource 4). Families and subfamilies are indicated in the right-hand part of the tree

To date, two groups of IS630/Tc1/mariner DNA transposons with DD37E signature have been described, namely, the mosquito family elements (Shao and Tu 2001) and the members of TRT subfamily (Zhang et al. 2016a). The phylogenetic tree shows that two oyster elements possessing the DD37E catalytic triads do not belong to neither of them (Online Resources 2, Fig. 2). Instead, Mariner-18_CGi formed the clade with Tс1 elements (DD34E), while Mariner-31_CGi formed the branch between the mariner (DD34D) and plants (DD39D) elements.

In order to more accurately classify the oyster elements with DD37E signature, we performed the tblastn search of WGS databases of metazoans using the amino acid sequences of Mariner-18_CGi and Mariner-31_CGi transposases as a query. Only DD37E transposases were selected from the search results. When Mariner-18_CGi transposase was used as a query, the nucleotide sequences coding for Mariner-18_CGi-like transposases (Tnp-DD37E(L18)) were found in only two mollusk species from Crassostrea genus (C. gigas and C. virginica). Also, Tnp-DD37E(L18) were identified in some members of cnidarians (subclass Hexacorallia and class Hydrozoa) and echinoderms (superclass Echinozoa). Among the Arthropoda, Tnp-DD37E(L18) were found in the representatives of two crustacean classes (Malacostraca, Maxillopoda), subclass Acari (mites and ticks) and class Collembola. In the Chordata, the similar proteins were identified only in some fish species (infraclass Teleostei). When Mariner-31_CGi transposase was used as a query, the transposases with DD37E signature (Tnp-DD37E(L31)) were found in the subclass Pteriomorphia (bivalves) and in some members of subclass Hexacorallia (cnidarians). A mosaic representation of Mariner-18_CGi-like and Mariner-31_CGi-like elements on the evolutionary tree of metazoans may indicate at the cases of horizontal transfer in the history of these TE. The horizontal transfer is a common feature of IS630/Tc1/mariner superfamily elements (Dupeyron et al. 2014; Wallau et al. 2016).

The phylogenetic analysis, which included Mariner-18_CGi-like and Mariner-31_CGi-like transposases, as well as 52 known DNA transposons from IS630/Tc1/mariner superfamily, shows that Mariner-18_CGi and most of Tnp-DD37E(L18) form an entire individual branch closely related to Tc1-like transposons (Fig. 4, Online Resources 5). Tnp-DD37E(L18) from copepoda Eurytemora affinis forms a separate branch also close to Tc1-like transposons. Tnp-DD37E(L18) from amphipod crustacean Parhyale hawaiensis formed the same clade with TRT subfamily elements. We suggest that Mariner-18_CGi and Tnp-DD37E(L18) elements are the group of DNA transposons evolutionary related to DD34E elements. Mariner-31_CGi-like elements formed a separate clade distinct from both mosquito family and TRT subfamily. These data, together with the fact that Mariner-31_CGi possesses differences from DD37E elements in its catalytic domain (Fig. 1), suggest that these DNA transposons represent a new family with DD37E signature.

Fig. 4
figure 4

The phylogeny based on amino acid sequences of the Mariner-18_CGi-like (Tnp-DD37E(L18)) and the Mariner-31_CGi-like (Tnp-DD37E(L31)) transposases. This tree was generated in MEGA7 with the ML method, using the LG + G model. Only bootstrapping values (100 replications) higher than 60% are written on the branch. The analysis involved 77 amino acid sequences (Online Resource 5). Families and subfamilies are indicated in the right-hand part of the tree. Species, taxon, and localization in WGS are indicated for the Tnp-DD37E(L18) and the Tnp-DD37E(L31) transposases

Thus, according to the analysis of DDE/D domains and the phylogenetic analysis, Tc1 family, which accounts for more than a half of elements in the genome of C. gigas, is the most representative of IS630/Tc1/mariner superfamily DNA transposons (Table 1). More than a quarter of IS630/Tc1/mariner elements from C. gigas belong to pogo family. The rest are represented by DNA transposons of mariner and rosa families, and also by a new family of Mariner-31_CGi-like elements.

Table 1 Distribution of IS630/Tc1/mariner transposons of Pacific oyster by family

Activity of IS630/Tc1/mariner DNA Transposons from C. gigas

Although the elements of IS630/Tc1/mariner superfamily have been discovered in virtually every eukaryotic genome studied, only few of them possess the functional transposase and their activity has been confirmed experimentally. To date, the following active IS630/Tc1/mariner DNA transposons have been identified: Impala and Fot1 from the fungus, Fusarium oxysporum (Daboussi et al. 1992; Langin et al. 1995), Tc1 and Tc3 from the nematode Caenorhabditis elegans (Emmons et al. 1983; Collins et al. 1989), Minos, Mos1, and Himar1 from drosophila (Bryan et al. 1990; Franz and Savakis 1991; Robertson and Lampe 1995), Mboumar-9 from the ant Messor bouvieri (Munoz-Lopez et al. 2008), and Passport from the flounder (Clark et al. 2009). This phenomenon is associated with the fact that the transposase-encoding nucleotide sequences of the majority of known elements are damaged by deletions, frameshifts, and amino acid substitutions. It is suggested that most of IS630/Tc1/mariner transposons are now at the terminal stage of their “lifecycle,” and the peak of their activity has been far in the past. According to the concept of the “lifecycle” (Schaack et al. 2010), a new active element enters the host genome and starts the colonization of germ line. Then occurs the stage characterized by the copy number increase and the spread of element across the population followed by the increase in the diversity of active elements, the subsequent vertical inactivation of the copies due to the mutation accumulation or negative selection and, finally, the elimination of inactive copies of the element from the genome.

In order to analyze the activity of IS630/Tc1/mariner DNA transposons from the Pacific oyster, we performed the search for transcripts homologous to the predicted transposases of the known elements in NCBI Transcriptome Shotgun Assembly (TSA) and Expressed Sequence Tags (EST) databases. Although the presence of transcripts homologous to the transposase is not a direct proof of their functionality or, moreover, of their transpositional activity, it can still serve as an indirect evidence of those.

For 25 elements we found transcripts with a high degree of homology to the putative transposases. However, most of these transcripts represented the chimeric sequences encompassing the fragments of transposase and some other proteins, and thus their presence cannot serve as evidence of a full mRNA of a potentially functional transposase being transcribed. Moreover, a small fraction of the detected transcripts represented fragments of the transposase from which one can hardly draw a conclusion on the potential activity of the studied DNA transposons. With the exception of the chimeric and truncated transcripts, the ones most closely resembling the sequences of their putative transposases were identified only for eight elements (Online Resource 2).

The transcribed RNA sequence of a single element (Mariner-35_CGi) was completely correspondent to the predicted transposase, while those of the other seven elements harbored substitutions and deletions/insertions, both shorter (9 bp) and longer (up to 396 bp) ones.

The differences between the transposase sequences predicted by annotation and the transcripts may be due not only to the annotation errors, but also to the fact that the genome (PRJNA70283) and the transcriptome (PRJNA301543) were sequenced from different individuals in separate experiments. Despite the presence of the transposase transcripts, the activity of relevant elements remains in question, especially, since almost all of them have low copy numbers.

The search for the regions homologous to DNA transposons in the genome of C. gigas using blastn yielded the copy number of each element per genome (Online Resource 2). This analysis was based only on the sequences of potentially autonomous oyster elements. We considered both the number of homologous loci, and the number of full-size elements (fragments retaining ≥ 95% of length and TIRs). Most IS630/Tc1/mariner elements from the Pacific oyster are represented by the few full-size copies (Online Resource 2, Fig. 5), and only for two of them (Mariner-2_CGi and Mariner-5_CGi) the number of complete copies is > 25. Such a unimodal distribution with positive asymmetry may serve as an indication that oyster elements have been virtually eliminated from the genome, i.e., that they are at the terminal stage of their “lifecycle”.

Fig. 5
figure 5

Number of Pacific oyster IS630/Tc1/mariner transposons full-length copies

However, virtually every element has a unique ratio of entire number of copies to the full-size copy number, which results from an individual history of each element. This ratio allows one to judge about how long ago the period of activity of the element occurred, while the entire copy number suggests the scope of transposition activity. For example, Mariner-2_CGi, Mariner-3_CGi, and Mariner-8_CGi elements have been, obviously, rather active in the past, as suggested by their high copy numbers (> 100 copies). Such elements as Mariner-33_CGi and Mariner-38_CGi are represented in the genome by the few sequences (2 copies of each element). An immediate immobilization of such elements by the genome after the initial colonization and, thus, a premature termination of a “lifecycle” is a possibility for such DNA transposons.

In an attempt to evaluate the dynamics of IS630/Tc1/mariner DNA transposons in the genome of the Pacific oyster, we used the approach to such problems previously used by Bouallègue et al. (2017). In order to estimate the period of amplification of the elements from C. gigas, we determinate the percentage of identity of the consensus sequences of elements with their corresponding full-size copies. As a result, only three elements Mariner-14_CGi, Mariner-31_CGi, and Mariner-35_CGi were shown to have the identity percentage of 97–99% (Figs. 6, 7). Another six elements with the prevalence of copies with high identity (97–99%), also have one or two copies with the identity of 81–96%. For three of nine elements mentioned above (Mariner-2_CGi, Mariner-30_CGi, and Mariner-35_CGi), the full-size transcribed RNA sequences were found (Online Resource 2). The high degree of conservation of the sequences of those elements may indicate that these copies are still retaining activity or have been inactivated only recently. The other DNA transposons (with the identity of copies ~ 69–98%) are less conserved, which supports the hypothesis that their “lifecycle” has already been completed. Mariner-5_CGi represents an interesting example of an element, which has undergone a reactivation event and commenced a new “lifecycle” in its evolutionary history in the genome of С. gigas. This is suggested by the presence of two “fractions” of its copies with identity percentage of 69–77% and 88–98%, respectively (Fig. 6).

Fig. 6
figure 6

Evolution dynamics of Pacific oyster Tc1-like elements. The identity percentage was estimated by comparing the consensus with the copies. A high level of similarity corresponds to the recent invasion of the element (the beginning of the “lifecycle”), and a decrease in this percentage indicates an earlier event

Fig. 7
figure 7

Evolution dynamics of non-Tc1-like elements of C. gigas. The identity percentage was estimated by comparing the consensus with the copies. A high level of similarity corresponds to the recent invasion of the element (the beginning of the “lifecycle”), and a decrease in this percentage indicates an earlier event

Thus, the predominant majority of IS630/Tc1/mariner DNA transposons from the Pacific oyster seem to lack the transposition activity. Among the transcriptionally active elements, only Mariner-2_CGi has a higher copy number (197 copies, 27 of which are the full-size copies) and high identity percentage value (94–99%), being probably still functional. Mariner-30_CGi and Mariner-35_CGi elements may also retain their function, as they have the transcribed RNA sequences and the degrees of identity of 92–99% and 99%, respectively. A comparison of transposon copy numbers between the groups of elements with or without the transcriptional activity of transposase did not yield any significant differences.

Conclusion

IS630/Tc1/mariner DNA transposons of the Pacific oyster are represented by four families, Tc1 (DD34E), mariner (DD34D), pogo (DDxD), and rosa (DD41D), the first of which accounts for more than a half of the discovered IS630/Tc1/mariner elements from C. gigas. Mariner-31_CGi is the representative of a new family with DD37E signature. In addition, we established that eight elements representing three families (Tc1, mariner, and pogo) demonstrate transcriptional activity. Presumably, only three of those elements may retain the transposition activity.

The studies of TE diversity and dynamics in the species with uninvestigated genomes facilitate a better understanding of both the TE evolutionary history and the molecular evolution of the genomes. Recently, due to growing number of organisms with fully sequenced genomes, several new groups of transposons were found in IS630/Tc1/mariner superfamily. For example, the elements with DD41D signature were described in the publications of Zhang et al. (2016b) and Bouallègue et al. (2017), while the TRT elements with DD37E signature were presented in the study of Zhang et al. (2016a). The genome of a Pacific oyster harbors two DD37E elements, which represent a new subfamily of Tc1-like elements (Mariner-18_CGi) and a new and previously unknown family (Mariner-31_CGi). Our results not only identify the new TE groups, but also demonstrate that the active elements with DD37E signature have repeatedly emerged in the course of evolution of IS630/Tc1/mariner DNA transposons.

The question of molecular domestication of DNA transposons remains a topical issue. Molecular domestication is an evolutionary process by which a TE-derived coding sequence gives rise to a functional host gene. Several criteria have been proposed on how to confirm of the TE-derived gene (Feschotte and Pritham 2007): (i) elements that have undergone domestication exist as single copies in the genomes; (ii) structurally, these genes are lacking the molecular features of DNA transposons, such as flanking TIR and TSD; (iii) orthologs are found in distant species. At least, three independent events of domestication of pogo-like transposases are known in the multicellular organisms (Mateo and Gonzalez 2014). The pogo-like transposases are suggested to be predisposed to domestication due to some of their specific features (Feschotte and Pritham 2007; Casola et al. 2008). Eight of eleven elements of pogo family in the genome of C. gigas possess a single full-size copy, while three of them are lacking TIRs. The study of molecular domestication of these TE presents a promising line of research.