Introduction

Mitochondria are organelles derived from an alpha-proteobacterium that lived in symbiosis in the first eukaryote cells (Andersson et al. 1998). From this bacterial ancestor, mitochondria kept their own genome: the mitochondrial DNA (mtDNA). As for bacteria, the genome of the first mitochondria were probably circular, but after 1.5 billion years of evolution, mtDNA architecture followed various trends in the eukaryote lineages (Gray et al. 1999). Although mtDNA is generally depicted as a circular molecule, a growing number of examples shows that linear molecules appeared independently in organisms such as plants (Ryan et al. 1978; Coleman et al. 1991; Pérez-Brocal et al. 2010; Smith et al. 2010), fungi (Fukuhara et al. 1993; Forget et al. 2002; Kosa et al. 2006; Valach et al. 2011; Ma et al. 2013) and unicellular eukaryotes (Burger et al. 2000; Burger et al. 2003; de Graaf et al. 2009; Hikosaka et al. 2010; Sesterhenn et al. 2010; Barth and Berendonk 2011; Burger et al. 2013). In animals, only a few examples of fully linear mtDNA have been observed, in sponges and cnidarians (Voigt et al. 2008; Kayal et al. 2012; Smith et al. 2012; Lavrov et al. 2013).

Mechanisms of linearization of circular mtDNA are still not fully understood, however, there is evidence that mitochondrial genome architecture conversion to a linear form might be due to similar mechanisms in unrelated organisms. Indeed, the majority of mitochondrial linear molecules sequenced so far from diverse organisms revealed the presence of inverted terminal repeats (ITR) on linear molecules ends (Morin and Cech 1988; Dinouël et al. 1993; Vahrenholz et al. 1993; Kairo et al. 1994; Burger et al. 2000; Forget et al. 2002; Kosa et al. 2006; Shao et al. 2006; Smith and Lee 2008; Voigt et al. 2008; Fricova et al. 2010; Hikosaka et al. 2010; Pérez-Brocal et al. 2010; Smith et al. 2010; Valach et al. 2011; Kayal et al. 2012; Smith et al. 2012; Burger et al. 2013)—but there is a counterexample in Paramecium caudatum (Barth and Berendonk 2011). These ITRs are palindromic repeats thought to form telomeric-hairpins (t-hairpins) structures which covalently close the double-stranded DNA linear molecules. T-hairpins permit to overcome the problem of the entire replication of linear mtDNA molecules (Nosek et al. 2004). In some organisms, such as jakobid protists and maize, the inverted repeats (IR) originate from foreign plasmid DNA inserted into circular mtDNA, inducing the genome linearization (Schardl et al. 1984; Burger et al. 2013). Although the origin of such plasmids is not known, Nosek et al. (2006) described this foreign DNA as selfish elements that convert circular chromosomes into linear, providing a telomere that solves the problems associated with linearization.

Few organisms within plants and yeasts have both linear and circular mtDNA molecules co-occurring in mitochondria (Dinouël et al. 1993; Backert et al. 1997). In animals, only one model combines linear and circular mtDNA molecules: the atypical mitochondrial genome of terrestrial isopods (Crustacea: Oniscidea) (Raimond et al. 1999; Doublet et al. 2012). This genome is composed of two molecular forms: linear monomeric molecules and “head-to-head” palindromic circular dimers (Fig. 1). Both molecular forms present the same nucleotides and genes sequence [except one point mutation (Marcadé et al. 2007; Doublet et al. 2008)] and are believed to be generated from one to another via a permanent mechanism of linearization and circularization. In order to find what facilitates genome architecture conversions in the atypical mtDNA of isopods, the sequencing of the complete mtDNA of the common backyard pill bug Armadillidium vulgare has recently been undertaken (Marcadé et al. 2007). Although 99 % of the genome has been successfully sequenced, the palindromic junctions of dimers and ends of linear monomers (where linearization and circularization is believed to occur) were not completely sequenced due to the formation of secondary structures during amplification processes—a problem also encountered recently by Kilpert et al. (2012) by re-sequencing A. vulgare mtDNA.

Fig. 1
figure 1

Architecture of the atypical mtDNA of terrestrial isopods composed of linear monomers and “head-to-head” palindromic circular dimers. Gray arrows show the positions of the primers used in this study. Gray boxes show the positions of the genes surrounding the control region: 12S rDNA and cytochrome b (Cytb). Distance between thin lines = 1 kb

To obtain the missing sequence of A. vulgare mtDNA, we then tried to amplify it with a new primer pair, assuming the presence of circular monomers with “head-to-tail” junctions, the “typical” architecture of animal mtDNA, never observed in terrestrial isopods mtDNA (Raimond et al. 1999; Doublet et al. 2012). With this new approach, we successfully amplified and sequenced what constitutes the mtDNA control region. Here, we present the mtDNA control region sequences of two species of the genus Armardillidium, from 10 populations of A. vulgare and one population of A. pelagicum. In these two species, all features generally present in arthropods mtDNA control regions have been observed (origin of replication, poly-T stretch, GA- and TA-rich blocks and one variable domain), plus a conserved IR present in two orientations: either in one sense or in its reverse complement. We discuss the composition of this control region and the implication of the IR in genome architecture conversions in terrestrial isopods mtDNA.

Materials and Methods

Sampling, mtDNA Amplifications and Cloning

Animals were sampled in the field or from populations reared in the laboratory (Table 1). Total DNA was extracted as previously described (Bouchon et al. 1998). MtDNA control regions were amplified by PCR with the primers 12SCR (5′-GAGATAAGTCGTAACAAAGTAG) and CytbCR (5′-CTACCTTGAGGTCAAATATC), designed from conserved regions of the 12S rDNA and Cytochrome b (Cytb) gene sequences. In A. vulgare mtDNA, these two genes are present at both ends of the linear molecules and the junctions of the palindromic dimers (Fig. 1, Marcadé et al. 2007). Primers were obtained by an alignment of A. vulgare mtDNA partial genome sequence (NCBI accession number EF643519.2), with two other isopods mitochondrial genomes: Ligia oceanica (DQ442914.1) and Idotea baltica (DQ442915.1). To overcome the problem of PCR artefacts, two independent PCR methods were performed. Amplifications of control regions were first realized with GoTaq (Promega) using the following program: 3 min at 95 °C, followed by 25 cycles of 30 s at 95 °C, 1 min at 54 °C and 1 min 30 at 72 °C, and a final step of 5 min at 72 °C. To confirm our results, a subset of samples were then amplified using the Phusion High-Fidelity PCR Master Mix (New England Biolabs) with the following program: 30 s at 98 °C, followed by 35 cycles of 15 s at 98 °C, 30 s at 55 °C and 30 s at 72 °C, and a final step of 10 min at 72 °C. Cloning was performed for one heteroplasmic sample using pGEM-T Easy Vector Systems kit (Promega). Purified PCR products and clones were sequenced with the BigDye Terminator kit (Applied Biosystems) and analyzed on an ABI Prism 3130 Genetic Analyzer.

Table 1 Samples origin. Type “A” and type “B” referred to the inverted repeat (IR) orientation

Sequence Analysis

For each sequence obtained, we first checked and assembled both strand sequences using Staden (Staden 1996). We aligned these sequences with the annotated partial genome sequence of A. vulgare (EF643519.2) and removed the part of the sequence that belongs to the Cytb gene. Likewise, we used the software ARWEN (Laslett and Canbäck 2008) to localize the tryptophan transfer RNA (tRNATrp) gene, flanking the 12sRNA gene, and removed its sequence. The control region sequence was then defined between the last nucleotide of tRNATrp and the stop codon of Cytb (these two genes are in opposite strands).

Sequence alignments and dot plot analyses were made using BioEdit software (Hall 1999). Secondary structures of DNA have been detected with RNAfold software (Hofacker 2003), using DNA parameters. Only the most likely secondary structures indicated by a color ranking probability have been considered in this study. Other mtDNA control region features have been identified by eye inspection of sequences, following description of Zhang and Hewitt (1997) and Kuhn et al. (2008). A search for restriction enzyme on the IR sequence has been done using the platform NEBcutter V2.0 (New England Biolabs, http://tools.neb.com/NEBcutter2/).

Location and Orientation of the Origin of Replication

To locate the origin of replication, cumulative GC-skew of the now completed A. vulgare mtDNA sequence of has been calculated using the GenSkew software (http://genskew.csb.univie.ac.at/) with a window size and a step size of 100 and 20 nucleotides, respectively. This GC-skew statistics reflect the strand-bias of nucleotide composition in the genome, using the following formula: (G−C)/(G+C) which gives the relative proportion of Guanine (G) compare to Cytosine (C) in a sequence (Perna and Kocher 1995). This strand-bias varies gradually along a genome and the region with the lower GC-skew value indicates where the origin of replication is located. The GC-skew is generally negative for the heavy strand of arthropod mtDNA, but variants exist, reflecting a reverse orientation of the origin of replication (Hassanin et al. 2005). We compare the results obtained for the mitochondrial genomes of A. vulgare and Daphnia pulex (NC_000844.1), an aquatic crustacean species.

Results

Amplification Success and Nucleotide Composition

Control region sequences have been obtained from A. vulgare samples from 10 populations and from two A. pelagicum samples from one population (Table 1). However, we successfully amplified and sequenced the mtDNA control region of only 17 individuals of A. vulgare out of 117 tested, which represent an amplification success of only 14.5 %. No polymorphism have been observed within populations, except for the population from Mas Thibert (MT), France (see below), but no more than three individuals have been successfully sequenced for each population. The Figure 2 shows an alignment of control region sequences obtained from each population of A. vulgare and the two individuals of A. pelagicum. The mtDNA control region sequences from A. vulgare have a size ranging from 249 to 282 bp (base pairs) and a size of 255 bp for A. pelagicum. Proportions of A and T are of 64.2 % (±0.02) for A. vulgare and 63.1 % for A. pelagicum. However, compared to the proportion of AT in the complete genome of A. vulgare (71.2 %), the control region is less AT-rich. Control region sequences are very similar within A. vulgare populations with a degree of similarity ranging from 74 to 100 % (Table 2). The degree of similarity of the control region sequence of A. pelagicum with the different mitotypes of A. vulgare varies from 62 to 68 %.

Fig. 2
figure 2

Alignment of A. vulgare and A. pelagicum mtDNA control region sequences. Flanking arrows with tRNATrp and Cytb show the localisation of these respective genes (note the 12S rDNA gene is located next to the tRNATrp). The IR and the origin of replication are shown in frames. Black lines show the conserved flanking sequences TATT and G(A)nT surrounding the origin of replication. The arrows locate the variable domain. Poly-T stretch, GA-block, and TA-block are shown in gray boxes. Numeration of bases is restricted to the control region sequence. A and B refer to the type “A” and type “B” orientations of the IR (see Fig. 5)

Table 2 Similarity matrix between control region sequences of A. vulgare populations

Origin of Replication

The origins of replication of A. vulgare and A. pelagicum mtDNAs have been identified via a secondary structure analysis of the sequences. They present a GC-rich sequence and are characterized by two conserved flanking motifs, a TATT-like sequence on 5′ and a GA(n)T motif in 3′ (Fig. 3). Sequences of the origins of replication are fully conserved among A. vulgare populations (Fig. 2). The origin of replication of A. pelagicum mtDNA presents a similar secondary structure and location in the genome compared to A. vulgare (Figs. 2, 3). Analysis of the cumulative GC-skew along A. vulgare mtDNA sequence confirmed the location of the origin of replication (Fig. 4). The positive value of the cumulative GC-skew in A vulgare also shows that the origin of replication is in a reverse orientation, compared to other crustacean D. pulex.

Fig. 3
figure 3

Secondary structures of the origins of replication of A. vulgare and A. pelagicum and their conserved flanking sequences TATT and G(A)nT underlined

Fig. 4
figure 4

Graphical representation of the cumulative GC-skew along the mitochondrial genome sequences of A. vulgare (EF643519.3) and D. pulex (NC_000844.1), using the software GenSkew (http://genskew.csb.univie.ac.at/), with a window size and a step size of 100 and 20 nucleotides, respectively

Poly-T stretch, GA- and TA-block, and the Variable Domain

Three other common motifs of arthropod mtDNA control regions have been identified in A. vulgare and A. pelagicum. These motifs are presented in gray boxes in the Fig. 2 and are: (i) a poly-T stretch, here composed of 6 thymine in both species, (ii) a GA-rich block (GA-block), fully conserved between the two species A. vulgare and A. pelagicum, and (iii) a long TA-block located nearby the Cytb gene, with a conserved sequence within A. vulgare but a different length and sequence in A. pelagicum. A variable domain is also present in the control regions of the two species, located between the IR (see below) and the origin of replication (Fig. 2). This region concentrates the maximum of variability within A. vulgare and between A. vulgare and A. pelagicum, both in sequence and size. The control region of the individual from Vancouver (VA) presents the largest variable domain, due to the three repeats of the sequence CTGTTATAATA.

Inverted Repeat (IR)

In all control region sequences from A. vulgare and A. pelagicum samples, a conserved IR has been found between the GA-block and the variable domain. This IR can potentially fold into a hairpin structure according to the software RNAfold (Fig. 2). Two different sequences of IR have been observed in A. vulgare: a type “A”, present in six populations, and a type “B”, present in the other four populations and the two samples of A. pelagicum (Fig. 5). A dot-plot analysis of the control region sequences shows that the two types of IR present nevertheless the same nucleotide sequence, but differ only by their orientation (Fig. 6). Indeed, sequences of type “A” correspond exactly to the reverse complements of the type “B”. The secondary structure analysis shows that type “A” IR can be potentially longer, including the GA-block (data not shown). Almost no variability has been observed within IR sequences between A. vulgare populations, except for the population of VA in which two point mutations do not affect the secondary structure, and for the French populations Archigny (AR) and Mas Thibert (MT) which present a point mutation in the inside loop of the IR.

Fig. 5
figure 5

Secondary structures of the IR of type “A” and type “B”. Sequences from the samples of Porto Alegre (PA) and Archigny (AR) have been used for type “A” and type “B”, respectively

Fig. 6
figure 6

Dot plot analysis showing similarity of control region nucleotide sequences between a samples of type “A” from Porto Alegre (PA, x axis) and a sample of type “B” from Archigny (AR, y axis, above plot) and the reverse complement of the sequence from Archigny (AR, y axis, below plot). Numbers represent bases positions

To account for possible PCR artefact, the control region sequence was amplified independently with a high-fidelity DNA polymerase for two individuals of the population Tenerife (TE). Identical sequences were obtained, confirming the presence of the IR. However, this method does not rule out the hypothesis that IR sequences obtained here were generated from an overlap of two linear molecules during PCR, giving the impression of a continuous sequence.

Heteroplasmy

A heteroplasmy has been detected in the control region of a sample from the population of MT. Out of the 14 sequenced clones from the PCR products, 13 control region sequences were type “B” (as MT is represented in Fig. 2) and one clone was type “A” with a 43 bases repeated sequence inserted into the variable domain (Supplemental material S1).

Discussion

Armadillidium vulgare mtDNA Control Region

By sequencing its control region, the mitochondrial genome sequence of A. vulgare is now complete. The previously published partial sequence (Marcadé et al. 2007) has been updated in the GenBank database (EF643519.3) with a sequence from a sample of the same population (Celles-sur-Belle, CB). The total size of A. vulgare mitochondrial genome is 13,939 bp, the shortest described so far in crustaceans and one of the shortest in arthropods. The size reduction of A. vulgare mtDNA is due to short intergenic spacers and overlapping gene sequences but also to the reduced control region, as observed in other short mitochondrial genomes (Yuan et al. 2010; Lin et al. 2012). In A. vulgare mtDNA, the control region has a very small size for animal mtDNA, ranging from 249 to 282 bp, due to a size reduction of its variable domain.

All the elements generally present in arthropod mtDNA control regions have been found in A. vulgare and A. pelagicum control regions (origin of replication, poly-T stretches, GA- and TA-blocks), confirming their function. The analysis of the cumulative GC-skew along A. vulgare mtDNA also confirmed the location of the origin of replication in this part of the genome, as well as its reverse orientation in the genome, a characteristic of isopod mtDNA (Kilpert et al. 2012).

The IR and Genome Architecture Conversions

In addition to all the typical elements of mtDNA control region, an IR was also found in all A. vulgare and A. pelagicum sequences. The nucleotide sequence of this IR and the potential secondary structure conserved across A. vulgare populations suggest that this element has an essential function—alternatively, these sequences could also be due to PCR artefact (see “Results” section), or nuclear copies of mtDNA, but the extreme sequence conservation across distant populations of A. vulgare makes this latter hypothesis rather unlikely. IRs are common elements in invertebrates mtDNA non coding regions (Mundy et al. 1996; Arunkumar and Nagaraju 2006; Rondan Dueñas et al. 2006; Lukić-Bilela et al. 2008; Pie et al. 2008; Brauer et al. 2012; Krebes and Bastrop 2012). They can be implicated in various functions including DNA replication (Voineagu et al. 2008). Indeed, IRs present in mtDNA control regions have been shown to be associated with the initiation of replication (Tapper and Clayton 1981; Hixson et al. 1986; Gerhold et al. 2010). The IR present in mtDNA control region of Armadillidium might also have another function. Among the elements described in the control region of Armadillidium mtDNA, the IR also appears as a good candidate for facilitating genome architecture conversions. Thus, using the results of this study and the previous literature about isopod mtDNA structure, we present here a putative model of genome architecture conversions with the IR (Fig. 7). In this model, we have located the IR at both ends of the linear monomers and at the junction of the palindromic “head-to-head” dimers. This is indeed the position of the control region revealed by the sequencing of A. vulgare mtDNA (Marcadé et al. 2007; Kilpert et al. 2012). At the ends of the linear molecules, the IR is represented folded into t-hairpins, a secondary structure that covalently closes the double-strand DNA. Although the presence of t-hairpins at both ends of linear mtDNA of isopods has not been directly observed, they are believed to give stability to the molecules. Linear molecules represent almost the half of the mtDNA molecules in A. vulgare—the other half being the circular dimers (Raimond et al. 1999). Without t-hairpins, linear molecules would be prone to exonucleases and might not be conserved. The failure to clone, amplify, and sequence the ends of the linear molecules of A. vulgare mtDNA (Marcadé et al. 2007) might be a mark of this physical protection that t-hairpins can provide.

Fig. 7
figure 7

Genome architecture conversions scheme in the atypical mtDNA of terrestrial isopods. A and B represents the type “A” and type “B” variations of the control regions. Solid arrows show the conversion from one architecture to the other. Doted arrows represent the scission of the palindromic “head-to-head” dimers at the location of the IR, either by an exonuclease or by double-strand DNA breaks. Gray boxes show the positions of the genes surrounding the control region: 12S rDNA and cytochrome b (Cytb). Distance between thin lines in molecules = 1 kb

T-hairpins present another major advantage to linear molecules. By closing ends of the double-stranded DNA, they permit the complete replication of linear molecules. In our model, we assume that replication of linear molecules of isopod mtDNA facilitated by t-hairpins, would generate palindromic “head-to-head” dimers (Fig. 7). This is supported by the fact that linear monomers and “head-to-head” dimers are constitutive of isopods mtDNA and present the same nucleotide sequences (Raimond et al. 1999; Marcadé et al. 2007; Doublet et al. 2012). An identical replication model exists in yeasts with linear mtDNA, where t-hairpins provide a substrate for terminus elongation and induce the formation of palindromic “head-to-head” dimers, also called “replicative intermediate” (Dinouël et al. 1993; Nosek et al. 1995; Valach et al. 2011). Although this replication model has not been experimentally observed in isopod mtDNA, the products of replication appear similar in the yeast linear mtDNA and the Armadillidium mtDNA.

The maintenance of nucleotide sequence homogeneity between the two molecular forms (linear and dimers) of isopod mtDNA requires a dynamic mechanism generating one architecture from the other. Thus, linear molecules of isopod mtDNA are likely the produce of the scission of the dimers, but in A. vulgare mtDNA the mechanism of resolution of palindromic “head-to-head” dimers into linear molecule remains unknown. In our model (Fig. 7), we assume that a specific mechanism associated to the IR would resolve dimers into two linear molecules. These linear molecules would have then opens ends that permit either to close covalently the double-stranded DNA linear molecules and produce new dimers by replication, or the formation of another architecture (see below). IRs are known to produce genetic instability due to their ability to form secondary structure and cruciform motifs causing double-strand DNA breaks (Leach 1994; Akgun et al. 1997; Nasar et al. 2000; Lobachev et al. 2007). Another hypothesis would be the scission of the dimers by a restriction enzyme. Such mechanism has been described in the bacteria spirochete Borrelia, where IRs are processed by a telomere resolvase into t-hairpins (Kobryn et al. 2009). Although the nucleotide sequence of the IR present only few mutations across A. vulgare populations, no conserved motif has been found to match for a restriction enzyme. Thus, it is difficult to know how the dimers are split in two linear forms and experimental confirmations of one or the other mechanism would be needed.

As discussed above, the “replicative intermediate” model appears to be the principal mechanism of genome architecture conversions of A. vulgare mtDNA. However, the sequences obtained during this study suggest that this mtDNA is not constituted only of linear monomers and palindromic “head-to-head” dimers, as previously observed (Raimond et al. 1999). Indeed, the primers designed for the amplification of the control region and the IR bind in opposite directions at the two ends of the linear molecules (Fig. 1). The amplification of the control region with these primers demonstrates that alternative architectures exist in A. vulgare mtDNA. Interestingly, the amplification success of the control region was very low. This suggests that this alternative architecture could be rare and maybe transient, possibly not transmitted to the next generation, and thus not detectable by RFLP analyses of pooled individuals as used in past studies (Raimond et al. 1999; Marcadé et al. 2007; Doublet et al. 2012). The form of this alternative architecture is unknown. However, the amplification shows that this form presents a “head-to-tail” junction. Such junctions can be obtained in two ways. First, a “head-to-tail” junction can be made by concatenating molecules. Concatenated “head-to-tail” molecules are rather rare but have been observed in plants with linear mtDNA, where they are formed after replication of the linear molecules (Backert 2002). The second possibility is the formation of circular monomers with a “head-to-tail” junction. This form is the “typical” architecture of animal mtDNA (Boore 1999), but has never been observed in isopods with linear mtDNA. Both concatemers and circular monomers can occur in Armadillidium mtDNA. However, for clarity, although we do not know the form of this alternative molecule, we have represented the alternative architecture in our model by circular monomers (Fig. 7). Whatever form this alternative architecture is, its formation required the presence of linear molecules with open ends. Thus, we believe that the formation of molecules with “head-to-tail” junctions might occur after the scission of dimers in two linear molecules with open ends. The observation of two different orientations of the IR across A. vulgare populations supports the open ends hypothesis. In our model (Fig. 7), we show how linear molecules with open ends in two different orientations can be generated by scission of the dimers. When these linear molecules convert into the alternative architecture, two “head-to-tail” junction sequences are generated corresponding to the two types of IR with reversed orientations (A and B, Fig. 7). Interestingly, a reversed orientation of the IR was also observed in yeasts with linear mtDNA (Dinouël et al. 1993). In this case, authors postulated an equivalent mechanism, called “flip-flop” inversion mechanism, which allows switching from “head-to-head” dimers to “head-to-tail” dimers via the formation of transient open ends linear molecules.

The frequency of formation of the alternative architecture with “head-to-tail” junctions, as well as its evolutionary consequences remains unclear. Our sample size is too small to tell if multiple reversions of the IR happened during the evolutionary history of A. vulgare mtDNA and additional sampling would be needed. However, the presence of an individual with the two orientations (heteroplasmy) suggests that this reversion might occur relatively frequently. In previous studies, we observed a correlation between the presence of linear mtDNA molecules and the constitutive heteroplasmy in terrestrial isopods (Doublet et al. 2008; Doublet et al. 2012), which might be a reason why the complex mechanism of genome architecture conversions of terrestrial isopod mtDNA is conserved. However, the relation between the heteroplasmy and the presence of different genome architecture remains to be investigated.

Conclusion

The presence and the maintenance of different molecular architecture co-occurring in the atypical mtDNA of terrestrial isopods make this genome an interesting model to study genome architecture conversions. Genome architecture conversions appears as a key mechanism for replication and evolution of isopod mtDNA, and with the present results we postulate that genome architecture conversions might be possible due to a specific IR sequence conserved in all mtDNA control region sequenced. Similar mechanisms of linearization of circular genomes due to the presence of IR have been observed in many other organisms with linear mtDNA (plants, yeasts, animals). Such convergent evolution toward linearization shows how organelle genome architectures could be flexible.