Introduction

Water mites of the genus Unionicola (Acariformes: Prostigmata) are common symbionts of freshwater sponges and mollusks. More than half of the known species of Unionicola are symbiotic with freshwater mussels, living on the gills or mantle and foot of their hosts and using these tissues as sites of oviposition (Vidrine 1996b). Some species of mussel-mites depend upon their hosts only for sites for oviposition and post-larval resting stages, while others are obligate symbionts of their hosts throughout post-larval development (Mitchell 1955). The evolution of unionicolid mussel-mites appears to be closely tied to the evolutionary history of their hosts, given that major clades of mussels harbor unique assemblages of Unionicola subgenera (Vidrine 1996a).

Although the identification and classification of Unionicola have been reasonably well documented, the phylogenetic systematics of the group is less well understood. In a recent study (Ernsting et al. 2006), we used heterogeneity in sequence data of the mitochondrial cytochrome oxidase subunit I (cox1) gene to construct a hypothesis of evolutionary relationships among a number of closely related species of mites within the subgenus Unionicola (formerly Parasitatax). However, as this analysis was expanded to include a more diverse assemblage of mites from several additional Unionicola mussel-mite subgenera of North America, we were unable to generate a well-supported phylogeny (D. D. Edwards et al., manuscript submitted). The relatively rapid rate of evolution of the cox1 sequence led us to take two complementary approaches to generating molecular data that could result in better phylogenetic resolution. First, we amplified and sequenced nuclear rRNA sequences from a number of Unionicola mites but found that, in contrast to the cox1 data, these sequences were not sufficiently substituted to be informative at low to intermediate taxonomic levels.

The second approach involves sequencing the complete mitochondrial genome of a representative species of Unionicola. This genome sequence would be expected to be informative on two counts. The complete genome would allow us to choose candidate mitochondrial loci that are likely to be phylogenetically informative for Unionicola species, and the sequence will provide baseline data helpful for designing PCR primers to amplify the chosen loci from other Unionicola species. Perhaps more importantly, the complete genome might be expected to yield idiosyncratic markers: molecular character states that, unlike sequence data, represent unique events, and can be used as synapomorphies in phylogenetic reconstruction (Murrell et al. 2003). As the number of complete mitochondrial genomes increases rapidly, these small genomes are excellent candidates for improving phylogenetic reconstruction in cases where homoplasy and mutational saturation in sequence data can lead to unresolved phylogenies (Boore 2006).

Within the chelicerates, genome structure synapomorphies are relatively rare in some clades, including soft ticks (Shao et al. 2004), but abundant in others, including spiders (Masta and Boore 2008) and the Trombidiformes mites (Van Leeuwen et al. 2006; Shao et al. 2005). Of the six species of Trombidiformes whose completed mitochondrial genomes have been submitted to GenBank, there are five unique gene orders. Additionally, given a correlation between rates of nucleotide substitution and genome rearrangement in animal mitochondria (Xu et al. 2006; Shao et al. 2003) the highly derived cox1 sequence data among Unionicola subgenera suggest that unionicolid mites may contain informative genome rearrangements.

The current work presents the sequence and annotation of the complete mitochondrial genome of Unionicola foili. The highly rearranged genome and non-canonical tRNA structures seen in this species are presented in comparison to other Trombidiformes mites, and are proposed as potentially informative in reconstructing the phylogeny of Unionicola mites.

Materials and methods

Samples and DNA extraction

Unionicola foili individuals used in this study were removed from host mussels (Utterbackia imbecillis) collected from ponds at various locations in Vanderburgh County, Indiana. Specimen processing and DNA extraction were carried out as previously described (Ernsting et al. 2006). Briefly, mites were removed from host mussels, washed, and frozen. Individual mites were ground and total DNA was prepared using a DNeasy Tissue Kit (Qiagen).

PCR amplification

We used a two-stage approach to amplify the mitochondrial genome of U. foili from total genomic DNA. In the first stage, fragments of the genome were amplified using regular PCR with primers designed based on the published sequences of related mites. The amplified fragments were then sequenced, and these sequence data were used to design a second set of primers specific for U. foili. The U. foili-specific primers were then used in long-range PCR and standard PCR reactions that amplified the mitochondrial genome in three overlapping fragments.

For the cob gene, primers COB-F and COB-R (Table 1) were used in 50 μl PCR reactions with Promega PCR Master Mix (Promega). PCR conditions were an initial denaturation at 94°C for 5 min, followed by 30 cycles of denaturation at 94°C for 1 min, annealing at 52°C for 1 min, and extension at 72°C for 2 min, with a 5 min final extension at 72°C. PCR products were visualized on 1% agarose gels and purified using a QIAquick PCR Purification Kit (Qiagen). For the cox1 gene, PCR amplification and DNA sequencing were as described (Ernsting et al. 2006).

Table 1 PCR primers used for the amplification of the Unionicola foili mitochondrial genome

Long PCR was performed using a GeneAmp XL PCR Kit (Applied Biosystems). For the region from cox1 to nad4 (the CN template), primers were CN-C1 and CN-N1 (Table 1), with the CN-C1 primer being based on U. foili cox1 sequence, and the CN-N1 primer a conserved PCR primer originally described by Simon and coworkers (Simon et al. 1994), and subsequently used to amplify a fragment of the mitochondrial genome of Varroa destructor (Navajas et al. 2002). The PCR conditions were an initial 2 min 92°C denaturation, followed by 10 cycles of denaturation at 92°C for 10 s, annealing at 48°C for 30 s, and extension at 68°C for 5 min. The next 20 cycles allowed additional time for extension: denaturation at 92°C for 10 s, annealing at 48°C for 30 s, and extension at 68°C for 5 min plus 20 s per cycle. The PCR included a final extension of 7 min at 68°C. For the region from cox1 to cob (the CB template), primers were CB-C1 and CB-B1 (Table 1), both based on U. foili sequence for these genes, and long PCR conditions were identical to those used for the amplification of the CN template.

The final fragment, the BN template, consisted of the region from cob to nad4, and was amplified using standard PCR as described above. Primers were BN-B1 and BN-N1 (Table 1), and PCR conditions were an initial denaturation at 94°C for 5 min, followed by 30 cycles of denaturation at 94°C for 1 min, annealing at 52°C for 1 min, and extension at 72°C for 2 min and 30 s with a 5 min 72°C final extension.

DNA sequencing

Each of the three overlapping fragments was sequenced using a primer-walking strategy where initial sequence reads were carried out with the primers used in the PCR amplification, and subsequent primers were designed based on returned sequence data. PCR products were visualized on 0.8 or 1% agarose gels, and fragment size was estimated using R f plots. PCR products were excised from the gel, purified using a QIAquick Gel Extraction Kit (Qiagen), and sequenced commercially by SeqWright DNA Technology Services (Houston, TX).

Sequence assembly and annotation

In our primer-walking approach, we emphasized long overlaps between steps such that the average base was sequenced 3.4 times. Sequence assembly was carried out using CodonCode Aligner software (CodonCode), and contigs were exported as FASTA files. Protein and rRNA gene boundaries were determined using the BLAST (Altschul et al. 1990, 1997) and ORF-evaluation tools in the MacVector software suite (MacVector, Inc.). tRNA genes were annotated using tRNAscan-SE (Lowe and Eddy 1997) on sequence regions not previously assigned to protein or rRNA genes. For some tRNA identification, the source parameter on tRNAscan-SE was set to “Nematode Mitochondrion”. tRNAs that were not identified by the tRNAscan software were assigned using a combination of by-eye and Clustal W version 2.0 (Larkin et al. 2007) sequence alignments involving putative U. foili tRNAs and the corresponding tRNA sequences reported for other Trombidiformes mites.

Results

General features

The circular mitochondrial genome of U. foili (GenBank accession EU856396) is 14,738 bp in size, 73% A + T, and contains the 37 genes typical for animal mitochondria: 13 protein-coding genes, 22 tRNA genes, and 2 rRNA genes. Twenty-two genes (9 protein-coding and 13 tRNA) are located on the majority (J) strand, with the remaining 4 protein-coding, 2 rRNA and 9 tRNA genes on the minority (N) strand. In addition, the U. foili genome contains two large noncoding regions of 645 and 387 bp, each of which contains regions of predicted secondary structure (Mathews et al. 1999) which could serve as replication origins (Zhang et al. 1995).

The U. foili mitochondrial genome is the seventh representative from the order Trombidiformes and shares general characteristics with these relatives (Table 2). First, the A + T content is typical for a Trombidiformes mite. Second, the overall genome size continues a pattern of compact genomes in this group. Finally, both protein-coding genes and tRNA genes are reduced in size when compared to the homologous sequences in the mitochondrial genome of Drosophila yakuba, a well-studied arthropod (Clary and Wolstenholme 1985; Crease 1999; Shao et al. 2005).

Table 2 Comparison of Unionicola foili mitochondrial genome features with those of other Trombidiformes and the fruit fly Drosophila yakuba

Gene arrangement

Like the mitochondrial genomes of other Trombidiformes mites, the U. foili mitochondrial genome (Fig. 1) is highly rearranged when compared to that of the hypothetical ancestor of the arthropods, whose gene order is represented by the horseshoe crab Limulus polyphemus (Boore et al. 1995; Shao et al. 2005). Additionally, the U. foili gene arrangement is highly derived even when compared to other Trombidiformes, continuing a pattern of extensive genome rearrangement among species in this taxon. Specifically, U. foili contains nine gene boundaries that are unique, eight of which involve tRNA translocations. Although the nad4L and nad5 genes are annotated individually in the GenBank accession for this sequence, the nineth unique gene boundary involves an in-frame fusion between these genes. Because our initial annotation did not indicate a stop codon for the nad4L gene, this region was amplified and resequenced independently from six additional individuals, confirming our original annotation.

Fig. 1
figure 1

Comparison of the mitochondrial genome structures of Trombidiformes mites and the horseshoe crab Limulus polyphemus. Light gray protein-coding genes, unshaded tRNA genes, dark gray rRNA genes. Circular genome sequences were linearized at the 5′ end of the cox1 gene. Genes transcribed in the same direction as cox1 (left to right) are shown below the line, and genes transcribed in the opposite direction are shown above the line. For protein-coding and rRNA genes, the gene names are shown either in the rectangle or above or below the line. For tRNA genes, gene names are abbreviated with the single-letter abbreviation for the amino acid specified. Serine and Leucine each have two tRNA genes in these mitochondrial genomes, which are distinguished by their anticodons (5′–3′) S1: GCU, S2: UGA, L1: UAG, L2: UAA. Protein-coding and rRNA genes are drawn to approximate scale, but tRNA genes are exaggerated for clarity. Formatting for this figure is adapted from the OGRe database (Jameson et al. 2003)

In addition to the apparent nad4L-nad5 gene fusion, we noted seven pairs of overlapping genes in the U. foili mitochondrial genome: for genes on the same strand, cox1 and cox2 overlap by one base, atp6 and atp8 overlap by seven bases, atp6 and cox3 overlap by one base, and nad6 and cob overlap by one base. Similar overlaps between these gene pairs have been annotated in other Trombidiformes species with completed genomes (Shao et al. 2006, 2005) and GenBank accession AB300500. For genes on opposite strands, tRNA-Thr and tRNA-Tyr overlap by one base, tRNA-Gln and tRNA-Ile overlap by one base, and tRNA-Arg and tRNA-Val overlap by three bases. The tRNA-Thr tRNA-Tyr gene boundary is unique to U. foili, and therefore the overlap has not been annotated in other species. The three-base overlap between tRNA-Arg and tRNA-Val, however, is similar to an annotated overlap in Walchia hayashii, and the tRNA-Gln tRNA-Ile overlap is similar to an overlap between these genes in L. polyphemus (Lavrov et al. 2000). Since all three pairs of overlapping tRNA genes are on opposite strands, they would be processed from different transcripts, and these overlaps would not be expected to result in truncated tRNAs.

Protein-coding genes

Of the thirteen protein-coding genes annotated, twelve begin with one of the common start codons for mitochondrial genes (ATA, ATG, ATT). Only one gene, nad5, is annotated as having a TTG start codon. In addition to nad4L, four other genes, cox2, cox3, nad5, and cob were found to have incomplete stop codons. In each of these cases, the gene is followed by a tRNA gene, and stop codons for these genes are likely to be generated by cleavage and polyadenylation of the primary transcript (Ojala et al. 1981). Interestingly, although the nad5 stop codon is generated by 3′ processing, the tRNA is located on the opposite strand, suggesting that the secondary structure of the strand complementary to that which specifies the functional tRNA is also able to be recognized by the posttranscriptional processing machinery. This type of processing has also been reported for the mitochondrial nad3 of the spider Ornithoctonus huwena (Qiu et al. 2005).

rRNA genes

The genes for the small (rrnS) and large (rrnL) ribosomal RNAs were annotated to extend to the boundaries of neighboring genes. This leaves open the possibility that the rRNA genes may be somewhat shorter than the annotated sequence indicates, but if these differences exist, they are likely to be small, as the sizes of both rRNAs are similar to those annotated for the other Trombidiformes mites shown in Fig. 1. rrnS is annotated at 649 bp in U. foili with a range of sizes in other Trombidiformes mites of 601–679 bp, while rrnL is annotated at 1,016 bp with a range in the same group of 992–1,046 bp. In U. foili, as in most mitochondrial genomes of metazoans, the rRNA genes are located near each other on the minority (N) strand. Unusually, these genes are separated by the protein-coding nad1 gene.

Secondary structure of tRNAs

Twenty-two transfer RNA genes were annotated using a combination of tRNAscan-SE, alignments, and by-eye inspection. In addition to being reduced in size (Table 2) U. foili tRNAs share structural features with the tRNAs of other Trombidiformes mites. Figure 2 presents the inferred secondary structures of tRNA molecules from U. foili and two other Trombidiformes mites: Leptotrombidium akamushi (Shao et al. 2006) and the prostigmatid mite Walchia hayashii (Mitani et al. 2008). In general, sequence conservation is high only in the anticodon loop. In base-paired stems, sequence conservation is often moderate to non-existent, although the base-pairing itself is strongly conserved. The exception to this pattern of limited sequence conservation is the D-arm. Most tRNAs with D-arms show moderate to strong sequence conservation for this region. Like the tRNAs of other Trombidiformes species (Shao et al. 2005), the U. foili mitochondrial tRNAs show structures unusual for arthropods. All but seven of the U. foili tRNAs in Fig. 2 lack either a T-arm or a D-arm, and are thus unable to fold into the canonical tRNA cloverleaf structure. For most tRNAs in Fig. 2, the loss of D- or T-arms is conserved: with few exceptions, all three species share a similar predicted secondary structure.

Fig. 2
figure 2

Inferred secondary structures of tRNAs from Unionicola foili (Uf), Leptotrombidium akamushi (La), and Walchia hayashii (Wh). Acceptor arm, D-arm, anticodon arm, and T-arm are regions involved in base-pairing, with putatively-paired bases underlined. Anticodon bases are bold text. Shaded bases are conserved across these three species

Discussion

Overall, the results of this study suggest that the mitochondrial genome structure of U. foili shares important similarities to those of other Trombidiformes mites. The emerging characteristics of this group include compact, extensively rearranged genomes and tRNA structures that are highly derived when compared to those of other arthropods. Although these characteristics are not universally shared across the arachnids, there are a growing number of examples of rearranged genomes and unusual tRNA structures in non-Trombidiformes Arachnida (Jeyaprakash and Hoy 2007; Fahrein et al. 2007; Masta and Boore 2004; Domes et al. 2008). The wealth of unusual features in these groups has generated intriguing hypotheses on a range of topics including genome recombination (Shao et al. 2005), RNA editing (Masta and Boore 2004), loss of tRNA genes (Domes et al. 2008), and the possibility of non-standard translation of some ORFs (Jeyaprakash and Hoy 2007). In the U. foili mitochondrial genome, the most unusual aspects are the non-canonical structures of most tRNAs and the apparent gene fusion between nad4L and nad5.

Non-canonical mitochondrial tRNA structures like those seen in the U. foili mitochondrial genome are well documented in only a few groups. Loss of the cloverleaf structure has been most extensively described in representatives of the phylum Nematoda (Wolstenholme et al. 1987; Montiel et al. 2006) but is now emerging as common among several groups of arachnids, including spiders and mites (Jeyaprakash and Hoy 2007; Masta and Boore 2004; Shao et al. 2005). Recently, an analysis of tRNA structure across several arachnid groups indicated that the propensity to evolve away from the canonical tRNA structure may have arisen independently in diverse lineages (Masta and Boore 2008), and suggested that as-yet unexplored evolutionary pressures drive this striking propensity.

In the U. foili genome, the apparent gene fusion between nad4L and nad5 is unprecedented among sequenced mitochondrial genomes. There are at least four possible scenarios for the expression of functional protein(s) from the fused genes. First, RNA editing may create a stop codon after the region is transcribed. mRNA editing is widespread in the mitochondria of trypanosomes (Ochsenreiter et al. 2008), and dinoflagellates (Lin et al. 2002), and in the nuclear transcriptomes of arthropods (Stapleton et al. 2006). Mitochondrial tRNA editing has been demonstrated as a mechanism for correcting mismatched tRNA acceptor stems in land snails (Yokobori and Päabo 1995) and proposed as a solution to similar mismatches in the spider Habronattus oregonensis (Masta 2000). Yokobori and Päabo (1995) have proposed possible mechanisms for tRNA editing that involve a base-paired region serving as the template for the edited product. Analysis of the region near the potential nad4L-nad5 editing site did not show significant predicted secondary structure, but this does not rule out template-directed editing that might involve sequences outside of this region. Second, a modified genetic code could lead to the usage of a non-standard stop codon for the nad4L ORF. A modified genetic code has been proposed as a possible explanation for unusual gene structure in the related mite Metaseiulus occidentalis (Jeyaprakash and Hoy 2007). Third, it is possible that the proteins are translated as a fusion protein and subsequently processed. Finally, it is also possible that the proteins are expressed as a fusion protein and function without processing, a situation that is possible given the localization of the ND4L and ND5 proteins to the hydrophobic membrane-bound arm of the NADH:ubiquinone oxidoreductase (complex I); (Chomyn et al. 1985; Wang et al. 1991) of the mitochondrial electron transport chain. As annotated, the downstream nad5 gene would use TTG as its start codon. TTG is a rare start codon in mitochondrial genomes (Bae et al. 2004), strengthening the likelihood of either a functional fusion protein or editing of the primary transcript.

Previous work in our laboratory has used the DNA sequence of the mitochondrial cox1 gene to examine phylogenetic relationships among Unionicola mites both within and among subgenera. We were able to use the cox1 sequence to reconstruct the phylogeny of closely related species within a subgenus (Ernsting et al. 2006). When we extended this approach to examine the phylogeny of the species across subgenera, we encountered the loss of phylogenetic support associated with applying multiparameter statistical corrections to highly derived sequence data (D. D. Edwards et al., manuscript submitted), underscoring the need for additional loci or other informative characters. Since the rate of molecular evolution is well-correlated with the rate of genome rearrangement in arthropods (Shao et al. 2003; Xu et al. 2006), and given the pattern set by previously-sequenced arachnid genomes, we predicted that the U. foili mitochondrial genome would also be extensively modified. Not only is the U. foili mitochondrial genome highly rearranged compared to other members of the Trombidiformes (Fig. 1), but preliminary data indicate that the mitochondrial genome of Unionicola parkeri (a closely related mantle mite from a different subgenus) has yet another unique gene order that has high affinity for the U. foili arrangement but that also has several unique gene-order characters. The rearrangements and other idiosyncratic markers, including tRNA secondary-structural characters (Murrell et al. 2003) seen in these groups have potential to be used as synapomorphic character states, allowing us to improve the reliability of phylogenetic inference (Boore 2006; Hypsa 2006). Additionally, when combined with DNA sequence data and other characters, genome-level characters can contribute to a total-evidence approach to phylogenetic reconstruction (Shao and Barker 2007; Wortley and Scotland 2006).

The wealth of genome-level characters among Trombidiformes mites suggests the potential for these types of characters to contribute to phylogenetic inference in the group. Although a maximum parsimony analysis based on 19 gene-order characters for the taxa in Fig. 1 did not result in a well-supported phylogeny of Trombidiformes species (data not shown), some features of the genome structures may be phylogenetically informative. In particular, species of Leptotrombidium share what would appear to be genome structure synapomorphies in the nad4L-nad1 region that support their close relationship. Similarly, the genomes of Walchia hayashii and Ascoschoengastia sp. TATW-1 have a unique insertion of the nad1 gene between the rrnS and rrnL genes. U. foili shares this synapomorphy, making it possible that this region will be useful in future phylogenetic reconstruction. Finally, some genome structure characters are similar or identical for most members of the group. For example, the gene order in the nad4-nad6-cob region is very similar in all Trombidiformes except Tetranychus urticae, which on the basis of its genome structure appears to be quite distantly related to both other Trombidiformes and their ancestors (Limulus). As more mitochondrial genome sequences are reported, the utility of genome-level characters for resolving phylogeny will become increasingly important. Additionally, the sequence of the U. foili mitochondrial genome suggests that the cob and rrnS genes may contain regions that are more conserved within the Unionicola than is cox1, raising the possibility that these sequences could contribute to phylogenetic inference. Future studies in our laboratory will be focused on assessing the usefulness of both sequence data and idiosyncratic characters within the Unionicola, with the ultimate goal of using both types of molecular data to generate a phylogeny of the genus.