Introduction

Gasterophilus (Diptera: Gasterophilidae) larvae are common obligate parasites of the digestive tract of the equids (Hall and Wall 1995; Wambwa et al. 2004; Anderson 2005). The larvae cause gastrointestinal myiasis in equids, causing considerable economic loss due to swallowing (throat localization of the immature stages), gastro and intestinal ulcerations, gut obstructions or volvolus, rectal prolapses, anemia, diarrhea, and digestive disorders (Waddell 1972; Dart et al. 1987; Principato 1988; Cogley and Cogley 1999; Sandin et al. 1999; Sequeira et al. 2001). Horse myiasis is widely distributed in China, especially in the northeast, northwest, and the Inner Mongolia Autonomous Region (Peng et al. 2011).

The metazoan mitochondrial (mt) genome possesses a circular double strand DNA that varies in size from 14 to 20 kb, generally encodes for 36–37 genes, including 12–13 protein-coding genes (PCGs), two ribosomal RNAs (rRNA) genes and 22 transfer RNAs (tRNA) (Wolstenholme 1992; Boore 1999). Mt genomes have been extensively used as genetic markers in molecular phylogenetic studies because they have several useful properties (i.e., haploidy, compactness, maternal inheritance, relatively high mutation rates, and the lack of recombination) (Tao et al. 2014). To date, there are over 700 complete mt genome sequences of the Insecta available in GenBank, including some from the dipteran. Diptera is one of the most extensively sequenced orders among the Insecta, with 97 complete or near-complete Diptera mt genome sequences available in GenBank (as of September 2015), including 66 Brachycera and 31 Nematocera species representing 16 families. In spite of the availability of advanced DNA technologies, however, no mt genomes are available for any members of the family Gasterophilidae so far, and Gasterophilidae flies are missing in many phylogenetic analyses.

In the present study, we sequenced the complete mt genome of Gasterophilus intestinalis of the family Gasterophilidea. We inferred the phylogenetic relationships with the concatenated mt amino acid sequences of G. intestinalis and 31 other Oestroidea species that have been sequenced to date.

Materials and methods

Parasites and DNA extraction

Third-stage larvae (L3) representing G. intestinalis were collected from the stomach wall of a horse at an abattoir in Heilongjiang province (HLJ) and Xinjiang Uygur Autonomous Region (XJ), China. The individual fresh larvae were washed in physiological saline, identified preliminarily to species based on morphological characters and predilection sites (Roelfstra et al. 2010), and then fixed in 70 % ethanol and stored at −20 °C. Total genomic DNA was extracted from individual larvae using sodium dodecyl sulfate/proteinase K treatment, followed by spin-column purification (Wizard® SV Genomic DNA Purification System, Promega). The identity of these larvae was further verified as G. intestinalis by PCR amplification and subsequent sequencing of cox1 gene as reported previously (Pawlas-Opiela et al. 2010).

PCR amplification and sequencing

The whole mtDNA genome was amplified by polymerase chain reaction (PCR) using nine primer pairs (Table 1) designed based on sequences well conserved in many distantly related taxa (e.g., Cochliomyia hominivorax, Dermatobia hominis, and Hypoderma lineatum, GenBank accession numbers AF260826, NC_006378, and NC_013932, respectively). PCR reactions (25 μL) were performed using 0.5 μL of each primer (20 pmol/μL), 2.5 μL Ex Taq buffer (100 mM Tris–HCl and 500 mM KCl), 2 μL of dNTP Mixture (2.5 mM each), 0.5 μL of Ex Taq (5 U/μL) DNA polymerase (TaKaRa Biotechnology, Dalian, China) and 1 μL of DNA sample in a thermocycler (Biometra, Göttingen, Germany). The cycling conditions were the following: 94 °C for 5 min (initial denaturation); then followed by 35 cycles of 94 °C for 30 s (denaturation), 40~58 °C for 30 s (annealing), and 72 °C for 1~4 min (extension) according to the product length; with a final extension step at 72 °C for 10 min. Each PCR reaction yielded a single band as detected in a 1 % (W/V) agarose gel upon ethidium-bromide staining (not shown). PCR products were subsequently sent to Invitrogen Biotechnology Company (Shanghai, China) for sequencing using a primer-walking strategy.

Table 1 Sequences of primers used to amplify PCR fragments from Gasterophilus intestinalis

Sequence analyses

Sequences were assembled manually and aligned against the complete mt genome sequences of H. lineatum (Weigl et al., 2010) available using the computer program MAFFT 7.122 (Katoh and Standley 2013) to identify gene boundaries. Each gene was translated into amino acid sequence using the invertebrate mt genetic code in MEGA 5 (Tamura et al. 2011), and aligned based on its amino acid sequence using default settings. The translation initiation and termination codons were identified to avoid gene overlap and to optimize the similarity between the gene lengths of closely related mt genomes. For analyzing tRNA genes, the program ARWEN (Laslett and Canback 2008) was used to detect tRNA and infer their secondary structure (not shown). Two rRNA genes were predicted by comparison with that of previously reported (Otranto et al. 2005). In addition, the Megalign procedure within DNAStar 5.0 (Burland 2000) was used to analyze sequence similarity of G. intestinalis from two different regions in China.

Phylogenetic analyses

For comparative purposes, amino acid sequences predicted from published mt genomes of 31 Oestroidea species were also included in the present analysis, using the Musca domestica (GenBank accession number NC_024855) as an outgroup. The 12 amino acid sequences (without atp8) were single aligned using MAFFT 7.122 and then concatenated, and ambiguously aligned regions were excluded using Gblocks online server (http://molevol.cmima.csic.es/castresana/Gblocks_server.html) with the default parameters (Talavera and Castresana 2007) using the options for a less stringent selection.

Phylogenetic analyses were conducted using Bayesian inference (BI) and maximum likelihood (ML) methods. The MtArt + I + G + F model of amino acid evolution was selected as the most suitable model of evolution by ProtTest 2.4 (Abascal et al. 2005) based on the Akaike information criterion (AIC). As MtArt model is not implemented in the current version of MrBayes, an alternative model, MtREV, was used in BI and four chains (three heated and one cold) were run simultaneously for the Monte Carlo Markov Chain. Two independent runs for 1,000,000 metropolis-coupled MCMC generations, sampling a tree every 100 generation in MrBayes 3.1.1 (Ronquist and Huelsenbeck 2003); the first 2500 trees represented burn-in and the remaining trees were used to calculate Bayesian posterior probabilities (Bpp). The analysis was performed until the potential scale reduction factor approached 1 and the average standard deviation of split frequencies was less than 0.01. ML analysis was performed with PhyML 3.0 (Guindon and Gascuel 2003) using the subtree pruning and regrafting (SPR) method with a BioNJ starting tree, and the MtArt model of amino acid substitution with proportion of invariant sites (I) and gamma distribution (G) parameters estimated from the data with four discretized substitution rate classes, the middle of which was estimated using the median. Phylograms were drawn using the program FigTree v.1.4.

Results and discussion

The whole mitochondrial genome of G. intestinalis is a circular molecule of 15,687 bp (HLJ) and 15,660 bp (XJ) in length (Fig. 1) which shows the same sequence order and orientation as that of other Oestroidea species (i.e., C. Hominivorax, H. Lineatum, and D. hominis). The complete mtDNA sequences of G. intestinalis generated in this work has the GenBank accessions numbers KU236026 (HLJ) and KU236025 (XJ). When compared with other insect genera for which the complete mt genomes are known [e.g., Drosophila spp. (Diptera: Drosophilidae) and Ceratitis spp. (Diptera: Tephritidae)], the gene content and general organization pattern correspond to typical Brachycera mtDNA (Clary and Wolstenholme 1985; Lewis et al. 1995; Spanos et al. 2000). As in other insects, this circular mt genome contains 13 PCGs (cox1-3, nad1-6, nad4L, cytb, atp6, and atp8), 22 tRNA genes, two rRNA genes, and one D-loop region (Table 2). The genes are transcribed in two different directions. Except for four PCGs (nad5, nad4, nad4L and nad1), and eight tRNA genes (trnQ, trnC, trnY, trnF, trnH, trnP, trnL2, and trnV) encoded on the minority strand (N-strand), all other genes were encoded on the majority strand (J-strand) (Fig. 1).

Fig. 1
figure 1

Arrangement of the mitochondrial genome of G. intestinalis. Gene scaling is only approximate. All genes have standard nomenclature including the 22 tRNA genes, which are designated by the one-letter code for the corresponding amino acid, with numerals differentiating each of the two leucine- and serine-specifying tRNAs (L1 and L2 for codon families CUN and UUR, respectively; S1 and S2 for codon families UCN and AGN, respectively)

Table 2 Mitochondrial genome organization of Gasterophilus intestinalis Heilongjiang isolate (HLJ) and Xinjiang isolate (XJ) in China

The length of PCGs of G. intestinalis was in the following order: nad5 > cox1 > nad4 > cytb > nad2 > nad1 > cox3 > cox2 > atp6 > nad6 > nad3 > nad4L > atp8 (Table 2). A total of 3714 amino acids are encoded in the mt genome of G. intestinalis. In this mt genome, five genes (nad2, atp8, nad3, nad5, and cytb) use ATT, six genes (cox2, cox3, atp6, nad4, nad4L, and nad1) use ATG, and one gene (nad6) uses ATA as start codon, respectively (Table 2). With regard to cox1, the start codon was identified as TCG (Table 2), which differs from the standard invertebrate mitochondrial code. This feature has been reported in cox1 gene sequences of other species belonging to the families Calliphoridae (Sperling et al. 1994; Wells and Sperling 1999), Tephritidae (Spanos et al. 2000) and Culicidae (Beard et al. 1993; Mitchell et al. 1993). In addition, all genes have complete termination codon except for cox2, nad5, nad2, nad6, and nad4 genes which use abbreviated stop codon T or TA, 7 genes (cox1, atp8, cox3, nad3, nad4L, atp6, and cytb) use TAA, and 1 gene (nad1) use TAG as termination codon, respectively (Table 2). The rrnL of G. intestinalis is located between trnL2 and trnV, and rrnS is located between trnV and D-loop. The length of the rrnS gene is 788 bp (HLJ) and 785 bp (XJ). The rrnL gene is 1321 bp (HLJ) and 1322 bp (XJ) (Table 2). A total of 22 tRNA sequences were identified in the G. intestinalis mt genome, ranging from 63 to 72 bp (Table 2). The size of D-loop is 898 bp (HLJ) and 875 bp (XJ), and the A + T content is 81.7 % (HLJ) and 80.8 % (XJ).

Many studies have demonstrated that mtDNA sequences are valuable genetic markers for phylogenetic studies of different groups of parasites, including insects (Cameron et al. 2006; Liu et al. 2014, 2015a, b, 2016; Jabbar et al. 2014; Guo 2015; Li et al. 2015; Cheng et al. 2016). Of the 31 Oestroidea species included in the phylogenetic analyses in this study, two species belonged to the Oestridae, one species belonged to the Gasterophilidae, three belonged to the Tachinidae, eight species belonged to the Sarcophagidae and 16 species belonged to the Calliphoridae. The results of the present study indicated that the families Gasterophilidae and Oestroidae were more closely related than to the Tachinidae (Figs. 2 and 3). These results were consistent with those of previously proposed classification schemes within the Oestroidea (Kutty et al. 2010). The monophyly of the Calliphoridae was strongly supported with a posterior probability (PP) of 1 in Bayesian analysis (Fig. 2), a bootstrapping frequency (Bf) of 99 % in ML analyses (Fig. 3). The monophyly of the Sarcophagidae was strongly supported in BI and ML analyses (PP = 1; Bf = 100 %, Figs. 2 and 3). The monophyly of the Tachinidae was weakly supported in ML analyses (PP = 1; Bf = 40 %, Fig. 3), and was paraphyletic in BI with weakly support (PP = 0.55; Bf = 67 %, Fig. 2). The monophyly of the Oestridae was rejected in BI and ML analyses (PP > 0.55; Bf = 67 %, Figs. 2 and 3), consistent with that of some studies (Marinho et al. 2012; Zhao et al. 2013).

Fig. 2
figure 2

Phylogenetic relationships among 32 species of Oestroidea inferred by Bayesian inference (BI) of deduced amino acid sequences of 12 mitochondrial proteins. M. domestica (GenBank accession number NC_024855) was used as the outgroup

Fig. 3
figure 3

Phylogenetic relationships among 32 species of Oestroidea inferred by Maximum likelihood (ML) of deduced amino acid sequences of 12 mitochondrial proteins. M. domestica (GenBank accession number NC_024855) was used as the outgroup

In the present study, although our results showed that the Oestridae and Gasterophilidae are sister groups, only two species within the Oestridae were included in the present study. Therefore, expanding taxon sampling from these lineages of flies is clearly the next step for phylogenetic studies of flies using mtDNA. In addition, phylogenetic analysis in the present study was based only on mtDNA sequences, so we believe it is still necessary to employ nuclear genomic sequences to provide additional evidence for phylogenetic analyses and genome evolution of the flies in further studies.

In conclusion, the present study determined the complete mt genome sequences of G. intestinalis, which represents the first mt genome of any member of the family Gasterophilidae. These data provide novel mtDNA markers for studying the molecular epidemiology and population genetics of the G. intestinalis and its congeners.