Introduction

Hymenolepis nana is the most common tapeworm of humans, particularly in young children in developing countries (Willcocks et al. 2015). It is often referred to as the “dwarf tapeworm” due to its small size (about 2–4 cm long and only 1 mm wide). Human hymenolepiasis, caused by H. nana and H. diminuta, is a globally widespread zoonosis, and it is endemic in Asia, Southern and Eastern Europe, Central and South America, and Africa (Thompson 2015). H. nana that causes many clinical symptoms, such as headaches, weakness, anorexia, abdominal pain, and diarrhea, has been seriously neglected (Sirivichayakul et al. 2000). The life cycle of H. nana may be either direct or indirect. Direct human to human transmission is the most common route of infection with H. nana, particularly in poor hygiene and inadequate sanitation (Willcocks et al. 2015). Fortunately, hymenolepiasis can be treated effectively using drugs such as praziquantel and albendazole.

The metazoan mitochondrial (mt) genomes, ranging in size from 14 to 18 kb, are typically circular and usually encode 36–37 genes, including 12–13 protein-coding genes (PCGs), 2 ribosomal RNA (rRNA) genes and 22 transfer RNA (tRNA) genes (Wolstenholme 1992). There are no introns within genes and only limited spacer regions between genes. Mt genomes have been extensively used as genetic markers in molecular phylogenetic studies due to its several useful properties (i.e., haploidy, compactness, maternal inheritance, relatively high mutation rates, and the lack of recombination) (Boore 1999; Tao et al. 2014). To date, there are over 4000 complete mt genome sequences of metazoans available in GenBank, including some from parasites, including tapeworm (Wolstenholme 1992; Boore 1999; Tao et al. 2014; Liu et al. 2013, 2014a, b; Knapp et al. 2011). Currently, The class Cestoda comprises more than 5000 described species, and many tapeworm species threaten the health of animals and humans on a global scale (Macnish et al. 2002; Mbaya et al. 2014). In spite of the availability of advanced DNA technologies, however, there is still a major gap in our knowledge of mt genomes of the hymenolepidid tapeworm. The hymenolepidid tapeworms have been assigned to many genera based on their morphological traits (Widmer et al. 2013; Makarikov et al. 2015). Pseudanoplocephala crawfordi is first described in the small intestine of a wild pig and recorded as one member of the genus Pseudanoplocephala based on its morphological features (sizes, acetabulum numbers, testicular quantities, and locations) (Jiang et al. 1990). However, recent studies using ribosomal DNA (rDNA) and mitochondrial DNA (mtDNA) sequences proposed that P. crawfordi may represent a member of the genus Hymenolepis (Jia et al. 2014; Zhao et al. 2015). However, molecular phylogenetic studies of hymenolepidid tapeworms are still scarce (Okamoto et al. 1997). In addition, limited information is available about mt genomes of hymenolepidid tapeworms (Von Nickisch-Rosenegk et al. 2001; Zhao et al. 2015).

In the present study, we sequenced the complete mt genome of the dwarf tapeworm H. nana (Cyclophyllidea: Hymenolepididae), which is a neglected zoonotic helminth. Also, we inferred the phylogenetic relationships with the concatenated mt amino acid sequences of the H. nana and 40 other tapeworm species that have been sequenced to date.

Materials and methods

Parasites and DNA extraction

H. nana samples were collected from a naturally infected Kunming mouse in Lanzhou, Gansu Province, China. The adult tapeworms were isolated from the small intestine of mice. The presence of tapeworms was detected by light microscopic examination, and identification of H. nana was conducted by morphological criteria and site of predilection (Jia et al. 2010), then fixed in 70 % alcohol and stored at −20 °C until use. Total genomic DNA was isolated from these specimens using sodium dodecyl sulfate/proteinase K treatment, followed by spin-column purification (WizardSV Genomic DNA Purification System, Promega), and the DNA was extracted by shaking with 250 μL Wizard SV Lysis Buffer. After centrifugation, the DNA was washed, eluted, and stored.

PCR amplification and sequencing

Five pairs of PCR primers were used to amplify overlapping segments of the complete mt genome of H. nana as shown in Table 1. Briefly, the segment of nad1 was amplified using the primers JB11/JB12 (Bowles and McManus 1993). The other segments were amplified using designed primers (Table 1) based on sequences well conserved in the H. diminuta (Von Nickisch-Rosenegk et al. 2001). All PCR reactions (25 μL) were performed in 4.0 mM dNTP mixture, 3.0 μL 10 × LA Taq buffer (Mg2+ Plus), 10 pmol of each primer, 1 U LA Taq polymerase (Takara), 1 μL of DNA sample in a thermocycler (Eppendorf, Germany) under the following conditions: 94 °C for 2 min (initial denaturation); then followed by 5 cycles of 92 °C for 30 s (denaturation), 48–55 °C for 30 s (annealing), 68 °C for 1 ~ 6.5 min (extension), and 92 °C for 2 min (initial denaturation); then followed by 30 cycles of 92 °C for 30 s (denaturation), 48–55 °C for 30 s (annealing), 68 °C for 1 ~ 6.5 min (extension) according to the product length; and with a final extension step at 68 °C for 10 min. Each PCR reaction yielded a single band as detected in a 1 % (W/V) agarose gel upon ethidium bromide staining (not shown). PCR products were subsequently sent to Invitrogen Biotechnology Company (Shanghai, China) for sequencing using a primer-walking strategy.

Table 1 Sequences of primers used to amplify PCR fragments from Hymenolepis nana

Sequence analyses

Sequences were assembled manually and aligned against the complete mt genome sequences of H. diminuta available using the computer program MAFFT 7.122 (Katoh and Standley 2013) to identify gene boundaries. Each gene was translated into an amino acid sequence using the invertebrate mitochondrial genetic code in MEGA 5 (Tamura et al. 2011) and aligned based on its amino acid sequence using default settings. The translation initiation and termination codons were identified to avoid gene overlap and to optimize the similarity between the gene lengths of H. diminuta mt genomes. For analyzing tRNA genes, the program tRNAscan-SE (Lowe and Eddy 1997) were identified by recognizing potential secondary structures and anticodon sequences by eye, and two rRNA genes were predicted by comparison with that of H. diminuta (Von Nickisch-Rosenegk et al. 2001).

Phylogenetic analyses

For comparative purposes, amino acid sequences predicted from published mt genomes of 40 species were also included in the present analysis, using the one trematode Dicrocoelium dendriticum (GenBank accession number JQ966973) as an outgroup (Liu et al. 2014b). The 12 amino acid sequences were single aligned using MAFFT 7.122 and then concatenated, and ambiguously aligned regions were excluded using Gblocks online server (http://molevol.cmima.csic.es/castresana/Gblocks_server html) with the default parameters (Talavera and Castresana 2007) using the options for a less stringent selection. Phylogenetic analyses were conducted using three methods: Bayesian inference (BI), Maximum likelihood (ML), and Maximum parsimony (MP). The JTT + I + G + F model of amino acid evolution was selected as the most suitable model of evolution by ProtTest 2.4 (Abascal et al. 2005) based on the Akaike information criterion (AIC). As the JTT model is not implemented in the current version of MrBayes, an alternative model, MtREV, was used in BI, and four chains were run simultaneously for the Monte Carlo Markov Chain. Two independent runs were made for 1,000,000 metropolis-coupled MCMC generations, sampling a tree every 100 generation in MrBayes 3.1.1 (Ronquist and Huelsenbeck 2003); the first 2500 trees represented burn-in and the remaining trees were used to calculate Bayesian posterior probabilities (Bpp). The analysis was performed until the potential scale reduction factor approached 1, and the average standard deviation of split frequencies was less than 0.01. ML analysis was performed with PhyML 3.0 (Guindon and Gascuel 2003) using the subtree pruning and regrafting (SPR) method with a BioNJ starting tree, and the MtArt model of amino acid substitution with proportion of invariant sites (I) and gamma distribution (G) parameters estimated from the data with four discretized substitution rate classes, the middle of which was estimated using the median. Bootstrap frequency (Bf) was calculated using 100 bootstrap replicates. MP analysis was conducted using PAUP 4.0 Beta 10 program (Swofford 2002), with indels treated as missing character states; 1000 random additional searches were performed using TBR. Bf was calculated using 1000 bootstrap replicates and 100 random taxon additions in PAUP. Phylograms were drawn using the program FigTree v.1.4 (http://tree.bio.ed.ac.uk/software/figtree).

Results and discussion

General features of the mt genome of H. nana

The complete mt genome of H. nana was 13,764 bp in length (Fig. 1). This size is within the range of other mite mt genomes (Table 2). The sequences have been deposited in GenBank under the accession number KT951722. This circular mt genome contains 12 PCGs (cox1-3, nad1-6, nad4L, cytb, and atp6), 22 tRNA genes, two rRNA genes, and two noncoding regions (Table 2). The genes are transcribed in same directions (Fig. 1). The nucleotide composition of the complete mt genome of H. nana is A = 26.96 %, T = 46.01 %, G = 18.19 % and C = 8.83 %, with a typically high A + T content of 72.97 % within the range of values found in H. diminuta (71.04 %) and P. crawfordi (69.70 %) (Table 2).

Fig. 1
figure 1

Arrangement of the mitochondrial genome of Hymenolepis nana. Gene scaling is only approximate. All genes have standard nomenclature including the 22 tRNA genes, which are designated by the one-letter code for the corresponding amino acid, with numerals differentiating each of the two leucine- and serine-specifying tRNAs (L1 and L2 for codon families CUN and UUR, respectively; S1 and S2 for codon families AGN and UCN, respectively)

Table 2 Organization of Hymenolepis nana mitochondrial genome

Annotation

The length of PCGs of H. nana was in the following order: cox1 > nad5 > nad4 > cytb > nad1 > nad2 > cox3 > cox2 > atp6 > nad6 > nad3 > nad4L (Table 2). A total of 3337 amino acids are encoded in the mt genome of H. nana. In this mt genome, all genes (cox1, nad3, nad6 cox2, cox3, atp6, nad4, nad1, nad5, nad2, and cytb) use ATG as start codon, except nad4L uses GTG, respectively (Table 2). All genes have complete termination codon and half (cox3, nad2, nad4, nad5, nad4L, and cytb) use TAA, and the other genes (atp6, cox1, cox2, nad1, nad3, and nad6) use TAG as termination codon, respectively (Table 2). A total of 22 tRNA sequences (ranging from 60 to 75 nucleotides in length) were identified in the H. nana mt genome. Their predicted secondary structures (not shown) are similar to that of H. diminuta (Liu et al. 2011). The rrnS of H. nana is located between tRNA-Cys and cox2, and rrnL is located between tRNA-Thr and tRNA-Cys. The length of the rrnS gene is 710 bp and rrnL gene is 967 bp (Table 2). The A + T contents of the rrnS and rrnL are 71.41 and 71.66 %, respectively. The longer non-coding region (NC2) is located between the nad5 and tRNA-Gly, and the shorter one (NC1) is located between genes tRNA-Tyr and tRNA-Ser. Their sizes are 268 bp (NC2) and 182 bp (NC1) (Table 2). The A + T contents of the NC2 and NC1 are 79.10 and 83.52 %, respectively (Table 2).

Comparative analyses among H. nana, H. diminuta, and Pseudanoplocephala crawfordi

The mt genome sequence of H. nana was 13,764 bp in length, 136 bp shorter than that of H. diminuta, and 428 bp shorter than that of P. crawfordi. The arrangement of the mt genes (i.e., 13 protein genes, 2 rrn genes, and 22 tRNA genes) and NCRs were the same. A pairwise comparison of the nucleotide sequences of each mt gene and the amino acid sequences conceptually translated from individual protein genes was made among the three taxa of tapeworms (Table 3). The sequence lengths of individual genes varied among these taxa, except for the atp6, nad3, nad4L, nad5, and nad6 genes, which were the same (Table 3). The magnitude of sequence variation in each gene among the three taxa of tapeworms ranged from 12.9–32.6 % for nucleotide sequences and 16.2–39.7 % for amino acid sequences (Table 3). The sequence difference across the entire mt genome between H. nana and H. diminuta was 42.9 %. This difference across the entire mt genome between H. nana and P. crawfordi was 43.4 %. The sequence difference across the entire mt genome between H. diminuta and P. crawfordi was 21.0 %. The greatest variation among the three taxa of tapeworms was in the nad5 (25.9–31.9 %), whereas least differences (12.9–16.7 %) were detected in the rrnS gene (Table 3).

Table 3 Nucleotide and/or predicted amino acid (aa) sequence differences for mt protein-coding and ribosomal RNAgenes among Hymenolepis nana (HN), Hymenolepis diminute (HD), and Pseudanoplocephala crawfordi (PC)

Amino acid sequences inferred from individual mt protein genes of H. nana were compared with those of H. diminuta and P. crawfordi. The difference across amino acid sequences of the 12 protein genes between the H. nana and H. diminuta was 30.7 and 33.9 % between the H. nana and P. crawfordi, respectively. The difference across amino acid sequences of the 12 protein genes between the H. diminuta and P. crawfordi was 26.6 %. The amino acid sequence differences among three taxa of tapeworms ranged from 16.2 to 39.7 %, with cox1 being the most conserved and nad6 the least conserved protein.

Phylogenetic analyses

Of the 40 tapeworm species included in the phylogenetic analyses in this study, 34 species belonged to the order Cyclophyllidea while six belonged to the order Pseudophyllidea. Both orders Cyclophyllidea and Pseudophyllidea were monophyletic in all of the trees inferred by the BI, ML, and MP methods. The monophyly of the order Cyclophyllidea was strongly supported with a Bpp of 1 in BI analyses (Figs. 2, 3 and 4). The 34 species of Cyclophyllidea tapeworm included in this study were from four families: Taeniidae (29 species), Hymenolepididae (three species), Davaineidae (one species), and Dipylidiidae (one species). The monophyly of the families Taeniidae and Hymenolepididae were strongly supported in all of the three phylogenetic analyses in the present study (Bpp = 1, Fig. 2; Bf = 100 %, Fig. 3; Bf = 100 %, Fig. 4). The family Diphyllobothriidae was monophyletic with strong support in all of the three phylogenetic analyses (Bpp = 1, Fig. 2; Bf = 100 %, Fig. 3; Bf = 100 %, Fig. 4). The family Taeniidae consists of four valid genera. Of the 16 species of tapeworm in the genus Taenia, four species were from the genus Hydatigera, nine species were from the genus Echinococcus and one species from the order Versteria. The genus Taenia was monophyletic with strong support in BI analysis (Bpp = 1, Fig. 2), and with weak support in ML analysis (Bf = 59 %, Fig. 3), and was moderately supported in MP analysis (Bf = 75 %, Fig. 3). Both the genera Hydatigera and Echinococcus were monophyletic with strong support in all of the three phylogenetic analyses (Bpp = 1, Fig. 2; Bf = 100 %, Fig. 3; Bf = 100 %, Fig. 4).

Fig. 2
figure 2

Phylogenetic relationships among 40 species of tapeworms inferred by Bayesian analysis of deduced amino acid sequences of 12 mitochondrial proteins. Bayesian posterior probability (Bpp) values were indicated at nodes

Fig. 3
figure 3

Phylogenetic relationships among 40 species of tapeworms inferred by maximum likelihood (ML) of deduced amino acid sequences of 12 mitochondrial proteins. Bootstrapping frequency (Bf) values were indicated at nodes

Fig. 4
figure 4

Phylogenetic relationships among 40 species of tapeworms inferred by maximum parsimony (MP) of deduced amino acid sequences of 12 mitochondrial proteins. Bootstrapping frequency (Bf) values were indicated at nodes

Significance and implications

The majority of helminth parasites are considered by WHO to be the cause of neglected diseases, and most are zoonotic with animal reservoirs playing a role in their epidemiology (Palmeirim et al. 2014; Thompson 2015). In the present study, we determined the complete mt genome of the dwarf tapeworm H. nana, which is a neglected zoonotic helminth. The characterization of the H. nana mt genome provides a foundation for the improved diagnosis of human hymenolepiasis using molecular methods. Molecular tools, using genetic markers in mt genes, would be developed to support clinical diagnosis and to assist in undertaking molecular epidemiological investigations of H. nana. In addition, there is considerable significance employing mtDNA markers to investigate the genetic variation of H. nana, particularly at the larval stages or cryptic/sibling species because morphological features of H. nana is still scarce. Macnish et al. (2002) suggested that isolates of H. nana in Australia actually exist as two cryptic or sibling species because they are morphologically identical, but genetically different. The availability of the complete H. nana mt genome herein now allows to identify and differentiate these cryptic/sibling species.

The characterization of the mt genome of H. nana can generate genetic markers for future systematic studies. Mt genome sequences provide useful genetic markers in examining taxonomic status of helminths, particularly when protein-coding gene sequences are used as markers in comparative analyses (Liu et al. 2011, 2012, 2016; Yamasaki et al. 2012; Jeon et al. 2007). In the present study, both the orders Cyclophyllidea and Pseudophyllidea were monophyletic in all of the trees inferred by the BI, ML, and MP methods. These results were consistent with those of previous studies (Liu et al. 2011, 2012; Yamasaki et al. 2012). The results of the present study support that the genus Taenia was monophyletic with strong support in BI analysis (Bpp = 1, Fig. 2), and with weak support in ML analysis (Bf = 59 %, Fig. 3), and was moderately supported in MP analysis (Bf = 75 %, Fig. 3). Previous studies from both nuclear and mt genes indicated that Taenia mustelae is a sister to Echinococcus (Olson and Tkach 2005). Recently, Nakao et al. (2013) proposed the erection of a new genus, Versteria, for T. mustelae (Versteria mustelae) because this species is more closely related to Echinococcus than to Taenia. Our present results support the erection of the new genus Versteria. Based on molecular phylogenies, Nakao et al. (2013) also proposed the resurrection of Hydatigera, and the members of Hydatigera included Hydatigera taeniaeformis, Hydatigera krepkogorski, and Hydatigera parva. In the present study, the genus Hydatigera (four 4 species) were monophyletic with strong support in all of the three phylogenetic analyses (Bpp = 1, Fig. 2; Bf = 100 %, Fig. 3; Bf = 100 %, Fig. 4), and is a sister to Taenia. The present results suggested that P. crawfordi would be one member of the genus Hymenolepis but not a member of the genus Pseudanoplocephala. Although several distinct morphological features for the two genera Pseudanoplocephala and Hymenolepis in the family Hymenolepididae have been reported, many similar morphological characters confused accurate identification of species in these two genera (Wang 2002). Based on genetic analyses using ribosomal and mtDNA sequences, Jia et al. (2014) and Zhao et al. (2015) suggested that P. crawfordi should be a member of the genus Hymenolepis. To date, mt genomes of many lineages of hymenolepidid tapeworms are still underrepresented or not represented. Therefore, expanding taxon sampling is necessary for future phylogenetic studies of hymenolepidid cestodes using mt genomic datasets.

Conclusion

The present study determined the mt genome of the dwarf tapeworm H. nana of human health significance. Phylogenetic analyses of the mt genome sequences of H. nana with that of 40 other tapeworm species support the monophylies of the orders Cyclophyllidea and Pseudophyllidea, the families Taeniidae, Hymenolepididae, and Diphyllobothriidae. Analyses of mt genome sequences suggest that Taenia taeniaeformis is a member of the genus Hydatigera, and P. crawfordi is a member of the genus Hymenolepis. Our results indicate the further need to study the phylogeny of the class Cestoda with more taxa and different molecular markers.