Introduction

Paramphistomes are known to be the causal agents of an incapacitating disease called amphistomiasis especially in ruminants (Sey 1991). In recent times, the disease has emerged as a significant root cause of productivity loss (Anuracpreeda et al. 2008). Death rates due to immature paramphistomid flukes can be as high as 80–90 % in domesticated ruminants in some foci of infection (Juyal et al. 2003; Ilha et al. 2005; Khan et al. 2008). The disease has been reported in subtropical and tropical areas, where the infection leads to economic losses related to mortality and low productivity (Chethanon et al. 1985; Prasitirat et al. 1997; Kilani et al. 2003). Of the 31 species of digenetic flukes reported so far from cattle, buffalo, goat, sheep, and pig in the northeastern region of India, 25 species represent the amphistome group (Roy and Tandon 1992). Of the various families under the superfamily Paramphistomoidea Fischoeder 1901, only four, viz., Paramphistomidae (comprising Paramphistominae and Orthocoeliinae subfamilies), Olveriidae, Gastrodiscidae, and Gastrothylacidae are represented in the mammalian hosts in Northeast India. Members of the superfamily Paramphistomoidea are digeneans described most perceptibly by the absence of an oral sucker and by the position of the ventral sucker, or acetabulum, at or close to the posterior extremity of the body in both adults and cercariae. The families Paramphistomidae, Olveriidae, and Gastrodiscidae, are restricted to paramphistomoid digeneans, parasitic in mammals, which lack pharyngeal sacs, a cirrus sac, and a ventral pouch (Jones 2005a). The identification of various species of the family Paramphistomidae is rather difficult from a systematic point of view (Mage et al. 2002). Species identification based on assessment of the internal form is made more inexact due to the thick tegument of the parasites (Jones 1990) and the fact that traits used for their characterization include the tedious histological studies of their muscular structures—the phaynx, the acetabulum, and the terminal genitalium (Sey 1991). Such difficulties in establishing the identification of these species could account for conflicting reports of the pathogenicity of amphistomes (Sanabria and Romero 2008). As an alternative to these classical approaches, a more adept choice would be the use of molecular tools (usually DNA sequencing) that allows a speedy and précised identification of genetically diverse but morphologically similar species (Nolan and Cribb 2005). An assortment of genetic markers is now available to detect polymorphisms in nuclear DNA. Ribosomal genes and their related spacers are among the most versatile sequences for phylogenetic analysis (Hershkovitz and Lewis 1996; Coleman 2000, 2003; Coleman and Vacquier 2002; Álvarez and Wendel 2003; Müller et al. 2007; Wickramasinghe et al. 2009; Yan et al. 2013). The large subunit ribosomal DNA (LSU or 28S rRNA), which is a mosaic of several variable and conservative fragments, is often regarded as a phylogenetic marker. Currently, the usage of 28S rRNA and small subunit (SSU or 18S rRNA) has provided a more gravid resolution among the Metazoa (Medina et al. 2001). The 28S rRNA region of eukaryotes consists of 12 divergent domains or expansion segments, which differ greatly in nucleotide composition as well as length among species (Hassouna et al. 1984; De Rijk et al. 1995). Consequently, the region has been widely used for resolving species phylogenies of Digenea as well (Kaukas et al. 1994; Snyder and Tkach 2001; Tkach et al. 2001; Leon-Regagnon and Paredes-Calderon 2002). Nevertheless, the information regarding the 28S and 18S rRNA regions of paramphistomes is still very scanty; a few workers have exploited only the common genetic marker, i.e., the second internal transcribed spacer 2 (ITS2) to describe these flukes (Itagaki et al. 2003; Rinaldi et al. 2005; Goswami et al. 2009; Lotfy et al. 2010; Shylla et al. 2011; Ghatani et al. 2012). The role of rRNA secondary structure has progressively been used to infer phylogenetic study through reconstructing optimal alignment, the “morphological” information of the molecule as a supplementary source of data and refining appropriate models of evolution of the molecule (Coleman 2003, 2007; Subbotin et al. 2007; Thornhill et al. 2007). Besides, the phylogenetic implications of compensatory base changes (CBCs) are defined as “mutations that occur in both nucleotides of a paired structural position while retaining the paired nucleotide bond” (Ruhl et al. 2009) in rRNA secondary structure of 28S rRNA have also been studied by few workers (Wheeler and Honeycutt, 1988; Dixon and Hillis 1993; Chilton et al. 2003). As yet, the secondary structures of the divergent domains of the 28S rRNA region of paramphistomes are still unexplored.

The present study intended to determine the nucleotide differences in the divergent domains (D1, D2, and D3) of 28S rRNA and to ascertain which domains contain informative genetic markers for phylogenetic studies, and to quantify the presence of CBCs that may occur in the secondary structures of the D domains. Thus, in the present study, we assembled 12 species (belonging to eight genera) of paramphistomes in order to assess the degree of variation in the domains of 28S rRNA and furthermore used the informative sequences of 18S rRNA to supplement the findings retrieved from the 28S rRNA data.

Materials and methods

Specimen collection and DNA isolation

Live flukes were collected from various local abattoirs that were situated in different collection sites in various states of Northeast India, viz., Shillong, Jowai, Nongstoin, and Tura (Meghalaya), Dharmanagar (Tripura), and Kohima (Nagaland) (Table 1). The identification of these parasites was performed based on comparisons of morphological features with the voucher specimens mentioned in Table 1. DNA was isolated from the individual flukes using a standard phenol-chloroform technique (Sambrook et al. 1989). The 5’ end of the 28S rRNA gene containing the D1–D3 variable domains was amplified using forward primer dig12 (5′-AAG CAT ATC ACT AAG CGG-3′) with the reverse primer 1500R (5′-GCT ATC CTG AGG GAA ACT TCG-3′) (Tkach et al. 2000). 18S rRNA was amplified using forward primer EukA (5′-AACCCGTTGAACCCCATT-3′) and reverse primer EukB (5′-CCATCCAATCGGTAGTAGCG-3′) (Díez et al. 2001). The thermal gradient of both these marker regions started with an initial denaturation at 95 °C (5 min), annealing at 56 °C (2 min), and final extension at 72 °C (10 min). The resultant PCR products were separated by electrophoresis through 1.6 % (w/v) agarose gels in TAE buffer, stained with ethidium bromide, transilluminated under ultraviolet light, and then photographed. For DNA sequencing, the PCR products were purified using Genei Quick PCR purification Kit and sequenced in both directions using an automated sequencer by DNA sequencing services of Macrogen, Korea.

Table 1 List of parasite species used in the study including their respective host species, locality of collection, and accession numbers

Sequence alignment and analysis

DNA Baser v3.5.3 (http://www.dnabaser.com/) was used to create contigs by assembling forward and reverse sequences of 18S rRNA and 28S rRNA genes, since the full length of these genes could not be retrieved from one-direction sequencing. Boundaries of variable domains of 28S rRNA and 18S rRNA were adjusted manually with the previously aligned sequence of Schistosoma mansoni using Bioedit v7.2.0 (Hall 1999). Gaps were treated as missing data.

Phylogenetic tree construction

Phylogenetic analyses were performed using Bayesian Inference (BI) (Ronquist and Huelsenbeck 2003). The BI analysis of the individual divergent domains of 28S rRNA, concatenated dataset (D1 + D2 + D3), and 18S rRNA was performed using MrBayes version 3.1.2 (Huelsenbeck and Ronquist 2001) to explore relationships between the taxa. The analysis was conducted on the concatenated dataset using the GTR + I + G model, where ngen set to 2–3 × 105, with two runs each containing four simultaneous Markov Chain Monte Carlo (MCMC) chains and every 100th tree saved. Samples of substitution model parameters and tree and branch lengths were summarized using the parameters “sump burnin = 0.25” and “sumt burnin = 0.25.” The topologies were used to generate a 50 % majority rule consensus tree. Posterior probabilities (PP) are given on appropriate clades. The tree formula retrieved from the CON file of MrBayes was imported for editing to FigTree v1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/).

Secondary structure prediction and analysis

Secondary structure of the variable D domains of 28S rRNA sequences of various paramphistome species was predicted using minimum free energy folding algorithms with RNAfold webserver, and structures with the highest negative free energy were chosen (Hofacker et al. 1994). Alignment of the predicted D domains of the secondary structures was performed using 4SALE (Seibel et al. 2006). A CBC table was also constructed for each domain. The alignment was imported into PETfold (Seemann et al. 2011) to display the highlighted reliable base pairing.

Results

Sequence analysis of 28S rRNA domains and 18S rRNA region

The 28S and 18S rRNA contigs gave a length of 1,200 and 1,800 bp, respectively. The 28S rRNA domains, viz., DI, D2, and D3 were 194, 547, and 189 bp, with a GC content of 47.4–50.5 %, 55.9 %–57.2 %, and 60.8 %–62.9 %, respectively (Figs. 1, 2, 3, and 4; Table 2). The sequence identity values of all these regions (Fig. 5a–d) indicated that the highest nucleotide difference (15.4 %) was observed in D1 as compared with D2 (4.4 %), D3 (4.3 %), and 18S rRNA (1.8 %). Amongst the domains, D2 was the most conserved with 0.8 % degree of divergence and D1, the most variable showing 7.9 % degree of divergence (Fig. 6).

Fig. 1
figure 1

Aligned nucleotide sequences of 28S rRNA D1 domain from 12 species of paramphistomes and Schistosoma mansoni. Dots indicate nucleotides identical to those in the top sequence. Dashes indicate alignment gaps (indels)

Fig. 2
figure 2

Aligned nucleotide sequences of 28S rRNA D2 domain from 12 species of paramphistomes and Schistosoma mansoni. Dots indicate nucleotides identical to those in the top sequence. Dashes indicate alignment gaps (indels)

Fig. 3
figure 3

Aligned nucleotide sequences of 28S rRNA D3 domain from 12 species of paramphistomes and Schistosoma mansoni. Dots indicate nucleotides identical to those in the top sequence. Dashes indicate alignment gaps (indels)

Fig. 4
figure 4

Aligned nucleotide sequences of 18S rRNA from 12 species of paramphistomes and Schistosoma mansoni. Dots indicate nucleotides identical to those in the top sequence. Dashes indicate alignment gaps (indels)

Table 2 Genetic markers used in the study and their respective lengths, GC content, and divergence
Fig. 5
figure 5

Similarity index matrices for 28S rRNA domains a D1, b D2, c D3, d 18S rRNA from 12 species of paramphistomes. Circles indicate lowest and highest similarity index values

Fig. 6
figure 6

A graph depicting the degree of divergence amongst the markers of interest. D2 domain emerged as the most robust amongst the markers used

Phylogenetic tree construction

The three domains of 28S and 18S rRNA sequence data were each analyzed independently and concatenated (D1 + D2 + D3) using BI. All trees were compared for similar clustering of taxa. The trees retrieved from the individual 28S domains illustrated dissimilar topologies of the various taxa (Figs. 7, 8, 9, 10, and 11). The D1 tree was the most poorly resolved as compared with the trees of other domains. In D1, tree members of the mentioned subfamilies of Paramphistomidae and members of Olveriidae and Gastrodiscidae (Table 1) do not cluster with their corresponding sister taxa and the branching, and placement of the various taxa poorly conform to the morphology-based taxonomy of these flukes (Fig. 7). D2 provided a better resolution than D1 and D3, and 18S rRNA. The sister species of Orthocoelium (Orthocoeliinae), Olveria (Olveriidae), and Gastrodiscoides hominis (Gastrodiscidae) grouped in concert based on the informative sequences of D2 domain indicating the robustness of this domain in comparison to the other domains (Fig. 8). The tree constructed based on the informative sequences of D3 domain could resolve only members of the Orthocoeliinae and Olveriidae; the nodes were not supported by significant bootstrap values (Fig. 9).

Fig. 7
figure 7

Phylogenetic tree depicting relationships between taxa based on Bayesian Inference analysis of 28S rRNA D1 domain. Posterior probabilities are shown at the nodes, with values <50 not shown

Fig. 8
figure 8

Phylogenetic tree depicting relationships between taxa based on Bayesian Inference analysis of 28S rRNA D2 domain. Posterior probabilities are shown at the nodes, with values <50 not shown

Fig. 9
figure 9

Phylogenetic tree depicting relationships between taxa based on Bayesian Inference analysis of 28S rRNA D3 domain. Posterior probabilities are shown at the nodes, with values <50 not shown

Fig. 10
figure 10

Phylogenetic tree depicting relationships between taxa based on Bayesian Inference analysis of 18S rRNA. Posterior probabilities are shown at the nodes, with values <50 not shown

Fig. 11
figure 11

Phylogenetic tree depicting relationships between taxa based on Bayesian Inference analysis of 28S rRNA (concatenated D domains). Posterior probabilities are shown at the nodes, with values <50 not shown

The BI of 18S rRNA did not yield a good taxonomic resolution at the species level. The fragment consists of relatively long and highly conserved sequences with a divergence of merely 0.102 %. 18S rRNA thus failed to resolve the groupings of these flukes; none of the members clustered with their sister taxa (Fig. 10).

However, the concatenated tree constructed based on the D1 + D2 + D3 domains of 28S rRNA provided a superior topology of the taxa concerned as opposed to 18S rRNA (Figs. 10 and 11). The mentioned members of the various families were well nested accordingly with high support values. The tree was also able to resolve members of subfamily Paramphistominae, which had shown variable nesting in the individual trees of D1, D2, D3, and 18S rRNA.

Secondary structure prediction and analysis

Since the 28S rRNA divergent domains comprise one or a series of reputed helical and nonpairing regions that are valuable for evaluating different levels of taxonomic divergence (Gillespie et al. 2005), their secondary structures were generated based on the consensus of sequence-structure of each of the domains to determine any “morphological” information that may exist as variations in the helices/loops of these domains. Using PETfold 3 consensus structures were predicted for the individual D1–D3 domains (Figs. 12, 13, and 14).

Fig. 12
figure 12

The PETfold output for 28S rRNA D1 domain. (i) Alignment with indication of the sequence conservation and (ii) the predicted RNA structure in dot-bracket format; pairing reliabilities color coded as per Vienna RNA conservation coloring scheme

Fig. 13
figure 13

The PETfold output for 28S rRNA D2 domain. (i) Alignment with indication of the sequence conservation and (ii) the predicted RNA structure in dot-bracket format; pairing reliabilities color-coded as per Vienna RNA conservation coloring scheme

Fig. 14
figure 14

The PETfold output for 28S rRNA D3 domain. (i) Alignment with indication of the sequence conservation and (ii) the predicted RNA structure in dot-bracket format; pairing reliabilities color-coded as per Vienna RNA conservation coloring scheme

In accordance with 4SALE, D1 having a length of 194 bp, comprises 3 helices (H1–H3), of which H1 has eight sub-helices (a–h) and is the most variable helix where most of its sub helices show the least degree of conservation with the exception of H1–f (Fig. 12). The D2 consensus secondary structure is also composed of three helices, with H3 showing the least pairing reliabilities as indicated in the stem; nucleotide variations are scattered to a lesser extent in other helices. However, with a length of 547 bp, the nucleotide difference expressed for D2 was 0.804 %, the lowest amongst the three domains. This may perhaps explain the improved resolution of the taxa in the tree topology (Fig. 13). With a length of 189 bp, the D3 consensus secondary structure generated yielded a 4-helical structure, with H4 showing the least heterogeneity in terms of nucleotide changes. The D3 segment was able to resolve only members of Orthocoeliinae and Olveriidae. This marker proved to be more robust than D1 (Fig. 14).

As depicted in Table 3, the presence of CBCs essentially is shown only in the D2 domain (Table 3). A complete CBC was noted at positions 402 and 484 (G-C ⟺ A-U) between Olveria indica and members of Orthocoeliinae and between O. indica and members of Paramphistominae. Such a transition was also observed interestingly with Olveria bosi (Fig. 15a). A transitional mutation (A-U ⟺ G-C) was also found at position 43 between Explanatum explanatum and Calicophoron calicophorum and at position 167 between E. explanatum and O. bosi (Fig. 15b). Another complete CBC was detected at positions 229 and 344 between Cotylophoron cotylophorum and Calicophoron shillongensis (G-C ⟺ A-U) (Fig. 15c). These may possibly be the positions that have gathered high substitutions in D2 stems.

Table 3 CBC table of D2 domain of 28S rRNA
Fig. 15
figure 15

Magnified stem regions of D2 predicted secondary structures highlighting CBCs between a Olveria indica and other spp in Helix 3; b E. explanatum and C. calicophorum, E. explanatum, and Olveria bosi Helix 1; c C. cotylophorum and Calicophoron shillongensis in Helix 2

Discussion

In the analysis performed using the various divergent markers, viz., 28S (D1-D3 domains) and 18S rRNA individually and collectively (D1 + D2 + D3), the D1 and D3 expansion segments of the 28S gene showed significant interspecific sequence differences among the paramphistome taxa. The inability for D1 to resolve the taxa may, therefore, be attributed to the mutational pattern found in its H1 helix of D1. This is in concordance to the earlier findings, thereby implying that D1 domain is in fact more appropriate for inference of phylogenetic relationships among closely related families, genera, and some species in the Digenea (Barker et al. 1993). D2, however, emerged as the most robust marker that could provide efficient nesting of flukes in accordance with their taxonomic placement and thus yielded the best resolution. Since D2 could discriminate between closely related species as compared with other domains and 18S rRNA, this domain may be used as a species diagnostic marker possibly contributing to a more reliable phylogenetic inference of paramphistomes.

The concatenated D domains of 28S versus 18S rRNA produced a tree where the former resolved the taxa by the well supported nesting of the members of the paramphistomid group in concordance to their subfamilies; the 18S gene, however, could not resolve the species with the same conformity as 28S rRNA (Zhao et al. 2012). Since it evolves at a slow rate, 18S rRNA is unable to resolve species-level differences between lineages and is considered well suited for evaluating deep-level relationships among organisms (Adoutte et al. 2000; Van de Peer et al. 2000; Fontaneto 2011). Thus, 18S rRNA has proven to be useful for resolving phylogenies at higher taxonomic levels within metazoan groups (Field et al. 1988; Abele et al. 1989; Friedrich and Tautz 1995; Blair et al. 1996; Aguinaldo et al. 1997; Campos et al. 1998; Whiting 1998; Hwang and Kim 1999; Cruickshank 2002). Conversely, 28S rRNA marker is much larger in size and has more variation in the rate of evolution compared with 18S rRNA (Hwang and Kim 1999). The 28S rRNA D domains have been employed as effective genetic markers for determining phylogenetic relationship both at lower and higher taxonomic levels (Al-Banna et al. 1997; Al-Banna et al. 2004; Duncan et al. 1999; Subbotin et al. 2005, 2007, 2008; Vovlas et al. 2008) and may therefore be a well-suited marker for inferring the phylogeny of paramphistomes. Furthermore, its respective domains, in particular D2, may be used as an effective marker for species identification. As a whole, the phylogenetic trees that could resolve the paramphistomid flukes indicated that the subfamily Orthocoeliinae shared similar historical patterns with the family Olveriidae than with its sister subfamily Paramphistominae; the two subfamilies of Paramphistomidae do not cluster together in any of the trees constructed, thus indicating a possible divergence of the members. Species belonging to Paramphistominae are variable in their taxonomical nesting whereby clustering of C. cotylophorum and E. explanatum could not be resolved by any of the markers. The family Gastrodiscidae (Homalogaster paloniae and Gastrodiscoides hominis) forms a deeply divergent clade from the rest of the families; this may be explained by the distinct morphological features of these members which are characterized by a dorsoventrally flattened body, which, in some taxa, appears as divided into two parts unlike paramphistomids of the present study (Jones 2005b). Incidently, G. hominis is also the only zoonotic amphistome.

Secondary structures, predicted based on the sequence-structure alignment, assist in providing a precise evaluation of nucleotide similarity that is sourced from the same evolutionary origin (Dixon and Hillis 1993; Kjer 1995; Chilton et al. 2001). Secondary structures of the variable regions of 28S rRNA have been used as effective tools for phylogenetic studies (Bachellerie and Michot 1989; Hwang et al. 2000). The nucleotide variations observed in the consensus secondary structures in the present study substantiate the findings of primary homology; there is high variation between taxa in the base composition of helices. The divergence in 28S rRNA domains is thus contributed by the variability of these helices with the D2 region being the most informative.

The CBCs observed in the secondary structures were classified as a Type I substitution that changes one pair of complementary bases to another pair (Dixon and Hillis 1993). The compensatory mutations in stems are associated with upholding of the secondary structures (Hancock et al. 1988; Ramirez and Ramírez 2010). The divergent domains of the 28S rRNA, even though not used in the inference of higher-level phylogenetic analysis, can be used for lower-level analyses, i.e., at the species or even subspecies level (Littlewood 1994; Mallatt and Sullivan 1998; Jarmen et al. 2000; Litvaitis et al. 2000; Winchell et al., 2002). The high nucleotide heterogeneity in the D domains of the 28S rRNA gene amongst paramphistomid species may be valuable for species distinction. In the case of nematodes, the D2-D3 expansion segments are promising candidates for DNA barcoding (De Ley et al. 2005; Bae et al. 2010). The D2 segment of 28S rRNA may consequently be considered a potential complement to mitochondrial DNA-based barcodes as well.

Regarding paramphistomes, the divergent domains of 28S rRNA and their secondary structure prediction has not been explored, so far. The present study provides the first ever information on this aspect. Identification of any varied structural constraints still necessitates more data from different taxa of Paramphistomidae. The diversity spectrum of paramphistomids is still highly undervalued both at the morphological and molecular level. A molecular approach will therefore, expedite the estimation of this group of parasites of veterinary importance.