Introduction

Coccidiosis is recognized as one of the most widespread and pathogenic parasitic infections in migratory waterfowl throughout the world. It can be caused by several species of Eimeria, as well as other coccidian Apicomplexa (Traill et al. 2009; Hervías et al. 2013; Huang et al. 2014; Hafeez et al. 2015). Eimeria is an obligate intracellular protist parasite that has a complex life cycle in the intestinal mucosa of the host and is directly transmitted from one animal to another by contact with infected feces (Lin et al. 2011; Honma et al. 2011; Huang et al. 2014). Eimeria anseris is an important agent of coccidiosis that is distributed worldwide in domestic poultry, particularly ducks, geese, and swans in the family Anatidae. E. anseris can cause serious damage to the digestive tract of the host, resulting in malabsorption of nutrients and diarrhea, which often causes weight loss and can lead to death (Abbas et al. 2008; Ding et al. 2008).

Mitochondrial DNA (mtDNA) is short in length, without introns, and has short intergenic regions. It has been extensively used as a genetic marker, not only for phylogenetic analyses at many different taxonomic levels but also serving as an ideal model for studying gene rearrangement and genome evolution (Liu et al. 2013, 2014). Several complete mitochondrial DNA sequences have been published for Eimeria (Lin et al. 2011). Mitochondrial genome organization in Eimeria is quite different from that of most eukaryotes (Feagin et al. 2012; Ogedengbe et al. 2014; He et al. 2014). Complete Eimeria mtDNA is typically a little longer than 6 kb and contains three protein-coding genes (Cyt b, COI, and COIII), 12–19 gene fragments for large subunit (LSU) ribosomal RNA (rRNA), and 7–14 gene fragments for small subunit (SSU) rRNA (Lin et al. 2011; Ogedengbe et al. 2014; Tian et al. 2015; Hafeez et al. 2016). It is highly conserved, short in length, has no introns, and contains short intergenic spacer regions (Feagin et al. 2012; Ogedengbe et al. 2014; He et al. 2014). Unlike most eukaryotes, the mitochondrial genomes of Eimeria species do not possess 5S rRNA or transfer RNA genes (Ogedengbe et al. 2014). In spite of these conserved features, Eimeria mtDNA genome structures vary widely, reportedly including linear concatemers, linear genomes with terminal inverted telomeric repeats, and circular genomes (Ogedengbe et al. 2014).

The greater white-fronted goose (Anser albifrons) is a long-distance migratory waterfowl, and an important wetland indicator species within the family Anatidae, in the order Anseriformes (IUCN 2018). We previously found greater white-fronted geese to be infected with many parasites, including E. anseris (unpublished). E. anseris can cause serious coccidiosis disease, resulting in malabsorption of nutrients and diarrhea, which affects the life and survival of migratory waterfowl. Anser albifrons is large migratory colonial water bird that winters in wetlands and is prone to parasite infection. Attempts have been made to molecularly characterize geese parasites using various nuclear loci; however, the maternally derived and mitotically replicating mitochondrial genome may be more appropriate for molecular epidemiology.

Here, we report the characterization and organization of the complete E. anseris mitochondrial genome isolated from wintering greater white-fronted goose feces at Shengjin Lake, China. The purpose of the present study is not only to determine the genomic organization and structure of the mitochondrial genome of E. anseris but also to increase attention by public health and ornithological researchers to coccidiosis in waterfowl. Furthermore, our phylogenomic analysis will shed increased light on the coevolutionary relationship between Eimeria species and hosts.

Materials and methods

Sample collection, source, and identification of E. anseris

E. anseris oocysts were isolated from wintering greater white-fronted goose feces. A noninvasive sampling technique was used to collect fecal samples from wintering greater white-fronted geese from Wangba Village (30° 16′ 58.89″ N, 117° 0′ 14.64″ E) at Shengjin Lake, Anhui Province, China, from November 2017 through April 2018. E. anseris oocysts were washed in physiological saline (0.9% sodium chloride) and identified based on morphological characters. Isolation and pretreatment of oocysts followed the method of Yan et al. (2012). Total DNA was extracted from fecal samples using the TIANamp Stool DNA Kit from Tiangen Biotech CO., LTD (Beijing, China). The mtDNA was sequenced by GeneSky Biotech Co., Ltd. (Shanghai, China) using next-generation sequencing (NGS) technology.

Genome annotation and sequence analysis

DNA sequences were analyzed using the programs Seqman (DNASTAR 2001), BioEdit, and Chromas v. 2.22. The protein-coding and rRNA gene boundaries were identified by alignment with the mitochondrial genomes of other Eimeria species. The complete E. anseris mitochondrial genome sequence has been deposited in GenBank under accession number MH758793.

Phylogenetic analyses

To explore phylogenetic relationships among Eimeria, we collected all available complete Eimeria mtDNA sequences in GenBank, plus our sequence, and Isospora sp. mtDNA (KP658103) as an outgroup (Table 1). We aligned the 21-member coccidian data set, using ClustalX v. 2.1 (Thompson et al. 1997), as implemented in Mega 4.0 (Tamura et al. 2007), followed by manual adjustment.

Table 1 GenBank accession numbers for the 20 complete mtDNA of Eimeria species in this study

Phylogenetic trees were then estimated using the maximum likelihood (ML) and Bayesian inference (BI) methods. ML and BI phylogenetic trees were reconstructed using PAUP* v. 4.0b8 (Strimmer and Haeseler 1996) and MrBayes. v. 3.1.2 (Strimmer and Haeseler 1996), respectively, specifying separate partitions for each gene within the Nexus format file (Strimmer and Haeseler 1996). ML analyses were performed in PAUP using tree bisection and reconnection (TBR) branch swapping (10 random addition sequences) and a general time-reversible model with invariant sites and among-site variation (GTR+I+Γ). This model was selected as the best-fit model of evolution using Modeltest v. 3.06 based on the Akaike information criterion (AIC). The support for internal branches in the ML tree was evaluated via the bootstrap test with 100 iterations.

Bayesian inference of phylogeny was performed using MrBayes (Ronquist and Huelsenbeck 2003) with the same best-fit substitution model as with the ML analysis. MrBayes simultaneously initiates two Markov chain Monte Carlo (MCMC) runs to provide a better estimated confirmation of convergence of posterior probability distributions. Analyses were run for one million generations until the average standard deviation of split frequencies was less than 0.01, which indicates that the convergence had most likely been reached. Chains were sampled every 1000 generations.

Results

Genome organization and arrangement

The complete E. anseris mtDNA is 6179 bp in size and contains three protein-coding genes (CYT B, COI, and COIII), 12 gene fragments for LSU rRNA, and seven gene fragments for SSU rRNA, but no transfer RNA genes (Fig. 1, Table 2). The longest gene is COI at 1476 bp in size, located between CYT B and LSUF. The shortest gene fragment is LUSC (16 bp), located between LSUG and SSUF. Overall base composition was as follows: A, 29.8%; C, 17.2%; G, 17.7%; and T, 35.3%. Overall A+T content is 65.1%, and the C+G content is 34.9%, with guanine being the rarest nucleotide, as in most all Eimeria mtDNAs (Fig. 2).

Fig. 1
figure 1

Arrangement of Eimeria anseris mitochondrial genome. SSU rRNAs are colored green, protein-coding genes are colored orange, and LSU rRNAs are colored red

Table 2 Organization of complete Eimeria anseris mtDNA
Fig. 2
figure 2

Nucleotide composition (%) of Eimeria mitochondrial genomes used in the study. Notes: A, E. anseri; B, E. acervulina; C, E. adenoeides; D, E. brunette; E, E. dispersa; F, E. gallopavonis; G, E. innocua; H, E. intestinalis; I, E. irresidua; J, E. magna; K, E. maxima; L, E. media; M, E. meleagridis; N, E. meleagrimitis; O, E. mephitidis; P, E. mitis; Q, E. necatrix; R, E. praecox; S, E. tenella, T, E. vejdovskyi

Protein-coding genes and ribosomal RNA genes

The total, combined length of the three protein-coding sequence (CDS) regions is 3336 bp, which represents 53.60% of the entire mitochondrial genome. The longest CDS is COI (1746 bp), located between CYT B and LSUF. The shortest is COIII, which is between LSUA and LSU1, and is 780 bp in length. CYT B begins with an ATG start codon; COI and COIII begin with ATT. The standard stop termination codon TAA occurs in CYT B and COI, and COIII stops with TAG (Fig. 3, Table 2).

Fig. 3
figure 3

Start codon use (a) and stop codon use (b) of three mitochondrial protein-coding genes, Cyt b, COI, and COIII, in 20 Eimeria species

The E. anseris mtDNA contains 12 gene fragments for LSU rRNA (LSUF, LUSG, LUSC, LSU10, LSU13, LSUD, LSU2, LSUA, LSU1, LUSB, LSU3, and LSUE) and seven gene fragments for SSU rRNA (SSUA (SA), SSUF, SSUD (SD), SSU9, SSU8, SSUB, and SSUE), but no transfer RNA genes. The LSU rRNA gene fragments range in length from 16 to 188 bp; the longest is LSUE, and the shortest is LUSC. The longest SSU rRNA is SSUB, which has a length of 116 bp; the shortest is SSUF, which is 61 bp long.

Phylogenetic reconstructions

The complete mtDNA sequence ML and BI phylogenetic trees of the 20 Eimeria species share identical topologies and high-node support values (Fig. 4). The trees divide the sequences into two primary clusters. Some species cluster into clades associated with the parasites’ hosts (e.g., species isolated from rabbits), but many species do not follow clear coevolutionary host segregating patterns (e.g., species isolated from both chickens and turkeys).

Fig. 4
figure 4

Phylogenetic relationships of 20 Eimeria species based on complete mitochondrial DNA. Numbers at each node indicate Bayesian posterior probabilities (left) and maximum likelihood bootstrap proportions (right). Isospora is the outgroup

One primary cluster contains a very well-defined, monophyletic clade of Eimeria species, E. praecox, from chicken, sister to both E. anseris, which infects A. albifrons, and E. acervulina, also from chicken. This clade is sister to three other chicken-infecting species, E. brunetti, E. mitis, and E. maxima. The other major clade in this primary cluster contains E. gallopavonis, E. adenoeides, and E. meleagridis, found in turkey, sister to E. tenella, and E. necatrix, from chicken, all sister to E. meleagrimitis, from turkey.

The other primary cluster assorts E. dispersa and E. innocua, from turkey, sister to E. mephitidis, from striped skunk. E. irresidua, E. intestinalis, E. magna, E. media, and E. vejdovskyi, all from rabbit, form a well-supported clade.

In summary, Eimeria species that infect chickens and turkeys do not all assort to host-specific clades. Furthermore, E. anseris, which infects A. albifrons, belongs to a monophyletic clade with species that infect chicken.

Discussion

Mitochondrial genome feature annotation

All sequenced Eimeria mitochondrial genomes are compact, with short intergenic spacer regions, and all are quite similar in length (Lin et al. 2011; Tian et al. 2015; Ogedengbe et al. 2014; Hafeez et al. 2015, 2016). The longest, E. mitis, is 6408 bp in length, and the shortest, E. maxima, is 6167 bp long (Table 3). Compared with 19 previously sequenced Eimeria mitochondrial genomes, E. anseris contains 4536 bp absolutely conserved sites, which represents 73.16% of the genome.

Table 3 Nucleotide composition (%) of study Eimeria mitochondrial genomes

E. anseris mtDNA nucleotide composition is biased toward A and T, with T (35.3%) being the most common nucleotide and C (17.2%), the least common. This is different from most Eimeria mtDNA, in which the rarest nucleotide is G (Fig. 2, Table 3) (Lin et al. 2011; Ogedengbe et al. 2014). Overall, E. anseris mtDNA A+T content is 65.1%, and C+G content is 34.9%, similar to other Eimeria, all with higher A+T content than C+G (Table 3) (Lin et al. 2011; Tian et al. 2015; Ogedengbe et al. 2014; Hafeez et al. 2015).

Protein-coding gene variation

E. anseris mtDNA initiation and termination codons follow the same pattern as other members of Eimeria. Most CDS regions in E. anseris use ATG as a start codon (Table 4). A few exceptions in our and in previous studies show Eimeria employing ATT, GTT, ATA, TTA, or TTG (Lin et al. 2011; Tian et al. 2015; Ogedengbe et al. 2014; Hafeez et al. 2015, 2016). Stop codons are also similar across species, with TAA and TAG occurring most frequently across all 20 Eimeria species (Lin et al. 2011; Tian et al. 2015; Ogedengbe et al. 2014; Hafeez et al. 2015, 2016). Specific examples from the three CDS regions of all 20 Eimeria species include the following: the CYT B initiation codon is ATG, and the termination codon is TAA in all, except in E. falciformis, which stops with TAG. COI starts with ATG, ATT, GTT, or ATA and ends with TAA in all 20 Eimeria species. And all COIII CDS regions end with TAA, but E. anseris and E. acervulina end with TAA.

Table 4 Predicted initiation and termination codons for three mitochondrial protein-coding genes in study Eimeria mitochondrial genomes

Phylogenetic relationships

Coevolution between Eimeria species and hosts has contributed to phylogenetic relationships within the genus, and the present study partially corroborates this assertion (Miska et al. 2010). This is seen in our research in that all Eimeria species isolated from rabbit, E. irresidua, E. intestinalis, E. magna, E. media, and E. vejdovskyi, are a monophyletic group (Vrba and Pakandl 2014; Liu et al. 2015; Hafeez et al. 2015, 2016). However, our phylogenetic analyses show that host swapping with subsequent divergence has also occurred in many species. Thus, some species from turkeys, E. gallopavonis, E. adenoeides, and E. meleagridis, in one highly supported, unique clade, do not group with the other turkey eimerians, E. meleagrimitis, E. dispersa, and E. innocua. Hence, Eimeria infecting turkeys are a paraphyletic group, which is a view different from previous research (Barta et al. 1997; Lew et al. 2003; Yabsley 2009; Miska et al. 2010; Ogedengbe et al. 2014). The results also show that some of the Eimeria species that infect chicken, E. tenella and E. necatrix, are more closely related to species that infect turkey. Thus, chicken-infecting Eimeria is also a paraphyletic group. Furthermore, E. anseris, which infects A. albifrons, and E. praecox, E. acervulina, E. brunetti, and E. mitis, all isolated from chicken, form one highly supported clade, sister to, though not as highly supported, E. maxima, also from chicken. This supports the view held by other researchers of a close phylogenetic relationship within these species (Poplstein and Vrba 2011; Ogedengbe et al. 2014).

The present study sequenced, annotated gene and genome organization, and reported for the first time the complete mtDNA sequence of E. anseris. The mt genome of E. anseris infecting A. albifrons is similar with respect to genome size, organization, start codon positions, and overall base composition as all other Eimeria. Coevolution between Eimeria species and hosts has contributed to phylogenetic relationships within the genus, and the present study partially corroborates this assertion using complete mtDNA. Our molecular phylogenetic analyses show some species clustering into clades associated with the parasites’ hosts, but many species do not follow clear coevolutionary host segregating patterns. The mtDNA sequence provides useful genetic data for addressing further questions in the systematics and population genetics of these and related Eimeria of relevance to waterfowl. The nature of the E. anseris mt genome makes the sequence highly suited for the development of diagnostic assays as well, potentially providing genetic markers for molecular epidemiology and the study of coccidia phylogenetics.