Introduction

Clonorchis sinensis is endemic in the Far East Asia, i.e., in China, Korea, Japan, Taiwan, northern Vietnam, and the Russian far east. It is estimated that about 35 million people are infected with this fluke in these endemic areas (Lun et al. 2005). People are infected by eating the flesh of cyprinoid fish (the second intermediate host) infected with C. sinensis metacercariae. Metacercariae excyst in the duodenum and then migrate up into the intrahepatic biliary duct and grow to ovigerous adult flukes. In vivo flukes provoke hyperplastic and metaplastic changes in the biliary epithelium and in some cases promote a neoplastic outcome namely, cholangiocarcinoma (Rim 2005).

The genome-sequencing project of Schistosoma mansoni resulted in the genome assembly and a gene structure annotation, which included 11,787 protein-coding gene structures. Gene structures were annotated by combining all evidence of ab initio gene prediction, expressed sequence tag (EST) and protein alignments, and S. japonicum conserved regions (Haas et al. 2007). The genome structures of parasites can be explored cost effectively by analyzing ESTs. Moreover, collections and analyses of large EST data sets of parasites have produced discoveries in the fields of pathogenesis, metabolism, host–parasite interactions, host adaptation, and gene expression and regulation. More than 1,462,000 ESTs have been registered from human and animal parasites in the publicly accessible dbEST datasets, which includes about 264,000 ESTs from human-infecting trematodes such as S. amnsoni, S. japonicum, C. sinensis, Opisthorchis viverrini, and Paragonimus westermani (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html). High-throughput analyses of the ESTs of parasites have produced a large amount of genetic and molecular biological information, which has broadened and strengthened our understanding of the pathobiological mechanisms of parasites (Hu et al. 2003; Verjovski-Almeida et al. 2003; Merrick et al. 2003; Kim et al. 2006).

Three secretory proteins, with known roles in tumorigenesis, were identified by large-scale EST analysis of a sister fluke, O. viverrini in Thailand, which is known to induce cholangiocarcinoma (Laha et al. 2007). To date, 2,551 ESTs have been collected from C. sinensis and deposited in public databases (Lee et al. 2003; Cho et al. 2006). To identify novel gene products and new drug targets and to identify vaccine candidates to control C. sinensis infections, a large amount of genetic information data should be available on diverse biological topics. The present study was performed to collect ESTs from C. sinensis metacercaria to provide a fundamental genetic basis for the juvenile fluke.

Materials and methods

cDNA library construction

C. sinensis metacercariae were collected by artificially digesting (Hong et al. 2000) Pseudorasbora parva collected in Shenyang, Lyoning, China. Metacercariae (1.5 g) were homogenated in guanidium thiocyanate buffer, and total ribonucleic acid (RNA) was extracted using a CsCl-ultracentrifugation method, as described previously (Hong et al. 2000). The extracted total RNA (0.49 mg) had an A260/280 ratio of 1.77 and was qualified by RNA–gel electrophoresis. A complementary deoxyribonucleic acid (cDNA) library was constructed using 0.17 mg of total RNA in HybriZAP-2.1XR Vector, according to the manufacturer’s instructions (Stratagene, CA). The resultant cDNA library had a titer of 2.0 × 105 plaque-forming units per 1 μg of bacteriophage DNA. The cDNA library was checked for quality by plaque polymerase chain reaction employing forward and reverse primers of the library vector. The cDNA insertion rate of the library was 86.2%, and the length of the cDNAs ranged from 0.75 to 2.5 kb. The cDNA library in the bacteriophage was then transcribed into phagemid DNA by massive in vivo excision and converted into a plasmid cDNA library.

Sequencing and homology searches

To qualify insertion rate and lengths of cDNA inserts in the library before massive sequencing, 96 colonies were randomly selected and sequenced from the C. sinensis cDNA library. Plasmid DNA was extracted and sequenced once with a forward primer of pAD-GAL4 vector (5′-ATGATGAAGATACCCCACCAAA-3′) and a Big-Dye reaction mix using an automatic sequencer (ABI Prism 3700, Perkin-Elmer). This preliminary sequencing revealed a cDNA insertion rate of 86.5%, a redundancy of 19.3%, and an average cDNA length of 600 bp. On this basis, another 504 plasmid colonies were randomly selected from the same cDNA library and sequenced, as described previously Cho et al. 2006).

cDNA reads were trimmed off vector and adaptor sequences and assembled into clusters using TIGR assembler version 2.0 (www.tigr.org). Clusters were subjected to homology searches in GenBank using BLASTX and annotated as corresponding cDNAs of C. sinensis when their e values were smaller than 10−5. Homologues of the remainder clusters were searched using BLAST N and annotated as described above. Nonannotated cluster were viewed as unknown or ESTs.

Comparison with the adult C. sinensis EST pool

To obtain an insight in the developmental gene expression and regulation of C. sinensis, the metacercariae ESTs collected here were compared with an EST pool comprised of 2,387 ESTs, which were previously collected from adult C. sinensis by our colleagues (Cho et al. 2006). The two EST pools were compared in terms of annotation rates, common ESTs, and functional groups.

Results

Annotation and classification

From the cDNA library of C. sinensis metacercariae, 600 sequences were read and processed to 419 ESTs after trimming off vector and adaptor sequences and excluding reads smaller than 100 bp. ESTs were registered in GenBank under accession numbers EV523942–EV524360. The ESTs were 660 bp long on average and assembled into 322 clusters. Clusters shorter than 200 bp were 11.2%, clusters longer than 1,000 bp were 32.9%, and the longest cluster was of 1,495 bp. Of the clusters, 252 (78.3%) were singletons, 48 (14.9%) consisting of two ESTs, 13 (4.0%) consisted of three ESTs, 4 (1.2%) consisted of four ESTs, and 2 (0.6%) consisted of five ESTs (Fig. 1). There were three clusters assembled each with 6, 9, and 12 ESTs, respectively. The biggest cluster containing 12 ESTs was a hypothetical protein of Neurospora crasa.

Fig. 1
figure 1

Frequency distribution of ESTs in each contig

Among the whole clusters, 186 (57.8%) clusters were found to have homologues in databases and were annotated accordingly. Some were classified into ten functional groups as reported elsewhere (Cho et al. 2006; Table 1). The clusters constituting each functional group were as follows. The largest group of structural and cytoskeletal proteins comprised 21 proteins, e.g., myosin, myosin heavy and light chains, microtubule-associated protein tau, gap junction protein (pannexin), and T complex protein 1 α-subunit. The second group of transcription and translation proteins contained ten clusters, e.g., transcriptional regulation, DnaJ (HSP40 homologue), reverse transcriptase, 60S ribosomal protein L34, and 40S ribosomal protein 527. The third group of kinases and phosphatases consisted of eight clusters, e.g., protein kinase C type beta, protein kinase G, protein phosphatase A-2, and pyruvate kinase. The fourth group associated with energy metabolism had eight clusters, e.g., glycogen phosphorylase, fumarate hydratase, glycezatdehyde-3-phasphatedehydrogenase type 2, adenosine diphosphate/adenosine triphosphate (ATP) carrier, ATP synthase lipid-binding protein-like protein, and malate dehydrogenase. The fifth group of metabolic proteins and enzymes contained eight clusters, e.g., disulfide isomeras-related protein, β-N-acethylhexosamidase, bile acid β-glucosidase, amidase, and histone deacetylase 3. The sixth group of DNA scaffold and DNA-binding proteins contained seven clusters, e.g., heterochromatin protein 1 beta, poly(A)-binding protein, single-stranded DNA-binding protein, and TAR DNA-binding protein. The seventh group of regulatory and single proteins comprised seven clusters, e.g., proliferation associated protein 1, erbB3-binding protein EBP1, receptor-mediated endocytosis RME-1, and retinoblastoma-binding protein. The eighth group containing proteases and inhibitors had six clusters, e.g., cathepsin B, carboxypeptidase H precursor, and serpin-like protease inhibitor. The ninth group contained transporters and channels, e.g., clusters of high voltage-activated calcium channel α subunit Cavα, Na/Ca exchanger, transient receptor potential cation channel, and FMRFamide-gated and pH-modulated sodium channel. Furthermore, the tenth and final group contained a cluster of T cell-recognized antigens.

Table 1 Functional classification of the annotated clusters of C. sinensis metacercaria ESTs

Genes expressed in metacercariae and adult stages

The annotated 186 EST clusters were compared with the annotated 848 clusters of adult C. sinensis analyzed previously (Cho et al. 2006). The messenger RNAs (mRNAs) expressed in both metacercariae and adults fell into ten clusters of DnaJ (Hsp40 homologue), endoplasmin, peptide chain release factor 3, GAPDH, leukotriene A-4 hydrolase, HMG1-like protein, and four unidentified clusters (Table 2).

Table 2 List of clusters shared by C. sinensis metacercariae and adults

The most abundantly expressed genes in metacercariae were a group of structural and cytoskeletal proteins, followed by transcription and translation machinery proteins, and a group of energy metabolism proteins. In contrast, abundant mRNA clusters of adult C. sinensis contained regulatory and signal proteins, other metabolic proteins and enzymes, and structural and cytoskeletal proteins in descending order, which differed from that observed in metacercariae (Fig. 2). The genes of structural and cytoskeletal proteins, DNA scaffolds, energy metabolism enzymes, and kinases and phosphatases were expressed more frequently in metacercariae than in adults. The genes expressed more abundantly in adults than in metacercariae coded regulatory and signal proteins, other metabolic proteins and enzymes, proteases and inhibitors, and antigenic proteins (Fig. 2).

Fig. 2
figure 2

Comparison of C. sinensis metacercariae and adults to identify developmental gene expression using annotated and classified clusters. The scale represents the percentage frequencies of ESTs

Discussion

In this study, 419 ESTs were collected from a C. sinensis metacercariae cDNA library and assembled into 322 clusters, which consisted of 78.3% singletons and 21.7% contigs. By homology searches, more than half (57.8%) of the clusters were annotated to putative proteins with high probabilities, while the others remained unidentifiable because homologies (e values) were unacceptably high or no match was found. Moreover, clusters assembled with more than four ESTs appeared homologous with hypothetical proteins or unnamed proteins or did not match any record in the public data bases searched. Genes expressed highly in metacercariae were supposed to be unique. Metacercariae parasitize cold-blooded intermediate hosts, which provide a complete different environment from that encountered during the adult stage. Metacercariae in the muscles of freshwater fishes are in a resting stage wherein they simply maintain a basal metabolic status. In contrast, adult C. sinensis have a high metabolic rate and produce large numbers of eggs in mammalian hosts (Rim 2005). Thus, it might be expected that the physiological features of metacercariae are likely to be unique. Research on helminthic parasites has been focused largely on the adult stage, which provokes clinically significant disease. Many proteins, cDNAs, and ESTs have been identified in C. sinensis adults and reported to public domains, but scarcely any have been identified in metacercariae. The physiological uniqueness and limited biological information about metacercariae may support the notion that ESTs encoding hypothetical proteins are unique. EST pools provide a solid dataset to further the understanding of the stage-specific physiology, metabolism, and gene expression of metacercaria and adult C. sinensis.

In the present study, ESTs were found in C. sinensis metacercariae that encode proteins homologous with EBP1, histone deacetylase, and retinoblastoma-binding protein. EBP1 is a transcription factor or transcriptional coregulator belonging to the proliferation-regulated protein family (Zhang et al. 2003) and can bind retinoblastoma protein and histone deacetylase-2 and inhibit transcription from cell cycle-regulating promoters to reduce cell proliferation and induce cell differentiation (Squatrio et al. 2004; Zhang and Hamberger 2004; Zhang et al. 2005). In C. sinensis, EBP1 is to regulate cell differentiation during development from metacercariae to the adult stage.

To survive in bile juice, C. sinensis probably needs to neutralize and eliminate potentially toxic bile acids from its body, and in man, bile acids are conjugated by glycosylation, sulfation, and amidation and excreted in urine or bile (Momose et al. 1997). C. sinensis metacercariae were found to have a bile acid β-glucosidase (Matern et al. 2001), which could catalyze the hydrolysis of bile acid 3-O-glucosides (Matern et al. 1997) and facilitate the elimination of the glycosylated conjugated form of bile acids.

The FMRFamide-gated sodium channel, a ligand-gated channel, is activated by Phe-Met-Arg-Phe-NH2 (a neuropeptide), and can be blocked by amiloride or FMRFamide analogues in a pH-dependent manner (Green and Cottrell 1999; Jeziorski et al. 2000). This channel has been found in neurons and in the nervous system (Marks et al. 1995; Davey et al. 2001) and has been reported to be responsible for muscle contraction and parasitic movement (Nelson et al. 1998; Perry et al. 2001).

In the present study, a high voltage-gated Ca2+ channel (VGCC) protein was found to be expressed in C. sinensis metacercariae. VGCCs couple changes in membrane potential to the influx of Ca2+ that is necessary to elicit intracellular signals, such as excitation–contraction coupling, excitation–secretion coupling, and other Ca2+-dependent processes. They are found in muscle, nerves, and other excitable cells. VGCCs are multisubunit protein complexes and are composed of a pore-forming subunit α1 and modulatory subunits β, α2γ, and δ. Phylogenetic analysis divides the α1 subunits into three clusters known as Cav1, Cav2, and Cav3 (Jeziorski and Greenberg 2006). The α1 subunit contains four homologous domains including six transmembrane segments, which form the membrane pore (Doyle et al. 1998), whereas the β subunit (Cavβ) increases current density and ligand binding to the α1 subunit and modulates various kinetic properties of the channel. Moreover, β subunits contain a conserved site, the 30-residue β interaction domain (BID), which spans the SH3 and guanylate-kinase (GK)-like domains. The modified GK domain binds to the α interaction domain of the α1 subunit (Chen et al. 2004). Two variant subtypes of Cavβs (Cavβvar) were cloned from both S. mansoni and S. japonicum, which lacked two highly conserved serine residues in BID (Kohn et al. 2001). The two serines each constitute consensus protein kinase C phosphorylation sites, and schistosome Cavβvar subunits have been shown to confer praziquantel sensitivity to α1 subunits (Kohn et al. 2001), which results in rapid Ca2+ influx and sustained Ca2+-dependent muscle contraction (Andrews 1985). C. sinensis juveniles and adults are highly susceptible to praziquantel (Rim et al. 1980), which suggests that Cavβvar could be expressed and confer praziquantel sensitivity to the α1 subunit in C. sinensis.