Introduction

The A subunit of Eukaryotic Elongation Factor 1 is one of the most abundant cytoplasmic proteins (Slobin 1980). In addition to its key function in the elongation step of protein biosynthesis, several additional functions have been attributed to the elongation factor (Yang et al. 1990). Because of its abundance, cDNAs transcribed from eEF1A genes have been isolated from nearly every organism for which an expressed sequence tag (EST) project has been established. Nucleotide and amino acid sequences have been used for phylogenetic studies (Hashimoto et al. 1995), although attempts to use these sequences to resolve deep evolutionary relationships between extant taxa have not been particularly successful (Hashimoto et al. 1995). Nonetheless, eEF1A sequences provide relatively consistent results in resolving more recent evolutionary branches (Cho et al. 1995). The high expression level of the eEF1A gene has prompted several groups to use its promoter region as a driver in chimeric expression vectors (Linney et al. 1999). These constructs have only been applied to mammalian cell cultures, perhaps because very few eEF1A genes have been functionally characterized in eukaryotes other than the few standard models. Although the Caenorhabditis elegans genes for eEF1A were completely sequenced and partially characterized (Bieri et al. 2007), the function of their promoter regions has not been described.

Because of the crucial functions of eEF1A, the genes that code for this protein are usually present at more than one copy per haploid genome (Liu et al. 1996). Indeed, the parasitic protozoan Trypanosoma brucei contains 8 to 10 eEF1A genes per haploid genome (Ridgley et al. 1996).

In the metazoans, there are two or three copies of the eEF1A gene, each of them differentially expressed during development or expressed in distinct tissues (Shepherd et al. 1989; Deschamps et al. 1991; Abdallah et al. 1991; Lee et al. 1992, 1993). A number of diverse biological processes are known to involve eEF1A protein, including actin bundling (Yang et al. 1990) and aging (Shepherd et al. 1989; Negrutskii and El’skaya 1998). Non-expressed pseudogenes resembling eEF1A have been described in the human genome (Madsen et al. 1990) but not in the genomes of other metazoans.

Although partial cDNAs exist for several parasitic nematodes (Wylie et al. 2004), descriptions of eEF1A genes are rare in Nematoda and are restricted to the genus Caenorhabditis. In order to better understand its phylogenetic position among the rhabditid nematodes we characterized the eEF1A genes of O. tipulae. O. tipulae (strain CEW1) has been used as a comparative genetic model in nematodes (Félix 2006) and was the second nematode (after C. elegans) in which operons were shown to be present (Evans et al. 1997). Although morphologically similar to C. elegans, the vulval precursors of O. tipulae arise by double induction of the anchor cell (Dichtel-Danjoy and Félix 2004). In addition, one of the vitellogenin genes of O. tipulae uses different codon combinations from those of C. elegans (see codon usage table, Winter et al. 1996). Ahn and Winter (2005, 2006) showed that although their genomes are the same size, the GC content of O. tipulae genomic DNA is significantly higher than that of C. elegans. Recent results from Baïlle et al. (2008) showed that O. tipulae populations harbor fivefold higher genetic diversity than Caenorhabditis spp. populations.

These results indicate that some introns of O. tipulae eEF1A genes are absolutely conserved in sequence and position, a feature not yet detected in any other metazoan eEF1A gene. Compared to other nematode species, the O. tipulae genome contains twice the number of genes coding for eEF1A. A GenBank search revealed eEF1A ESTs from other species, and these were assembled into complete coding sequences (when possible) and used for phylogenetic analysis of nematodes.

Materials and Methods

Worm Cultures and DNA/RNA Purification

Caenorhabditis elegans, strain N2, and Oscheius tipulae, strain CEW1, were maintained in NGM plates (22-mm diameter) with a lawn of E. coli (Strain NA22), as previously described (Stiernagle 1999). Larger quantities of worms were obtained from 90-mm NGM plates or 1500-ml liquid cultures according to Sulston and Hodgkin (1988). The isolated worms were thoroughly washed in S buffer, floated once in 30% sucrose, kept in S buffer at room temperature for 2 h for intestine cleansing, and washed again before use.

DNA extractions were performed by incubating the cleansed worms in 1 vol of lysis buffer (100 mM Tris–HCl, 50 mM EDTA, 100 μg/ml proteinase K, and 1.0% SDS) at 65°C for 30 min, followed by extractions with 1:1 Phenol:chloroform and chloroform. The genomic DNA was precipitated with ammonium acetate and ethanol (Winter et al. 1996).

Bacteriophage Clones and Libraries

The genomic clone λ-EMB-I3-1 was isolated from an O. tipulae genomic library after screening with a cDNA probe (Winter et al. 1996). Screening of a λ-ZAP II (Promega) cDNA library (obtained from Dr. Marie-Anne Félix, Université de Paris IV) was performed in six 140-mm NZCYM plates, using two Hybond-N replicas for each. Alkali-treatment, washing, hybridization, and phage clone purification were performed as described by Sambrook et al. (1989). Phagemid excision was performed with the Ex-Assist helper phage (Stratagene) according to the manufacturer’s instructions.

DNA and RNA manipulations were performed as described by Sambrook et al. (1989) except when noted in the text. Restriction fragments were subcloned into pTZ-19U (Fermentas) or Litmus 29 (New England BioLabs). The PCR fragments were cloned into pGEM-T easy vector (Promega) or Topo-TA (Invitrogen) according to the manufacturer’s instructions. Chemically competent HB101, DH-5α, and XL1-Blue E. coli strains were used for transformation with these plasmids.

Probes for screening and hybridization were radioactively labeled by random priming (Feinberg and Vogelstein 1983) using the RediPrime kit (Amersham) according to the manufacturer’s instructions. Hybridization took place overnight at 37°C in 50% formamide, 5× SSC, 1.0× Denhardt, 50 mM sodium phosphate buffer, pH 6.5, 200 μg/ml ssDNA, and 0.1% SDS after a 2-h prehybridization at 42°C in 50% formamide, 5× SSC, 2.5× Denhardt, 50 mM sodium phosphate buffer, pH 6.5, 25 μg/ml ssDNA, and 0.1% SDS. After the hybridization incubation, blots were washed three times for 5 min in 2× SSC, 0.1% SDS at 22°C, followed by one 30-min wash in 0.1× SSC, 0.1% SDS at 55°C and 5 min in 0.1× SSC at 22°C prior to film exposure.

PCR amplifications and DNA sequencing reactions (Thermo Sequenase, Amersham; Big Dye automatic sequencing kit, Perkin Elmer) were performed according to the manufacturer’s instructions. The radioactive sequencing reactions were resolved in 0.4-mm-thick polyacrylamide denaturing gels (8% polyacrylamide, 8 M urea) with glycerol and glycerol-tolerant buffer gradients whose values varied, respectively, from 0% glycerol and 0.5× buffer at the top to 10% glycerol and 2.5× buffer at the bottom. Automatic sequencing was performed in an ABI 310 platform (Perkin-Elmer) using the enzymatic method (Sanger et al. 1977) with fluorescence-labeled ddNTPs.

Sequence Alignment and Phylogenetic Analysis

Sequence alignments were performed with ClustalX or with the MegAlign module from the Lasergene software suite; contig assembly was performed with SeqMan (DNAstar Inc.). Phylogenetic trees were constructed using Mr Bayes version 3.1.2 (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003) and PAUP version 4.0b10 (Sinauer Associates, Sunderland, MA). Individual ESTs, obtained from GenBank, were assembled into complete eEF1A sequences for each nematode species. We assessed species cited on the Parasitic Nematode Sequencing Project (http://www.nematode.net/) at the Genome Sequencing Center, Washington University School of Medicine. Several GenBank sequences of eEF1A cDNA could not be fully resolved. All robust EST sequences were aligned and grouped according to coincident single-nucleotide polymorphisms (SNPs) and/or variable untranslated regions (UTRs).

From the GenBank EST database, we constructed complete eEF1A cDNA sequences for Ancylostoma caninum (BK006068 and BK006069), Ancylostoma ceylanicum (BK006070 and BK006060), Ascaris suum (BK006061), Globodera pallida (BK006062), Heterodera glycines (BK006063), Pristionchus pacificus (BK006064), Strongyloides stercoralis (BK006065 and BK006066), and Xifinema index (BK006059 and BK006065) and deposited them at the NCBI Third Party Annotation (TPA) database. The complete cDNA sequence of Onchocerca volvulus eEF1A (M64333) was extracted from the GenBank database. The alignment of all nematode eEF1A cDNA coding sequences used for phylogenetic analysis can be found under accession number ALIGN_001131 at the EMBL-Align databank (ftp://ftp.ebi.ac.uk/pub/databases/embl/align/).

The sequence alignments of O. tipulae eEF1A genes described here are databased at GenBank (accession numbers AY928342 [Oti-eft-1], AY928338 [Oti -eft-2], AY928340 [Oti -eft-3] and AY928337 [Oti -eft-4]), as are the sequences of their cDNAs (AY928339, AY928341).

Results

Oscheius tipulae Has Four eEF1A Genes

The first eEF1A gene was identified in a genomic clone (λ-EMB-I3-1; see Materials and Methods) by hybridization with a total cDNA probe (see Materials and Methods). A 1380-bp-long region of eEF1A coding sequence was isolated, along with the regions ~2.0 kbp upstream and ~1.0 kbp downstream of the coding sequence. This entire region was referred to as Oti-eft-1,Footnote 1 and it is the only one of the four regions described here which was characterized directly from a cloned genomic DNA fragment. The coding sequence was completely sequenced, together with 731 bp upstream and 187 bp downstream.

Using a large portion (~70%) of the Oti-eft-1 coding region as a probe, an exhaustive screening identified 10 eEF1A clones from a λ-ZAP II O. tipulae cDNA library. These clones were purified, excised, and sequenced, and revealed similar coding sequences (though the 5′ and 3′ UTRs were predictably divergent). These coding sequences were grouped into four classes, with one class composed entirely of Oti-eft-1 transcripts. The genomic sequences corresponding to the cDNAs were isolated by PCR with primer pairs designed to anneal at the 5′ and 3′ UTR sequences (see Supplementary Table S1). These genomic sequences were named Oti-eft-2, Oti-eft-3, and Oti-eft-4, respectively (Fig. 1).

Fig. 1
figure 1

Alignment of the 5′ ends of the eEF1A cDNAs of O. tipulae (a) and 3′ ends of the eEF1A cDNAs of O. tipulae, C. elegans, and C. briggsae (b). The amino acid sequences shown are named OTI-EFT-n. The filled arrowhead shows the position of the 5′ UTR intron of Oti-eft–1. Conserved nucleotides are shown on a gray background; conserved amino acids are shown inside open boxes. Start and stop codons are in boldface. Dots represent indels

O. tipulae eEF1A Gene Introns

All the isolated genes contain introns of various sizes. Oti-eft-1 contains four introns within the coding region, along with a large intron (516 bp) in the 5′ UTR, seven bases upstream from the start codon (cf. Fig. 1). Unfortunately, this feature could not be confirmed in the other genes because, unlike Oti-eft-1, they were not isolated from a genomic library. Introns located within the 5′ UTR region are not an uncommon feature of eEF1A genes, having been described in Drosophila, Xenopus, Danio rerio, and human eEF1A genes, where 5′ introns appear to play an enhancer role in gene expression (Kim et al. 1990). These 5′ UTR introns have not been identified in the Caenorhabditis spp. eEF1A genes annotated to date (Bieri et al. 2007).

Examination of the coding regions of the four O. tipulae eEF1A genes allowed for identification of conserved and unique introns (Fig. 2). The position-conserved introns were designated by letters (A to E) and ranged from 100% sequence identity to completely dissimilar and unalignable sequences (Fig. 3, Table 1).

Fig. 2
figure 2

Diagram of the O. tipulae eEF1A genes. Genes are represented by boxes, with start codons aligned. Names are shown at the left. Gray boxes correspond to introns, and transparent boxes (E1, E2, etc.) correspond to exons. Dashed gray lines join position-conserved introns (letters inside the light-gray boxes). Numbers below the genes indicate the base pair position relative to the start codon. Small horizontal arrows above the boxes indicate the position and orientation of the primers used for isolation and sequencing of the genes. Start and stop codons are indicated by arrows

Fig. 3
figure 3

Comparison of the intron sequences from Oti-eft genes. The intron nomenclature is consistent with Fig. 2. Intron sequences are in lowercase letters. Conserved regions are shown on a gray background. Numbers above the figures show the position of the first and last bases relative to the cDNA start codon. Protosplice sites are shown in boldface capital letters, and underlined lowercase letters designate the 3′-end consensus for the C. elegans introns

Table 1 Similarity of conserved introns in the eEF1A genes of O. tipulae: percentage nucleotide identity

Oti-eft-1 has the most uniformly sized introns, ranging from 41 to 43 nucleotides (nt; all phase 0). The least uniform is Oti-eft-3, which has two short, 39-bp introns, three medium introns (ranging from 42 to 72 bp), and a 328-bp intron, the longest intron within any of the coding regions.

Oti-eft-2 and Oti-eft-4 share the greatest number of conserved introns (A, B, C, and D), some of which have highly conserved sequences (Fig. 3). Intron C, with four substitutions and one indel in position 194, is found only in these two genes, and is almost four times the size of an average C. elegans intron (Blumenthal and Steward 1997). Considering that large introns in C. elegans tend to be novel (Coghlan and Wolfe 2004), and given the remarkable sequence similarity found between intron C in Oti-eft-2 and intron C in Oti-eft-4 (Table 1), a recent origin for these introns is plausible, and the homology possibly arose due to an insertion followed by a recombination event.

The Oti-eft-3 gene contains the largest number of gene-specific introns (four). The first intron of Oti-eft-3 is of particular interest, as it is bounded by GC-AG ends instead of typical GU-AG ends. The former are found in 0.7% of human pre-mRNA introns and 0.6% of the cDNA-confirmed introns of C. elegans, where they play an important role in the control of the alternative splicing of some genes (Farrer et al. 2002).

The coding sequence of Oti-eft-3 is the most divergent (see phylogenetic analysis below) and includes a three-base insertion close to the translation start codon (Fig. 1a). These three new bases present in the gene insert a glycine residue in a highly conserved region possibly involved in the GTP-binding site (see Fig. 5). This raises the question whether this insertion is a footprint due to the loss of an intron, as occurred with the jingwei gene in Drosophila teissieri (Llopart et al. 2002).

The intron phase distribution is remarkably biased toward phase 0, and indeed, 16 of 20 introns (80%) show phase 0 characteristics. All the position-conserved introns in O. tipulae eEF1A genes are phase 0. Some authors contend that this is an indication of a single ancient origin (Zhaxybayeva and Gogarten 2003), although other authors (Qiu et al. 2004; Coghlan and Wolfe 2004) disagree. One of the phase 0, gene-copy-specific introns, as well as three of the position-conserved introns (which are also phase 0), is inserted within a consensus sequence thought to be a protosplice site, MAG↓R (Dibb and Newman 1989; Fedorov et al. 2003), where introns would preferably insert (Fig. 3). According to Coghlan and Wolfe (2004), these features could be interpreted as evidence of a recent origin for these introns.

Nevertheless, O. tipulae eEF1A genes share introns between genes, but not with other nematodes (Fig. 4). Coghlan and Wolfe (2004) used this characteristic to identify novel introns in C. briggsae and/or C. elegans. In our study, we were unable to distinguish recent insertions from ancient insertions (older than 100 MY) because of the rapid sequence divergence of nematode genomes (Kiontke et al. 2004; Mitreva et al. 2005).

Fig. 4
figure 4

Comparison of intron positioning in O. tipulae with that found in other metazoans. The boxes represent mRNAs, and the bold vertical lines within them represent the intron positions. Vertical numbers designate the distance to the start codon; numbers in brackets, the intron phase. The dashed gray line links the position-conserved introns. ATG and TAA are the start and stop codons, respectively. D.r., X.l. O, H.s. 1, and H.s.2 are the positions of introns located in the 5′ UTR of the eEF1A genes of the vertebrates Danio rerio, Xenopus laevis (oocyte copy), Homo sapiens copy 1, and H. sapiens copy 2, respectively. The capital letters in the gray box between Oti-eft-2 and Oti-eft-4 designate the introns as in Fig. 2

With regard to the gene-specific introns, the intron phase is distributed evenly among the three phases, with three phase 0 introns and two of each other phase.

The 5′ and 3′ ends of the O. tipulae eEF1A introns fit into previously established consensus for C. elegans (Blumenthal and Steward 1997) (Table 2), suggesting a similar splicing apparatus between these two rhabditid nematodes (Winter et al. 1996; Evans et al. 1997; see also Guiliano and Blaxter 2006). O. tipulae eEF1A genes do not contain any introns homologous to C. elegans or C. briggsae homologues. Intron A has a somewhat similar position, three bases upstream of its putative corresponding Cel-eft-3 intron. This is particularly interesting since this C. elegans intron is position conserved between protostomes (nematodes and insects) and deuterostomes (vertebrates) (Fig. 4). A similar conservation of position is observed in O. tipulae Intron D, which is homologous to introns in arthropods and vertebrates, but not in C. elegans or C. briggsae.

Table 2 Intron boundaries of O. tipulae and C. elegans: percentage nucleotide identity

O. tipulae eEF1A Coding Sequences and Their GC Content

Compared with their C. elegans paralogous counterparts, the genes described here contain a 12-nt-long deletion (Fig. 1B), which corresponds to an indel of four amino acids in the C-terminal portion of the translated protein. Aside from this difference, the coding sequences are largely similar, ranging from 80.2% (Oti-eft-3:Cel-eft-4) to 85.1% (Oti-eft-4:Cel-eft-4) sequence identity and from 90.3% (Oti-eft-3) to 91.4% (Oti-eft-4) identity for amino acid sequences (see Supplementary Table S5). This contrasts with the two C. elegans eEF1A genes, in which the translated amino acid sequences are identical.

The G/C content of the O. tipulae eEF1A genes is slightly higher than that of the C. elegans genes. Although the codon usage table of the O. tipulae coding sequences (Supplementary Table S2) resembles that of C. elegans, the former shows some preference for the codons with C or G in the third position. Comparing the eEF1A genes reveals that the average G/C value of the codons is similar: ~55% in O. tipulae and ~53% in C. elegans. However, considering only the third codon positions, O. tipulae shows an average value of 70%, compared to only 62% in C. elegans. A similar preference is observed in proline codon preference (100% CCA in C. elegans, compared to 49.5% CCC in O. tipulae), histidine (59.1% CAC in C. elegans and 80% in O. tipulae), and glutamine (53.8% CAA in C. elegans and 85.7% CAG in O. tipulae). Similar results have been described for the vit-6 gene in O. tipulae (Winter et al. 1996). Furthermore, the G/C content of eEF1A introns in O. tipulae genes is higher, varying from 35% to 50%, compared to an average value of 30% for eEF1A introns in C. elegans (Blumenthal and Steward 1997). These facts reflect the higher overall G/C content in the O. tipulae genome (42%) (Ahn and Winter 2005) compared to the C. elegans genome (36%) (Waterston et al. 1997).

Evolution and Comparative Analysis of Nematode eEF1A cDNA Sequences

In order to evaluate the reliability of eEF1A sequences as a phylogenetic molecular marker for nematodes, we made contigs of complete eEF1A-coding mRNA sequences using EST sequences from various nematode species available in GenBank. Although eEF1A mRNAs tend to be abundant in cells, some nematode EST projects are probably based on normalized cDNA libraries, making complete eEF1A mRNA coding sequences relatively rare in the databanks. The contigs corresponding to complete eEF1A coding sequences were aligned and compared. In some cases, coincident SNPs allowed detection of sequences that had probably been transcribed from different gene copies. Unfortunately, these transcripts often did not contain complete EST coding-region sequences. When entire ESTs were present, the assembled sequences were treated independently, although there is no experimental evidence that they are actually transcribed from different genomic regions.

The inferred amino acid sequences proved to be quite similar, particularly in the N-terminal region, where they were almost identical to the 83rd residue of the C. elegans eEF1A sequence. The other sequences showed a number of differences, some of which were species specific. None of the GTP- or aminoacyl-tRNA binding sites depicted in Fig. 5 seem to be affected by these differences. As in C. briggsae, C. elegans, and O. tipulae, most of the indels are concentrated in the carboxyl end of the protein (cf. Fig. 1).

Fig. 5
figure 5

Amino acid sequence alignment of eEF1A from O. tipulae and from other nematodes. An alignment made with ClustalW was transformed using the Point replacer v. 2.1 program (http://www.geocities.com/alvesjmp/software.html). The points in the alignment correspond to positions where sequence is identical to C. elegans eEF1A. Putative functional domains are highlighted in gray boxes (A, C, E, and G for GTP binding and hydrolysis and J for tRNA binding). Thick lines inside the boxes correspond to motifs found in all the G-protein translation factors (Dever et al. 1987; Carr-Schmid et al. 1999). Horizontal arrows above the alignment represent β-sheets; T’s, turns; and coils, α-helices and η-helices, based on X-ray diffraction data obtained for the eEF1A of S. cerevisiae (Andersen et al. 2000; PDB accession no. 1F60). This figure was originally constructed using ESPript (Gouet et al. 1999) and then edited in Canvas SE (version 7.0.1; build 337). CEL-EFT-N corresponds to the amino acid sequence of Cel-eft-3 and Cel-eft-4, which code for eEF1A with identical amino acid sequences. The same occurs with CBR-EFT-N. OTI sequences correspond to O. tipulae proteins; OVO, to Onchocerca volvolus protein. The numbering above the alignment corresponds to the C. elegans eEF1A amino acid sequence

Most of the differences between nucleotide sequences are in the third position of the codons. The G/C content of the third position varied remarkably, from 26.9% in S. stercoralis to 75.9% in P. pacificus (Fig. 3). The other codon positions showed more uniform values (data not shown). A recent extensive analysis of 25 million codons from ESTs of 32 different nematode species suggests that the frequency of G/C in the third codon position varies with the G/C of the entire genome (Mitreva et al. 2006). The G/C content observed in the third codon position for all the analyzed cDNAs was always higher than the G/C content for the other genes analyzed in these species (Cutter et al. 2006).

Fig. 6
figure 6

Comparison of G/C content in the third position of the codons in eEF1A from different nematodes. Compared sequences were obtained from the following files in GenBank: Sst-ef-1, BK006065; Ovo-eft-1, M74333; Sst-eft-2, BK006066; Xin-eft-1, BK006067; Xin-eft-2, BK6059; Ace-eft-2, BK006060; Hgl-eft-1, BK006063; Aca-eft-1, BK006068; Ace-eft-1, BK006070; Aca-eft-2, BK006069; Cel-eft-3, U40935, F31E3.5; Cel-eft-4, U51994, R03G5.1a; Asu-eft-1, BK006061; Cbr-eft-4, CAAC01000078, CBG16828; Cbr-eft-3, CAAC01000333, CBG24566; Gpa-eft-1, BK006062; Oti-eft-4, AY928337; Oti-eft-2, AY928338; Oti-eft-1, AY928342; Oti-eft-3, AY928340; Ppa-eft-1, BK006064. The G/C content was calculated using CodonW (Peden 1999)

Bayesian, maximum-likelihood, and maximum-parsimony trees constructed with the coding DNA from nematode eEF1A genes showed very similar topologies supported by either bootstrapping values or Bayesian posterior probability (Fig. 7). Though some details differ, our OTU groupings superficially resemble those suggested by rooted trees constructed by other authors (Blaxter 2003; De Ley 2006). All the genes used to construct our trees originated from nematode species of the order Rhabditida, with the exception of Xiphinema index, which belongs to the order Dorylaimida. Within rhabditids, sequences obtained from species of suborder Rhabditina cluster together, though the resolution between orders is not reflected in branch topology. Pristionchus pacificus eEF1A cDNA (Ppa-eft-1) clusters within Rhabditomorpha sequences, though this may be caused by the low sampling of Diplogasteromorpha species.

Fig. 7
figure 7

Phylogram (a) and trees (b, c) for the eEF1A DNA coding regions (cDNAs) from genes of several nematodes. The gray areas group the OTUs (operational taxonomic units) into the different nematode suborders (Rhabditina, Tylenchina, and Spirurina) or order (Dorylaimida). With the exception of the Oscheius tipulae sequences (Oti-eft-x), all sequences were obtained from GenBank either in the form of complete coding sequences or by constructing the putative complete cDNAs from available ESTs (see Materials and Methods). Cbr-eft-x and Cel-eft-x correspond to the Caenorhabditis briggsae and Caenorhabditis elegans eEF1A cDNA sequences, respectively. (a) Phylogram obtained by Bayesian inference using Mr Bayes (version 3.1.2) (bottom-left bar represents expected changes per site) with the following parameters for the Markov Chain Monte Carlo simulations: number of generations = 106; screen output every 100th generation; tree sampling every 100th generation; and number of Monte Carlo chains = 4. The clade credibility values (as percentage of trees) calculated by Mr Bayes are shown above the branches. The substitution model used (GTR+I+G = general time-reversible with invariable sites and gamma distribution) was calculated by MrModeltest (modified from Modeltest 3.0 by Johan Nylander). (b) Unrooted tree obtained by maximum parsimony; the scale bar corresponds to the number of substitutions. (c) Unrooted tree obtained by maximum likelihood; the scale bar corresponds to the number of substitutions per site. Trees in (b) and (c) were obtained with PAUP (version 4.0b10) and used the same settings for the best-fit substitution model (TrN+I+G) selected by hLRT in Modeltest 3.7. The bootstrap values shown above the branches of both trees were obtained with a heuristic search of over 100 replicates. The alignment (accession no. ALIGN_001131 in the EMBL-Align Databank; ftp://ftp.ebi.ac.uk/pub/databases/embl/align/) used for this analysis was made with the coding regions of the sequences shown in Supplementary Table S3. Aca, Ancylostoma caninum; Ace, Ancylostoma ceylanicum; Asu, Ascaris suum; Cbr, Caenorhabditis briggsae; Cel, Caenorhabditis elegans; Gpa, Globodera pallida; Hgl, Heterodera glycines; Oti, Oscheius tipulae; Ovo, Onchocerca volvulus; Ppa, Pristionchus pacificus; Sst, Strongyloides stercoralis; Xin, Xiphinema index

The Ancylostoma genes grouped together but were not consistent between species: A. ceylanicum genes grouped with A. caninum. Xiphinema index was positioned closer to the plant-parasitic nematode branch and the Ascaris suum/Onchocerca volvulus group showed rather low bootstrap values. In the distance trees, these branches are quite long, probably because they connect single OTUs belonging to distinct nematode orders and suborders. The fact that all three methods yielded relatively similar topologies suggests that the inconsistent groupings are not inherent to the methods used to construct the trees, but may be due to low sampling or may indicate that eEF1A nucleotide sequences are not good taxonomic markers for this group.

These puzzling grouping patterns are consistent, even when considering only the third position of each codon and/or discarding the indel-rich portions on the 3′ ends of the sequences (results not shown). A pairwise distance analysis derived from RSCU (Relative Synonymous Codon Usage [McInerney 1998]; see Supplementary Table S4 and Fig. S1) shows a highly distorted tree, probably because of the GC3 differences. Indeed, Ascaris suum groups with members of the suborder Tylenchina (Heterodera glycines and Globodera pallida), and separately from Onchocerca volvulus, a member of the same suborder as A. suum (Spirurina). We conclude that eEF1A gene sequences, although useful for phylogenetic studies of closely related nematodes, should be used with caution if the number of paralogous sequences in the species analyzed is unknown.

Discussion

The Number of eEF1A Genes in the O. tipulae Genome

The four eEF1A genes present in the O. tipulae genome support the Winter et al. (1996) hypothesis that nematode genomes diverge more rapidly than nematode morphology (Kiontke et al. 2004; Mitreva et al. 2005). This large number of eEF1A genes is uncommon in described metazoan genomes, and indeed, most genomes contain only two eEF1A copies. The greatest number of eEF1A genes is found in Xenopus laevis (Deschamps et al. 1991), which appears to have at least three copies, expressed in a developmentally controlled manner (Abdallah et al. 1991). In Homo sapiens, a large number of eEF1A genes (nearly 20) have also been described, although only 2 of them have been confirmed as expressed copies, and all other copies were shown to be pseudogenes (Madsen et al. 1990). The O. tipulae eEF1A genes described here are unlikely to be pseudogenes, as cDNAs were isolated and sequenced for all of them.

Results from Southern blots of restriction enzyme-digested O. tipulae genomic DNA using C. elegans actin probes suggested a greater number of actin genes in the O. tipulae genome than in the C. elegans genome (Winter, unpublished results). Interestingly, although these results could be taken as evidence of partial genome duplications, studies of genomic DNA reassociation kinetics showed that the genome size is the same in O. tipulae and C. elegans (Ahn and Winter 2006). Intrachromosomal rearrangements are fairly common in nematodes (Mitreva et al. 2005) and can lead to gene duplications.

Comparative Analysis of O. tipulae eEF1A Genes: Introns

Oti-eft-3 has the highest number of gene-specific introns. These introns do not appear to show a preference for a particular insertion phase, as expected in a pattern of random intron insertion. The largest intron (328 nt; see Figs. 2 and 3) found interrupting the coding region of an O. tipulae eEF1A gene is found in Oti-eft-3. It is larger than any other intron within a coding sequence described in O. tipulae.

Introns A, D, and E of O. tipulae genes are inserted within a motif that may be a protosplice site (MAG↓R) (Dibb and Newman 1989; Long et al. 1998) (see Fig. 3). This motif appears to be common in regions immediately adjacent to nematode introns (Blumenthal and Steward 1997). Some studies have shown that recently acquired introns in C. elegans tend to be phase 0 introns located within an AG↓G motif (Coghlan and Wolfe 2004). In addition to its similarity to the protosplice site, this motif is also commonly found in the vicinity of introns specific to certain groups of organisms or protein families, supporting the hypothesis of a recent origin (Kent and Zahler 2000; Sverdlov et al. 2003; Qiu et al. 2004). Of all the introns of the O. tipulae genes available in databanks (including those described here), the protosplice site appears in 13 of 34 (35.3%) (results not shown). If only the introns of the eEF1A genes of O. tipulae are considered, the protosplice site appears in one of the seven nonconserved introns and, interestingly, in the position-conserved introns A, D (the introns with weaker sequence conservation), and E (see Fig. 3 and Table 1). This could be interpreted as support for an insertional origin of these introns.

The other conserved introns (B and C) have strong sequence conservation, which is unusual between introns. Even when the introns of the eEF1A genes within the same genome in other organisms are compared, they are conserved only in their position, and not in their sequence.

If we assume that the eEF1A introns of O. tipulae and C. elegans have different origins (i.e., they were not present in an ancestral genome), two hypotheses could potentially explain our finding: (a) intron insertion in one of the copies was followed by spreading through recombination events to other locations, or (b) there has been some selective pressure maintaining the intron sequences. Hypothesis (a) could explain the insertion of intron A within a target sequence with poor sequence conservation, and could also explain its absence in Oti-eft-3. However, the sequence-conserved introns (B and C) are not located within any of the target motifs for intron insertion. Certainly, intron insertion could occur in sites other than the protosplice sites, but these insertion events would have to be followed by repetitive recombination events. As such, the characteristics of introns C and D could be explained by this hypothesis. However, in the case of introns B and E, the insertion event would have to have been followed by no fewer than three (intron B) or two (intron E) recombination events. Hypothesis (b) would presuppose sequence conservation of some introns vital for gene function from a pool of introns that may have existed in an ancestral gene. All other introns would have arisen by secondary insertion events.

The high intron turnover rate described in nematodes (Kiontke et al. 2004), along with the different transcriptional control strategy hypothesized above, could account for the differences in intron positions of the eEF1A genes in O. tipulae and C. elegans. Different transcriptional control strategies would require elements located in different positions, making introns important for one species less important for another.

The first intron of Oti-eft-3 is of particular interest because it has a rare GC-AG intron boundary, which is a unique type of donor site that appears in less than 1% of C. elegans introns. This motif is sometimes implicated in the regulation of alternative splicing (Farrer et al. 2002). However, we failed to detect any Oti-eft-3 cDNA clones that displayed evidence of alternative splicing (results not shown).

Can eEF1A Be Used as a Tool to Study Nematode Evolution?

Unlike the trees in other studies, which were based on the primary structure of eEF1A proteins, the trees described here (see Fig. 7) were based on nucleotide sequences. Despite the small number of sampled taxa, the grouping pattern obtained was relatively similar to that established by rRNA phylogenies. The Pristionchus pacificus eEF1A cDNA sequence is remarkably distinct in its composition from those of the other nematodes (cf. Fig. 6). This might account for its persistent positioning among the infraorder Rhabditomorpha, which includes, among others, the genera Caenorhabditis, Oscheius, and Ancylostoma (see Fig. 7).

Oscheius tipulae as an Example of eEF1A Gene Diversity in Nematodes

The data described in this paper reinforce previous findings (Winter et al. 1996) and reveal striking molecular differences between Oscheius tipulae and C. elegans. Analysis of the large subunit of RNA polymerase I showed that this protein has an amino acid substitution rate in Rhabditid nematodes comparable to that of different orders of the Mammalia (Kiontke et al. 2004). Therefore, although Oscheius tipulae and C. elegans are very different at the molecular level, this does not imply great evolutionary distance. O. tipulae has a strong preference for G or C at the third position of its codons (see Fig. 6) and has a higher copy number of the highly expressed eEF1A gene. The introns show strong sequence conservation, suggesting some selective pressure on them and possibly a consequentially different strategy of gene expression.

The use of eEF1A sequences as a tool for evolutionary studies has been highly controversial. The criticism of this strategy cites the inconsistency of basal branching (Hashimoto et al. 1995; Cho et al. 1995). The results described here show some unexpected branching, including that of Ancylostoma and Caenorhabditis, as well as the splitting of the suborder Tylenchina, suggest that special attention is required when using eEF1A as a phylogenetic marker. Indeed, eEF1A is frequently coded by more than one gene copy, each with different expression patterns. This itself poses a phylogenetic problem, as comparisons between paralogous copies could give rise to false organism phylogenies. Differences in expression patterns may also be confounding, as these can lead to different codon-usage preferences (Duret and Mouchiroud 1999).

The use of eEF1A as a phylogenetic marker in nematodes would require a major effort to increase the number of sampled taxa and to describe a more comprehensive set of genes from each organism. In beetles (Curculionidae), coincident intron positioning appears to be a good criterion for supporting gene orthology inferences (Jordal 2002). A similar approach in nematodes would likely be confounded, however, due to the limited number of closely related taxa that have been sampled. However, the number of eEF1A genes found in O. tipulae and the number of introns in these genes suggest that the genus Oscheius would be a good candidate for this type of analysis. In conclusion, the high diversity of Nematodes at the molecular level (Mitreva et al. 2005) and the results we describe here indicate that nematode eEF1A genes are good candidates for use in the study of intron diversity and evolution. Due to the high genetic diversity of different geographic isolates of O. tipulae (Baïlle et al. 2008), an analysis of eEF1A genes from different populations could yield important data about intron evolution in rhabditid nematodes.