Introduction

Of a total of 302 neurons in the adult Caenorhabditis elegans hermaphrodite, just 32 mediate chemosensory function and three pairs of these chemosensory neurons are primarily olfactory (Bargmann 1998; Bargmann et al. 1993; Prasad and Reed 1999). Despite this, C. elegans can still discriminate between hundreds of odorants at various concentrations (Bargmann et al. 1993). Mammals utilize over 10 million olfactory neurons to distinguish between a substantially larger range of molecules (Prasad and Reed 1999). Olfactory receptor (ORs) are a subfamily of the G-protein-coupled receptor (GPCR) superfamily and comprise the largest multigene family in mammals comprising over 1000 genes in the mouse (Zhang and Firestein 2002). In mammals each olfactory neuron expresses only one functional OR and this “one olfactory receptor/one neuron” rule is necessary for the conversion of olfactory signals into an accurate topographical map in the olfactory bulb (Serizawa et al. 2003). In Drosophila melanogaster also, each olfactory neuron expresses only one OR and these neurons project to the antennal lobe, the equivalent of the vertebrate olfactory bulb (Vosshall et al. 1999). However, a recent exception to this rule has been reported in D. melanogaster by Goldman et al. (2005). The C. elegans genome contains circa 550 functional ORs (Robertson 1998) and individual C. elegans chemosensory neurons contain multiple ORs (Sengupta et al. 1996; Troemel et al. 1995).

The downstream effectors of ORs are a family of guanine nucleotide binding proteins (G-proteins) that relay signals from seven transmembrane ORs to intracellular proteins. G-proteins have a common design comprising three polypeptide chains: an α subunit that binds and hydrolyzes guanosine triphosphate (GTP) and a βγ dimer that serves as a functional monomer (Simon et al 1991). When a G-protein is activated by interaction with a receptor, the α subunit exchanges bound GDP for GTP (Supplementary Fig. 1). An intrinsic GTPase activity of the α subunit restores it to the basal state in which GDP is bound. G-proteins are conserved in animals separated by considerable evolutionary distances, such as mammals and protistans. Gα proteins have been previously divided on the basis of amino acid (aa) similarity into four classes; Gαs, Gαi/o, Gαq, and Gα12 (Simon et al. 1991; Strathmann and Simon 1990, 1991). The human genome contains 17 Gα genes which are grouped into the four classes listed above, based on functional and sequence attributes (Simon et al. 1991). No additional Gα genes were detected when the human genome was sequenced. C. elegans contains 21 Gα genes, 14 of which are specifically expressed in sensory neurons (Jansen et al. 1999; Cuppen et al. 2003).

In C. elegans, chemotaxis to volatile chemoattractants is mediated primarily by two pairs of neurons, AWA and AWC (Bargmann et al. 1993). Both AWA and AWC neurons have been shown to express numerous ORs including the diacetyl receptor, ODR-10 (Sengupta et al. 1996; Troemel et al. 1995), and at least four Gα subunits (Jansen et al. 1999). AWC-mediated olfaction involves the Gα subunits, ODR-3, GPA-2, GPA-3, and GPA-13 (Lans et al. 2004; Roayaie et al. 1998), which, upon activation, induce an increase in intracellular cGMP, mediated by the guanylyl cyclases ODR-1 (L’Etoile and Bargmann 2000) and DAF-11 (Birnby et al. 2000). This in turn leads to the activation of a cyclic nucleotide gated channel encoded by the tax-2 and tax-4 genes (Coburn and Bargmann 1996) (Supplementary Fig. 1). The TAX-2/TAX-4 channel is similar to vertebrate visual and olfactory channels and permits the entry of Ca2+ ions into the cell. AWA neurons have been shown to express the Gα subunits ODR-3, GPA-3, GPA-5, and GPA-6 (Lans et al. 2004), however, unlike AWC-mediated olfaction, AWA-mediated olfaction is generated by an alternative pathway that requires a novel channel encoded by the osm-9, ocr-1, and/or ocr-2 genes (Colbert et al. 1997; Tobin et al. 2002) (Supplementary Fig. 1). These subunits are distantly related to the Drosophila phototransduction channel TRP (Transient Receptor Potential), which is regulated by G-protein signaling.

A phylogenetic analysis of the human G-protein α-subunit gene family indicated the existence of four Gα gene classes (Gq, G12, Gs, and Gi/o), each of which contains a number of closely related isotypes (Simon et al. 1991). These four Gα-subunit classes transduce signals from a great variety of extracellular agents including hormones, neurotransmitters, chemokines, and peptides. Signaling from the extracellular environment via GPCRs is phylogenetically ancient being found in protistans, fungi, plants, and animals. The existence of one or more representatives of each of these four Gα-subunit classes among the seven Gα-subunit genes cloned from the sponge Ephydatia fluviatilis indicates that these Gα gene classes existed prior to the parazoan eumetazoan split (Suga et al. 1999). A dendrogram of the predicted aa sequences of all the C. elegans Gα subunits with a representative sequence of each of the four human Gα-subunit genes identified at least one clear homologue of each of the four classes of vertebrate genes, the remaining nematode genes being more divergent (Jansen et al. 1999). To gain further insight into the phylogeny and diversity of the Gα gene family in nematodes we assembled a data set containing homologues of putative Gα genes from a variety of protistans, metazoans, fungi, and plants. Our phylogenetic analysis shows that in addition to having at least one member of each of the four mammalian Gα gene classes, C. elegans also possesses two lineage-specific Gα gene expansions, homologues of which are not found in other organisms. We hypothesize that these novel nematode-specific Gα genes increase the functional complexity of individual chemosensory neurons, enabling them to integrate and adapt to odor signals from the multiple distinct ORs expressed on their membranes.

Materials and Methods

Gα protein homologues were located by performing multiple BLASTP (Altschul et al. 1997) searches with a cutoff expectation value (E-value) of 10−7 against GenBank (http://www.ncbi.nlm.nih.gov/BLAST). In each case putative C. elegans Gα proteins were used as the query sequence. In total, 118 proteins from many diverse genera were located (see supplementary information). Sequences were aligned using ClustalW 1.81 (Thompson et al. 1994) using the default settings. All alignments were corrected for obvious alignment ambiguity using the alignment editor Se-Al 2.0a11. Alignment can be viewed at http://www.biology.nuim.ie//staff/JMESupp.shtml.

Gene Tree Construction

A brief examination of the sequence data revealed that average sequence similarity was approximately 40% (a similarity matrix can be viewed at http://www.biology.nuim.ie//staff/JMESupp.shtml). To account for the problems associated with low levels of sequence similarity, the aa alignment was recoded into the six Dayhoff (1978) groups: C, STPAG, NDEQ, HRK, MILV, and FWY. This is based on the premise that aa substitutions within the six groups will be common and noisy, whereas changes between groups will be rarer and so have less saturation (Hrdy et al. 2004). The recoded alignment was analyzed using the Bayesian criterion implemented in MRBAYES v3.0B4 (Huelsenbeck and Ronquist 2001). The model used has a 6 × 6 general time-reversible rate matrix. Among-site variation was modeled with a free proportion of invariable sites and a four-category discrete gamma distribution. The analysis used a Markov chain Monte Carlo (MCMC) chain that ran for 15 million generations, sampled every 100th generation. Plots of likelihood versus generation revealed that all chains reached stationarity after 73,500 generations, therefore these trees were discarded as a burnin. Bayesian posterior probability branch supports were determined using the sumt command of MRBAYES.

According to PUZZLE 5.1 (Schmidt et al. 2002), the initial protein alignment fails the chi-square square test for homogenous aa composition at p=0.95. Furthermore, our initial phylogenetic reconstructions appeared to lack coherent phylogenetic signal as most groupings were poorly supported. Therefore, highly variable sites were removed from the alignment so that the largest possible alignment in which all sequences passed the chi-square test for homogeneity of aa composition could be recovered. This second phylogenetic analysis was based on a smaller number (47) of exemplar Gα proteins. Sites were categorized into different classes using a method implemented in Tree-Puzzle 5.1. The method assumes that there are eight categories of sites. Rate variation across these sites is assumed to follow a discrete gamma distribution. The fastest-evolving sites were found in category 8, these were removed until an alignment of 399 aa positions that passed the chi-square test for homogeneity of aa composition was recovered for the Gα proteins dataset. The most appropriate protein model was selected using the software program MODELGENERATOR (http://www.bioinf.nuim.ie/software/modelgenerator). One hundred bootstrap replicates were performed with the appropriate protein model, using the software program PHYML (Guindon and Gascuel 2003). The results of this analysis were summarized using the majority-rule consensus method.

EST Database Searches

Using each nematode-specific gene as a query sequence, we performed exhaustive TBLASTN (Altschul et al. 1997) database searches with a cutoff expectation value of 10−7 against the nematode EST database NEMBASE (Parkinson et al. 2004). The version of NEMBASE used contained 130,184 clustered ESTs from 37 different nematode species from the four major nematode clades. All statistically significant EST sequence hits were extracted and subsequently searched locally against the C. elegans proteome (ftp://ftp.ensembl.org/pub/current_celegans/data/fasta/pep/) using BLASTX with a cutoff expectation of 10−7. Significant hits were confirmed by manual inspection of BLAST alignments. The purpose of this approach was to confirm orthology between the nematode EST sequences and the C. elegans protein sequences. The presence or absence of nematode-specific genes within the 37 species found in NEMBASE was noted.

Using the same methodology as above, nematode-specific genes were used to search the Schistosome (http://www.ebi.ac.uk/blast2/parasites.html) and Tardigrade (http://www.zeldia.cap.ed.ac.uk/Tardibase/tardibase/tardigrades.html) EST databases. No orthologues were found for the Gns group of Gα-subunit genes in these additional database searches.

Results

The final alignment contained 118 taxa. There is a high degree of sequence dissimilarity between the Gα-subunit genes. This resulted in a poorly supported tree, making it difficult to infer phylogenetic relationships. We therefore recoded each aa into one of the six groups of chemically related aa that commonly replace one another (Dayhoff 1978). This recoding technique has the effect of homogenizing the aa composition between sequences and shortening long branches (Hrdy et al. 2004) and resulted in a more robust phylogenetic hypothesis. Our phylogenetic reconstruction (Fig. 1) shows that the plant Gα genes form a robust monophyletic grouping. The Gq, G12, and Gs classes are derived from a common ancestor, being grouped together with strong support. The Gi/o class forms an adjacent group to the Gq, G12, and Gs clade, as does the fungal Gα gene group. Each of the four described classes contains at least one C. elegans Gα gene, viz., Gq (egl-30), Gs (gsa-1), G12 (gpa-12), and Gi/o (goa-1, gpa-4, and gpa-16). The remaining 15 C. elegans Gα genes are located outside these four classes. Our analysis indicates that the majority of the 15 C. elegans genes form a distinct sister clade to the Gi/o Gα gene class, as they are grouped beside it with high support. The remaining C. elegans Gα genes are grouped among the protistan (gpa-6, gpa-11, gpa13, and gpa-14) and the fungal (gpa-5) Gα genes. These genes are heterogeneous and highly divergent in their aa sequences. Therefore their inferred positions within the tree may be erroneous on account of high levels of sequence divergence and not relatedness. The Trichomonas vaginalis genes form a distinct but divergent clade at the base of the tree, being grouped together with strong support. The Dictyostelium discoideum Gα genes form three small sister clades at the base of the tree, beside the T. vaginalis clade and the divergent nematode genes. One D. discoideum gene (gpa-7) was positioned beside gpa-17, a very divergent C. elegans Gα gene with an extremely long branch. Because of the extreme heterogeneity of these D. discoideum genes, their inferred positions within the tree are not reliable. A secondary phylogenetic analysis using the maximum likelihood criterion was carried out on a reduced sample of our data. Heterogeneous positions within the data were removed until the data passed a chi-square test for aa homogeneity (Fig. 2). This analysis yielded a comparable phylogeny to the full data set. The only difference in this analysis was the grouping-together of the divergent C. elegans genes gpa-5 and gpa-17.

Figure 1
figure 1

Phylogenetic analysis of the G-protein α-subunit protein family. The inferred phylogeny shows a Bayesian consensus tree of a general time-reversible substitution matrix estimated from G-protein α aa sequences recoded into the six Dayhoff (1978) groups. Among-site rate heterogeneity was modeled using a four-category gamma correction with a fraction of invariant sites. The analysis used the Metropolis-coupled MCMC strategy from MrBayes (Huelsenbeck and Ronquist 2001) and parameters such as the composition and substitution rate matrix were free. Posterior probabilities for selected branches are shown at nodes. The scale bar indicates the number of changes per site. This methodology is appropriate for a phylogenetic analysis of divergent genes (Hrdy et al. 2004).

Figure 2
figure 2

Maximum likelihood phylogenetic tree of exemplar G-protein α-subunit proteins. Highly variable sites have been removed and the resultant alignment (399 aa positions) passes a chi-square test for homogeneous aa composition at p=0.95. Branch supports were determined using the bootstrap resampling technique.

During our searches of GenBank we failed to find orthologues for the majority of the C. elegans Gα genes (orthologues were found only for Gs, Gq, G12, and Gi/o Gα gene classes). A database search of wormbase (http://www.wormbase.net) revealed that C. briggsae contains homologues for all the Gα proteins found in C.elegans. As a result of our database searches and subsequent phylogenetic analysis we propose that the majority of the C. elegans genes represent unique gene expansions which are confined to the phylum Nematoda. Therefore, we refer to these genes as putative Gα nematode-specific (Gns) genes. To test this proposal we examined all available nematode sequence data. EST data are available for the four major nematode clades (Parkinson et al. 2004). The NEMBASE database contains many EST homologues of the four described Gα gene classes (Gs, Gq, G12, Gi/o), but in addition, we also detected six of the Gns genes in this database. Although we could only locate a small number (typically two or less) of representative Gns genes in individual nematodes in NEMBASE, these Gns EST homologues occurred in diverse members from each of the nematode clades represented in the database (Table 1). This finding suggests that the evolution of the Gns genes predates the origin and adaptive radiation of nematodes. The small number of Gns ESTs detected in individual nematodes might indicate that the full expansion of the Gns genes is restricted to Caenorhabditis. A more parsimonious explanation may be that the EST database does not contain sufficient sequence data. We propose that the diversity of Gns gene homologues detected in the NEMBASE database and their distribution pattern across the major clades within the phylum Nematoda support the premise that the expansion of Gns is common to all members of this phylum. At present a number of projects are under way to sequence additional nematode genomes (e.g., Brugia malayi, Haemonchus contortus, Heterorhabditis bacteriophora, Meloidogyne hapla, Pristionchus pacificus, Trichinella spiralis). When published, these genome sequences will help elucidate the origins and expansion of the Gns genes. An alternative and less parsimonious hypothesis to the nematode-specific origin and expansion of the Gns clade is that orthologues of the nematode-specific Gα genes may have been lost in the vertebrate and arthropod lineages.

Table 1 Orthologues of C. elegans nematode-specific Gα genes in the four major clades of the phylum Nematoda identified from the nematode EST database NEMBASE (Parkinson et al. 2004)
Table 2 Conserved nucleotide binding motifs in nematode heterotrimeric Gα subunit proteins

Gα subunits consist of two domains: a guanine nucleotide binding GTPase domain with a high structural similarity to Ras GTPases and a compact helical domain. The GTPase domain consists of five α-helices surrounding a six-stranded β-sheet (Rens-Domiano and Hamm 1995). The five α-helices, designated G1 to G5 form the guanine nucleotide binding site, and each helix contains a conserved nucleotide binding motif. Our phylogenetic analyzes infer that the majority of the Gns genes are most closely related to members of the Gi/o group. The Gns subunits show strong conservation of the G1–G5 nucleotide binding motifs with those found in the Gi/o group, particularly the G1, G3 and G4 motifs (Table 2). The G2 and G5 motifs show slight deviations from the Gi/o consensus, but all the critical GTP binding contact residues in these motifs are conserved, implying a conservation of Gα protein function. The divergent nematode sequences gpa-5, gpa-6, gpa-11, gpa-13, gpa-14, and gpa-17, which are located outside the main Gα gene groupings at the base of the tree, also retain all the critical GTP binding contact residues in the G1–G4 motifs, but the G5 motif is more variable in these divergent Gns genes. Overall aa sequence within the functionally important G1 to G5 motifs is highly conserved (Table 2). However, the aa sequence outside these motifs reveals high levels of divergence. This may have a functional significance, possibly enabling the Gns to couple to novel members of the GPCR superfamily, which has also undergone a major gene expansion in C. elegans and C. briggsae (Robertson 1998).

Discussion

Most individual mammalian cells express multiple (<10) distinct GPCR gene products (Gomperts et al. 2002). The distinct GPCRs expressed on individual cell membranes relay signals from a great variety of extracellular agents, often with antagonistic effects, to intracellular secondary messengers and effectors via G-protein activation. Gαo is the most abundantly expressed Gα subunit in mammalian neurons. Gαsi2, Gα12, and Gαq, are also expressed ubiquitously in most mammalian cells (Gomperts et al. 2002). The C. elegans homologues of these four genes (gsa-1, goa-1, gpa-12, and egl-30, respectively) are also widely expressed in nerve and muscle cells, and some are also expressed in the excretory cells (egl-30) and the ventral hypodermis (gpa-12) (Jansen et al. 1999). In mammals and insects each olfactory neuron typically expresses only one functional OR and the brain discriminates between odors by determining which neurons have been activated (Prasad and Reed 1999). The extent to which similar olfactory coding systems have evolved in other invertebrate phyla is still unknown. However, this “one olfactory receptor/one neuron” design does not occur in C. elegans, reflecting the small number of chemosensory cells in the nematode nervous system and the limited emphasis on cephalization during nematode evolution.

C. elegans olfactory neurons express multiple classes of ORs and Gα subunits within a single cell (Jansen et al. 1999; Lans et al. 2004; Troemel et al. 1995). Jansen et al. (1999) have proposed that the presence of multiple signal transduction pathways in indivdual C. elegans olfactory neurons most probably evolved to compensate for the small number of chemosensory neurons in the nematode’s nervous system. The Gns subunit ODR-3 is the major activator of the signaling pathway in the two pairs of sensory neurons, AWA and AWC, which mediate odorant attraction in C. elegans. ODR-3 is also required for osmosensation and mechanosensation (Roayaie et al. 1998), suggesting that it can couple to several distinct ORs. In addition to ODR-3, the AWA neurons also express GPA-3, which contributes a second stimulatory signal, and GPA-5, which has inhibitory effects (Lans et al. 2004), while the AWC neurons express GPA-13, which is stimulatory, and GPA-2, which has inhibitory effects (Lans et al. 2004) (Supplementary Fig. 1). Thus the Gns subunits form a regulatory network within the AWA and AWC cells which modulates the olfactory response to various odors. This integrative, cell-autonomous response is similar to the G-protein-coupled response systems in nonolfactory cells in mammals and other animals. Cell-autonomous integration of G-protein signaling represents the ancestral condition in the protistans, sponges, and eumetazoans and appears to have been retained in all cellular systems apart from olfaction. Our results indicate that C. elegans and most likely all nematodes have evolved at least two unique lineage-specific expansions of Gα-subunit genes. With the exception of gpa-7, which is widely expressed in neuron and muscle cells, the remaining 14 Gns genes are expressed in other C. elegans amphid sensory neurons and/or other putative sensory neurons (Jansen et al. 1999). Thus it appears that nematodes, like other animals, have increased their olfactory repertoire by expanding the number of ORs, but instead of a concomitant increase in the number of sensory neurons, nematodes have retained the ancestral multireceptor G-protein signaling system in their olfactory neurons. The integrative capacity of the chemosensory system was, however, further refined by the recruitment of additional Gα-subunit genes.

The genomes of C. elegans and C. briggsae contain an unexpectedly large number of genes (Hodgkin 2001; Stein et al. 2003). This may have resulted in part from the relative lack of alternative splicing and domain accretion in nematode genes (Hodgkin 2001), but lineage-specific gene expansions have also occurred in nematodes (Lespinet et al. 2002; Robinson-Rechavi et al. 2005). These gene expansions are particularly noticeable for neuronal genes (Bargmann 1998). For example, the largest and most diverse nicotinic acetylcholine receptor gene family is that of C. elegans (Mongan et al. 1998); novel families of potassium channels have been identified in C. elegans (Wei et al. 1996), and G-protein coupled (putative) chemoreceptor genes comprise the largest gene family in C. elegans (Robertson 1998). Thus the relative lack of anatomical complexity in the nematode nervous system appears to have been compensated during nematode evolution by an increased functional complexity and multitasking capacity of individual nerve cells, as is clearly illustrated in the olfactory system of C. elegans.