1. Drug Discovery and Genomics

The rapid growth in numbers of genomic and cDNA sequences deposited in public and private databases in recent years has led to the development of new avenues of exploration and in silico approaches for drug discovery. Drug discovery can now be divided into 2 principle categories: ‘discovery genetics’ and ‘discovery genomics’.[1] Discovery genetics relies on using positional cloning to identify disease-related susceptibility genes. In turn the molecular identification of these disease-associated genes combined with knowledge of the developmental or metabolic pathways that they affect may help researchers to focus their search for compounds that ameliorate disease. The advantage of this scheme is that targets are pre-validated because of their association with a known disease. Unfortunately, this route of gene discovery is limited because the positional cloning of human disease genes has been a lengthy process and relatively few genes are correlated with known diseases. In addition, genetic susceptibility to disease is often polygenic. Moreover, the molecular identification of a disease gene may be initially uninformative because of the absence of associated function.

Fig. 1
figure 1

Flow-chart illustrating how the use of functional genomics can facilitate the identification and validation of drug targets. Genomic data collected from sequence databases, expression studies and protein interaction experiments help to identify potential drug targets. The validation of these targets is facilitated by using model organisms to obtain phenotypic and biologic information about the vertebrate homolog.

2. The Power of RNAi and Caenorhabditis elegans Functional Genetics

This review will focus on the nematode Caenorhabditis elegans as a model organism for biomedical studies and highlight the importance of RNAi for learning about gene function. The availability of the whole genome sequence of C. elegans has created the challenge to understand the function of each predicted gene and its global role in organismal biology. RNAi has developed into one of the most indispensable tools for analysing gene function in C. elegans because it is rapid, efficient, and can be scaled up for high-throughput analyses.[911] There are many examples in the literature where RNAi has been used to target individual genes or groups of genes for knockout to gain clues about gene activity.[1218] The basic technique of RNAi involves introducing sequence-specific double-stranded RNA (dsRNA) into C. elegans in order to generate a nonheritable, epigenetic knockout of gene function that phenocopies a null mutation in the targeted gene.[9,19] RNAi can also play a crucial role in drug discovery. RNAi can be used to: (i) validate potential drug targets; (ii) investigate the biologic role of validated targets; and (iii) screen for active compounds.

3. Background: The Biology of C. elegans

C. elegans has many attributes that have helped to establish its role as an excellent model organism for biomedical studies. There are 2 sexes, self-fertilizing hermaphrodites and males. Each hermaphrodite produces approximately 350 embryos that reach maturity in 3 and a half days at 20°C. Adult animals are only one millimetre in length. Worms can be raised on agar plates or in liquid, which can be advantageous for drug-screening. It is also possible to maintain frozen worm stocks in liquid nitrogen.

The biology of C. elegans has been extensively characterized. C. elegans is transparent during all life-stages so it has been possible to determine the cell lineage.[20] The invariant cell lineage is comprised of only 959 somatic cells in the adult hermaphrodite and 1031 somatic cells in the male. Although there are relatively few somatic cells, these cells form epidermal, intestinal, excretory, nerve and muscle tissue. Over the past 3 decades numerous studies have added to our knowledge of biologic processes because C. elegans is highly amenable to genetic analysis. A wide variety of mutants have been identified, carrying mutations in genes affecting cell migration, programmed cell death, movement, cell lineage, defecation, longevity, neurobiology, morphology, susceptibility to drugs, behaviour, viability and many other biological processes.[21] The identification of the genes affected by these mutations and the ability to place them in regulatory pathways have provided major insights into developmental and disease processes.

4. The Genome of C. elegans

C. elegans is the first metazoan with a completely sequenced genome.[22] The small 100 Mb genome of C. elegans is distributed across 6 chromosomes and is predicted to encode 19 705 proteins.[23] Many of these proteins share significant similarity to proteins responsible for human development and disease. Present estimates indicate that as many as 75% of human disease genes have a C. elegans homolog.[22,24] Therefore, many of the potential drug target candidates found in humans are also likely to be represented in the C. elegans genome, making it possible to take advantage of the biology of C. elegans to validate potential targets and to screen compounds for activity. Table I lists some human disease genes with C. elegans homologs.

5. C. elegans is a Valid Model for Elucidating Gene Function in Vertebrates

The biology of C. elegans has been used successfully to gain insights into the role of the human presenilin genes and their involvement in Alzheimer’s disease. Mutations in human presenilin genes PS1 and PS2 are responsible for early onset Alzheimer’s disease. It was found that the C. elegans homologs of PS1 and PS2 are encoded by the sel-12 and hop-1 genes.[25,26] sel-12 was initially identified as a suppressor of an activated form of lin-12, which encodes a homolog of the Notch receptor. By itself, the sel-12 mutant displays an egg laying defective (Egl) phenotype, characterized by bloating due to the inability to expel developing embryos from the uterus.[27] This phenotype is obviously different than the human phenotype of neurodegeneration, nonetheless it has been found that presenilins from both organisms play similar roles. For example, the human PS1 and PS2 can substitute functionally for sel-12 in C. elegans, establishing that PS1 and PS2 proteins share a similar function to SEL-12.[28] Mutant PS1 alleles that are associated with Alzheimer disease in humans retained partial activity when introduced into C. elegans, suggesting that they are reduced function alleles. Finally, both SEL-12 and the human presinilin appear to play roles in proteolytic processing: SEL-12 affects the intracellular localization and possibly the processing of the LIN-12/Notch-receptor, whereas PS1 and PS2 are involved in the proteolytic processing of amyloid precursor protein (APP).[29,30] This example serves to illustrate that although the phenotypes obtained in the model organism may differ from the human disease phenotype, it is still possible to use model organisms to investigate gene activity.

6. The Mechanism Underlying RNAi

RNAi has emerged as one of the most important tools for elucidating gene function, and efforts to understand the mechanism and physiological function underlying this fascinating phenomenon are providing a stimulating challenge. Study of the par-1 gene, an essential gene for viability, provided the first evidence that direct injection of RNA molecules into the gonad syncytium of C. elegans could lead to inhibition of gene activity throughout the animal.[31] It was shown that injection of antisense par-1 RNA produced a mutant phenocopy that mimics the embryonic lethality caused by the par-1 null mutation. Surprisingly, sense strand RNA also elicited a par-1 mutant phenocopy. The puzzling observation that either antisense or sense strand RNA could interfere with normal gene activity was subsequently resolved. Fire et al.[9] showed that double stranded RNA (dsRNA) was responsible for the RNAi effect; it was speculated that dsRNA was generated as a contamination during the in vitro synthesis of single-stranded RNA. Several observations indicate that RNAi in C. elegans acts at the post-transcriptional level. First, promoter sequences and introns do not elicit RNAi effects.[9] Second, cDNA sequences appear to be more potent in eliciting RNAi effects than a mixture of intronic and exonic sequences.[32,33] Third, RNAi can lead to degradation of the target gene mRNA.[34] Moreover, in Drosophila it has been observed that dsRNA is processed into segments 21–23 nucleotides in length, and that these fragments may be guiding the mRNA cleavage.[35] Finally, there is no indication that the nucleotide sequence of the RNAi targeted gene is modified by processes such as methylation.[34]

By taking advantage of the genetic tractability of C. elegans, mutants have been identified that are defective in RNAi;[36] these genes have been named rde-1, rde-2, rde-3 and rde-4 (RNA interference-deficient). The molecular identification of the mutated genes has shed additional light on the mechanisms underlying the RNAi phenomenon. Mutations in the mut-2 and mut-7 (mutator) genes, which lead to the mobilization of transposons in the germline, also reduce the RNAi effect.[37] Similarly, rde-2 and rde-3 mutants, but not rde-1 and rde-4, also have the property Table I. Caenorhabditis elegans homologs of human genes associated with disease. All C. elegans proteins corresponding to predicted C. elegans open reading frames (C.e. ORFs) and corresponding genes (C.e. gene) are listed in the Wormpep33 databasea of mobilizing transposons in the germline. Based on these data it has been hypothesised that RNAi may function physiologically as a defence mechanism designed to prevent transposon activation or viral invasion.[37]

The rde-1 gene encodes a novel member of a gene family that includes the translational initiation factor eIF2c, suggesting a link between RNAi and translation.[36] The mut-7 gene encodes a protein with homology to bacterial RNaseD, which is implicated in mRNA degradation.[37] Another gene identified as essential for RNAi in C. elegans is ego-1, which encodes a homolog of RNA-directed RNA polymerases.[38] The involvement of ego-1 in C. elegans RNAi provides a link between the gene silencing mechanisms called quelling in Neurospora crassa and post-transcriptional gene silencing (PTGS) in plants. It is hypothesized that these phenomena share a conserved core mechanism that may depend on the generation of dsRNA by RNA-directed RNA polymerases.[39]

7. Practical Applications of RNAi

On a practical level, RNAi is one of the most powerful techniques available for probing gene function in C. elegans. All that is required is a DNA template from which to synthesize dsRNA. Sequence similarity should be considered when deciding on a dsRNA template because cross-interference can occur when enes share significant (greater than 70%) nucleotide similarity.[40] To enhance specificity, unique nucleotide sequences of the mRNA should be targeted, such as the 3′ UTR. It is generally preferable to use cDNA sequences as templates for generating dsRNA because dsRNA generated from genomic DNA may affect the penetrance of the RNAi effect,[32,33] however, in the absence of cDNA, exon-rich genomic DNA regions can be used. It is often convenient to generate DNA templates of 250 to 2000 bp for in vitro transcription using the polymerase chain reaction (PCR) and gene-specific primers flanked with T3 and/or T7 RNA polymerase promoter sequences (fig. 2). Subsequently, RNA can be generated in vitro using one of the many commercially available kits containing T3 or T7 RNA polymerase; the products generated are annealed to form dsRNA. Agarose gel electrophoresis is used to assess the efficiency of the synthesis and annealing reactions.

Fig. 2
figure 2

Generating dsRNA for performing RNAi in Caenorhabditis elegans. (a) The DNA source for generating in vitro transcription templates can be genomic DNA or a plasmid carrying either genomic or cDNA sequence. DNA templates for in vitro transcription are conveniently generated by polymerase chain reaction (PCR) using sequence-specific primers flanked with promoter sequences for T7 or T3 RNA polymerase. dsRNA is generated by in vitro transcription. (b) Methods for performing RNAi. RNAi is performed by administering dsRNA using (1) microinjection (2) feeding worms bacteria expressing both sense and anti-sense RNA (3) soaking worms in dsRNA solutions or (4) expressing a ‘snap-back’ dsRNA from a transgene under the control of an inducible promoter.

RNAi effects have been elicited in worms using 4 different methods (fig. 2). The first technique involves microinjecting dsRNA into the syncytial gonad of adult animals.[9] This method requires a knowledge of C. elegans anatomy, specialized equipment, and skill in microinjection. Surprisingly, it has been found that the RNAi effect is capable of crossing tissue boundaries, hence the exact location of microinjection may not be crucial.[9]

Non-invasive methods for eliciting RNAi include soaking and feeding (fig. 2) Both techniques simplify the administration of dsRNA and can facilitate high-throughput RNAi analyses (discussed in section 8). When a worm is soaked in a solution of dsRNA, it ingests the dsRNA, which is subsequently absorbed by intestinal cells to produce an RNAi response.[15] It has also been shown that RNAi effects can be elicited by feeding worms E. coli carrying plasmids expressing both sense and anti-sense RNA driven from inducible promoters.[41] This technique has the advantage that the costs associated with synthesizing dsRNA are reduced and the bacteria carrying the dsRNA producing plasmids become renewable resources that can be used for multiple purposes.

The methods for applying RNAi thus far discussed produce only transient phenotypes. However, it is possible to generate RNAi inducing transgenes that are stably inherited in C. elegans. This method was developed to investigate the mutant phenotypes of genes that play a role in neuronal development.[42] Many neuronal genes had been refractory to other forms of RNAi. Tavernarakis et al.[42] expressed dsRNA in neuronal cells utilizing a transgene under the control of the heat-shock promoter (fig. 2). The snap-back transgene carries 2 copies of a gene-specific sequence oriented in a head-to-head inverted repeat; the RNA molecule expressed from this transgene is thus capable of folding back on itself to generate dsRNA. The heat shock promoter drives expression of the transgene in most somatic cells, including neurons. The direct expression of dsRNA in neurons appears to overcome the RNAi resistance previously encountered using other modes of RNAi. This result suggests that the RNAi effect may be unable to cross neuronal cell boundaries.[42]

As observed with neuronal genes, the method by which dsRNA is delivered to cells and the nature of the gene being studied may affect the ability to detect an RNAi phenotype. Thus it is advisable to attempt different methods for applying RNAi if a mutant phenocopy is not immediately apparent. For genes that act in the embryo, injection into the syncytial gonad of the parent may be the most efficient method for eliminating both maternal and zygotic gene activities in the progeny. For genes expressed later in development, it has been reported that feeding results in 50% more post-embryonic phenotypes than injection.[10] The increased efficacy of feeding may be due to the fact that the animals have a continuous input of dsRNA while feeding.

There are additional considerations to note when performing RNAi. Although RNAi usually leads to the loss of gene function, this loss of activity may range from a partial to a complete loss of activity. In addition, tissues may reveal a differential susceptibility to RNAi. It has also been reported that dsRNA templates derived from different regions of the same gene may produce a gradation of RNAi responses.[18]

8. High-Throughput RNAi Facilitates the Global Analysis of Gene Function

The suitability of RNAi as a method for analysing genes on a global scale has been demonstrated in 2 independent reports that used RNAi to explore the mutant phenotypes of all predicted genes on chromosome I[10] and chromosome III.[11] The studies used different methods to administer the dsRNA: Fraser et al.[10] administered dsRNA from feeding vectors expressing dsRNA corresponding to each predicted gene on chromosome I, whereas Gonczy et al.[11] microinjected dsRNA corresponding to each predicted gene on chromosome III. Both reported similar levels of success in detecting mutant phenotypes; 13.9% of tested genes produced a visible phenotype. Gonczy et al.[11] also found that only 1 or 2 dsRNA species should be included per microinjection to prevent repression of the RNAi phenomenon. The relative efficiency of RNAi in producing mutant phenotypes was tested by comparing 50 genes with known loss of function phenotypes to the RNAi results. 31 of the 50 genes tested resulted in a visible RNAi phenotype and 81% of these phenotypes matched the published phenotype for a corresponding mutant.[10] Embryonic phenotypes were also found to be more susceptible to RNAi than post-embryonic phenotypes. 19 of 21 tested embryonic genes exhibited an RNAi phenotype whereas only 14 of 31 post-embryonic genes exhibited an RNAi phenotype. These studies establish that RNAi is an effective method for screening large numbers of genes to obtain functional information.

9. Direct Applications of RNAi for Drug Discovery

RNAi phenotypes form the basis for screening compound libraries in order to identify compounds that enhance or ameliorate specific phenotypes (fig. 3). For example, dsRNA directed against the C. elegans RAD51 homolog results in increased germline apoptosis.[43] Mutations in the DNA checkpoint/repair enzyme mrt-2 prevent Ce-rad-51-induced apoptosis. Therefore, it is now possible to screen for compounds that antagonise the DNA checkpoint/repair pathway by screening for compounds that ameliorate the apoptotic phenotype of Ce-rad-51 (RNAi).

Fig. 3
figure 3

Screening compounds that affect a specific Caenorhabditis elegans phenotype. Mutant worms displaying an easily scored phenotype, such as egg-laying defective (Egl), are targets for assaying the ability of pharmaceuticals to ameliorate or enhance the selected phenotype. Mutants can be generated either genetically or by using RNAi. Worms are easily propagated in multiwell plates in order to facilitate the parallel screening of multiple compounds.

In a similar manner, it is possible to screen for compounds that phenocopy or mimic an RNAi phenotype. In the case of the C. elegans presenilin homolog, SEL-12, it is possible to screen for compounds that phenocopy the sel-12 egg laying defective phenotype. Compounds resulting in an egg laying block may act in the same pathway as the presenilins and function as inhibitors of presenilin. The availability of the entire C. elegans genomic sequence coupled with the ability to generate dsRNA against any given sequence provides researchers with the opportunity to generate global bacterial libraries expressing dsRNA corresponding to all predicted genes in the C. elegans genome. Such libraries can be used for multiple purposes. For example, worms grown on these bacteria can be treated with a compound that elicits a known phenotype in order to search for genes that enhance or suppress the activity of the compound. It also becomes possible to identify loci that are resistant to the effects of a drug or pesticide and in turn, the identification of these resistant genes can provide rapid information about the possible mode of action of the compound. Screens have been performed using classical genetics to identify mutants that are resistant to the action of drugs such as fluoxetine[44] and the anthelmintic ivermectin.[45] In the case of fluoxetine, resistant mutants were identified that defined a novel gene family encoding multipass transmembrane proteins. These results may have clinical implications relating to the mechanism of fluoxetine action and may identify new targets for drug discovery.

10. Corroborating RNAi

Phenotypes by Obtaining a Knockout Deletion Mutant Presently, C. elegans lacks an efficient system for site-directed homologous recombination and the search for a gene deletion can be laborious, especially when compared to RNAi. However, once sufficient incentive is generated to pursue a target gene of interest, it is usually advisable to generate a chromosomal gene knockout before pursuing further study. Additionally, in cases where there are few potential targets of clear importance it may be more expedient to obtain a genetic knockout of the target gene rather than utilizing RNAi. In the case of Niemann-Pick type C disease, there are only 2 homologs of the human NPC1 gene, designated npc-1 and npc-2.[46] Instead of using RNAi to phenocopy the two C. elegans homologs, which might have proven difficult due to functional redundancy or cross-interference, Sym et al.[46] generated knockout mutations in both npc-1 and npc-2 using the PCR-mediated gene knockout technique described below. The primary strategy for generating a gene knockout uses high-throughput PCR to detect a chromosomal rearrangement in the gene of interest by screening complex combinatorial libraries of worms (fig. 4). Normally the libraries are divided in half in order to prepare DNA and also to save the remaining live worms as frozen stocks. This strategy was initially developed to detect the presence of transposon insertions in the target gene of interest using PCR and a combination of gene-specific and transposon-specific primers. The desired insertion/deletion worm was identified after multiple rounds of sibling selection in which pools testing positive for a rearrangement were continuously sub-divided into pools until the desired single worm carrying the mutant allele was found.[47] Unfortunately, transposon insertions are often silent because they preferentially integrate in A/T rich sequences, such as introns. Therefore, this method often requires a second step to screen for an imprecise excision event that either removes exonic sequences or creates a frameshift mutation.

Fig. 4
figure 4

A strategy for identifying a Caenorhabditis elegans deletion mutant. The F1 progeny of animals mutagenized with trimethyl psoralen (TMP) and long wave UV light (UV) are cultured in multiwell plates to generate a worm library. Half of the library is used to generate genomic DNA and the remaining half is stored to maintain a population of live worms. The isolated DNA is pooled combinatorially and PCR is performed using nested primers to identify a deletion in the targeted gene. The deletion allele is detected by agarose gel electrophoresis as a band that migrates with higher mobility than the wild-type allele. A single worm carrying the deletion allele is isolated by sib-selection from the worm library address identified from the PCR screen.

To circumvent the need to perform two rounds of PCR screening in order to find a gene rearrangement that produces a mutant phenotype, a direct mutagenesis approach is now being followed. The mutagen trimethylpsoralen (TMP) coupled with long wave ultraviolet light (UV) appears to be the most effective method for generating small deletions.[48] Thus, combinatorial libraries of TMP/UV mutagenized worms have been generated.[48] A deletion allele is detectable during the PCR screening of these libraries because the synthesis of the shorter deletion product is generally favoured over the longer wild-type product. Moreover, because deletions are generated at random, it is usually possible to screen for a specific deletion event and to recover a single worm carrying this allele (fig. 4).

11. Perspectives and Conclusions As more potential drug targets are uncovered by genomic approaches, techniques must be developed to rapidly validate good candidate targets. Model organisms such as Drosophila, S. cerevisiae, and Caenorhabditis elegans will play a significant role in validating these targets. The sequenced genome of C. elegans has led to several global techniques for assaying the biological roles of the genes encoded in the genome, including microarray analysis of gene expression,[49] whole genome RNAi screening,[10,11] genome-wide 2 hybrid analyses of protein interactions[50] and high-throughput screening for PCR mediated gene knockouts.[48] The high throughput capacity of RNAi makes it a particularly attractive method for rapidly screening and validating targets identified by protein-protein interaction, or microarray analyses or in silico gene predictions. With such a robust validation system available, it is possible to use RNAi to determine the gene function of entire classes of predicted proteins with the potential to be involved in human disease, for example phosphatases, kinases, or nuclear hormone receptors. In conclusion, if a C. elegans homolog of a disease associated gene exists, then there is a strong likelihood that the biology of C. elegans may be advantageous in providing information crucial for drug discovery and development.