The classical progression of the pharmaceutical discovery process goes from drug target to lead compound to drug. Effective discovery of disease-associated targets for further validation is the first critical step in this process. The more information we have about potential drug targets, the more opportunities we have to develop successful drugs. Genomics research has deepened the pool of potential drug targets, however, a major challenge for drug development continues to be the rapid and accurate identification of drug targets with true potential. It is reported that just 483 drug targets account for nearly all the drugs currently on the market (45% receptors, 28% enzymes, 5% ion channels, and 2% nuclear receptors).[1] However, it is estimated there might be thousands of drug targets within the human genome, indicating the huge potential for drug target discovery.[24] Currently, most of the new drugs approved by the regulatory authorities are based on protein targets for which marketed drugs already exist.[5] Addressing this ‘innovation gap’ has resulted in the development of the new paradigm of genomics-based drug discovery, with bioinformatics having a key role in the exploitation of genomic, transcriptomic, and proteomic data to gain insights into the molecular mechanisms that underlie disease and to search for targets that will lead to new drugs.[68]

In this review, some of the data resources and computational methods for the identification of potential drug targets are summarized.

1. Defining Drug Targets

Drug targets are membrane or cellular receptors or other molecules that are pivotally involved in a disease process. From a pharmacological viewpoint, a drug target is either inhibited or activated by drug molecules (e.g. small organic molecules, antibodies, therapeutic proteins). Drug molecules can physically attach to a drug target, triggering a cascade of intracellular biochemical reactions, followed by a cellular reaction. Potential drug targets can include:

  • • genes that are differentially expressed between individuals who are and are not in need of treatment for a particular disease or condition;

  • • genes that are differentially expressed when that individual is exposed to a drug known to alleviate or exacerbate the symptoms of interest;

  • • genes that are co-expressed with other genes presumed to be involved in the systems and pathways under study;

  • • genes that serve as pathway initiators.

Any gene (or its product) falling into any one of those categories may be a gene for which manipulation of its expression might affect disease or symptom progression.[9]

1.1 Characteristics of a Putative Target

Based on the analysis of the molecular targets of current therapies, biologists have revealed that the most successful drug targets share several basic characteristics.[10] First, the most successful targets tend to be amenable to medical intervention using therapeutic drugs that fall into three major classes: small molecules, antibodies, or therapeutic proteins. Secondly, in inhibiting or activating the target should have a clear therapeutic effect. Thirdly, a drug target should have robust assay systems for in vitro characterization and high-throughput screening. In addition, an ideal target should be specific and essential disease process, and targeting it should not only address unmet medical needs but also serve major medical markets. The principles described above have been consistent traits associated with targets of proven value and, therefore, can be used as a simple set of rules to guide target discovery, validation, and development.

1.2 Successful Target Classes

Certain classes of proteins are more amenable to drug development than others. Historically, G-protein coupled receptors (GPCRs) have been the major drug target class for the pharmaceutical industry, with ion channels, nuclear hormone receptors, proteases, kinases, phosphodiesterases, phosphatases, and other key enzymes making up the remaining target classes.[2,7,11] Proteins within these target families tend to exert a biological effect that is amplified within cells or organisms by a variety of signaling mechanisms.

1.2.1 G-Protein Coupled Receptors

GPCRs constitute one of the most important families of drug targets in the pharmaceutical industry and are central to the signaling networks that regulates basic cellular processes.[12,13] Over the last decade the number of characterized GPCRs has grown steadily and more than 700 GPCR genes have been identified from human genome.[14] However, given the track record of GPCRs as validated drug targets, the vast number of potentially untapped targets within this superfamily still presents an intriguing challenge for drug target discovery. Despite their importance, the power and utility of microarray technology has not been extended to membrane proteins because of significant technical challenges associated with their fabrication and use. GPCRs, like other membrane-embedded proteins, have characteristics that make their three-dimensional structures extremely difficult to determine experimentally. To date, the three-dimensional structures of GPCRs are unsolved, except for that of the GPCR-bovine rhodopsin.[15] As a result, the structure-based in silico methods of drug discovery cannot be used effectively with regards to GPCR targets, and the design of ligands of GPCRs has to rely on ligand-based techniques.

As GPCRs are proven to be important drug targets, the pharmaceutical industry is devoting enormous amounts of money and manpower to identify these targets. The success of this monumental effort depends on the correct identification and delineation of the functions of GPCRs and effectively applying that information toward drug discovery.[13] One of the key areas for innovation is further development of two-hybrid methods suitable for GPCRs and other transmembrane proteins.

1.2.2 Ion Channels

Ion channels are another attractive drug target class. Ion channels have potential as drug targets for several reasons. First, ion channels are required in various normal physiological processes. The dysfunction of ion channels can have a strong impact on cellular function and signaling. Secondly, ion channels belong to one of a few protein classes that are highly amenable to regulation by small molecule drugs. Thirdly, ion channels are expressed in numerous cell types and occur as large families of related genes with cell-specific expression patterns. Despite their remarkable physiological value, ion channels remain a relatively unexploited therapeutic target class, especially in comparison with target areas such as GPCRs or kinases. Major challenges have been the lack of high-throughput screening assays and available targeted libraries of candidate ion channel modulators.[1618]

However, ion channels are currently experiencing renewed interest from pharmaceutical and drug discovery companies due to the progress of new high-throughput technological approaches. The large number of diseases that are attributable to ion channel dysfunction are the primary drivers for the development of ion channel targets. According to a report by the US Food and Drug Administration (FDA) in 2003, the number of new approved drugs targeting ion channels is equal to or even higher than those targeting proteases, polymerases, and reverse transcriptases. In the post-genomics era, progress in function genomics will reveal the tissue-specific distributions of ion channels and a greater understanding of these proteins, meaning that ion channels will play an increasingly important role as therapeutic drug targets in a number of areas, including asthma, inflammation, arrhythmia, and CNS disorders.[19]

1.2.3 Nuclear Hormone Receptors

Nuclear hormone receptors are outstanding targets for drug discovery, not only because of their profound roles in human physiology and diseases but also because their structures allow them to interact with small chemical molecules.[20]

The current members of the nuclear receptor gene family can be divided into two main classes: the ‘validated’ nuclear receptors, whose ligands and endocrine pathways are established and as a result serve as bona fide drug targets for human disease; and the ‘orphan’ nuclear receptors, whose ligands, target genes, and physiological function are not completely understood, and offer new first-in-class targets for large therapeutic areas, in particular, cardiovascular and metabolic disorders. In addition, several members of the nuclear hormone receptor superfamily are directly involved in tumor progression, or conversely, have shown tumor-suppressive potential through modulation of cell proliferation, differentiation, and apoptosis (e.g. the anticancer drugs tamoxifen and flutamide act by targeting nuclear receptors). Using advanced structure-based bioinformatics tools, Inpharmatica Ltd has identified 16 proteins with previously unrecognized structural similarity to the ligand-binding domain of the nuclear receptors, all clearly outside of the known family members.[21] The detailed knowledge of the structural mechanism underlying activation and inhibition of nuclear receptors by small molecule modulators begets important therapeutic opportunities.

1.2.4 Proteases and Kinases

Given the importance of altered protease expression/function in many diseases, proteases and their substrates are increasingly viewed as important drug targets.[22,23] Proteases exert high-order post-translational control over a diverse range of cellular functions. Elucidating the substrate repertoire of a protease is critical to understanding its biological role. Serine proteases, the largest human protease gene family, have been implicated in the growth and progression of solid tumor cancers, including breast and prostate cancer.

Protein kinases also present distinctive advantages as potential drug targets.[24] The human genome contains over 500 different protein kinases, which are the key regulatory enzymes that catalyze the phosphorylation of proteins at about 100 000 different sites to reversibly control their functional activities. Defects in specific protein kinases have been linked to over 400 diseases, including cancer, diabetes mellitus, and Alzheimer disease, and about 25% of all pharmaceutical industry research and development is now focused on the discovery and evaluation of protein kinase inhibitors for therapeutic applications.[25] Numerous specific kinases have been identified as attractive drug targets for inflammation, cancer, and other diseases. It has been reported that cyclin-dependent kinase-5 (CDK5) may play a role in microtubule-associated protein tau (MAPT) phosphorylation and contribute to the pathogenesis of Alzheimer disease.[26]

2. Strategies for Drug Target Identification

2.1 Tools and Resources for Drug Target Identification

Drug target identification involves acquiring a molecular level understanding of a specific disease state and includes analysis of gene sequences, protein structures, protein-protein interactions, and metabolic pathways.[27,28] The ultimate goal of the process is to discover macromolecules that can become binding targets for lead compounds, each one a potential drug. In the age of genomics, the process of drug target identification needs to incorporate and integrate different sources of data including genetic, transcriptomic, proteomic, and metabolomic data. Relational databases are increasingly effective in facilitating pharmaceutical research and development as they broaden the range of analytical functions and expand the class of data models supported.

Table I lists some important databases for drug target identification. One of the most important resources is the human genome itself and associated annotations. The public data infrastructure is also as important as the data and includes algorithms for sequence analysis, gene expression analysis, proteomics analysis and that for protein structure prediction — one of the most computationally intensive exercises in the drug discovery process.[29] Although these resources represent a good general reference, they also possess significant limitations; most importantly, in many cases the amount of data contained in them are insufficient to be used for effectively for identifying drug targets.

Table I
figure Tab1

Databases and web sites of interest for drug target identification

2.2 Bioinformatics Strategies for Drug Target Identification

With the development of bioinformatics, many computational techniques have been proposed for searching novel drug targets from genomic information. In this paper, the bioinformatics approaches for drug target identification are summarized as four classes: (i) the gene-to-target approach; (ii) the disease-to-target approach; (iii) the gene network approach; and (iv) the protein interaction network approach.

2.2.1 The Gene-to-Target Approach

Selection of a Certain Target Class

For the gene-to-target strategy, the first step is to select a common class of drug targets, then to design computational methods to find new members of this class and to predict their function based on available knowledge and information of the target class. Suitable target classes are those protein families whose members have been proven to be successful targets historically, such as GPCRs, ion channels, kinases, and nuclear hormone receptors. Here we select GPCRs as a case study based on the enormous amount of current pharmaceutical research aimed at understanding their structure and function.

Predicting New Target Genes from a Certain Class

Once we have chosen a particular class of targets, the next step is to screen sequence databases and identify all the possible candidates of that class. Recent studies demonstrated that discovering new members of a target class is important not only for finding useful drug targets but also for understanding the molecular basis of diseases. Early efforts to predict new targets of a particular class relied on two strategies.

  • Data mining the genome: mining the human genome sequence can detect new protein coding genes and find new members of particular target classes.[27,50,51] GPCRs represent the most important target class for drug discovery. Many strategies have been used to identify novel GPCRs for various sequenced genomes. The common strategies attempt to find similar sequences of known GPCRs from sequence databases using primary database search tools (e.g. BLAST) or more sophistiones that are coupled with the search of pattern databases such as PRINTS.[52,53] However, in many cases, these have not been sufficiently successful for the identification of GPCRs, since GPCRs make up a highly divergent family, with strikingly little sequence similarity shared between members. In order to overcome these limitations, other in silico approaches that incorporate such features as amino acid compositions, physiochemical properties,[54,55] and transmembrane topology patterns of GPCRs have been proposed.[56,57] In addition, the incorporation of ab-initio gene prediction techniques should also be useful in the discovery of new GPCR targets.[58]

  • Data mining the expressed sequence tags (ESTs): genome-wide sequencing projects aim to identify all genes contained in genomes. The huge number of ESTs provides a valuable resource for gene identification, characterization, and tissue-specific gen expression analysis.[7,59,60] One of the most important applications of EST databases (e.g. dbEST) in target discovery is to indentify new genes of a target class and infer relative gene expression levels.[61] Wittenberger et al.[62] demonstrated a comprehensive EST database search method to identify new members of the GPCR superfamily. They found at least 14 ESTs that are promising candidates for new putative GPCRs, and five of them, namely GPR84, GPR86, GPR87, GPR90, and GPR91 sequences, were experimentally validated. Furthermore, it was found that GPR86 is central to the pathophysiology of hematopoiesis and immune system disease. Marvanová et al.[63] also investigated the use of ESTs as a starting point to map brain expression patterns and to identify potential novel drug targets. There are many factors that prevent ESTs from being widely exploited, including alternative splicing and the “error prone” characteristic of ESTs. Further studies are needed to tackle these problems in order for ESTs to be more fully utilized.[64]

Predicting the Function of New Genes

One essential requirement for a drug target to be useful is to understand its function.[27] The elucidation of gene function in silico is an important field for bioinformatics in target discovery. Nevertheless, determining protein function is one of the most challenging problems in the post-genomic era.

Functional annotation of completely sequenced genomes has proved to be a formidable task, and large segments of genes are as yet uncharacterized. Even in well studied genomes, such as Escherichia coli, ∼30% of the genes are annotated as being of unknown function. In the malarial parasite Plasmodium falciparum, ∼60% of genes lack functional assignments.[65] A significant limitation in understanding gene function is the lack of assays evaluating signal-specific cellular metabolic events downstream of the anticipated changes in gene expression and protein phosphorylation. The availability of entire genome sequences and high-throughput capabilities to determine gene function has shifted the research focus from the study of single proteins or small complexes to that of the entire proteome. However, the technology for discovering gene function is lagging behind the advances made in genomic sequencing.

Accurate computational function prediction, which is helpful for speeding up the functional annotation of gene products, has become an increasingly important problem in the field of bioinformatics.[6669] By searching similar protein sequences with known function annotations, one can draw some inferences about the function of the uncharacterized gene. More sophisticated methods of incorporating sequence information such as sorting signals, post-translational modifications and domains to predict protein function have also been developed.

The function of a protein is highly correlated with its three-dimensional structure and the structural information is also very important to drug discovery and design. However, for many known protein sequences, their three-dimensional structure information is lacking. Therefore, further studies are needed to develop more accurate structure prediction methods and strategies of linking structure to function. Heterogeneous data should be integrated to take this problem. Recently, some researchers have initiated systems biology approach to predict target gene function.[70]

2.2.2 The Disease-to-Target Approach

Focusing on a Specific Disease

The identification of therapeutic targets requires knowledge of the etiology of a disease and the associated biological systems. The disease-to-target approach first focuses on a specific disease, or at least diseases in specific therapeutic categories. Then, various techniques such as gene expression analysis and linkage analysis are adopted to identify disease relevant genes and drug targets. Many pharmaceutical companies have focused on specific diseases for drug target identification.

Identifying Disease Relevant Genes and Drug Targets

Microarray technology, which can be used for measuring the expression levels of thousands of genes simultaneously in a single experiment and generating gene expression profiles can be utilized to discover disease relevant genes and drug targets.[71] Microarray experiments can not only identify novel candidate molecular targets and biochemical pathways that may be therapeutically exploited, but also increase our understanding of the biology of a disease process, and define how a specific compound affects the regulatory networks involved in cellular metabolism, or affects a specific cellular pathways, which may ultimately lead to the identification of other potential drug targets.[72] The first step is to compare the gene expression patterns in various disease stages of healthy tissue, and to identify those genes with differential expression in different conditions. The subsequent process then focuses on whittling down the candidate genes to those that seem central to the disease process, and whose products are likely to be amenable to therapeutic intervention. For example, Cellzome Ltd (Heidelberg, Germany) has tried to identify and validate potential drug targets associated with Alzheimer disease by this strategy, and has developed a series of small-molecule gamma secretase modulators for the treatment of patients with this disease.

Separating genes causally involved in a disease from innocent bystander genes is a crucial problem in the analysis of disease expression profiles.[73] Many statistical methods have been proposed to detect the expression difference of single gene.[74] These methods generally produce long list of differentially expressed genes, but they provide few clues to which of these changes are important.[75] One promising method is to analyze the alterations of expression at functional level, such as biological pathways, which holds the tremendous potential to detect subtle but coordinate alterations in the expression of groups of functionally related genes and unveil the most relevant genes and functions that contribute to diseases. Currently, some tools are available to provide such analysis, e.g. OntoExpress,[76] GOAL,[77] and MageKey.[78]

For inherited diseases, analyzing chromosome regions that are linked to disease phenotypes can also identify the relevant genes and potential drug targets. The linkage analysis method has been widely used to locate disease loci.[79,80] Within the chromosome region of a disease locus mapped by this strategy, however, there are often hundreds of candidate genes. In order to find the disease-relevant gene, further experiments are needed to check the candidate genes for disease-causative mutations. Obviously, it will be very time-consuming and expensive if the candidate genes are randomly selected in the experimental search for disease-causative mutations. Therefore, the prediction of disease-relevant genes and prioritization of candidate genes for mutation analysis is one of the crucial steps in the identification of disease relevant genes.[68,8183] Currently, the major information used to choose the candidate disease genes for mutation analysis include the gene function, gene expression patterns and features of gene sequences, based on which several computational methods and tools to predict disease relevant genes have also been developed in the recent years.[68,8184] In addition, Pettipher et al.[85] used genetic association approach for the identification of GPCRs involved in inflammatory disease, leading to the identification of genetically associated targets, including TSHR, EDG6, and CRTH2. In this regard, the discovery of disease relevant genes may provide an essential starting point for drug discovery.

Building Predictive Disease Models

Moving all targets forward through development is prohibitive in terms of cost and time. From the perspective of drug target identification for human diseases, predictive disease models that are suitable for rigorous experimentation can support the case for discovery or validation of a target in humans. We cannot realistically hope to characterize all the relevant molecular interactions one-by-one as a requirement for building a predictive disease model. The ultimate goal of the disease model is to be able to model a disease process at the molecular level, to predict which specific chemical compounds are best suited to treating the disease for a genetically defined patient population, and to perform all binding experiments in silico. Intradigm Corporation (London and Cambridge, UK) has developed a unique and proprietary method that employs efficacy in animal disease models as a starting point for target discovery and validation. Intradigm’s target discovery method, which combines gene perturbation of animal disease with pathway analysis, selectively and rapidly identifies those novel targets, operating in complex biological processes, which are activated as disease pathology expands or contracts. Achieving these goals in silico will dramatically improve drug discovery process and pave the way for personalized medicine based on a molecular level understanding of both the patient and the illness.

Phenotypes are generally difficult to recognize and validate, especially at the cellular level. Providing an association between phenotype and genotype is critical to being able to understand and create models of disease. This association is also the key to targeting critical pathways in disease and identifying the genes and proteins that regulate biological processes, thus identifying better drug targets. Predictive disease models that are suitable for rigorous experimentation can give well informed linkages between genotypes and individual phenotypes.[86]

Using Pharmacogenomics in Drug Target Identification

Inherited differences in drug targets, as well as polymorphisms in genes with a role in drug metabolism and disposition, have an influence on the efficacy and safety of therapeutics. The field of pharmacogenomics has the potential to lead to the identification of new drug targets, an improved understanding of the causes for variable drug response, and greater knowledge of the mechanistic basis for drug action and disease pathophysiology.[87,88] The critical strategy for a pharmaceutical company going forward is one that uses pharmacogenomics and biomedical informatics to better define disease targets. Until these are clear, and until some form of biomedical informatics is put into place, therapeutic design is going to be flawed by poorly defined targets.

The study of single nucleotide polymorphisms (SNPs) is crucial for characterizing molecular targets and can also validate the role of these targets in diseases.[89] SNP technology is expected to contribute substantially to the fields of pharmacogenomics and personalized medicine, disease mechanisms, and drug target discovery. An important prerequisite before these next generation achievements can be reached is the ability to analyze complex biological associations, and to identify their relevance for clinical problems. There are great expectations to the potential value in exploiting the accumulating amount of genetic/biological data.[90]

2.2.3 The Gene Network Approach

The aim of the network-based strategy is the reconstruction of endogenous metabolic, regulatory, and signaling networks with which potential drug targets interact. The reason is that if a drug target participates in many biological pathways, the inhibition of this target may interfere with many activities associated with those pathways and, therefore, it may not be a good candidate for drug target.

Genetic interactions are central to the understanding of the molecular structure and function, cellular metabolism, and response of organisms to their environments. If such interaction patterns can be measured for various kinds of tissues and the corresponding data interpreted, potential clinical benefits are obvious for diagnostics, identification of candidate drug targets, and predictions of drug effectiveness. It has already been shown that it is possible to infer a predictive model of a genetic network by overexpressing each gene of the network and measuring the resulting expression at steady state of all the genes in the network.[91] Using the inferred model, we can endeavor to make useful predictions by mathematical analysis and computer simulations. Model-based and computational analysis can open up a window on the physiology of an organism and disease progression. Recently, several computational methods have been proposed along with gene network models such as Boolean networks,[92] differential equation models,[93] and Bayesian networks,[94,95] to infer gene regulatory networks. These quantitative approaches can be applied to natural gene networks and used to generate a more comprehensive understanding of cellular regulation and elucidation of the underlying gene regulatory mechanisms.

2.2.4 The Protein Interaction Network Approach

Proteins are the principal targets of drug discovery. Protein expression in normal and diseased human tissue holds the key to developing more effective drugs to treat a wide variety of diseases. High-throughput proteomics, potentially identifying hundreds to thousands of protein expression changes in model systems following perturbation by drug treatment or disease, lends itself particularly well to target identification in drug discovery, and complements the genomics approach.

Protein-protein interaction data can be utilized in drug target identification.[96] Protein interaction maps can reveal novel pathways and functional complexes, allowing ‘guilt by association’ annotation of uncharacterized proteins and ascribing the role of these proteins into biochemical pathways and networks. Generation of a comprehensive human protein interaction map would facilitate identification of proteins that could be targeted for therapeutic and diagnostic applications. Once the pathways are mapped, these need to be analyzed and validated functionally in a biological model.[97] There are numerous studies aimed at mapping pathways that are involved in disease processes. The goal is to identify the key nodes in a complex network of genes and proteins (and small metabolites, i.e. the metabolome) that can serve as drug targets.

Using an in situ proteomics technology involving whole cell imaging, MelTec GmbH & Co. KG (Magdeburg, Germany) have tried to predict key proteins involved in disease pathways. The ability to monitor pathways with subcellular resolution in the proper tissue context further increases the significance of target predictions, and the ability to associate specific proteins with disease and to localize those proteins within tissues at the cellular level is of critical importance for identifying and validating drug targets.

Although a diagrammatic representation of the information on a pathway facilitates the understanding of the network topology and identification of drug targets, its capacity to predict cell behavior in response to an environmental or genetic change is very limited.[98] The identification of proteins is only the beginning of the process; data analysis and validation of potential protein targets that follows is a time-consuming and labor-intensive process as well.

With our ever-increasing understanding of the complexity of human protein interactions that impact directly on the safety and efficacy of therapeutic interventions, new technology in systems biology allows synergistic interpretation of both types of data in the context of functional networks. Technically, cellular processes are presented as ’interactome’, an interconnected network of signaling, regulatory, and biochemical modules and pathways. Using a representative set of human protein-protein interaction data, the network analysis enables a comprehensive view of disease-implicated pathways which enables the discovery and validation of modules and pathways involving disease-specific protein drug targets as specific network modules.

Powerful bioinformatics software enables rapid interpretation of protein-protein interactions, comparative pathway analysis, accelerated functional assignment, and drug target discovery.[99,100] Current challenges to fully exploit the available experimental proteomics data include the integration of information available across several databases and the in-depth characterization of the data using new and advanced algorithms.

2.3 Systems Biology and Drug Target Discovery

The fact that the total number of genes in the human genome is surprisingly small suggests that much of the complexity of human biology resides outside the DNA sequence itself. The recent availability of large-scale heterogeneous (genomic, proteomic, and metabolomic) data is responsible for the major growth spurt of systems biology. Systems biology — that is, the computational integration of data generated by the suite of genetic, transcriptomic, proteomic, and metabonomic platforms to understand function through different levels of biomolecular organization — offers exciting new prospects for determining the causes of human disease and finding possible cures.[101,102]

Systems biology is currently one of the hottest areas of biotechnology research today and is becoming central to the strategy of many biopharmaceutical and genomic companies. There is little doubt that biomedicine and the pharmaceutical industry stand to be significant beneficiaries of the promise of systems biology. The Cambridge-Massachussets Institute of Technology (MIT) Institute (CMI) brings together two of the world’s leading universities in a dynamic and unique academic partnership to further the application of systems biology and stem cell research to the study and identification of drug targets in complex diseases such as cancer and inflammation. Bioseek, Inc. (Burlingame, CA, USA) has also developed quantitative, automated primary human ‘cell systems biology’ models of inflammation, autoimmunity, and cardiovascular disease that embody disease-relevant complexity for drug discovery.

Systems biology aims at understanding complex biological networks through a combination of (comprehensive) experimental analysis and (quantitative) mathematical modeling.[103,104] At present, however, it is largely unclear which knowledge and data will be required for establishing realistic mathematical models. Related to this, it is equally important to ask to what extent the already available data allow for meaningful model development. Therefore, systems biology will also encompass the development of tools and experimental approaches to produce quantitative data. This type of information will help us to better understand diseases and, hence, systems biology will become an integral part of drug target identification.

3. Discussion

Whether the number of actual drug targets is correct or not, the currently available data strongly suggest that the present number of known and well validated drug targets is still relatively small. Bioinformatics is making practical contributions in identifying large numbers of potential drug targets; however, target validation efforts are required to link them to the etiology of known diseases and/or to demonstrate that the novel targets have relevant therapeutic potential. A number of problems are still present in the current approaches.

  • • An increasing number of bioinformatics tools coupled with the lack of an integrated and systematized interface for their selection and utilization is becoming widely acknowledged.

  • • The processing and exploitation of useful information from genomics data pose a challenging problem. Sophisticated bioinformatics platforms should be constructed for integrating genetic and gene expression data and their use in the selection of genes as novel targets.

  • • The identification and validation of drug targets depends critically on knowledge of the biochemical pathways in which potential target molecules operate within cells.

Although database mining and transcriptional profiling clearly have increased the number of putative targets, the current focus is to assign function to new gene targets in a high-throughput manner. This requires a restructuring of the classical linear progression from gene identification, functional elucidation, target validation, and screen development. For this reason, the complexity of the drug discovery process in the post-genome era requires the application of integrated approaches for the rapid advancement of target-to-drug. The study of biochemical pathways is the focus of numerous drug discovery research efforts, and is central to the strategy of drug target identification.

4. Concluding Remarks

Up until about 20 years ago, drug discovery was chemistry-driven, conducted by trial-and-error, and had a paucity of defined targets. Genomics and proteomics technologies have created a paradigm shift in the drug discovery process. With the complete sequencing of the human genome, it is now possible to think of the whole pharmaceutical process as a computational approach, with confirmatory experiments at each decision-point. Genomics-based drug discovery and development is reliant on sophisticated bioinformatics and data management tools. As the molecular dynamics data become more copious and complex, we may need to develop new in silicomethods to provide the reliable, guiding hypotheses for experimental design.

It should be emphasized that although bioinformatics tools and resources can be used to identify putative drug targets, validating targets is still a process that requires understanding the role of the gene or protein in the disease process and is heavily dependent on laboratory-based work. Target validation needs high-throughput screening and selectivity analysis for therapeutic compounds that inhibit a mutant protein involved in the disease process.