Key words

1 Introduction

In drug designing process the search for drug molecules with computational methods is frequently performed by molecular docking or to a lesser extent by de novo drug design approaches. While virtual screening relies on pre-existing compounds, de novo design approaches generate novel molecules out of building blocks consisting of single atoms or fragments [1]. The de novo design involves the design of novel structures based on the structure of binding site with which they are meant to interact, which can be identified from an X-ray crystallographic study of the receptor protein with bound ligand. By definition, de novo means “from the new” and involves the design of complex new molecules from smaller constituent parts. The creation of any new chemical entity could be considered de novo design, while in practice the de novo design will be defined as the creation of novel chemical entities that are substantially different from any of the starting materials [2].

De novo design involves the design of novel structures based on the structure of binding site with which they have to interact. A drug can be designed which will have the correct size and shape to fit the space available with required functional groups to interact in binding sites [3]. The operator can carry out each of these procedures manually. Automated de novo design of bioactive molecules is one of the key aims in computational chemistry. To approach this idea, algorithms have to provide two principal tasks: First, the search space, i.e., the set of all algorithmically tractable molecules must be structured into regions of higher and lower quality to allow for prediction of desired properties (protein binding), and second, systematic search to make easy navigation in a high-dimensional chemical space [4].

One of the principal attractions of designing drug-like molecules starting from simpler building blocks is the potential to cover a larger fraction of the available compounds within the size range of interest. A relatively small number of fragments can be compiled to cover most of the possible shapes, features, and properties contained in a much larger set of drug-like compounds. This is in contrast to the very small amount of the accessible space that can be covered in a screening library for molecules of drug-like size [2, 5, 6]. Conceptually, Schneider and Fechner focused on three basic questions, which must be addressed by a de novo design program: (1) how to assemble the candidate compounds; (2) how to evaluate their potential quality; (3) and how to sample the search space effectively. All algorithmic decisions of a de novo design program are noticeably appraised by the quality of their outcome, which in turn crucially depends on a meaningful reduction of the search space [7].

In this chapter, we describe strategies and methods used in computer-based molecular de novo design methods on a conceptual level. The application of this approach is exemplified with a detailed description of the design of CDK2 and CDK4 inhibitors obtained by fragment-based de novo design program to identify promising scaffold candidates.

2 Materials

  1. 1.

    GANDI (Genetic Algorithm-based de Novo Design of Inhibitors).

  2. 2.

    Protein Data Bank (PDB) for protein structures.

  3. 3.

    Molinspiration Cheminformatics (www.molinspiration.com) to construct molecules and library of fragments design.

  4. 4.

    CORINA and Babel for conversion from SMILES strings to MOL2 format.

  5. 5.

    SEED software to dock fragment library into the receptor binding site.

  6. 6.

    LEGEND de novo design program.

  7. 7.

    MODELLER for homology or comparative modeling of protein three-dimensional structures.

  8. 8.

    CHARMm force field for model minimization.

3 Methods

The methods described here allow to the top ranking molecules to be involved in one to three hydrogen bonds with the backbone polar groups in the hinge region of CDK2, an interaction pattern observed in potent kinase inhibitors [1]. Crystallography has revealed that the ATP-binding site of CDK2 can accommodate a number of diverse molecular frameworks, exploiting various sites of interaction. In addition, residues outside the main ATP-binding cleft have been identified that could be targeted to increase specificity and potency [8]. For simplicity of reference the ATP-binding site is generally described by means of the ATP-binding mode as: The adenine pocket, the ribose pocket, and the phosphate groove. Figure 1 shows a schematic of the naming scheme incorporating two additional regions. An important feature in terms of drug design is that ATP does not occupy the total volume of the cleft and there are nonconserved regions that can be exploited in the development of inhibitors [911].

Fig. 1
figure 1

Schematic presentation of ATP-binding site categorized according to ATP-binding mode in CDK2 with additional nonconserved regions

The search for drug molecules with computational methods is often performed by high-throughput docking or to a lesser extent by de novo drug design approaches [7]. The de novo procedures can also be done manually [4]; however this procedure would not be a good idea (see Notes 1 and 2 ). The de novo design approaches generate novel molecules out of building blocks consisting of single atoms or fragments. Because of the difficulty to predict synthetic accessibility, de novo drug design tools often generate molecules that demand synthesis [12]. Other important points could be also taken into consideration in de novo design (see Notes 3 7 ). Alternatively, there are many software packages available which will carry out the process automatically; some of de novo design programs are given in Table 1.

Table 1 Different de novo design programs with their scoring function

3.1 Fragment-Based De Novo Ligand Design Strategy of CDK2 Inhibitors

In this section a novel approach for Genetic Algorithm-based de novo Design of Inhibitors (GANDI) is presented [1]. GANDI is a fragment-based method that generates molecules by joining pre-docked fragments with linkers with parallel genetic algorithm [47], employing the simultaneous evolution of multiple populations used in GANDI to search for feasible solutions. Only the pre-docked fragments are encoded by the genetic algorithm, while suitable linker fragments are efficiently evaluated with a tabu search [4850]. GANDI is evaluated on CDK2, and it is able to suggest molecules with new scaffolds or substituents which preserve the main binding interaction motifs of known inhibitors of CDK2 [1].

3.1.1 GANDI Procedure for Design of CDK2 Inhibitors

In order to connect pre-docked fragments with linker fragments (see Fig. 2), GANDI uses a combination of two stochastic search procedures, namely (1) a genetic algorithm [51, 52] and a tabu search [4850], which are as follows.

Fig. 2
figure 2

Schematic illustration of GANDI with general procedure (a) from top to bottom including two iterative procedures, which are the main loop of the genetic algorithm and the random tabu search (b)

  1. 1.

    The genetic algorithm in GANDI is an island model, using the simultaneous, noninteracting evolution of multiple populations at the same time. The exchange of genetic material is performed after a certain amount of optimization iterations by swapping individuals (i.e., molecules) between neighboring, all, or randomly picked islands as reported by Dey and Caflisch [1]. Every individual contains a single chromosome consisting of multiple genes. Contrary to classic genetic algorithms (see Note 8 ) [7, 53], the implementation in GANDI uses integers as gene values encoding indexes of docked fragments.

  2. 2.

    The next step involves linking the encoded fragments for each individual by a tabu search [4850]. GANDI builds a look-up table containing all distances and angles of all pairs of linker fragment connection vectors. All possible connections of fragment pairs of an individual are generated with cutoff values and the look-up table. A connection solution is randomly picked, and the two fragments are joined with the linker [1].

3.1.2 3D-Similarity and Selection of Fittest Individuals

The detection of 3D-similarity between molecules may be measured as reported by several researchers [1, 5456]. The scoring function implemented in GANDI is a linear combination, and force field-based energy function consists of van der Waals and electrostatic terms. The potentials of the receptor are calculated and stored on a grid [56] and used only for the linkers. The next term in the scoring function is a fingerprint-based 2D measure of similarity (2D similarity) as Dey and Caflisch have used the crystallographic structure of CDK2 with an oxindole-based inhibitor [1, 57] from the protein data bank (PDB ID: 1KE5). The crystallographic ligand can be prepared as described for the protein but minimized without any constraints inside the rigid protein binding site.

3.1.3 Selection of Fragment Library and Docking with SEED

In this step, library of fragments from Molinspiration Cheminformatics (see Note 9 ) is the source of constructed molecules (www.molinspiration.com). The library consisted of 20,000 fragments with one and 20,000 fragments with two connection points occurring in bioactive molecules. Both sets are converted from SMILES strings to MOL2 format with CORINA [58, 59] and Babel [60]. Then, all are docked into the receptor binding site with SEED, a program for docking mainly rigid fragments with evaluation of protein-fragment energy and electrostatic desolvation [56] (see Note 10 ).

3.1.4 Quality Assessment of Binding Modes

The quality of the binding modes generated by GANDI is further assessed with an in-house developed suite of programs for flexible ligand docking. First GANDI molecules were decomposed into fragments by DAIM [61]. DAIM prioritizes also the resulting fragments according to their suitability as anchors for the docking program FFLD [62]. In the present case study applied to CDK2, 2D similarity is used only for analysis (and not during GANDI runs) by Tanimoto coefficient based on normalized DAIM fingerprints [1, 61].

3.2 Generation of Potent CDK4 Inhibitors with De Novo Design Program LEGEND

This section presents the strategy of de novo design program LEGEND with structure-based design of novel and potent CDK4 inhibitors. To obtain the CDK4 protein structure, homology model is build, based on monomeric form of CDK2. It has been implemented a new de novo design strategy which combines the de novo design program LEGEND with in-house structure selection supporting system SEEDS to generate new scaffold candidates [63].

3.2.1 Homology Model Building of CDK4 Protein

According to the level of sequence homology between CDK4 and CDK2 (45 %), the CDK4 homology model has been constructed based on the CDK2 Protein (see Note 11 ). The model is minimized with the CHARMm force field [64] with the exception of the conserved region in CDK4 and CDK2. This minimized structure is used for scaffold design in the de novo design strategy.

3.2.2 Identification of New Scaffolds with Structural Requirements

In CDK2, the NH group in Leu83 is the most important in binding of the inhibitors because it serves as a hydrogen-bonding donor in every structure reported in earlier studies [11, 65]. The main chain carbonyl groups of Glu81 and Leu83 also seemed to be important because most inhibitors form hydrogen bonds with them. In CDK4, most of the residues that are important for hydrogen bonds interactions are conserved. Among the altered amino acid residues, those between Leu83 in CDK2 and Val96 in CDK4 would not be critical since only the main chain is used for hydrogen bond(s). Therefore, this information concerning structural requirements is also useful to find out new class of CDK4 inhibitors and is used to identify new scaffolds that satisfy these structural requirements [63]. LEGEND program is based on the atom-by-atom approach and is suitable for generation of drug molecules in cavity of the ATP-binding site [66]. The output structures from primary step could not be commercially available (see Note 12 ). In the case of CDK4, a diarylurea scaffold has been identified as appropriate for rapid construction of structurally diverse libraries [63]. The data obtained from LEGEND de novo design suggested that neighboring NH and CO in the diarylurea scaffolds form hydrogen bond(s) with the main chain(s) of Glu94 and Val96, and aromatic rings are supposed to be located in the hydrophobic regions of CDK4. According to these insights, Honma and his group also provided further strategy for constructing diarylurea informer libraries based on the structural requirements of Cdk inhibitors in the ATP-binding pocket of the Cdk4 mode. Further, docking study is performed to investigate the binding mode of Diarylurea in CDK4. Moreover, design and synthesis can be also executed based on the predicted binding mode. Finally, X-ray analysis could be done for confirmation of binding mode from de novo design.

4 Notes

  1. 1.

    In general, the structure of the binding site is identified from X-ray crystallography and it is possible that the designed molecule may not bind to the binding site exactly as predicted. If the projected fit is too rigid, a minor modification in the binding mode may prevent the molecule binding. It would be better to have a flexible structure in the first instance [4].

  2. 2.

    It is significant to leave possibility for variation and elaboration in the molecule, and this agrees on fine modification of the molecule’s binding affinity and also on required pharmacokinetics.

  3. 3.

    Flexible molecules are better than rigid molecules, because the former are more likely to find an alternative binding conformation. This allows modification to be carried out based on the actual binding mode. On other hand, if rigid molecules fail to bind as predicted, it may not bind at all.

  4. 4.

    It is pointless designing molecules that are difficult or impossible to synthesize.

  5. 5.

    Similarly, it is pointless designing molecules that need to adopt an unstable conformation in order to bind.

  6. 6.

    Consideration of the energy losses involved in water desolvation should be taken into account.

  7. 7.

    There may be subtle differences in structure between receptors and enzymes from different species.

  8. 8.

    Evolutionary algorithms are based on the ideas described by Charles Darwin, which are population-based optimization algorithms that mimic biological evolution with the genetic operators. Mutation introduces new information into a population, whereas recombination exploits the information inherent in the population [7].

  9. 9.

    Molinspiration offers a broad range of cheminformatics software tools supporting molecule manipulation and processing, including SMILES and SDfile conversion, normalization of molecules, generation of tautomers, molecule fragmentation, calculation of various molecular properties needed in QSAR, molecular modeling and drug design, high quality molecule depiction, molecular database tools supporting substructure and similarity searches.

  10. 10.

    It is significant to note that in the Dey and Caflisch studies, 6882 fragments with two connection vectors were used also as linker fragments in GANDI. They have pointed out the protocol of GANDI which integrated three different parameter settings with 400 individuals in a single island, 4 islands of 100 individuals each, and 4 islands with 100 individuals exchanging 5 % of all individuals every 20th iteration with a randomly selected island. The minimized oxindole-based inhibitor co-crystallized with the protein (PDB code 1KE5) was used as a target structure [1].

  11. 11.

    De Bondt et al. in 1993 have solved the first X-ray structure and analysis of CDK2. Although the structures of the CDK6-p16 complex and CDK6-p19 complex were already reported, their cyclin binding region and the ATP-binding pocket in CDK6 were both largely altered by the endogenous inhibitors (p16 and p19). Therefore, despite a very high sequence identity between CDK4 and CDK6 (70%), CDK6 structural information could not be used for designing CDK4 inhibitors [63].

  12. 12.

    This difficulty appears to apply to most de novo design programs that build up structures sequentially. To overcome this disadvantage, one can use an in-house program SEEDS [63].