Introduction

Diabetes mellitus is a serious threat to human health. The causes of such a chronic disease are very complicated. It is believed that multiple proteins are involved in the causation of this disease. Among those proteins, dipeptidyl peptidase IV (DPP4), a serine protease, plays a major role in glucose metabolism, to cleave a dipeptide from the N-terminus of peptide substrates, such as glucagon-like peptide-1 (GLP-1) and glucose-dependent insulinotropic peptide (GIP) [1]. Inhibition of DPP4 in humans has been proved to increase circulating GLP-1 and GIP levels, which lead to decreased levels of blood glucose, hemoglobin A1c and glucagon [24]. Moreover, the inhibition acts in rapid mode and is of favorable tolerability [5]. Hence, DPP4 inhibitors are pursued as therapeutic agents for the treatment of type II diabetes [68].

Currently, five DPP4 inhibitors are used in clinic as anti-diabetic drugs, and dozens of drug candidates are developed in clinical trials. The first DPP4 inhibitor, Sitagliptin (Januvia, Merck) was approved by the US FDA in October 2006 [912]. Later on, Vildagliptin (Glavus, Novartis) was approved for use in Europe in February 2008, and Saxagliptin (Onglyza, Bristol-Myers Squibb) was approved by FDA in July 2009 [1315]. Alogliptin (Nesina, Takeda) was approved in Japan in August 2010. Last year, Linagliptin (Tradjenta, Boehringer-Ingelheim) was approved by FDA in May. In those structures (Fig. 1), Vidagliptin and Saxagliptin share a cyanopyrrolidine-moiety in common; the xanthene part of Alogliptin is similar to the pyrimidine part in Linagliptin; the amino linkage in Sitagliptin looks like that in Vidagliptin and Saxagliptin. However, the potential relationship between Sitagliptin and renal impairment was reported recently [16, 17]. Therefore, DPP4 is still an attractive target for the treatment of diabetes.

Fig. 1
figure 1

Structures of DPP4 inhibitors approved or in late stage clinical development. The magenta shadow represented the cyanopyrrolidine moieties, the blue shadow represented the xanthene/pyrimidine moieties, and the olivine shadow represented the amino-like linkage

The early examples of DPP4 inhibitors were mainly discovered via the ligand-based methods [10, 15, 18]. Now lots of crystal structures of DPP4 complexed with ligand have been determined experimentally, and therefore the structure-based methods can be used to discover novel structures. The crystal structures demonstrated that the binding pocket of DPP4 was located inside the center and consists of S1 pocket, S2 pocket, catalytic triad Ser630-Asp708-His740, Tyr547, Glu205-Glu206 and other residues [1921]. Most DPP4 inhibitors contain a hydrophobic part to occupy the S1 pocket, formed by residues Tyr631, Val656, Trp659, Tyr662, Tyr666 and Val711. The S2 pocket is bounded by Ser209, Phe357 and Arg358 [22].

In this study, virtual screening was carried out to identify potential DPP4 inhibitors from a large number of compounds based on the crystal structure of DPP4, and illustrated in Fig. 2. We employed a multi-step virtual screening including both rigid and flexible docking, and a druglike filter to search for novel structures with DPP4 inhibition. Observing that most drug structures were relatively small and lipophilic [23], people started to use some rules for molecular properties to evaluate the solubility and permeability of structures, which is called druglike filter. We employed a multi-step virtual screening including both rigid and flexible docking, and a druglike filter to search for novel structures with DPP4 inhibition, which is illustrated as Fig. 2). Biological assays were then used to evaluate the inhibitory activities of selected compounds. Molecular docking and pharmacophore modeling were further performed to understand the binding mode of these inhibitors to DPP4. The results might be helpful for understanding of the inhibitory mechanism and further structural optimization of these compounds.

Fig. 2
figure 2

Flowchart of the structure-based virtual screening used in this study

Materials and methods

Virtual screening

Virtual screening was carried out using the Glide module of Schrödinger Suite 2008 [24] on a Dell PowerEdge 2950 cluster. Crystal structure of DPP4 was obtained from the Protein Data Bank (PDB code: 2P8S [25]). Water molecules and chain B were removed from the original data file. And also the ligand in the complex, which was similar to the structure of Sitagliptin (Fig. 2), was removed, but would be used as an active compound later. The rest of chain A was prepared using the protein preparation wizard protocol [24]: hydrogens were added; bond orders were assigned and overlaps were treated; the impref utility was run to perform restrained optimizations of the protein; OPLS2005 force field was employed.

Structures of 50 active compounds [10, 2528] from literature were initially sketched in ISIS/BASE [29], and then imported into the SPECS database [30]. 197,116 compounds from the database and the added active compounds were initially screened using the parallel version of Glide program [24] in standard precision. All the compounds were then exported into a SDF file, and prepared with Ligprep [24]. The enclosing box of docking was set within the length of 10 Å from the center, which was defined by the virtual center of a few key residues in active site around the reference ligand: Glu205, Glu206, Ser209, Phe357, Arg358, Tyr547, Tyr662 and Tyr666 [25]. Grid was generated with the Receptor Grid Generation program [24]. Following docking, the molecules ranked top 7216 in glide score were selected for further screening, using the docking score of the active compound as cutoff value.

In the next step, an improved Lipinski’s ‘rule of 5’ was applied as filter criteria for druglikeness analysis of all retrieved compounds [23]. They were: molecular weight (200 ≤ MW ≤ 600), number of rotatable bonds (0 ≤ RB ≤ 6), number of H-bond acceptors (0 ≤ HA ≤ 10) and donors (0 ≤ HD ≤ 6). The molecules that violate these rules were eliminated, and 4522 molecules succeeded in proceeding.

Then, we employed AudoDock4 program [31] to complement the previous filters. Different from Glide, AutoDock4 considered the flexibilities of both ligands and sidechains of proteins. It utilized a free-energy scoring function to evaluate the ligand-receptor interactions. Protein grid file was generated with AutoGrid, and parameters were kept as their default settings. Those score-ranked top 99 compounds were selected as the final hits. The samples of those 99 compounds were purchased from the SPECS Company and then evaluated by biological assays.

Biological assays

The ability of compounds to inhibit human recombinant soluble DPP4 was determined using a DPP4 Drug Discovery Kit (Catalog No. BML-AK499, biomol). The fluorimetric assay is based on the cleavage of 7-amino-4-methylcoumarin (AMC) moiety from the C-terminus of the peptide substrate (H-Gly-Pro-AMC), which increases its fluorescence intensity at 460 nm. Stock solutions of test compounds were dissolved in DMSO and were diluted using the assay buffer solution (50 mM, Tris–HCl, pH 7.5). A 40 μL solution of DPP4 enzyme in assay buffer was incubated with 10 μL of various concentrations of test compound solutions. The reaction was initiated by the addition of 50 μL of H-Gly-Pro-AMC (final concentration 5 μM). The activity of DPP4 was determined in a Synergy™2 Multi-Mode Microplate Reader (BioTek) at an excitation wavelength of 380 nm and an emission wavelength of 460 nm. P32/98 (10 μM) was used as a positive control. IC50 values were determined by using the GraphPad Prism software with three independent determinations.

Molecular redocking and pharmacophore modeling

The 15 active compounds were docked back to the binding pocket of the enzyme with induced fit docking protocol [24] to further investigate their interaction with DPP4. Each ligand was initially docked by Glide, then the generated protein-ligand complex was minimized by Prime, and the complex structure was docked again by Glide within a specified energy (default 30 kcal mol−1) of the lowest-energy structure. Settings of the binding pocket were the same as outlined above in virtual screening. Thirty conformers were generated for each compound and cluster analysis was then carried out to get the rational one.

To explore the common structural features of these diverse compounds, a pharmacophore model was further generated using the HipHop module within Catalyst software in Discovery Studio 2.1 package [32, 33]. Ligand conformational generation was carried out using the BEST routine with the maximum conformers of 255 and energy threshold of 20 kcal mol−1. The 15 hit compounds were then submitted to the common feature pharmacophore generation procedure. Finally four features including hydrogen-bond acceptor, hydrogen-bond donor, positive ionizable function and hydrophobic group features were aligned in the HipHop algorithm. The pharmacophore model with the best direct and partial hit masks was selected.

Then the pharmacophore model was further verified by virtual screening the database which contains 1000 molecules. In the database, 970 molecules were collected from Specs compound database, and the other 30 molecules [3440]. were known DPP4 inhibitors. As a result, in the top 100 hit compounds, 25 compounds were known DPP4 inhibitors, and in the top 50 hit compounds, 17 compounds were known DPP4 inhibitors. Furthermore, the lowest fit value of the 30 known DPP4 inhibitors was more than 3.2. This result demonstrated that our pharmacophore model has a good ability to detect active compounds from the virtual database.

Results and discussion

In order to identify novel DPP4 inhibitors, a multi-step virtual screening was performed on the basis of the crystal structure of DPP4 complexed with a reference ligand. The SPECS database was used for the virtual screening, and 99 hits were obtained out of the 197,116 compounds. To confirm the result of multi-step virtual screening for DPP4, we investigate the inhibitory activity of 99 hits on DPP4 using a fluorimetric assay in vitro. The biologic assay of DPP4 was carried out for three independent experiments and the results were provided as the average value. Among the 99 hits, 15 compounds were discovered to have inhibitory activities on DPP4 at the micromolar level via biological assays, and the chemical structures of the 15 compounds and their IC50 values were listed in Table 1.

Table 1 The identified compounds with DPP4 inhibition

Analysis of the structure-activity relationship

Compound 1 contained a naphthalene ring, a tetrahydro-indazole and a hydrazine linkage. Compounds 2, 4 and 7 contained a heterocyclic linkage instead. The nitrogen atom contributed mostly to the hydrogen bonding interactions with residues Glu205-Glu206. In particular, compound 3 was composed of hydrophobic benzene and negative moieties. It was quite different from known DPP4 inhibitors, but displayed a DPP4 inhibition at 17.16 μM. Compound 5, without an obvious linkage, showed an inhibitory activity of 30.59 μM. Compound 6 contained a linear backbone without amide linkage or analogues. Compounds 9 and 15 were simply linked by a “sulfur bridge”. Their micromolar activities indicated the possibility of this kind of category as DPP4 inhibitors. Compounds 11 and 13 shared a similar backbone, and therefore they were of similar inhibitory activities of 40.23 μM and 40.65 μM, respectively. According to the structural skeleton, most of our compounds did not belong to these usual structural categories.

Redocking studies of those compounds revealed their possible interactions with the protein. The binding mode of compound 1 was given in Fig. 3a. Its 4H-indazole moiety occupies the hydrophobic S1 pocket at the interface of the two domains. The linker part formed hydrogen bonds with Tyr662 and His740. The naphthalene part interacted with the protein residues through hydrogen bonds with Arg125 and Glu205. Also, naphthalene had the hydrophobic interaction with Phe357, occupying the S2 pocket. Comparatively, the parazine part of compound 2 formed hydrogen bonds with Glu206 and the catalytic residue Tyr547 (Fig. 3b). The two aromatic rings occupied the S1 and S2 pockets, respectively. Other compounds were also found to repeat either the hydrogen bond with Glu205/Glu206, or hydrophobic interaction with Phe357. The binding modes were various due to the different skeleton of these compounds. All the docking scores were given in Table 1, and the binding modes of compounds 3 to 9 were also shown (Fig. 3).

Fig. 3
figure 3

The binding modes of compounds 1 (a), 2 (b), 3 (c), 4 (d), 5 (e), 6 (f), 7 (g), 8 (h), and 9 (i) redocked to DPP4. The surface was colored by their electrostatic potential. The docking results of drug Vidagliptin (j) and Alogliptin (k) were also shown. Those compounds were shown in green stick, and residues of the active site were shown in light gray stick

The binding modes of drugs Vidagliptin and Alogliptin to DPP4 were also repeated with the same docking methods for reference. Vidagliptin here (Fig. 3j) bonded to the protein by hydrogen bonds with Glu205, Glu206, Ser630, Asn710 and His740. For Alogliptin (Fig. 3k), its benzonitrile occupied the S1 pocket, and aminopiperidin in the S2 pocket. Three hydrogen bonds were found between Alogliptin and Tyr547, Gln553 and Tyr585. The benzene also formed a hydrophobic interaction with Tyr547. The binding modes of our compounds do focus on the hydrogen bonds with Glu205-Glu206 as the cyanopyrrolidine-derivative does, and some have the interaction with Arg125 and Tyr547 as the xanthenes-containing Alogliptin. Besides, our compounds have another preference for Phe357.

Pharmacophore modeling

To explore the common structural features and binding modes of these inhibitors of structural diversity, a pharmacophore model (Fig. 4) was further generated based on the 15 active compounds. The reliability of the model was verfied by virtual screening. The model contained four features, namely two hydrophobic (HP1 and HP2), one hydrogen-bond acceptor (HA) and one hydrogen-bond donor (HD). The 3D space and distance constraints of these pharmacophore features were shown in Fig. 4a and b. Compound 1 was selected as the reference to map onto the pharmacophore model, showing an activity value of 5.77 μM. And the model was also superimposed with the active site of DPP4 for comparison (Fig. 4c). It is clear that the hydrophobic features, represented by blue spheres, was mapped to the naphthalene and the 4H-indazole parts, corresponding to the S1 pocket (formed by the side chain of Tyr547, Tyr631, Val656, Tyr662, Tyr666 and Val711) and S2 pocket (corresponding to residues Phe357 and Tyr666); the HA feature, shown in green sphere, was mapped to the carbonyl group, in accordance with one of the catalytic triad His740. And the HD feature, represented by megenta sphere, was mapped to the hydroxyl group at naphthalene, perfectly directing to the hydrogen bonds with Glu205-Glu206 in Fig. 3a. These residues mentioned above were consistent with the observation from the crystal structures. The contribution to the activities from the hydrogen bonds with His740 and Glu205-206 were also provided by site-mutant experiments [19, 21, 41].

Fig. 4
figure 4

HipHop pharmacophore model for DPP4 inhibitors. (a) The distance constraints of the model. (b) The best HipHop pharmacophore model. (c) The best pharmacophore model was mapped onto the binding site of DPP4 crystal structure with compound 1. The ligand was shown in gray stick model and residues were displayed in white stick model. Nitrogen atoms were colored in light blue and oxygen atoms in red. All hydrogen atoms were not displayed

This model was in accordance with a previously published model [42] to some extent, thus confirming its general validity. Moreover, the consistency of the pharmacophore model to our previous 3D-QSAR model [43] on a series of DPP4 inhibitors also indicated the rationality of the model. The two hydrophobic features in S1 and S2 pockets were demonstrated by the hydrophobic favored areas in the 3D-QSAR model. A H-bond-donor-favored areas near Glu205-Glue206 emphasized that a hydrogen bond donor at this moiety was essential for a DPP4 inhibitor. And a positive group in the CoMFA model was suggested to be favored near the HA feature in this pharmacophore model.

We also analyzed other purchased compounds which exhibited no biological activities. Compounds s1 and s2 were selected for the following SAR studies due to their structural similarity with the active hits (Table 1). Compounds s1, 9 and 15 share similar structures. However, compound s1 showed no DPP4 inhibition at 10 μM in the biological evaluation. The absence of the hydroxyl or methoxy group was therefore related to the loss of inhibition, which also implied the importance of the HD feature. The comparison of either compound s2 with compound 1 or compound 11 with compound 13 implied the importance of the HP features in structures.

As predicted by our pharmacophore model (Fig. 4a), the linear distance between the two hydrophobic features is 8.1 Å, representing the length of 5 ~ 6 C-C bonds. And the distance between the HA feature and the HD feature would be suitable for 2 ~ 3 C-C bonds. Hydrogen bonds with Glu205-Glu206 or Arg125 were essential for the inhibitory activities toward DPP4. Residue His740 also provided the chance to form hydrogen bonds with the ligand. S1 and S2 pockets were suitable for some hydrophobic moieties. And a comparison of those active compounds suggested that the absence of a hydrogen-bond acceptor moiety would result in a loss in activity. The hydrophobic parts made the compounds stable at the hydrophobic binding pocket. Due to the structure diversity and novelty of these compounds, this pharmacophore model could be used as a common-structure suggestion for all kinds of DPP4 inhibitors.

Conclusions

In this study, a multi-step virtual screening, including rigid docking and flexible docking, was used to search for novel DPP4 inhibitors from the SPECS database. From the 99 purchased compounds, 15 of them were identified to have DPP4 inhibitory activities with IC50 values at the range of 5 ~ 50 μM. The hits contained different structural features from the classical DPP4 inhibitors. Virtual screening played a pivotal role in the identification of these active inhibitors, and the biological activity analysis of structural variants in combination with the pharmacophore model revealed the basic binding mode of these inhibitors. The pharmacophore model, which coincided with the docking results, also provided insightful suggestions for further improvement of these DPP4 inhibitors.