Introduction

Rheumatoid arthritis (RA) is characterized by the chronic inflammation of joints that leads to destruction of cartilage and bone deformation (Chen et al., 1993; Pincus, 1995). Patients suffering from arthritis and other autoimmune diseases have elevated levels of pro-inflammatory cytokines such as TNFα and IL-1β in the synovial fluid (Dinarello, 1991; Feldmann et al., 1996). They are produced by synovial macrophages and exhibit similar biological activities like cell proliferation, collagen synthesis, adhesion molecule expression, etc. (Dominguez et al., 2005). Clinical therapies with anticytokines were employed to obstruct the hyperactivated pro-inflammatory cytokines and prevent the joint damage caused by TNFα and IL-1β. The cost of anticytokine treatment, mode of administration, and post-treatment infections hampered the effective use of these anticytokine therapies (Andreakos et al., 2002). Intracellular signaling pathways involving p38 mitogen-activated protein kinase (MAPK) regulate the production of several pro-inflammatory cytokines and are considered as a focal point in the development of new therapeutic agents to treat inflammatory disease such as RA (Kulkarni et al., 2006). p38 MAP kinases are proline-directed serine-threonine protein kinases that are activated by various growth factors and pro-inflammatory cytokines by the dual phosphorylation of TXY motifs in the activation loop (Pearson et al., 2001). Four variants of p38 kinase have been recognized, namely, p38α (also known as p38), p38β, p38γ (ERK6/SAPK3), and p38δ (SAPK4) with 40–60 % structural similarity (Lee et al., 1994; Jiang et al., 1996; Li et al., 1996; Jiang et al., 1997). Several reports indicate that p38α has profound role in RA and is involved in the expression of TNFα and IL-1β at both transcription and translation levels while the roles of p38β, p38γ, and p38δ have to be identified critically (Newton and Holden, 2003).

In the last couple of decades, the three dimensional structure-activity relationship (3D QSAR) techniques such as comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) (Cramer et al., 1988; Klebe et al. 1994; Bohm et al., 1999) have been routinely used in modern drug design. Current studies are now focused on the development of novel inhibitors. Several series of p38α kinase inhibitors are reported and despite these, more structurally diverse inhibitors should be discovered. The primary aim of this study is lead optimization and lead identification of p38 kinase inhibitors. QSAR techniques are most often used as a tool for lead optimization within the congeneric domain of molecules. There is strong literature evidence suggesting that lead identification can be achieved by pharmacophore-based methods (Martin et al., 1993; Jones et al., 1995; Patel et al., 2002; Willett and Winterman, 1986). In this study, we combined both these approaches to identify lead compounds using pharmacophore-based approach and optimize the lead compounds using QSAR approach. We have developed 3D-QSAR models for a series of dihydroquinazolinone inhibitors (John et al., 2003) using receptor-based and ligand-based schemes (atom by atom matching and pharmacophore based). The generated CoMFA and CoMSIA models were further validated by the test set. Receptor-based model was selected for further explanation of contour maps. Furthermore, the pharmacophore which was generated for 3D-QSAR studies was validated with the crystal structure-bound conformation of potent p38 kinase inhibitor Dihydroquinazolinone (PDB code: 1M7Q) and then used in the virtual screening of NCI database. Compounds with high screening scores obtained after different filtering were further supported by CoMFA and CoMSIA with good predicted pIC50 values. The results obtained could be useful for further prospects.

Materials and methods

Dataset

Forty five novel p38α kinase inhibitors were taken from literature with their biological activities (John et al., 2003) in terms of IC50 values and forty one molecules were selected for the development of model. Four molecules were excluded because of their low inhibition concentration (IC50). These compounds were excluded as they might act as outliers and it would affect the quality of the model. The IC50 values i.e., the concentration (nM) of inhibitors that produces 50 % inhibition of p38 kinase were converted into pIC50 as reported in Tables 1, 2, and 3. The dataset is randomly divided into a training set of 32 molecules and test set of nine molecules. The IC50 (nM) values were taken in molar (M) range and converted to pIC50 according to the formula given in following Eq.

$$ {\text{pIC}}_{ 50} = - {\text{log IC}}_{ 50} $$
Table 1 Structure and biological values (pIC50) of dihydroquinazolinone inhibitors (series 1)
Table 2 Structure and biological values (pIC50) of dihydroquinazolinone inhibitors (series 2)
Table 3 Structure and biological values (pIC50) of dihydroquinazolinone inhibitors (series 3)

Molecular docking

Docking study was performed to validate the hits obtained from virtual screening using SYBYL 8.1Footnote 1 (SYBYL 8.1., Tripos, St. Louis, MO) molecular modeling package installed on Linux system. Protein structure was prepared using biopolymer module of SYBYL. Hydrogen atoms were added to structure, atom types and charges were assigned using AMBER7 FF99 force field, and side chain amides were modified. Docking study was performed using Surflex-Dock module of SYBYL, which uses empirical scoring function to score ligand and protomol (Ruppert et al., 1997) guided docking. All the molecules were docked to the inhibitor binding site of p38 kinase crystal structure in complex with DQO (PDB code: 1M7Q) (John et al., 2003).

Pharmacophore generation

The pharmacophore hypothesis was generated using GALAHAD (genetic algorithm with linear assignment of hypermolecular alignment of datasets) module of SYBYL. GALAHAD operates in two main stages: the ligands are aligned to each other in internal coordinate space, and then the conformations produced are aligned in Cartesian space. The features considered in developing the pharmacophore model includes hydrogen bond donor atoms, hydrogen bond acceptor atoms, hydrophobic and charged centers. In this study, eight compounds (i.e., 9g, 14b, 14c, 14e, 14k, 15a, 15f, and 15i) were selected to carry out pharmacophore hypothesis and the genetic algorithm was used to create conformers for all molecules. The compounds which were selected to generate pharmacophore hypothesis are highly active. Five models are generated with default parameters. The pharmacophore indicates that donor and acceptor atoms are quite crucial to target the hinge region. The selected pharmacophore model is shown in Fig. 1.

Fig. 1
figure 1

Representation of the selected pharmacophore model. Cyan indicates hydrophobes, green indicates hydrogen bond acceptors, magenta indicates hydrogen bond donors, and red indicates positive nitrogens (Color figure online)

Atom by atom matching alignment

In this scheme, one of the most active compounds (14c) was used as the template. Systematic search was used in the conformational analysis and all rotatable bonds were searched in 10° increments from 0° to 360°. Conformational energy was computed with electrostatic term and the lowest energy conformer was selected. The template was modified for other ligands in the dataset. The common moiety was constrained for each molecule and the substituents were minimized at Tripos force field. The minimized structures were aligned over the template using atom by atom matching method and are displayed in Fig. 2. The alignment obtained was subsequently used for CoMFA and CoMSIA.

Fig. 2
figure 2

Alignment of molecules in the dataset by atom and atom matching method

Pharmacophore-based alignment

Pharmacophore-based alignment was done using GALAHAD. In this scheme, the molecules in the dataset (41 compounds) were aligned to the selected pharmacophore model. The alignment was done based on the common pharmacophores between the molecules in the dataset. The alignment obtained from this scheme along with their pharmacophoric features is shown in Fig. 3 and subsequently used for CoMFA and CoMSIA.

Fig. 3
figure 3

Alignment of molecules by pharmacophore-based alignment methods. 41 molecules in the dataset are aligned based on the common pharmacophores using GALAHAD

Receptor-based alignment

Among the inhibitors in the dataset, one of the compound (14e) was complexed with the receptor is known (21). Therefore, the ligand (14e) extracted from the protein structure (PDB code: 1M7Q) was used as the template molecule. The common moiety was constrained for each molecule and other molecules in the dataset were modified. The compounds in the dataset were minimized within the receptor site using Tripos force field, but the whole protein active site was fixed during minimization. All minimized structures inside the receptor were superimposed to get the molecular alignment for CoMFA and CoMSIA, and subsequently used for analysis. The alignment of the molecules inside the receptor is shown in Fig. 4.

Fig. 4
figure 4

Alignment of molecules inside the receptor using receptor-guided alignment scheme

Database search

The selected pharmacophore model was validated and converted into a UNITY query for pharmacophore-guided virtual screening studies. The query was screened against NCI2000 database. The “flexible database search” option was implemented to perform virtual screening. Primary filters such as Lipinski’s rule of five, Van der Waals bumps, restricting the number of rotatable bonds to ≤7, and QFIT (pharmacophoric match between query and the hit compound) were applied to reduce the dataset. Further screening of the hits was carried out using the docking algorithm, Surflex-Dock in SYBYL. The generated UNITY query with distance constraints is shown in Fig. 5.

Fig. 5
figure 5

The generated UNITY query for virtual screening against NCI database along with the distance constraints

CoMFA model

The steric and electrostatic field effects were calculated using the Tripos force field with a distance-dependent dielectric constant at all intersections in a regular space (2 Å) grid. A sp3 carbon atom was used as steric probe and a +1 charge as electrostatic probe. The generated steric and electrostatic fields were scaled by CoMFA-Standard scaling method in SYBYL with default energy cutoff value 30 kcal/mol. With standard options for scaling of variables, the regression analysis was carried out using the full cross-validated PLS method (Cramer et al., 1988a, b; Wold et al., 1984) of leave-one-out (LOO) (Wold, 1978). The minimum sigma (column filtering) was set to 2.0 kcal/mol to improve the signal-to-noise ratio by omitting those lattice points whose energy variation was below this threshold. The final model, a non-cross-validated conventional analysis, was developed with the optimum number of components to yield a non-cross-validated r 2 value.

CoMSIA model

The reported CoMSIA method is based on molecular similarity indices (Klebe et al. 1994) with the same lattice box as was used in CoMFA. Molecular similarity is expressed in terms of five different properties, namely steric, electrostatic, hydrophobic, and hydrogen bond donors and acceptors which were calculated using a C+ probe atom with a radius of 1 Å placed at a regular grid spacing of 2 Å. CoMSIA similarity indices (A F ) for molecule j with atoms i at a grid point q were calculated using Eq. 1.

$$ A_{F,K}^{q} \left( j \right) = - \sum {\omega_{{prob , {\text{k}}}} \omega_{ik} e^{{-\alpha r^{2} }} iq,} $$
(1)

where k represents the following physicochemical properties: steric, electrostatic, hydrophobic, and hydrogen bond donor and acceptor. A Gaussian-type distance dependence was used between grid point q and each atom i of the molecule. The default value (0.3) was used as the attenuation factor (α). The steric indices were related to the third power of the atomic radii, electrostatic descriptors were derived from atomic partial charges, hydrophobic fields were derived from atom-based parameters (Viswanadhan et al., 1989), and hydrogen bond donor and acceptor indices were obtained by a rule-based method based on experimental results (Klebe, 1994).

Partial least square (PLS) analysis and validation of QSAR models

To derive 3D-QSAR models, the CoMFA and CoMSIA descriptors were used as independent variables and the pIC50 values as the dependent variable. The PLS method (Wold and Ruhe, 1984; Geladi, 1988) was used to linearly correlate these CoMFA and CoMSIA descriptors to the activity. The CoMFA cutoff values were set to 30 kcal/mol for both steric and electrostatic fields, and all fields were scaled by the default options in SYBYL. The cross-validation analysis was performed using the LOO method in which one compound was removed from the dataset and its activity was predicted using the model derived from rest of the dataset. The cross-validated correlation coefficient (q 2) that resulted in the optimum number of components and the lowest standard error of prediction was considered for further analysis and calculated using Eq. 2.

$$ q^{2} = 1 - \frac{{\sum {(Y_{\text{predicted}} - Y_{\text{observed}} )^{2} } }}{{\sum {(Y_{\text{observed}} - Y_{\text{mean}} )^{2} } }} $$
(2)

where, Y predicted, Y observed, and Y mean are the predicted, observed, and mean values of the target property (pIC50), respectively. The non-cross-validated PLS analyses were performed with a column filter value of 2.0, to reduce analysis time with small effect on the q 2 values. The predictive power of the developed 3D-QSAR models has been assessed by test set (nine molecules) predictions. The predictive abilities of the models were expressed by the predictive r 2 value, which is analogous to the cross-validated r 2(q 2) and is calculated using Eq. 3.

$$ r_{\text{pred}}^{2} = \frac{{{\text{SD}} - {\text{PRESS}}}}{\text{SD}}, $$
(3)
$$ {\text{PRESS}} = \sum\limits_{y} {\left( {Y_{\text{predicted}} - Y_{\text{observed}}}\right)^{2} } $$
(4)

where SD is the sum of the squared deviations between the biological activities of the test set and mean activities of the training molecules, and PRESS is the sum of squared deviation between the predicted and observed activity of the test set molecules and is calculated using Eq. 4.

Results and discussion

Molecular docking

To predict the appropriate binding conformation for p38 kinase inhibitors and the reported hit compounds from virtual screening, Surflex-Dock was used to generate an ensemble of docking conformations. Before the docking was done, the reliability of Surflex-Dock was evaluated by re-docking the co-crystallized ligand [14e (DQO)] into the binding site. It was observed that Surflex-Dock has successfully reproduced the binding pose. Moreover, the resulted hit compounds from screening after QFIT filtering were further screened using molecular docking into the binding site. The docked compounds were filtered based on scoring function and interaction with crucial active site residues (Leu107, Met109, Gly110) in the binding site.

CoMFA and CoMSIA statistical results

The CoMFA and CoMSIA studies were carried out using three different schemes. Results of the PLS analyses of three different alignments are summarized in Table 4. The ligand-based model gave better result for CoMFA with cross-validated r 2(q 2) = 0.775 and non-cross-validated r 2 = 0.951, while for CoMSIA model, combination of steric, electrostatic, donor, acceptor and hydrophobic terms yielded a cross-validated r 2(q 2) = 0.806 and non-cross-validated r 2 = 0.927. These models were validated by a test set of nine molecules with predictive r 2 = 0.778 for CoMFA model and 0.927 for CoMSIA model. The pharmacophore-based model also gave good results for both CoMFA and CoMSIA. CoMFA yields a cross-validated r 2(q 2) = 0.734 and non-cross-validated r 2 = 0.894, while CoMSIA model yielded a cross-validated r 2(q 2) = 0.750 and non-cross-validated r 2 = 0.862. The predictive ability of these models was calculated by an external test set and it gave 0.631 for CoMFA and 0.688 for CoMSIA. In comparison, receptor-guided alignment gave better statistical results. CoMFA yields a cross-validated r 2(q 2) = 0.777 and non-cross-validated r 2 = 0.958, while CoMSIA model yielded a cross-validated r 2(q 2) = 0.782 and non-cross-validated r 2 = 0.927. These models were also validated on a test set of nine molecules with predictive r 2 = 0.818 for CoMFA and 0.710 for CoMSIA. It can be seen that all the models are better in statistics, so we decided to explain receptor-based model to better understand the relationship between the inhibitors and the macromolecule. The predicted pIC50 value for test and training set from CoMFA and CoMSIA models are given in Table 5. Graphical representations of actual and predicted inhibitory activities of CoMFA and CoMSIA are shown in Figs. 6 and 7.

Table 4 The regression summary of CoMFA and CoMSIA models by different alignment methods
Table 5 Observed, predicted, and residual values for the test set and training set by CoMFA and CoMSIA models
Fig. 6
figure 6

Trend of actual and predicted activities of training and test set by receptor-based model (CoMFA)

Fig. 7
figure 7

Trend of actual and predicted activities of training and test set by receptor-based model (CoMSIA)

CoMFA and CoMSIA contour plots

Since the model was built on the basis of receptor-guided scheme, we could overlay the 3D contour maps produced by CoMFA and CoMSIA onto the receptor binding pocket. The results obtained from CoMFA indicate that steric and electrostatic properties play a major role in inhibition activity (Table 4). However, electrostatic properties play a major role in CoMSIA, followed by donor and steric fields. This is because of the fact that the substituents are highly bulky, electropositive, and electronegative in nature. So we explain more on steric and electrostatic field contributions. The steric interactions are represented by green and yellow colored contours while electrostatic interactions are represented by red and blue colored contours. In steric field, green contour represents region where bulky substituent enhances activity, whereas yellow contour indicates region where bulky substituents decrease the activity. In case of electrostatic interactions, the blue contour represents region where electropositive groups enhance the activity, while red-colored region indicates that electronegative groups increases the activity. One of the most active compounds in the series (compound 14j) is shown with CoMFA contour maps of steric and electrostatic fields in Figs. 8 and 9. The green contour region near the piperidine or piperazine moiety of C7 substituent indicates that bulky substituent at this position increases activity. This is consistent with series 2 and 3 where the bulky substituent shows potent activity, whereas in series 1 the absence of bulky groups shows less activity. The presence of yellow contour near the second and third position of phenyl ring at C5 indicates that steric bulk disfavors activity. This is quite visible that compounds (9k, 9l, 9n, 9o) having bulky substitutions at those positions are lower in activity. A blue contour present around the C7 substituent indicates that electropositive and hydrogen donors are favorable at this position. This blue contour also reflects the fact that electronegative groups are unfavorable at this position. This is quite clear that series 1 having carbonyl substitutions lacks activity. Red contours around the phenyl group of C5 indicate that electronegative groups are favorable. Compounds having chlorine, fluorine at those regions are highly active and it is quite evident throughout the dataset.

Fig. 8
figure 8

CoMFA contour maps for steric field with highly active compound (14j), where green contour indicates regions where bulky groups increase activity and yellow contours indicate that bulky groups decrease activity (Color figure online)

Fig. 9
figure 9

CoMFA contour maps for electrostatic field with highly active compound (14j), where blue contour indicates regions where bulky electropositive groups increase activity and red contours indicate regions where electronegative groups increase activity (Color figure online)

The CoMSIA steric and electrostatic contour maps were more or less similar to that of CoMFA. Figures 10 and 11 show compound 14j superimposed on CoMSIA plots. The presence of green contour near the piperidine or piperazine moiety indicates that bulky substitution at this position enhances the activity. A small yellow contour present at the fourth position of phenyl ring at C5 indicates that bulky position around this region disfavors activity. Compounds such as 9l and 9o having bulkier substitutions at these positions lack in activity, whereas compounds having smaller substitutions at this position show better inhibitory activity. This is quite evident throughout the dataset. These results were supported by the presence of red contour around the same region which indicates that electronegative groups such as chlorine and fluorine are favorable.

Fig. 10
figure 10

CoMSIA contour maps for steric field with highly active compound (14j), where green contour indicates regions where bulky groups increase activity and yellow contours indicate that bulky groups decrease activity (Color figure online)

Fig. 11
figure 11

CoMSIA contour maps for electrostatic field with highly active compound (14j), where blue contour indicates regions where bulky electropositive groups increase activity and red contours indicate regions where electronegative groups increase activity (Color figure online)

Evaluation of pharmacophore with crystal complex

The crystal structure of p38α in complex with DQO (PDB code: 1M7Q) was further utilized to evaluate the selected pharmacophore model. It was reported that the complex forms three hydrogen bond interactions with hinge region. The complex structure forms hydrogen bonding with Leu107, Met109, and Gly110 (21). The selected pharmacophore model was mapped on to the bound conformation of DQO. The bound conformation fits quite well on to the pharmacophore model. The RMSD between the pharmacophore and DQO is 0.59 Å. From the mapping we can see that the pharmacophore features were mapped exactly on the complex, where the co-crystal ligand interacts with the key residues of active site. The mapping of pharmacophore with DQO is shown in Fig. 12. These results suggested that the generated pharmacophore was reliable enough to retrieve compounds from a chemical database that fits all the features of the query.

Fig. 12
figure 12

The generated pharmacophore (sticks) was mapped with the co-crystallized ligand DQO (dihydroquinazolinone in capped sticks) inside the active site

Common pharmacophore hypothesis and virtual screening

A set of highly active molecules were used to identify the common pharmacophore, GALAHAD produces five models with default parameters. The results are given in Table 6. The resulted models are more or less similar in statistics. All the models show reasonable specificity, which is a logarithmic indicator of the expected discrimination of each query. The number of hits column indicates that all models hit all ligands in the dataset and each of them has seven features in it. Pareto rank indicates that no model is more or less superior to each another. We have selected one model and the model has been validated for subsequent virtual screening analysis. The virtual screening workflow is shown in Fig. 13. The screening of the pharmacophore query yielded 1,182 hits that met the specific requirements. The compounds are further screened on the basis of QFIT, where QFIT is the pharmacophore match between query and the hit. 268 compounds which were further filtered by QFIT were then subjected to molecular docking to the binding site of p38 to select the compounds on the basis of their ability to form favorable interactions with the active site. Finally, seven compounds are selected on the basis of dock score and favorable interaction with key residues. The pIC50 values of final hits were predicted using CoMFA and CoMSIA models generated on the basis of pharmacophore alignment. The results of hit compounds with their dock score and predicted biological activity values are shown in Table 7. Thus, these potential hits are expected to induce improved binding affinity with p38α kinases. The hit compound NCI 211823 which has good docking score is shown in Fig. 14. The binding mode of this hit compound is similar to that of co-crystallized compound (14e). All the final hit compounds have hydrogen bonding interactions with the hinge residues which is crucial to target the ATP binding site.

Table 6 Number of models obtained along with the pharmacophoric features and their statistical values using GALAHAD
Fig. 13
figure 13

The results obtained from virtual screening. The numbers given in the figure represent the number of molecules after employing the filters

Table 7 Chemical structures of hit compounds, their dock scores, predicted CoMFA and CoMSIA
Fig. 14
figure 14

Binding mode and hydrogen bonding pattern of the lead molecule (NCI 211823) obtained from virtual screening along with co-crystallized compound 14e (capped sticks)

Conclusion

3D-QSAR analysis was performed for a series of potent p38 inhibitors. Ligand-based and receptor-guided alignment schemes were applied to develop models. Receptor-guided CoMFA and CoMSIA were explained, probably because the alignment using receptor information is more realistic. The contour maps revealed that bulky substitutions around piperidine ring of C7 carbon atom and electronegative groups around the phenyl ring of C5 carbon atom are desirable for potent activity. The information obtained can help to design new inhibitors of p38 kinases. We have also performed virtual screening analysis and we predicted the biological activities of obtained hits using the 3D-QSAR model generated by pharmacophore-based alignment. 1,182 hits were obtained using certain filters such as Lipinski’s rule of five and number of rotatable bonds to a maximum of seven. These hits were further reduced to 268 compounds using QFIT. Molecular docking was employed as final filter and seven compounds were selected on the basis of dock score, hydrogen bonding interaction, and the biological activities were predicted using CoMFA and CoMSIA. The identified hit compounds were structurally different from already available inhibitors and we suggest that these compounds could serve as potential leads.