1 Introduction

Glycogen synthase kinase-3 (GSK-3) was discovered over 30 years ago as one of kinase protein that phosphorylates and inactivates glycogen synthase [1]. The glycogen synthase kinase-3 has closely related isomers which share 98 % identity, a 51 kDa GSK-3α and 47 kDa GSK-3β, these isomers are not functionally identical and redundant [2, 3]. GSK-3 is one of the interesting target protein in many diseases [4], including cancer (hepatocellular carcinoma, prostate cancer, pancreatic cancer and colorectal cancer) [59], Alzheimer’s disease [10], and diabetes [11, 12] because it regulates cell division, apoptosis, and insulin action. Glycogen synthase kinase-3β (GSK-3β) is a serine/threonine kinase and is a multifunctional kinase protein; it regulates more than 40 proteins depending on cellular pathway [13]. The inhibitors of GSK-3β have been focused great interest in drug discovery; structurally significant diverse compounds have been reported to inhibit GSK-3β in recent years [1422], but unfortunately most of the inhibitors have cell toxicity, various side effects, and poor ADMET which affect their clinical potential [23], and this make us to design novel GSK-3β inhibitors using computer-aided drug design studies. Various computer-aided drug design studies are successfully employed in drug discovery research. Ligand-based pharmacophore is a major tool in drug discovery and is applied in virtual screening and lead identification [24]. The protein–ligand docking is applied to predict the position and orientation of a ligand when it is bound to a protein receptor or enzyme. In the present study, pharmacophore modeling, virtual screening, and molecular docking approaches were used to identify potential lead compounds for GSK-3β.

2 Materials and Methods

2.1 Data Set Preparation

A set of 23 structurally distinct compounds reported as glycogen synthase kinase-3β (GSK-3β) inhibitors with diverse experimental inhibitory activity (IC50) data was compiled from literature, and their structures are shown in Supplementary Figure 1 [2527]. All compounds were sketched in ISIS draw and converted into 3D form, and energy minimization was performed with CHARMM force field, and conformation models were generated by FAST method; maximum 255 diverse conformers were generated with energy threshold of 20 kcal mol−1. The developed conformers were used for hypotheses generation. To form a training set, 17 compounds were chosen with IC50 ranging from 0.28 to 300 µM. The data set of activity was classified into three categories, i.e., active (IC50 < 5 µM, +++), moderately active (IC50 > 5 < 50 µM, ++), and less active (IC50 > 50 µM, +).

2.2 Pharmacophore Model Generation

3D QSAR module in Discovery Studio (DS) was used to develop the pharmacophore model [28]. All 17 training set compounds associated with their conformations were submitted to the HypoGen module of DS with four pharmacophore features such as hydrogen bond donors (HBD), hydrogen bond acceptors (HBA), ring aromatic (RA), and hydrophobic (HY). HypoGen algorithm implemented for the pharmacophore hypotheses generation process has three phases, which are constructive, subtractive, and optimization phases. In the constructive phase, identification of the common features of active compound takes place, while in subtractive phase all pharmacophore features present in the least active compound are removed, and finally in optimization phase, the hypotheses score enhanced by regression parameters which are used for the estimation of activity. The uncertainty value 2 was set from default value 3, which means the biological activity was two times higher or lower than the true value. The quality of the HypoGen model describes in terms of fixed cost, null cost, and total cost [29]. The three cost components [error (E), weight (W), and configuration (C)] multiplied by a coefficient (default coefficient is 1.0 for each) contributed to hypotheses cost. The fixed cost is the simplest model and fits the data perfectly, while null cost is the cost of hypotheses with no features that estimates every activity to be the average activity. The developed pharmacophore model selected based on the highest correlation coefficient and high cost difference.

2.3 Pharmacophore Validation and Database Screening

The generated quantitative pharmacophore model was validated by cost analysis, test set prediction, Fischer’s randomization test, and enrichment factor calculation to find out whether it is capable of identifying the active structures and estimating their activity values accurately. The HypoGen ranks 10 generated pharmacophore models based on their cost values. The quality of the pharmacophore model is described in terms of total cost, fixed cost, and null cost. The model was selected based on the cost difference between the fixed cost and null cost. In simple terms, the large difference between these two costs gives the best model. If the cost difference is greater than 60, it would imply above 90 % probability for correlating the experimental and predicted activity. If the cost difference is in between 40 and 60, the probability is 75–90 %, and if the difference below 40, it is difficult to predict the model. Second approach to validate the model is a test set prediction. Six test set compounds were mapped on to the pharmacophore model using Ligand Pharmacophore Mapping module in Discovery Studio and were used to predict the activity of test set compounds. A third approach is Fischer’s randomization (Cat-Scramble) in which the 95 % confidence level was selected for the study and 19 spread sheets were constructed. The correlation between the chemical structure and biological activity was checked in this validation study. In this method, the same parameters were used in developing the pharmacophore hypotheses by randomizing the activity data of training set compounds. The last validation method is enrichment factor calculation (E value), which calculated using a database containing active and inactive compounds.

In order to identify new potential lead compounds, the selected pharmacophore model was used as a 3D structural query to screen the National Cancer Institute (NCI) and natural database containing 265242 and 53299 molecules, respectively. All queries performed using Ligand Pharmacophore Mapping protocol running with Best/Flexible search method in DS. To be retrieved as a hit, the molecule must fit all pharmacophore features of hypotheses. The hits obtained through database screening were further filtered by estimated activity less than 1 μM and those further screening by Lipinski rule of five—compound has (1) molecular weight less than 500, (2) hydrogen donors less than 5, (3) hydrogen acceptors less than 10, (4) an octanol/water partition coefficient (Log P) value less than 5, and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties were predicted using ADMET descriptors for the selection of the effective and bioavailable compounds. Further screening using similarity analysis with reference compound 679 [2-chloro-5-(4-(3-chlorophenyl)-2,5-dioxo-2,5-dihydro-1H-pyrrol-3-ylamino)benzoic acid] is based on Tanimoto coefficient which is used to quantify the similarity between molecules and is the ratio of the number of features common to both molecules to the total number of features. The coefficients varied between 0 and 1, where 0 meant maximally dissimilar and 1 coded for maximally similar. A total of 234 compounds hits from NCI and 362 compounds from natural database were retrieved.

2.4 Molecular Docking

Docking calculations were carried out using the LigandFit program in Discovery Studio [30]. The crystal structure of glycogen synthase kinase-3β (GSK-3β) protein (PDB Id: 1Q4L) was downloaded from the protein data bank (http://www.rcsb.org/pdb). The crystal structure of 1Q4L has two chains such as chain A and B. The chain A has the highest docking score with reference compound 679 [2-chloro-5-(4-(3-chlorophenyl)-2, 5-dioxo-2, 5-dihydro-1H-pyrrol-3-ylamino) benzoic acid] than chain B, so chain A was selected for docking studies. The protein preparation involves deletion of water molecules and Het atoms and addition of hydrogen atoms; CHARMM force field was applied on macromolecule. After protein preparation, the active sites were searched using flood-filling algorithm. In the active site of 1Q4L, 596 screened compounds (234 from NCI database and 362 from Natural database) were docked.

3 Results and Discussion

3.1 Pharmacophore Model

A training set of 17 compounds with diverse structural features was selected for generating pharmacophore model. This diverse training set includes well-known natural potent inhibitors of GSK-3β. 3D QSAR pharmacophore generation module was used to construct a pharmacophore model using hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), hydrophobic (HY), and ring aromatic (RA) chemical features. It produces ten hypotheses based on the activity of the training set compounds. The best hypotheses have three features: HY, RA, and RA. Hypo1 consists of one hydrophobic (HY) and two ring aromatic (RA) which has the highest cost difference of 82.3, best correlation coefficient of 0.78 and maximum fit value of 8.2. The fixed and null cost values are 70.29, 217.4, respectively.

In our results expect 3–5 hypotheses, all other hypotheses were having HY and RA group, which implies that HY and RA groups play an important role in GSK-3β inhibition. The cost difference between the null cost and fixed cost was found to be 82.3, and it is more than 60 bits. All hypotheses had correlation coefficient of higher than 0.7, but Hypo1 shows the highest correlation coefficient of 0.78, demonstrating good predicted ability of the Hypo1. Higher cost difference and high correlation value were observed for the Hypo1 compared with other hypotheses. Hence Hypo1 was selected for further analysis; the statistical parameters such as cost values, correlation, and RMS deviation are summarized in Table 1, and the correlation between the experimental and predicted activity of training set compounds is shown in Table 2. For most of the compounds, the model predicts the activity correctly. Figure 1 shows chemical features of the Hypo1 with its geometric parameters. The most active and inactive compounds in the training set are aligned in Hypo1 which is shown in Supplementary Figure 2.

Table 1 Results of pharmacophore hypotheses generated by HypoGen algorithm
Table 2 Experimental and estimated activities of training set molecules based on the pharmacophore model Hypo1
Fig. 1
figure 1

Generated pharmacophore model with distance constraints features are color coded with light blue: one hydrophobic, orange: two aromatic groups

3.2 Pharmacophore Validation

There are several methods to confirm the pharmacophore model, such as cost analysis, test set prediction, Fischer’s randomization, and enrichment factor (E value).

3.2.1 Cost Analysis

The HypoGen algorithm in DS generates the pharmacophore model with three cost values, which are fixed cost, total cost, and null cost. The pharmacophore model was validated by the cost difference between the null cost and total cost; if the cost difference has above 60 bit, it has predictability level of 90 %. The Hypo1 has the highest cost difference of 82.3 shows as significant model (shown in Table 1).

3.2.2 Test Set Prediction

Secondly in test set prediction, six compounds were chosen as test set. Ligand Pharmacophore Mapping protocol with flexible search option was used to map the test set compounds. In test set prediction, the model predicts activity to the tune of less than 10 %. Out of six compounds, four compounds predicted by the error factor less than 5 %. The experimental and predicted activities of test set compounds are given in Table 3.

Table 3 Experimental and estimated activities of test set molecules

3.2.3 Fischer’s Randomization Method

Fischer’s randomization was third approach to validate the Hypo1. In this method, 95 % confidence level and 19 random spreadsheets (random hypotheses) were generated. The significance of the model calculated by the formula[1 − (1 + X)/Y] × 100, where X is number of hypothesis having total cost lower than Hypo X and Y is total number of HypoGen runs (initial + random runs). Here X = 0 and Y = (19 + 1), S = [1 − ((1 + 0)/(19 + 1))] × 100 % = 95 %. Supplementary Figure 3 shows that Hypo1 has good statistical value than other models.

3.2.4 Decoy Set

Finally decoy set was generated to validate the Hypo1. The decoy set contains 1512 compounds (D) including active and inactive compounds of GSK-3β inhibitors. Ligand Pharmacophore Mapping module with flexible search option in DS was used for screening the data set. From the database screening, 333 compound (Ht) hits were retrieved, and among these, 229 (Ha) compounds are active. A number of parameters such as hit list (Ht), number of active percent of yields (%Y), percent ratio of active in the hit list (%A), enrichment factor of (EF), false negatives, false positives, and goodness-of-fit score of (GF) are calculated. The EF and GF are 1.82, 0.55, respectively, indicating the competence of screening (Supplementary Table 1).

3.3 Database Screening

Two databases namely NCI (265242) and Natural compounds (53299) were employed in database searching with Hypo1. In drug discovery process, virtual screening of the database is the alternative process of high-throughput screening (HTS). A total of 79887 compounds (49895 NCI and 29992 Natural databases) satisfied the all features of Hypo1. 26952 compounds (13709 NCI, 13243 natural) were considered for further analysis based on activity less than 1 µM. Drug likeness properties are important for in vitro analysis and so were further analyzed by Lipinski’s rule of five and ADMET properties gave total of 3510 (1187 NCI, 2323 natural). Molecular similarity was performed for 3510 molecules that were identified as hits from pharmacophore-based virtual screening of NCI and natural compound databases with reference compound 679 using Tanimoto similarity coefficient method. Finally, 596 compounds (234 NCI, 362 Natural) were considered for docking studies. The schematic representation of the virtual screening process is shown in Fig. 2.

Fig. 2
figure 2

Flow chart of virtual screening process

3.4 Molecular Docking

In order to understand the ligand orientation in GSK-3β (PDB Id: 1Q4L), initially reference compound 679 was docked into the active site of GSK-3β. GSK-3β protein (PDB Id: 1Q4L) has two chains A and B, and the reference compound 679 was docked into the active site of these chains using LigandFit protocol in DS. The docking score along with binding orientation and hydrogen bond network were considered to choose best pose of the docked compounds. The docking score and H-bond interactions of reference compounds with both chains are listed in Supplementary Table 2 and 3. It shows the chain A (81.67 kcal/mol) has the highest docking score than the chain B (80.16 kcal/mol) having four hydrogen bonding with VAL135, ARG141 (3), GLN185, and ASP133 residues shown in Fig. 3a. The chain A was selected, and final hit compounds (234 NCI, 362 Natural) from virtual screening were docked into the active site of 1Q4L-A, and the docking scores were compared with the reference compound 679. The hit compounds which showed the highest docking score than the reference compound (679) were considered as potential hits. About 75(37 compounds from NCI, 38 compounds from natural) compounds show docking score higher than the reference compound (679) among which top 20 compounds (10 NCI, 10 Natural) are listed with docking score and H-bond interactions in Supplementary Table 4–7. Finally, eight hit compounds (4 NCI, 4 Natural) were listed as potent inhibitors of GSK-3β. NSC230353 (4-chloro-N2,N7-diphenylnaphthalene-2,7-disulfonamide), NSC66454 [2-(4-(N-thiazol-2-ylsulfamoyl)phenylcarbamoyl)benzoic acid], NSC159593 [N,N′-(2-nitro-1,4-phenylene)dibenzenesulfonamide], and NSC156759 [N,N′-(2-(butylthio)-1,4-phenylene)dibenzenesulfonamide] from NCI database were identified with good docking score and estimated activity value of 0.417 µM, 0.216 µM, 0.25 µM and 0.19 µM, respectively, and STOCK1N-81808 (methyl 3-(3-chlorophenyl)-3-[2-(3,4-dihydroxyphenyl)-3,5,7-trihydroxy-4-oxo-chromen-8-yl]propanoate), ZINC02159818 [ethyl 2-((6,7-dihydroxy-2-oxo-2H-chromen-4-yl)methylamino)benzoate], ZINC04042470 [3-(2-(2-carboxybenzamido)ethyl)-5-methoxy-1H-indole-2-carboxylic acid] and ZINC72326235 [2-(2-(7-hydroxy-4-methyl-2-oxo-2H-chromen-8-ylamino)-2-oxoethyl)benzoic acid] from natural database were identified with good docking score and estimated activity value of 0.067 µM, 0.172 µM, 0.7 µM and 0.090 µM, respectively. All eight compounds showed favorable hydrogen bond interactions with ARG141, GLN185, VAL135 residues. Figure 3 shows that the binding orientations of hit compounds and how well these hit compounds fit to the Hypo1. The hit compound NSC230353 has the docking score of 105.8 kcal/mol and forms two hydrogen bonds with ARG141 and GLN185 shown in Fig. 3b, and the binding orientation shows the sulfonamide group of amine forms hydrogen bond with ARG141 and oxygen of sulfonamide group forms bond with GLN185. NSC66454 has the docking score of 101.67 kcal/mol showing H-bond interactions with two residues, ARG141 and GLN185, and the binding orientation shows amide group of nitrogen of sulfonamide group accepts the electron from ARG141 and sulfur of thiazole group accepts electrons from GLN185 shown in Fig. 3c. In Fig. 3d the binding mode of the NSC159593 is shown, and it has the docking score of 95.83 kcal/mol and forms two hydrogen bonds with VAL135, ARG141; the oxygen of nitro group forms bond with VAL135, and nitrogen of sulfonamide group forms bonds with ARG141. NSC156759 has the docking score of 94.85 kcal/mol forming hydrogen bonds with ARG141, GLN185; oxygen of sulfonamide forms bonds with these residues shown in Fig. 3e, and there is also pi–pi interaction between ARG141 and benzene ring. STOCK1N-81808, ZINC02159818, ZINC04042470, and ZINC72326235 identified as potent inhibitors of glycogen synthase kinase-3β from natural database. STOCK1N-81808 has docking score of 112.72 kcal/mol with three hydrogen bonds with LYS183 (2), GLN185, and ILE62, and the binding orientation of complex is given in Fig. 3f, and it shows that oxygen forms H-bond with three residues. Figure 3g shows the binding mode of ZINC02159818, and it has the docking score of 109.371 kcal/mol with two hydrogen bonds with ARG141 (2), GLN185 (2) amino acids. ZINC04042470 has the docking score of 100.17 kcal/mol having interaction with ARG141 (2), GLN185, and VAL135, and the binding mode of protein–ligand complex is shown in Fig. 3h; oxygen binds with ARG141 and GLN185, and nitrogen of indole group forms hydrogen bond with VAL135. ZINC72326235 has docking score of 97.71 kcal/mol and shows H-bond interactions with VAL135 (2), ARG141 (2), and GLN185 shown in Fig. 3i. The pharmacophore overlay of hit compounds shown in Supplementary Figure 4 and their 2D chemical structures of hit compounds are given in Fig. 4. Finally eight diverse structural compounds were sorted for in vitro studies.

Fig. 3
figure 3

Binding mode of reference compound and hit compounds a Reference compound 679, b NSC230353, c NSC66454, d NSC159593, e NSC156759, f STOCK1N-81808, g ZINC02159818, h ZINC04042470, i ZINC72326235

Fig. 4
figure 4

2D chemical structures of hit compounds a NSC230353, b NSC66454, c NSC159593, d NSC156759, e STOCK1N-81808, f ZINC02159818, g ZINC04042470, h ZINC72326235

4 Conclusion

In this study pharmacophore model successfully generated using HypoGen algorithm, the Hypo1 represents one hydrophobic (HY) and two ring aromatic (RA) pharmacophore features required for glycogen synthase kinase-3β (GSK-3β) activity. The model validated by four methods and provides additional confidence on proposed model. This model can be useful for future studies to design the structural novel GSK-3β inhibitors. To identify potent GSK-3β inhibitors, virtual screening studies were carried out with NCI and Natural databases. Five hundred and ninety-six compounds from the virtual screening were selected for molecular docking studies. The binding mode between the protein–ligand complexes was analyzed to select lead candidates. Finally four ligands from each database are selected as lead candidates, which are NSC230353, NSC66454, NSC159593, and NSC156759 from NCI database and STOCK1N-81808, ZINC02159818, ZINC04042470, and ZINC72326235 from natural database as potent novel GSK-3β inhibitors. These novel compounds can be used for experimental studies for inhibition of GSK-3β.