Introduction

The Takeda G-protein receptor 5 (TGR5) also known as G-protein bile acid receptor 1 (GPBAR1), belonging to the G-protein coupled receptor (GPCR) family, is a plasma membrane-bound bile acid receptor (Kawamata et al., 2003) found in many human and animal tissues, including liver, intestine, and brain (Kawamata et al., 2003; Vassileva et al., 2006). Some non-genomic actions of bile acid (BA) are mediated through the formation of cAMP as the activation of TGR5 (Maruyama et al., 2002). An accumulating body of evidence now demonstrates that TGR5 also acts in a number of processes important in inflammation (Wang et al., 2011). Moreover, the recent observations have revealed an unexpected role for GPBA in the nervous system, and it is a newly identified liver tumor suppressor in carcinogenesis (Duboc et al., 2014). In cell culture models, TGR5 has been linked to signaling pathways involved in metabolism, cell survival, proliferation, and apoptosis, suggesting a possible role of TGR5 in cancer development (Feng et al., 2007). Thus, its pharmacological modulation may furnish alternative therapeutic strategies to treat diabetes, obesity, other metabolic syndromes, inflammation, and so on (Gioiello et al., 2012). The recent studies also suggested that the activation of TGR5 in macrophages may be of utility in atherosclerosis (Pols, 2014).

Depending on the chemical characteristics, TGR5 ligands can be classified into two types: steroidal and non-steroidal agents. The Pellicciari group reported their research for potent and selective TGR5 agonists based on natural bile acid. Among them, INT-777, one of the semisynthetic BA derivatives, exhibited good properties in vitro and in vivo and was considered as a promising anti-diabetic drug candidate in the preclinical study (Pellicciari et al., 2009). In addition, a number of pharmaceutical and biotech companies including Takeda Pharmaceuticals (2004), Kalypsys (Herbert et al., 2010a, b), GSK (SmithKline Beecham Corp., 2007), Hoffmann-La (2010), SIMM (Duan et al., 2012), Pfizer (Futatsugi et al., 2013), and Novartis have focused their efforts toward finding structurally diverse non-steroidal TGR5 agonists. Some representative structures are shown in Fig. 1. These agonists offer the potential to enable delivery of a tool compound with higher selectivity against other bile acid-mediated pathways, such as FXR, and lead to a wide range of small molecules, after all BAs derivatives vary a little (Jansen, 2010; Lavoie et al., 2010).

Fig. 1
figure 1

Reported representative TGR5 agonists

As an important method for drug discovery, computer-aided techniques have been applied in the identification of TGR5 agonists. Macchiarulo et al. (2008) developed a molecular interaction field analysis (MFA) and a 3D-quantitative structure–activity relationship study (3D-QSAR) of TGR5 agonists using a training set of 43 bile acid derivatives. Martin et al. (2013) constructed their homology model using bovine rhodopsin as template and then carried out the docking procedure. As the pharmacophore model of non-steroidal ligands has not been published, herein we focus on the uncovered area.

At first, we planned to generate a common pharmacophore model on the basis of TGR5 agonists containing BAs, terpenes, and non-steroidal molecules. However, the obtained model turned out to be too simple to differentiate TGR5 agonists from negative ones, it only has four common features: two hydrophobic spheres, one aromatic ring features, and one hydrogen-bond acceptor (HBA) (Fig. 2). The decoy set validation came to a bad result with EF value less than 1. Although the cost and correlation value of this number of false pharmacophore model were reasonable enough, the model obviously lacked specific information and might lead to an increasing uncredible virtual screening results. Furthermore, SAR studies have shown that BAs, terpenes, and non-steroidal are characterized with different pharmacophoric elements for TGR5 activation. Steroidal and non-steroidal ligands may bind to a common orthosteric site involving different interacting residues, or they may interact with different regions of the TGR5 binding pockets. Therefore, it was not proper for the further study. Due to those deficiencies, we aimed at the non-steroidal ligands for constructing the pharmacophore, which has been attracting more medicinal chemists.

Fig. 2
figure 2

Hypo1 geometric constraint generated by steroidal and non-steroidal compounds consists of one aromatic rings (R), two hydrophobic (H) features, and one hydrogen-bond acceptor (HBA). Pharmacophore features are color represented as blue for hydrophobic, green for hydrogen-bond acceptor, and yellow for ring aromatic feature (Color figure online)

In this work, we combined two 3D-quantitative SAR (QSAR) modeling tools to investigate key pharmacophoric features. We carry out both HypoGen and Phase in the generation of pharmacophore models to obtain the most significant features by comparison of both models (Zhang et al., 2009). These two kinds of models were further validated by various approaches to justify its qualification. Furthermore, we employed the preferable Phase pharmacophore model as a novel searching tool for chemical databases to conduct virtual screening for new potential lead candidates, with due cognizance of the refined Lipinski ‘rule of five’ and absorption, distribution, metabolism, excretion (ADME) properties. This information is relevant to extend, on a quantitative basis, the current structure–activity relationships of non-steroidal compounds as TGR5 modulators and will be beneficial to design new potent and selective agonists of the receptor.

Results and discussion

HypoGen pharmacophore study

A data-set of 29 TGR5 agonists belonging to several structural classes (Fig. 3) was collected and randomly divided in 17 training set and 10 test set compounds (Gioiello et al., 2012; Herbert et al., 2010a, b; Duan et al., 2012; Zhu et al., 2013a, b). Top 10 pharmacophore hypotheses were exported based on the activity values of the training set molecules. Hypogen produces three cost values: fixed cost, total cost, and null cost. The cost difference between null cost and fixed cost was found to be 106.81, and it was more than 70 bits. Configuration cost should be smaller than 17 for a good pharmacophore hypotheses since it represents the complexity of the hypotheses (Bharatham et al., 2006). Various cost values, correlation coefficient, RMS deviation, and pharmacophore features of 10 hypotheses are summarized in Table 1. Hypo1 consists of one HBA, three hydrophobic features (H), and one aromatic ring (R), which establishes the highest cost difference (88.05), best correlation coefficient (0.93), maximum fit value (13.54), and lowest root mean square (RMS) of 1.37. The fixed and the null cost values for the 10 hypotheses were 67.1629 and 173.97, respectively. Higher cost difference and correlation value with low RMS and error values have been observed for Hypo1 when compared with other hypotheses. Hence, Hypo1 was selected as a best hypothesis and employed for further analyses. Figure 4 shows the Hypo1 chemical features. Figure 5a and b represents the best pharmacophore model aligned with the most active and inactive molecules 6 and 23 with EC50 of 0.3 and 5,100 nm, respectively. The pharmacophore features are mapped well to the active molecule in the Fig. 5a. On the other hand, the feature of HBA in Fig. 5b could not fit well since it is a low active molecule. Our results indicated that the HBA moiety seemed to be essential for TGR5 agonists.

Fig. 3
figure 3

2D chemical structure of reported TGR5 agonists in the training set and test set together with their biological activity values (EC50)

Table 1 Characteristics of ten hypotheses for training set inhibitors generated by the HypoGen algorithm
Fig. 4
figure 4

Hypo1 geometric constraint generated by non-steroidal compounds consists of one hydrogen-bond acceptor (HBA), one ring aromatic (R), and three hydrophobic (H) features. Pharmacophore features are color represented as blue for hydrophobic, green for hydrogen-bond acceptor, and yellow for ring aromatic feature (Color figure online)

Fig. 5
figure 5

Best pharmacophore model Hypo1 aligned with training set compounds. a With most active compound 6 (EC50 = 0.3 nM). b With least active compound 23 (EC50 = 5 100 nM). Pharmacophore features are color represented as blue for hydrophobic, green for hydrogen-bond acceptor, and yellow for ring aromatic feature (Color figure online)

Random hypotheses built by Fischer validation (confidence of level of 95 %) are illustrated in Fig. 6. None of the exported pharmacophores had lower cost than the original hypotheses. It clearly showed that the Hypo1 hypothesis was not generated by chance, because its statistics were superior to all random hypotheses. Moreover, decoy set was generated to ensure whether Hypo1 could pick out active molecules from inactive compounds. EF and GH were calculated to evaluate the hypotheses. Decoy set contains active and inactive compounds of TGR5 agonists. Parameters such as total number of compounds in the hit list (H t), number of active percent of yields (%Y), percent ratio of actives in the hit list (%A), EF, false negatives, false positives, and GF were calculated (Nagarajan et al., 2011). The false positives and false negatives are 357 and 2, respectively. The EF and GF are calculated to be 3.784 and 0.25, respectively, which are very good indications of the high efficiency of the screening (Table 2). According to all the validations, we drew the conclusion that Hypo1 can be taken as further analyses such as virtual screening.

Fig. 6
figure 6

Fisher’s randomization test results

Table 2 Statistical parameters of Hypo1 from screening the Decoy set

Phase pharmacophore study

As a first step, common features of pharmacophore hypotheses were generated, scored, and ranked by Phase. Four highly active compounds of the set (6, 9, 3, and 10 in Fig. 3) were selected, aiming at the definition of a reliable and not subjective alignment rule for the subsequent 3D-QSAR development. The top-ranked hypothesis (AHHRR. 1321) was formed by five features: one hydrogen-bond acceptors (A), two aromatic rings (R), and two hydrophobic features (H) (Fig. 7). Pharmacophore AHHRR.1321 (shown in Fig. 8 superimposed to the compound 6 and the compound 23) was allowed to confirm the perfect and worse match, related to the corresponding TGR5 activity. Taking into account this and comparation of the above SAR studies, it could be assumed that pharmacophore AHHRR.1321 actually accounts for relevant interactions between agonists and TGR5. Consequently, it is not arbitrary to state that matching the pharmacophore may indicate binding to this receptor.

Fig. 7
figure 7

AHHRR. 1321 consists of one hydrogen-bond acceptor (HBA), two ring aromatic (R), and two hydrophobic (H) features. Pharmacophore features are color represented as blue for hydrophobic, red for hydrogen-bond acceptor, and yellow for ring aromatic feature (Color figure online)

Fig. 8
figure 8

Pharmacophoric features aligned to the (a) highest active ligand (b) least active. Pharmacophore features are color represented as blue for hydrophobic, red for hydrogen acceptor donor, and yellow for ring aromatic feature (Color figure online)

Pharmacophore AHHRR.1321 was then used to align the molecules for the development of an atom-based 3D-QSAR analysis. Models containing one to three PLS factors were generated, whose statistical parameters are reported in Table 3. The model with three PLS factors was preferred and selected, since it performed better on the whole than those with fewer factors. The high correlation and cross-validated correlation coefficients (R 2 = 0.927 and Q 2 = 0.7613, respectively) together with the high Pearson R value (R-Pearson = 0.8704) suggested a close correspondence between predicted and actual EC50 activity values, indicative of a model with strong predictive power and significance. A scatter plot of experimental against predicted activities was created to assess the results (see Fig. 9), which showed that EC50 values were effectively predicted for both training and test set molecules. These features, along with the small number of PLS factors, the large F value supported the reliability of the approach. It is significant to mention that, all the test set compounds, the differences between experimental and calculated EC50 values were within one order of magnitude for all the compounds, demonstrating that the 3D-QSAR model was reasonably efficient in the estimation of TGR5 activity.

Table 3 Results of selected pharmacophore hypothesis generated by Phase
Fig. 9
figure 9

Correlation graph between experimental and predicted TGR5 activity using pharmacophore-based QSAR model. a Training set. b Test set

The 3D-QSAR model represents 3D characteristics as cubes that signify the model and color according to the positive or negative coefficients. The 3D-QSAR results were visualized using 3D plots of the crucial volume elements occupied by ligands. The 3D plot representation of the model as a whole, superimposed to derivatives 6 and 23, is shown in Fig. 8. In this representation, blue and red cubes indicate positive and negative coefficients, respectively, that is volumes in which the occupying atoms of the ligands cause an increase or a decrease of activity. Cubes having small positive and negative coefficients, which therefore did not greatly affect activity, were filtered out by setting a 1.7e−02 coefficient threshold. Notably, compound 6, showing the most potent inhibition against TGR5, mainly occupies blue regions (Fig. 10a), while the less active compound 23 occupies mainly the red regions (Fig. 10b). Figure 11a shows that electron-withdrawing favorable effects (blue cubes) are present close to carbonyl. A few electron-withdrawing unfavorable effects (pink cubes) were found to be distributed in a discrete fashion, and these are not quite reliable for any prediction. Figure 11b depicts that hydrophobic favorable effects (blue cubes) are located adjacent to three aromatic rings.

Fig. 10
figure 10

Atom-based 3D-QSAR model visualized in the context of the most active ligand (a) and least active ligand (b) in the training set

Fig. 11
figure 11

3D pictorial representation of the cubes generated using the QSAR model. Blue cubes indicate favorable regions and red cubes indicate unfavorable region for the activity. Atom-based 3D-QSAR model visualized in context of the ligand 6 (a) cubes for electron-withdrawing groups (b) cubes for hydrophobic regions (Color figure online)

After the generation of the 3D-QSAR model and in order to perform its validation, a decoy set which was the same as the Hypo1 validation was selected from the literature. Forty-five active TGR5 agonists were also included in the decoy set to calculate the statistical parameters such as goodness of hit score (GH) and enrichment factor (EF). GH and EF are the two main parameters which play an important role in predicting the capability of the pharmacophore hypothesis. The EF and GF are calculated to be 33.48 and 0.95, respectively, which are very good indications of the high efficiency of the screening (Table 4).This result provided further evidence that the correlation shown by the model was not accidental.

Table 4 Statistical parameters of AHHRR.1321 from screening the Decoy set

Comparison of the pharmacophore models

Both HypoGen and Phase would contain a workflow of selecting a training set, generating conformers, finding hypotheses from actives, and scoring hypotheses. Nevertheless, there are also some differences between these two methodologies, for instance rejecting hypotheses using inactives and building QSAR models would be the specific steps in HypoGen and Phase, respectively (Zask et al., 2009). Fewer features would be generated by HypoGen, attributing to the rejection of inactives. Meanwhile, maximum features and minimum interfeature distance would make the same point. Contrarily, less essential character may be summarized by Phase, so that the selection of candidates and validation of the chosen hypotheses are extremely significant. However, different generations of conformers and scoring algorithm would lead to different hypotheses. Therefore, it seems more accurate by utilizing models and screening queries when analogical conclusion restricted by various approaches (Evans et al., 2008). The difference between those two models is obvious. Comparing the superposition of Hypo1, AHHRR.1321, and most active compound 1, only H5 and R11 of AHHRR.1321 are located in the same site as the features in Hypo1. One hydrophobic feature is substituted by R12. As we all know, the ring aromatic feature is a particular case of hydrophobic feature. In short, the R12 in AHHRR.1321 was more specific than Hypo1 with regard to TGR5. With regard to the decoy set validation, the EF value of AHHRR.1321 is 33.48, which is almost tenfold than that of Hypo1. We can conclude that AHHRR.1321 is more effective to identify active compounds from inactive ones. Thus, AHHRR1321 was used as the protocol for virtual screening.

Database screening

Pharmacophore screening

Virtual screening is valuable for discovering lead compounds in a more cost-efficient, less resource-intensive manner compared with experimental methods (Marcu et al., 2000). By employing this pharmacophore model as a search template, we have performed a database search for potential TGR5 agonists from Specs database of over 200,000 compounds. Totally, compounds satisfied all the critical features in AHHRR.1321 and 932 compounds were considered for further analyses based on the cut-off fitness value of 1.00.

Drug-like filter

Drug-likeness properties are an important indicator for selecting the compounds for in vitro studies, which includes molecular or physicochemical properties that contribute to favorable Lipinski’s rule of five. Hence, we further sorted these 932 compounds using the refined Lipinski’s rule of five and finally 301 compounds were further considered for ADME studies. The percentage of the human oral absorption of published compounds was found to be 68–100 %. Through calculating the QPlog Po/w, QPlog S, and human oral absorption, 10 hits were retrieved in the end (Fig. 12). For selected lead compounds, the partition coefficient (QPlog Po/w) and water solubility (QPlog S) was within the permissible range of 4.47–5.95 and −5.06 to −8.04, respectively, and the human oral absorption was 100 %.

Fig. 12
figure 12

2D chemical structure of 10 retrieved compounds

Conclusion

In summary, we built two different pharmacophore models by Phase and HypoGen separately using the same training set and test set of TGR5 non-steroidal agonists. It was the first time that a typical pharmacophore hypothesis of TGR5 agonists belonging to the class of bile acid derivatives was reported. Hypo1 indicated that one HBA, three HY, and one R feature would be the common features of potential TGR5 agonists. Meanwhile, features required by Phase hypothesis AHHRR.1321 is different, one HBA, two HY, and two ring aromatic features. Moreover, a number of approaches could be applied for validating each hypothesis. Fischer validation and decoy set validation suggest that both pharmacophore hypotheses were reliable for the discovery of novel TGR5 agonists. Comparing the two results, the AHHRR.1321 was applied for further screening and a different filter strategy was endeavored. Finally, 10 non-steroidal compounds were identified which deserved further study. It is our hope that the pharmacophore generated will be valuable for researchers seeking to develop novel TGR5 agonists. Interestingly, three compounds 34, 35, and 39 bear the same skeleton benzopyrimidine. Given that the reported TGR5 agonists do not hold this moiety, it may be promising to lead to a new series of TGR5 modulators.

Materials and methods

Collection of data-set

TGR5 agonists were gained from reported paper with EC50 ranging from 0.3 nM to 5.1 μM. Prior to the establishment of the models, all molecules were minimized and modified by ChemBio 3D elementarily. The data-set of 29 compounds was then divided randomly into training and test set, respectively in such a way that both sets consisted of highly active, medium active, and least active compounds. Training and test sets consisted of 17 and 10 compounds, respectively. The in vitro inhibitory activity data are reported as EC50.

HypoGen pharmacophore model

Pharmacophore modeling is one of the most potent and rapid method to discover a novel scaffold. The automatic generation procedure using the HypoGen module was adopted for generation of the hypotheses. HypoGen uses the activity values of the small compounds in the training set to generate the hypothesis. The hypothesis may reveal the critical features for binding. Considering the chemical features of the compounds included in the training set, four features were selected: HBA, hydrogen-bond donor (HBD), hydrophobic (HY), and ring aromatic (RA).

The training set of 17 compounds was used to construct HypoGen pharmacophores. The best mode of conformation generation algorithm was used for generating conformations. The related parameters which were chosen for generating conformations were as follows: energy threshold: 20 kcal/mol and maximum conformations: 255. The minimum and maximum features in the hypothesis run were sets 1 and 5, respectively. The default Uncertainty value 3 had been changed to 2 for effectively correlating the training set with their activity.

In terms of validation, firstly Fischer’s randomization method was used to measure the statistical significance of our model. In this model, 19 random spreadsheets were generated to obtain the 95 % of confidence level. Secondly, decoy set was used. In the decoy set method, a database of 1,539 decoys was obtained from a collection offered by DrugBank (subset of random FDA-approved small molecule drug structures without TGR5 reported). Forty-five active TGR5 agonists were also included in the decoy set to calculate the statistical parameters such as goodness of hit score (GH) and enrichment factor (EF). GH and EF are the two main parameters which play an important role in predicting the capability of the pharmacophore hypothesis. All queries were performed using the Ligand Pharmacophore Mapping protocol. Fast and flexible approach was used.

Phase pharmacophore model

A pharmacophore-based 3D-QSAR study was carried out using PHASE implemented in the Maestro 9.4 modeling package (Schrodinger, Inc., LLC, New York, USA). Like HypoGen, Phase could be utilized for pharmacophore hypothesis generation, activity estimate, and virtual screening. As for the development of Phase pharmacophore and 3D-QSAR models, a total of 29 ligands were adopted. Several sets of pharmacophore sites for all ligands were then created using a set of available pharmacophore features (hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobe (H), negative ionizable (N), positive ionizable (P), and aromatic ring (R)). Retain specified chiralities of stereoisomers and neutralize of ionization were set in beginning preparation. Conformations search was carried out by ConfGen and a thorough sampling method applied MMFFs force field. The number of 100 conformers per rotatable bond and maximum number of 1,000 conformers per structure were set. Preprocess and postprocess minimization steps were 1,000 and 500, respectively. For each molecule, a maximum of 1,000 conformers was generated with a relative energy difference of 20 kcal/mol, the redundant conformers being removed after setting the root-mean-square deviation (RMSD) value at 1 Å. Four highly active compounds were used to build the pharmacophores. In this study, compounds with pEC50 above 8.50 were defined as ‘active’, while below 6.5 as ‘inactive’, which sorted 4 actives and 5 inactives. While finding the common pharmacophore, Nsites was set as the available maximum to be five, and all the four active compounds were required to find and score hypotheses (Zhang et al., 2009). Scoring of pharmacophore with respect to activity of ligand was conducted using default parameters for site, vector, and volume terms. The selected of hypotheses focused on both survival score and the alignment of indispensable sites (Almerico et al., 2010). Common pharmacophore hypotheses were identified, scored, and ranked using conformational analysis and a tree-based partitioning technique.

Atom-based QSAR models were generated for TGR5 hypothesis using the 17-member training set and a grid spacing of 1.0 Å. QSAR models containing one to three PLS factors were generated. Best 3D-QSAR model was selected based on the correlation coefficient values in training set molecules, which was further validated by predicting activities of 10 test set molecules. Three external test set predictors, namely Q 2, Pearson R, and RMSE, were used to validate the developed model. The same training set and test set were used for the construction and validation of the Phase pharmacophore study. The generated hypotheses were assessed by statistical parameters and correlated with the observed and estimated activity for the training set of 17 compounds and test set of 10 compounds. The best hypothesis was chosen for the alignment of compounds for further 3D-QSAR study. To validate the hypothesis, decoy set method was used as the same as Hypogen, a database of 1,539 decoys was obtained from a collection offered by DrugBank (subset of random FDA-approved small molecule drug structures without TGR5 agonists reported). Forty-five active TGR5 agonists were also included in the decoy set to calculate the statistical parameters such as goodness of hit score (GH) and enrichment factor (EF). All queries were performed using the Advanced Pharmacophore Screening protocol in Maestro 9.4.

Virtual screening protocol

The validated QSAR model was used as a 3D structural query for retrieving potential inhibitors from Specs database of 207,018 molecules. The Specs databases were downloaded from their official website. Multiple conformations of the databases were generated using Schrödinger. The Specs database was then screened with the pharmacophore models in Phase module using the following running conditions: (a) generated conformers during search by a rapid sampling method. (b) The maximum number of conformers was set as 50 and retained up to 5 conformers per rotatable bond. (c) returned at most 1 hit per molecule, 10,000 hits total. (d) must match on all five site points. The rest of options and parameter were set as default.

Drug-like analysis

At a glance, the inspection of some physicochemical properties (lipophilicity (clog P), molecular weight (MW), HBAs, and hydrogen-bond donors (HBD)) of these three structural classes of TGR5 ligands reveals that all of them are characterized by relatively high MW and lipophilicity. Indeed, 90 % of non-steroidal ligands fall in a shorter HBA between 1 and 12, with an average value of 6.02 ± 2.80. The combination of the above properties results in different compliance of each structural class of TGR5 ligands to Lipinski’s ‘rule of five’. Therefore, refined Lipinski’s ‘rule of five’ filter [(1) hydrogen-bond donors should be less than 2, (2) HBAs should be more than 1 and less than 6, (3) molecular weight should be less than 550 Da, and (4) log P should be more than 3 and less than 6] was utilized to exclude false-positive drug-like compounds. As oral absorbability is a significant factor, the ADME properties were calculated by QikProp which predicts required principle and physiochemical descriptors of possible drug compounds. The program was processed in normal mode-predicted principle descriptors and physiochemical properties for all known and screened compounds with detailed analysis of the log P (octanol/water), QPlog S (predicted aqueous solubility), QPlog BB (predicted brain/blood barrier partition co efficient), and percentage human oral absorption.