Introduction

The G protein-coupled receptors (GPCRs) constitute a large family whose members are involved in numerous physiological functions and represent more than 30 % of all pharmaceutical drug targets (Pal et al., 2012). While the sequencing of human genome has led to the discovery of novel GPCRs, many of them are orphan receptors for which the natural ligands remain undefined (Milligan, 2002). As one of the orphan GPCRs, GPR35 was initially described in 1998 sharing homology with GPR55, the lysophosphatidic acid receptors LPAR4, LPAR5, and LPAR6, and the nicotinic acid receptor HM74 (Mackenzie et al., 2011). GPR35 has been found to be expressed in various tissues, especially in gastrointestinal, immune, and nervous systems (Cosi et al., 2011; Ohshiro et al., 2008; Taniguchi et al., 2006). Meanwhile, accumulated evidence has confirmed that the activation of GPR35 plays a key role in some physiological and pathological progresses (Mackenzie et al., 2011). For example, GPR35 agonists can regulate insulin release (Jenkins et al., 2011), lower blood pressure and modulate immune responses (Min et al., 2010; Wang et al., 2006). Thus, GPR35 has been recognized as a novel therapeutic target in conditions that range from diabetes and hypertension to asthma. In the process of searching for GPR35 agonists, kynurenic acid, a tryptophan metabolite, was firstly described to be a possible endogenous ligand (Wang et al., 2006). However, because of its low potency, more and more attentions have been focused on synthetic agonists. In the past decade, several synthetic GPR35 agonists have been reported in the literature such as the antiallergic drug cromolyn (Taniguchi et al., 2006), zaprinast (Yang et al., 2010), thiazolidinylidenemethylbenzoic acid derivatives (Neetoo-Isseljee et al., 2013), and so on. Unfortunately, most of these existing GPR35 agonists still have some major drawbacks including low potency and limited selectivity. It is noteworthy that most recently a series of 8-substituted chromen-4-one-2-carboxylic acid derivatives, which shared a substructure with cromolyn, have been identified as potent and selective GPR35 agonists (Funke et al., 2013; Thimm et al., 2013).

Nowadays, various computational tools have been increasingly employed in rational drug discovery process to design new molecules and to optimize structure (Heikamp and Bajorath, 2013). Commonly used computer modeling techniques mainly include structure-based drug design and ligand-based drug design. Structure-based drug design relies on knowledge of the three-dimensional structure of the biological target obtained through methods such as x-ray crystallography or NMR spectroscopy. In the absence of detailed structural information about the receptor, ligand-based approaches such as three-dimensional quantitative structure–activity relationship (3D-QSAR) methods could be more effective (Wilson and Lill, 2011). 3D-QSAR methods can facilitate the correlation of the three-dimensional structures of potential drug molecules with their biological activity and in turn help to predict the activity of new molecules prior to synthesis (Chandrasekaran et al., 2004). Until now, the three-dimensional structure of GPR35 is not available. In the present paper, a series of 8-substituted chromen-4-one-2-carboxylic acid derivatives were used to develop 3D-QSAR models using comparative molecular field analysis (CoMFA), comparative molecular similarity indices analysis (CoMSIA), and Topomer CoMFA methods. The obtained models can be used to identify the structural features essential for enhancing their activities and subsequently can enable the design of the new more potent GPR35 agonists.

Materials and methods

Dataset

In the present work, thirty-eight 8-substituted chromen-4-one-2-carboxylic acid derivatives were taken from the published work of Funke et al. (2013). The human GPR35 agonist activities were tested by a β-arrestin recruitment assay, the most commonly used assay system to pharmacologically characterize GPR35 ligands. In this assay, stock solutions of the compounds were prepared in DMSO and further diluted in cell plating medium or phosphate buffered saline. So compounds could be regarded to exert their activities in neutral forms. Subsequently, the EC50 values were converted into the corresponding pEC50 values by the formula pEC50 = −logEC50. The pEC50 values have a span of three log units providing a broad and homogenous dataset for 3D-QSAR study. The structures of the compounds and their biological data are given in Table 1. The test set of eight compounds is chosen at random from the original dataset, which is used to determine the external predictivity of the resulting 3D-QSAR models. The remaining 30 compounds are treated as a training set and used to derive the 3D-QSAR models (Table 1).

Table 1 Structures and biological activities of the training and test sets of compounds

Molecular modeling and alignment

The molecular modeling and calculations were performed using SYBYL program (SYBYL- X2.0, Tripos Inc., St. Louis, MO, USA) on windows operating system. All the molecular structures were built in SYBYL and then the energy minimization was performed using Tripos force field with a distance-dependent dielectric function and Powell conjugate gradient algorithm with a convergence criterion of 0.01 kcal/mol Å using 1,000 iterations. Partial atomic charges were calculated using the Gasteiger-Huckel method (Clark et al. 1989).

The alignment of the compound structures is a critical step in 3D-QSAR methodologies, such as CoMFA/CoMSIA. The accuracy of the prediction of CoMFA and CoMSIA models and the reliability of the contour models depend strongly on the structural alignment of the compounds (Cho and Tropsha, 1995). In the present study, the most potent compound 34 was employed as a template, and the rest of the compounds in the training set were aligned to it using the common substructure. Figure 1 describes the common substructure for the alignment which is marked in bold red and the aligned compounds are displayed in Fig. 2.

Fig. 1
figure 1

Chemical structure of compound 34 used as template compound in 3D-QSAR modeling. The common substructure used for compound alignments is represented in bold red (Color figure online)

Fig. 2
figure 2

Alignment of 30 compounds of training set for 3D-QSAR studies

CoMFA studies

All the aligned compounds were placed in a 3D cubic lattice with a grid spacing of 2 Å which was generated automatically by the sybyl program. CoMFA descriptors were calculated using Tripos force field taking a sp3 carbon probe atom with a Van der Waals radius of 1.52 Å and a positive charge +1 to generate steric (Lennarde-Jones 6-12 potential) field energies and electrostatic (Coulombic potential) fields with a distance-dependent dielectric at each lattice point. Values of the steric and electrostatic fields were truncated at 30.0 kcal/mol.

CoMSIA studies

CoMSIA similarity index descriptors were derived according to Klebe et al. (1994) using the same lattice box as that used in CoMFA calculations. Five different similarity fields including steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor interactions were calculated using the sp3 atom with charge +1 and radius of 1 Å. CoMSIA similarity indices (AF.k) for the molecule j and the atom i at grid point q were calculated as follows:

$$A^{q}_{\text{F.K}} (j) = - \sum\limits_{i = 1}^{n} {\omega_{{{\text{probe}},k}} \omega_{i,k} {\text{e}}^{{ - ar^{2}_{iq} }} }$$

where ω i,k is the actual value of the physicochemical propriety k of atom i; \(\omega_{{{\text{probe}},k}}\) indicates the probe atom with charge +1, radius 1 Å, hydrophobicity +1, hydrogen bond donor and acceptor property +1; a is the attenuation factor; r iq is the mutual distance between the probe atom at grid point q and atom i of the test compound. In this paper, steric indices are related to the third power of the atomic radii, electrostatic descriptors are derived from partial atomic charges, hydrophobic descriptors are derived from atom-based parameters, and H-bond donor and acceptor indices are obtained by a rule-based method based on experimental results. The default value of 0.3 was used as the attenuation factor (a) for the Gaussian-type distance r iq .

Topomer CoMFA studies

The Topomer CoMFA descriptors were calculated using sybyl-X2.0 software package. Topomer CoMFA is an alignment-independent 3D-QSAR that combines the topomer search technology with the conventional CoMFA method (Cramer, 2012). The Topomer CoMFA model can be generated by splitting the compounds into fragments, topomerically aligning each fragment, and calculating steric and electrostatic field descriptor values for the topomerically aligned fragments to create a CoMFA table with the field descriptor values. Identifying the R-groups for the training set compounds was the important step for Topomer CoMFA. In this study, all compounds of dataset were splitted into two fragments by cutting a single bond, shown as R1 (red) and R2 (blue) groups in Fig. 3.

Fig. 3
figure 3

Fragmentation pattern (R1 and R2) for all compounds of dataset in Topomer CoMFA analysis. R1 fragment is represented by the red color and R2 fragment is denoted by the blue color (Color figure online)

Partial least square (PLS) analysis

Partial least squares (PLS) approach, an extension of the multiple regression analysis, was used to derive the 3D-QSAR models. Column filtering was set to 2.0 kcal/mol to improve the analysis and reduce the noise. In PLS regression analysis, the CoMFA, CoMSIA, and Topomer CoMFA descriptors were used as independent variables and biological activity (pEC50) as dependent variables.

To check statistical significance of the models, cross-validation analysis performed by the leave-one-out (LOO) procedure was carried out to identify optimum number of components (N), subsequently used to generate the final QSAR models. The optimal numbers of components were selected on the basis of the highest cross-validated correlation coefficient (q 2), which is defined as follows:

$$q^{2} = 1 - \frac{{\mathop \sum \nolimits ({\text{Y}}_{\text{prep}} - {\text{Y}}_{ \exp } )^{2} }}{{\mathop \sum \nolimits ({\text{Y}}_{ \exp } - {\text{Y}}_{\text{mean}} )^{2} }},$$

where Ypred, Yexp, and Ymean are the values (pEC50) for the predicted activity, experimental activity, and mean activity, respectively. \(\mathop \sum \nolimits ({\text{Y}}_{\text{prep}} - {\text{Y}}_{ \exp } )^{2}\) is the predictive sum of squares (PRESS).All the results were graphically represented by field contour maps, where the coefficients were generated using the StDev*Coeff field type.

To further assess the robustness and statistical confidence of the derived 3D-models, the leave-five-out (L5O) procedure was also utilized by deleting five compounds from the training set and the corresponding \(q^{2} (q_{\text{L5O}}^{2} )\) was reported.

Predictive ability of the 3D-QSAR models

The predictive ability of 3D-QSAR model was determined from test set of eight compounds not included in the model generation. The predictive correlation coefficient (\(r_{\text{pred}}^{2}\)) based on the test set compounds is defined as

$$r_{\text{pred}}^{2} = \frac{{\left( {{\text{SD}} - {\text{PRESS}}} \right)}}{\text{SD}},$$

where SD is the sum of the squared deviations between the biological activity of compounds in the test set and the mean biological activity of the training set compounds, and PRESS is the sum of the squared deviations between predicted and actual activity values for every compound in the test set.

Results and discussion

CoMFA statistical results

The statistical results obtained from standard CoMFA models constructed with steric and electrostatic fields are summarized in Table 2. The LOO cross-validated q 2 is 0.610 with an optimum number of components is 4. The non-cross-validated PLS analysis produced a conventional r 2 of 0.918 with estimated F value of 69.917 and standard error (SEE) of 0.325. These statistical indexes indicated that the CoMFA model has a good internal predictivity.

Table 2 PLS statistical results of CoMFA, CoMSIA, and Topomer CoMFA models

CoMSIA statistical results

CoMSIA models can be constructed by five different fields including steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor. After testing the every single field, CoMSIA models with hydrogen bond donor, hydrogen bond acceptor, and electrostatic fields were statistically poor (q 2 < 0.1, data not shown), whereas both electrostatic and hydrophobic fields play an important role. Furthermore, to generate a reliable model, all possible combinations of fields were examined (Table 3). Among these models, the combination of electrostatic and hydrophobic fields derived the highest q 2 (0.649) using 2 components, with an r 2 value of 0.788, a SEE value of 0.502, and an F value of 50.307. However, considering steric field as a key factor, model including electrostatic, hydrophobic, and steric fields was finally selected, which gave similar q 2 (0.646) with 2 components, relative higher r 2 of 0.800 and F (53.852), as well as small SEE (0.489) (bold in the Table 3), suggesting that the CoMSIA model is also a reliable predictor.

Table 3 The results of CoMSIA models based on different field combinations (final CoMSIA model in bold)

Topomer CoMFA statistical results

The topomer CoMFA technique provides a means for an alignment-independent 3D-QSAR approach which is advantageous in not just being alignment-independent, but also in providing the means for automated search for activity in fragment libraries. In this study, the Topomer CoMFA model was generated based upon split-in-two strategy (Fig. 3), which gave better statistics (q 2 = 0.746, r 2 = 0.979, F = 146.294, SEE = 0.175, N = 7) compared with CoMFA and CoMSIA models.

Validation of 3D-QSAR models

Generally, cross-validation is used to assess internal predictive power of models. In this study, the indices q 2 and \(q_{\text{L5O}}^{2}\) (>0.5) have accounted for internal robustness of these models. However, models with high values of cross-validated coefficient are not always predictive. For this reason, it is necessary to have an external set for validation. In this work, the external test set of 8 compounds was used to evaluate the predictive ability of the constructed CoMFA, CoMSIA, and Topomer CoMFA models. As shown in Table 2, the \(r_{\text{pred}}^{2}\) values of test set for CoMFA, CoMSIA, and Topomer CoMFA models were 0.892, 0.473, and 0.722, respectively. Obviously, CoMFA and Topomer CoMFA may be capable of obtaining more predictable models, compared with CoMSIA studies. The observed activities, the predicted activities using the constructed models, and the residual values for the training set and the test set are listed in Table 4. In all three 3D-QSAR models, the predicted values fell close to the observed values, deviating by not more than 1.0 logarithmic unit except for one of the test set compounds (31) in case of Topomer CoMFA. Compound 31 is an outlier (residual more than 1.0) for Topomer CoMFA, whereas in CoMSIA, the residual value is also quite high (0.859). There are several reasons for the presence of outliers including incorrect biological measurement, incorrect parameter value, metabolic inactivation/activation, etc. (Wang et al., 2014). In the present study, since the assay was carried out under the same conditions by Funke et al., this outlier might be due to experimental error. Figure 4 shows the plots of experimental versus predicted activities for both training set and test set of the three constructed models. In summary, the CoMFA and Topomer CoMFA in comparison with CoMSIA can lead to more robust and predictable model.

Table 4 Experimental activities, predicted activities, and residual values of 38 compounds were shown in CoMFA, CoMSIA, and Topomer CoMFA models
Fig. 4
figure 4

Plots of the experimental versus predicted pEC50 for CoMFA (a), CoMSIA (b), and Topomer CoMFA (c) analysis

CoMFA contour maps

In the CoMFA study, the steric fields and electrostatic fields nearly gave the same contribution, accounting for 52.8 and 47.2 %, respectively, which suggests that both fields are critical in explaining the variations of these compounds for GPR35 agonists.

Figure 5a shows the steric contour map for the CoMFA model with the most active compound 34 as a reference. The green contours indicate areas where sterically bulky substituents increase the potency, whereas the yellow contours indicate areas where sterically bulky groups are detrimental to the activity. There is one green contour around 6-position, which can explain well that most of the compounds with bromine in this position have better activity than those of containing only hydrogen atom. Similarly, increasing the size of the substituent from fluorine (compound 18, pEC50 = 6.951) to chlorine (compound 22, pEC50 = 7.775) to bromine (compound 29, pEC50 = 7.917) at 6-position enhances the activity gradually. There is another small green region located near the 14-position of compound 34. This is a possible reason why compounds 1, 2, and 3 lacking bulky group at this position have poor activities (pEC50 values are 4.671, 4.873, and 4.695, respectively). R2 of compounds 1, 2, and 3 are small alkyl groups, and introduction of bulkier cyclohexyl substituent can improve potency (compound 4, pEC50 = 5.377). In contrast, all the other compounds have an aromatic group at this position, and the majority of these aromatic-substituted compounds displayed the higher efficacies except for compound 12 (pEC50 <5.0), which means the existence of an aromatic group at the 10-position is important for activity. Moreover, two sterically unfavored yellow contours were found surrounding the outer sphere of 14-position where too large substituents like cyclopropylmethoxy and propoxy groups (compounds 35 and 36) decreased the activity. These observations are in agreement with previous SAR studies, and the rank order of potency among compounds with varying substituents in 14-position was as follows: methoxy (13) > bromo (14) > methyl (11) > hydrogen (5) > trifluoromethyl (12) > propoxy (36) ≈ cyclopropylmethoxy (35) (Funke et al., 2013).

Fig. 5
figure 5

CoMFA stdev*coeff contour plots for steric (a) and electrostatic (a) fields. Compound 34 was displayed as reference. Sterically favored/disfavored areas are shown in green/yellow, while the blue/red polyhedra depict the favorable site for positively/negatively charged groups. Favored and disfavored levels of these displayed interaction fields were fixed at 80 and 20 %, respectively (Color figure online)

The CoMFA electrostatic contour map is shown in Fig. 5b with compound 34. A favorable negative red contour region is found around the 6-position. This region indicates that electronegative groups could have a positive influence on the activity. It can explain well that this position is occupied by the electron-rich halogen substituent in most of the highly active compounds. In addition, two large blue regions near the 13- and 14-position suggest that electronegative substituent would decrease the activity. This is in agreement with the fact that the activities of compounds 10, 12, 21, and 28 bearing electronegative substituents such as –NO2, –CF3, and CN are relatively lower. Conversely, electron density was decreased due to “p-π” conjugation, then compounds 34 and 29 with methoxy group at para position of benzoyl ring display potent activity.

CoMSIA contour maps

The CoMSIA three contributors, namely steric, electrostatic, and hydrophobic fields, based on the PLS analyses are presented as 3D contour plots in Fig. 6. The CoMSIA steric, electrostatic, and hydrophobic fields explain variance of 6.9, 42.8, and 50.3 %, respectively. This demonstrated that hydrophobic interactions took an important role in describing the field properties of GPR35 agonists. Figure 6a illustrates the CoMSIA contour maps of steric and electrostatic fields, which give the similar conclusions as the field distribution of the CoMFA model (Fig. 5).

Fig. 6
figure 6

CoMSIA stdev*coeff contour plots for steric along with electrostatic fields (a) and hydrophobic fields (b). Compound 34 was embedded into the contour maps as reference. Sterically favored/disfavored areas are shown in green/yellow, while the blue/red polyhedra depict favorable sites for positively/negatively charged groups. Yellow/white contours indicate the regions where hydrophobic/hydrophilic groups would enhance the activity. Favored and disfavored levels of these displayed interaction fields were fixed at 80 and 20 %, respectively (Color figure online)

The hydrophobic contour map of the CoMSIA model in the presence of compound 34 is displayed in Fig. 6b. The white and yellow contour maps highlight areas where hydrophilic and hydrophobic properties are preferred. A yellow polyhedron was found around the 6-position, which can explain the fact that the bromine substituent is necessary to enhance the biological activity due to the increasing hydrophobicity. There is another large white contour surrounding the benzene ring. This is consistent with the observation that compounds 3234 with polar substituents in this region display potent activity, whereas compounds 3536 with more hydrophobic groups (cyclopropylmethoxy and propoxy) exhibit low activity. Moreover, compounds 37 and 38 in which the phenyl ring has been replaced by a pyridine ring have shown better potency compared with compound 29 due to increased hydrophilicity

Topomer CoMFA contour maps

Topomer CoMFA interaction maps (steric and electrostatic interactions) for both fragments (R1 and R2) using compound 34 as a reference structure are shown in Fig. 7. In the steric contour map, green color denotes sterically bulky groups favored for activity, and the yellow color indicates sterically bulky groups unfavored for activity. In the electrostatics contour map, red indicates electronegative favored groups and blue indicates electropositive favored group.

Fig. 7
figure 7

Topomer CoMFA stdev*coeff contour plots for steric and electrostatic fields. Compound 34 was displayed as reference. Steric contour map for the R1fragment (a); Electrostatic contour map for the R1fragment (b); Steric contour map for the R2 fragment (c); Electrostatics contour map for R2 fragment (d). Sterically favored/unfavored areas are shown in green/yellow contour, while the blue/red polyhedra depict the favorable sites for the positively/negatively charged groups (Color figure online)

As shown in Fig. 7a, there is a blue region near the 6-position of the R1 fragment which means sterically bulky group such as bromine is favorable and increase the activity. For the R2 fragment, green contour (Fig. 7c) surrounding the phenyl ring demonstrated that the introduction of a moderately bulky group into the benzene ring can improve the agonist effects. Meanwhile, two small yellow contours were found outside of the 14-position of R2 fragment, which can explain well that compounds 35 and 36 containing larger substituents at this position have relatively lower activity.

The electrostatic contour map of the R1 fragment (Fig. 7b) revealed that electron-withdrawing nature of bromine group substituted at 6-position on the chromone ring was favored for activity. On the other hand, the blue area was found outside of the hydrogen of carboxyl group, which means that this electropositive hydrogen could retain molecular activity. Figure 7d shows two regions of red polyhedral space, indicating that the strong electron-withdrawing group at R2 resulted in high activity (compounds 26 and 27). Additionally, the blue polyhedron in the Fig. 7d suggested that electron-rich substituent will reduce the biological activity, which is similar to CoMFA electrostatic contour maps.

Comparison of the CoMFA, CoMSIA, and Topomer CoMFA models

The CoMFA and CoMSIA models were developed based on the alignment of common substructure. Both CoMFA and CoMSIA models have provided the significant correlations of biological activities with structural descriptors. In comparison, the CoMSIA model yielded slightly lower statistical values than the CoMFA model. However, CoMSIA model can give more information because steric, electrostatic, and hydrophobic fields were considered. The CoMFA steric map indicated that a bulkier group around 6 and 14-position will be favorable for higher activity, while the CoMFA electrostatic contour map highlights the importance of the negative substituents at 6-position and positive groups at 14-position for improved potency. The CoMSIA hydrophobic map points out hydrophobic groups at 6-position can enhance the biological activity, whereas the relative hydrophilic substituent around the benzene ring is required for high activity. Topomer CoMFA, the combination of the universal “Topomer” methodology and CoMFA technologies, belongs to one of the fragment-based 3D-QSAR methods (Ding et al., 2013). In the Topomer CoMFA, fragments can automatically generate based on molecular 3D-pose, and experientially complete identification and alignment of pose of the fragments. The Topomer CoMFA displayed the best statistical parameters in terms of q 2 and r 2 values among these three models. In addition, the contour maps obtained from the Topomer CoMFA model further confirmed the role of steric and electrostatic effects in the agonist activities.

Conclusions

In this study, CoMFA, CoMSIA, and Topomer CoMFA 3D-QSAR models were developed for a series of 8-substituted chromen-4-one-2-carboxylic acid derivatives as GPR35 agonists. All the three models had good statistical results in terms of q 2 and r 2 values, and CoMFA together with Topomer CoMFA models had better predictive ability than CoMSIA model. Furthermore, CoMFA, CoMSIA, and Topomer CoMFA contour maps offered enough information for us to understand 3D-QSAR relationship between structures and their biological activities, which may be considered as a powerful tool in designing and forecasting more efficacious analogs.