1 Introduction

Corticotropin-releasing factor, a neuropeptide isolated from mammalian brain (Vale et al. 1981) is the prime regulator of the hypothalamus–pituitary–adrenocortical (HPA) axis (Owens and Nemeroff 1991; De Souza and Grigoriadis 1995). It has broad extra hypothalamic distribution in the central nervous system and produces a wide spectrum of autonomic, electrophysiological, and behavioral effects consistent with a neurotransmitter or neuromodulator role in the brain (Vale et al. 1983; Koob and Bloom 1985). Corticotropin-releasing factor (CRF) has been implicated as the mediator for the integrated physiological response to stress (Rivier et al. 1982; Antoni et al. 1990) and it mediates its actions through high-affinity binding to its receptors, CRF1-R and CRF2-R, both of which are members of the class B G-protein-coupled receptor super family (Steckler and Holsboer 1999). Corticotropin-releasing factor is involved in a wide spectrum of central nervous system-mediated effects that suggest that this peptide plays an important role within the brain, especially during stress (Brown 1991). Physiological studies have strongly implicated alteration of the CRF system in anxiety and depression (Holsboer 1999), promoting the concept of CRF1 receptor antagonism for treating these conditions. This hypothesis has stimulated development of high-affinity peptide and nonpeptide antagonists for the CRF1 receptor (Grigoriadis et al. 2001; Gilligan et al. 2000). There is both preclinical and clinical evidence to suggest that CRF1 plays a role in anxiety-related diseases (Owens and Nemeroff 1991; Britton et al. 1986; Berridge and Dunn 1987; Dunn and Berridge 1990). It has been shown that intracerebroventricular injection of CRF in rats produces behavioral and physiological changes that mimic the effects of stress (Dunn and Berridge 1990). The potential that CRF1 receptor antagonists offer to provide a novel mechanism for the treatment of depression and anxiety has captured the attention of numerous research groups (Dzierba 2008; Tellew and Luo 2008). A number of CRF-1 receptor antagonists have been reported to have entered clinical trials for depression and anxiety-related disorders (Kehne and Cain 2010; Kehne and Maynard 2009; Dzierba et al. 2008). In the computer-aided drug design methods, especially quantitative structure–activity relationship (QSAR) is a good and accepted method for this aim. The QSAR has crucial role in the construction of novel and potent lead compounds as well as saving the time and cost for better prediction of new compounds activity (Verma et al. 2010). Quantitative structure–activity relationship (QSAR) contribution of individual substituent site, the knowledge of which can be applied to create a combinatorial library by substituting different entities at the substitution site can be used for rational designing new compounds for therapeutic purposes.

The aim of the present study was to rationalize corticotropin-releasing factor-1 (CRF1) receptor antagonists of this set of inhibitors through the application of 2D-QSAR method. Our resulting 2D model will guide further structural modification and predict the potency and physicochemical properties of clinical drug candidates.

2 Methodology

QSAR studies were performed using the Molecular Design Suite (VLife MDS software package, version 3.5 2010).

2.1 Data Set

The structure of 57 N3-Phenylpyrazinones derivatives of as novel corticotropin-releasing factor-1 (CRF1) receptor antagonists and their biological activity were collected from the literature by Hartz et al. (2009). The biological activity values [IC50 (nM)] reported in nanomolar units were converted to their molar units and then further to negative logarithmic scale (−logIC50) and subsequently used as the dependent variable for the QSAR analysis. In all the models subsequently developed, pIC50 (−logIC50) values were used as the dependent variable. These values are presented in Table 1.

Table 1 Structures and activity of Phenylpyrazinones as corticotropin-releasing factor-1

2.2 Optimization of structures

The software enables evaluation of several molecular descriptors and provides a facility to build regression equation relating the best set of descriptors with the activity which can be used later for predicting activity of new molecules. The molecular structure of all the 57 molecules were built using the 2D draw application of VLife Engine module of VLife MDS 3.5 software and then the structures were converted to 3D structures for further analysis. The ligand geometries were optimized by energy minimization using MMFF94 force field and Gasteiger–Marsili charges for the atoms, till a gradient of 0.001 kcal/mol Å and the iteration limit to 10,000 (Halgren 1996). While preparing the data set, compounds whose pharmacological screening was performed by same experimental protocol and conditions were considered.

2.3 Selection of training set and test set

The sphere exclusion method (Golbraikh and Tropsha 2002) was adopted for division of training and test data set comprising of 45 and 12 molecules, respectively, with dissimilarity value of 11.6 where the dissimilarity value gives the sphere exclusion radius. The training set was used to generate 2D-QSAR models, and the test set was used to validate the quality of the model. The QSAR models were generated using a training set of 45 molecules and remaining 12 compounds as a test set (Table 1 marked with asterisk) for validating the quality of the models.

2.4 Calculation of descriptors for the 2D-QSAR model

A quantitative structure–activity relationship (QSAR) relates numerical properties of the molecular structure to its biological activity by a mathematical model. The starting point for a 2D QSAR analysis is a set of conformations, one for each molecule in the set.

The physicochemical descriptors include 266 physicochemical parameters, 500 alignment-type parameters, and 60 atom type count descriptors that were calculated for the energy-optimized molecules using the same software. Various types of physicochemical descriptors were calculated individual, chi, path count, chi chain, SsCH3count, SdCH2count, SssCH2count, StCHcount, information theory based, path cluster, kappa, element count, estate number, estate contribution, semi-empirical and polar surface area. In this study to calculate Baumann’s paper (Baumann 2002) descriptors, we have used following attributes, 2 (double-bonded atom), 3 (triple-bonded atom), C, N, O, S, H, F, Cl, Br and I and the distance range of 0–7.

Descriptors with no variation were removed and were further applied with autoscaling as input to the regression method. In this study, more than 251 calculated descriptors (2D) were subjected to partial least square (PLS) analysis, to establish a correlation between physicochemical parameters. The cross-validation run returns the optimum number of components for which it has maximum cross-validated r 2 (q 2) and minimum standard error of prediction (q 2 se). To further assess the statistical validity and robustness of the derived equations, randomization was performed to get a Z score value. QSAR model is considered to be predictive, if the following conditions are satisfied: r 2 > 0.6, q 2 > 0.6 and pred_r 2 > 0.5 (Golbraikh and Tropsha 2002).

2.5 Cross-validation

Internal validation was carried out using leave-one-out (q 2, LOO) method (Cramer et al. 1988). To calculate q 2, each molecule in the training set was sequentially removed, the model refit using same descriptors, and the biological activity of the removed molecule predicted using the refit model. The cross-validated correlation coefficient (q 2) of the generated model was calculated as follows:

$$q^{2} = 1 - \frac{{\sum {\left( {y_{i } - \hat{y}_{i} } \right)^{2} } }}{{\sum {\left( {y_{i} - y_{\text{mean}} } \right)^{2} } }},$$

where y i , \(\hat{y}_{i}\) are the actual and predicted activity of the ith molecule in the training set, respectively, and y mean is the average activity of all molecules in the training set. To test the utility of the model as a predictive tool, an external set of compounds with known activities (the test set) were used. For external validation, activity of each molecule in the test set was predicted using the model generated from the training set. The pred_r 2 value is calculated as follows:

$${\text{Pred\_r}}^{2} = 1 - \frac{{\sum {\left( {y_{i } - \hat{y}_{i} } \right)^{2} } }}{{\sum {\left( {y_{i} - y_{\text{mean}} } \right)^{2} } }},$$

where y i, \(\hat{y}_{i}\) are the actual and predicted activity of the ith molecule in the test set, respectively, and y mean is the average activity of all molecules in the training set.

3 Results and discussion

QSAR studies of 57 substituted N3-Phenylpyrazinones as CRF1 through PLS methodology, using VLife MDS 3.5 software. The developed QSAR models are evaluated using the following statistical measures: n, (the number of compounds in regression); r 2 (the squared correlation coefficient), F test (Fischer’s value) for statistical significance, q 2 (cross-validated correlation coefficient); pred_r 2, (r 2 for external test set); Z score, (Z score calculated by the randomization test). 2D-QSAR model-1 shows coefficient of determination (r 2) of 0.81 and cross-validated correlation coefficient (q 2) of 0.73. 2D-QSAR model 2 and 3 produced from training set obtained from sphere exclusion method shows r 2, q 2, and pred_r 2 much lower than model-1.

pIC50 = 0.1413 (±0.0043) T_2_F_1 + 0.2958 (±0.3211) SdsNcount + 0.0452 (±0.0143) T_C_Cl_1 −0.2174 (±0.0478) SaasCE-index + 0.3812 (±0.0052) SsOHcount

N training = 45, N test = 12, Degree of freedom = 27, r 2 = 0.8141, q 2 = 0.7391, F test = 27.1428, r 2_se = 0.3381, q 2_ se = 0.2994, pred_r 2 = 0.7827, pred_r 2se = 0.5349, Best Rand R^2 = 0.3114, Best Rand Q^2 = 0.1604, Z Score R^2 = 6.0633, Z Score Q^2 = 3.1232

2D-QSAR model-1 can explain 81.41 % of the variance in the observed activity values. It shows an internal predictive power (q 2 = 0.7391) of 73 % and a predictivity for the external test set (pred_r 2 = 0.7827) of about 78 %. The developed PLS model showed the importance of each descriptor that makes the equation. The developed PLS model-1 that the negative contribution (~18 %) of SaasCE-index showed that decrease in the values of this descriptor would be beneficial for the activity of Phenylpyrazinones derivatives. This model also indicates that the positive contributions of AI descriptors T_2_F_1 (~27 %) and T_C_Cl_1 (~32 %) signify the count of number of carbon atoms separated from any fluorine and chlorine atom, respectively, by one-bond distance. Thus, the presence of fluoro- or chloro-substituents would increase the activity. SsOHcount descriptor represents total number of hydroxy group connected with one single bond should be attached with Phenylpyrazinones ring for maximal determining activity. This finding is also supported by studying at R1 site, suggesting that these molecules were suitable for further optimization with respect to their biological activities. The correlation matrix is shown in (Table 2) which shows good correlation of selected parameters with biological activity. Figure 1 gives the fitness plot for training set and test set. The graph is in the form of actual versus predicted activity values obtained by PLS method in generating model-1. Figure 2 shows contribution chart (% contributions of different descriptors in model-1) representing the contribution of descriptors in the 2D-QSAR model developed by PLS method. The above model-1 is validated by predicting the biological activities of the training and test molecules, as indicated in Table 3.

Table 2 Correlation matrix between descriptors present in the best QSAR model -1
Fig. 1
figure 1

Plot of contribution chart of 2D QSAR model-1

Fig. 2
figure 2

Graphs of observed vs. predicted activity of 2D QSAR model-1

Table 3 Comparative observed and predicted activities of Phenylpyrazinones as corticotropin-releasing factor-1

pIC50 = −1.4928 (±0.2301) T_2_O_7 + 0.8290 (±0.2880) SaasCcount + 0.1208 (±0.0534) rotatable bond count

N training = 45, N test = 12, Degree of freedom = 27, r 2 = 0.7824, q 2 = 0.6682, F test = 24.6705, r 2_se = 0.6340, q 2_ se = 0.7011, pred_r 2 = 0.7163, pred_r 2se = 0.4188, Best Rand R^2 = 0.3768, Best Rand Q^2 = 0.1954, Z Score R^2 = 6.4216, Z Score Q^2 = 5.2298

2D-QSAR model-2 with PLS method shows good squared correlation coefficient (r 2) of 0.7824 explains ~78 % variance in biological activity. In model-2, the positive coefficient of SaasCcount and rotatable bond count showed that increase in the values of these descriptors are beneficial for the activity. The descriptor rotatable bond count indicated that the presence of saturated single bonds at the R1 substitution site increases the activity of the compound. The descriptor T_2_O_7 plays important role (~20 %) in determining activity. The descriptor (SaasCcount) signifies the total number of carbon connected with one single bond along with two aromatic bonds. Positive contribution of this descriptor revealed the increase of CRF-1 activity of Phenylpyrazinones with presence of more number of carbons connected with single bond along with two aromatic bonds. The activity contribution chart for 2DQSAR model is shown in Fig. 3 and plots of observed vs. predicted values of pIC50 are shown in Fig. 4.

Fig. 3
figure 3

Plot of contribution chart of 2D QSAR model-2

Fig. 4
figure 4

Graphs of observed vs. predicted activity of 2D QSAR model-2

pIC50 = −0.7381 (±0.2541)1PathCount + 0.5127(±0.2202) SaasCE-index + 0.2117 (±0.0742) SsCH3 count

N training = 45, N test = 12, Degree of freedom = 27, r 2 = 0.7614, q 2 = 0.6758, F test = 12.4135, r 2_se = 0.1902, q 2_ se = 0.2932, pred_r 2 = 0.6193, pred_r 2se = 0.5381, Best Rand R^2 = 0.2731, Best Rand Q^2 = 0.4517, Z Score R^2 = 6.5689, Z Score Q^2 = 5.3034

In 2D QSAR model, r 2 > 0.7 suggests that significant percentage of the total variance in biological activity is accounted by the model. 2D-QSAR model-3 shows good squared correlation coefficient (r 2) of 0.7614 explains 76 % variance in biological activity. The descriptor 1PathCount is path count parameter signify the total number of fragments of single order (single bond path) in compound. It is negatively correlated with CRF-1 activity so, it may be inferred that decreasing the branching of compound is disfavorable for activity. The SaasCE-index indicate that electrotopological properties of the carbon atoms connected with aromatic rings and single bonds positively influence CRF-1 activity shown by substituted Phenylpyrazinones derivatives. SsCH3count (i.e., the descriptor that signifies the total number of methyl groups connected with a single bond) contributed positively (*32 %) in the mathematical representation of the model. The activity contribution chart for 2DQSAR model is shown in Fig. 5 and plots of observed vs. predicted values of pIC50 are shown in Fig. 6.

Fig. 5
figure 5

Plot of contribution chart of 2D QSAR model-3

Fig. 6
figure 6

Graphs of observed vs. predicted activity of 2D QSAR model-3

4 Conclusion

In present study, an attempt has been made to identify the necessary structural and substituent requirements. From the present QSAR analysis, three best models were generated among which any one can be used for predicting the activity of the newly designed compounds in finding some more potent molecules. The developed PLS model reveals that the descriptors SdsNcount, SaasCE-index, SsOHcount are inversely proportional to the CRF1 activity while T_2_F_1 was found to be directly proportional to the activity. This information was used to search the structural database to find optimum substitution required at the R1 position. The molecules were designed using structural restrictions obtained from QSAR study in selecting the functional groups. Design of novel Phenylpyrazinones molecules has been performed on the basis of chemical information obtained from descriptors of QSAR equations. The current study provides better insight into the designing of more potent corticotropin-releasing factor-1 in the future before their synthesis.