Introduction

Nuclear receptors are a class of proteins that function as ligand-activated transcription factors in human being [1,2,3]. They regulate the expression of specific genes that control processes such as reproduction, development, homeostasis, and metabolism upon binding to small endogenous molecules like steroid hormones, thyroid hormones, retinoid acids, fatty acids, and phospholipids [4,5,6]. Because of their structural similarities to endogenous ligands, some chemical compounds found in foods, cosmetics, environment, pharmaceuticals, and industrial products mimic the behaviors of endogenous ligands and interact inappropriately with nuclear receptors, thereby causing endocrine disruption [7,8,9,10,11,12,13,14,15,16,17]. Human exposure to endocrine disrupting chemicals has been shown in numerous studies to be associated with reduced semen quality [18,19,20], prostate cancer [21,22,23,24], urogenital tract abnormalities [25, 26], precocious puberty [27,28,29], irregular menstrual cycle [30], early menopause [31, 32], breast cancer [33,34,35], alteration of immune responses [36,37,38], diabetes [39,40,41,42], obesity [43,44,45,46], cardiovascular diseases [47], and a host of other adverse health effects.

In recent times, research efforts are being directed towards understanding the mechanisms involved in endocrine disruption by persistent organic pollutants. One of the most widely reported group of persistent organic pollutants with high endocrine disrupting potencies is polychlorinated biphenyls (PCBs) and their metabolites. Hydroxylated polychlorinated biphenyls (OH-PCBs) are a group of exogenous chemicals that are produced by oxidation of PCBs through a variety of mechanisms, including metabolic transformation in living organisms and abiotic reactions with hydroxyl radicals [48]. Recently, the possible roles of nuclear receptors in mediating the endocrine disrupting effects of OH-PCBs are being vigorously studied by many research groups. Using a variety of in vitro bioassay methods, researchers have shown that some OH-PCB congeners act as agonists of estrogen receptor α (ERα) [49], estrogen receptor β (ERβ) [49], constitutive androstane receptor (CAR) [50], retinoid X receptor β (RXRβ) [51], retinoic acid receptor γ (RARγ) [51], and estrogen-related receptor γ (ERRγ) [52]. Some OH-PCB congeners have also been demonstrated in some in vitro studies to act as ERα antagonists [49], ERβ antagonists [49], androgen receptor (AR) antagonists [49], glucocorticoid receptor (GR) antagonists [49], and thyroid hormone receptor (TR) antagonists [53].

The major downside of the in vitro studies reviewed in the preceding paragraph is the high numbers of inactive OH-PCB congeners encountered in most of the reported experiments. For instance, only 22% and 38% of the OH-PCB congeners selected for investigation exhibited agonistic activities against RXRβ and RARγ, respectively [51]. Similarly, only 9%, 6%, and 30% of the OH-PCB congeners selected for investigation exhibited antagonistic activities against ERα, ERβ, and GR, respectively [49]. The low numbers of active OH-PCB congeners recorded in most of the experimental studies reviewed was due to the approach adopted by researchers in selecting OH-PCB congeners used in the experiments. In most experimental studies, the selections of chemicals used for toxicity testing were usually based on trial-and-error approach rather than on rational basis. This has resulted in waste of time and other valuable resources. In order to prioritize the choice of OH-PCB congeners for toxicity testing, it is pertinent to develop rapid and inexpensive quantitative structure–activity relationship (QSAR) models that can reliably predict the biological activities of OH-PCBs prior to experimental studies. QSAR modeling is a computational approach that establishes a correlation between the biological activities and measured or computed molecular features of a series of chemical compounds [54]. It is based on the assumption that changes in molecular structures of chemical compounds reflect corresponding changes in the observed biological activities [54]. Exposure of humans to PCBs and OH-PCBs has been linked to onset of obesity, diabetes, and fatty liver disease, and these metabolic disorders are thought to be mediated through activation of constitutive androstane receptor by this group of persistent organic pollutants [55]. Although QSAR models have been developed for toxicity prediction of some persistent organic pollutants, no QSAR model is currently available in the literature for the prediction of agonistic activities of OH-PCB congeners to constitutive androstane receptor. The objective of this study was to develop a QSAR model that can reliably predict the agonistic activities of untested OH-PCBs to constitutive androstane receptor using the limited experimental data available in literature.

Methods

The procedures adopted for constructing and validating the QSAR model described in this paper are summarized in the workflow displayed in Fig. 1 [56]. Details of the steps involved in these procedures are described below.

Fig. 1
figure 1

Workflow showing the basic procedures for constructing and validating a QSAR model (reproduced from Wang and Hou [56])

Dataset acquisition and molecular descriptor computation

The dataset used for the construction and validation of the QSAR model developed in this paper was obtained from the literature [50]. This dataset consists of structural formulae of 64 OH-PCB congeners and the experimental values of their agonistic activities to constitutive androstane receptor (Table S1 in the Supplementary Information). The agonistic activity of each OH-PCB congener in the dataset (reported as tenfold effective concentration) was measured in a reporter gene assay using yeast cells transduced with human constitutive androstane receptor [50]. This agonistic activity (EC × 10) was defined as the concentration of OH-PCB in solution that produced luminescence intensity that was 10 times greater than the luminescence intensity of a blank solution [50]. Before being used as response variable in the model building step, the original activity values were converted to logarithmic scale (\(log1/(EC\times 10)\)). Using semi-empirical PM6 optimized molecular structures obtained with Spartan '14 program (version 1.1.4) as input [57], a total of 1875 one-dimensional (1D), two-dimensional (2D) and three-dimensional (3D) molecular descriptors, representing the numerical information encoded within the molecular structure of each chemical compound in the dataset, were calculated using PaDEL-Descriptor software [58].

Variable elimination, molecular descriptor standardization, and dataset division

Existence of irrelevant and redundant molecular descriptors in multiple regression modeling is problematic and must therefore be eliminated [59]. Molecular descriptors with constant or nearly constant values (variables with low variance) were removed from the descriptors set because these variables are considered irrelevant for model building. In this paper, a descriptor is considered to have constant or nearly constant values if its variance is less than 0.0001. Multicollinearity among molecular descriptors introduces redundancy in a QSAR model since highly correlated descriptors contribute essentially the same information in the model [60]. In this paper, two molecular descriptors are considered redundant if the correlation coefficient between them exceeds 0.90. Multicolinear and low-variance descriptors were removed from the pool of 1875 descriptors calculated in the preceding section using V-WSP algorithm [61] as implemented in V-WSP tool (version 1.2) developed by Ambure et al. [62]. Pairwise correlations in a correlation matrix and variance inflation factor (VIF) were calculated and used to examine the presence or absence of multicollinearity among variables utilized in the final QSAR model [59]. After removing irrelevant and redundant descriptors from the descriptors set, the remaining descriptors were transformed into auto-scaled descriptors using standard normalization method [63]. Standardization of the original descriptors to auto-scaled descriptors (\({X}_{ik}^{n}\)) was accomplished by subtracting the mean of the descriptors (\({\mu }_{k}\)) from the original descriptor values (\({X}_{ik}\)) and then divided by their standard deviations (\({\sigma }_{K}\)). The main reason for using standardized regression coefficients in a QSAR model is that the magnitude of the standardized regression coefficient reflects the relative contribution of each descriptor to the predicted activity in the developed model [64]. Transformation of the original descriptor to auto-scaled descriptors was done using Minitab® 18.1 [65]. Finally, the entire dataset was divided into training set (70% of the entire dataset) and test set (30% of the entire dataset) using Kennard-Stone algorithm [66,67,68] as implemented in Dataset Division GUI 1.2 [62]. This corresponds to 44 OH-PCBs in the training set and 20 OH-PCBs in the test set. In Table S1 (Supplementary Information), compounds assigned to the test set are marked with single asterisk to distinguish it from compounds assigned to the training set.

Variable selection and model building

Selection of predictor variables for inclusion in a regression model is one of the most crucial aspects of a QSAR study. In this paper, genetic function approximation (GFA) [54], as implemented in Materials Studio 7.0 [69], was used to select the most appropriate combination of molecular descriptors for inclusion in the QSAR model developed in this study. This was then followed by a comprehensive regression analysis using multiple linear regression (MLR) method [70] as implemented in Minitab® 18.1 [65] and Materials Studio 7.0 [69]. In this model building step, efforts were made to establish a linear relationship between the dependent variable (\(log1/(EC\times 10)\)) and the selected independent variables (standardized molecular descriptors) of the series of chemical compounds assigned to the training set in Table S1 (Supplementary Information).

Internal and external validation of QSAR model

The goodness-of-fit of the QSAR model developed from the application of the GFA and MLR procedures described in the preceding section was evaluated using the following statistical parameters: \({R}^{2}\), \({R}_{adj}^{2}\), and s in Eqs. (1), (2), and (3), respectively. The stability and robustness of the developed QSAR model were evaluated using leave-one-out cross-validation and Y-randomization techniques [71]. In the leave-one-out cross-validation technique, one chemical compound from the training set was omitted and a model was developed using the data of the remaining chemical compounds. The model developed was then used to predict the agonistic activity of the omitted compound. This procedure was repeated iteratively and values of \({Q}_{LOO}^{2}\) and \({RMSE}_{CV}\) in Eqs. (4) and (5) respectively were calculated. In the Y-randomization procedure, the dependent variable of compounds in the training set was randomly shuffled 50 times while keeping the independent variables as they are. New MLR model was then developed after each random shuffling. Statistical parameters for the generated random models (R, \({R}^{2}\), and \({Q}^{2}\) values for random models) and the Y-randomization parameter, \({cR}_{p}^{2}\) in Eq. (6), were calculated. All the statistical parameters used for evaluating the fitness and robustness of the developed QSAR model were calculated using Minitab® 18.1 [65], Materials Studio 7.0 [69], and Y-randomization tool 1.2 [62]. In addition to the statistical parameters listed in Table 1 for evaluating the goodness-of-fit and robustness of the developed model, the QSAR model constructed in this paper was also externally validated for its ability to predict the agonistic activities of untested OH-PCB congeners. This task was accomplished by using the compounds assigned to the test set in Table S1 (Supplementary information). The parameters used for external validation of the developed QSAR model (\({R}_{EXT}^{2}\), \({R}_{0}^{2}\), \({R}_{0}^{\prime2}\), k, \({Q}_{F1}^{2}\), \({Q}_{F2}^{2}\), \(\overline{{r }_{m}^{2}}\), \({\Delta r}_{m}^{2}\), \({RMSE}_{EXT}\), and \({MAE}_{EXT}\)) are displayed in Table 2. These external validation metrics were calculated using MLRPlusValidation 1.3 and XternalValidationPlus 1.2 tools developed by Ambure et al. [62].

Table 1 Statistical parameters for fitting, cross-validation, and Y-randomization
Table 2 Statistical parameters for external validation

Evaluation of model applicability domain

Applicability domain expresses the scope and limitation of a QSAR model by defining the range of chemical structures for which the QSAR model is considered applicable [72]. In this paper, the leverage approach was used to evaluate the applicability domain of the QSAR model developed in this study. A measure of how far a chemical compound is from the applicability domain of a QSAR model is its leverage in the original variable space, \({h}_{ii}\) [64]. This measure is defined as: \({h}_{ii}={x}_{i}^{T}{({X}^{T}X)}^{-1}{x}_{i}\), where \({x}_{i}\) is the descriptor row-vector of the query compound, and X is the n × p matrix of p model descriptor values for n training set compounds. The superscript T refers to the transpose of the matrix vector. The warning leverage h* is generally fixed at \(3k/n\), where n is the number of training compounds and k is the number of model descriptors plus one (p + 1) [64]. To visualize the applicability domain of the developed model, a plot of standardized residuals versus leverages (Williams plot) was constructed.

Results and discussion

The QSAR model obtained on correlating the agonistic activities (\(log\;1/(EC\times 10)\) values) of 44 OH-PCB congeners assigned to the training set in Table S1 (Supplementary information) with the standardized molecular descriptors computed for these 44 OH-PCB congeners (Table S2 in the Supplementary information) is displayed in Eq. (17). As shown in Eq. (17), a linear relationship was established between \(log\;1/(EC\times 10)\) and eight structural features of OH-PCB congeners. Table 3 shows the descriptions, classes, and types of the molecular descriptors utilized in the developed QSAR model. As shown in Table 3, the QSAR model displayed in Eq. (17) contained six 2D descriptors (MATS1s, VE3_DzZ, VE1_Dzp, SpMin8_Bhv, SpMax5_Bhi, and topoRadius), belonging to four classes of descriptors—autocorrelation descriptor, Barysz matrix descriptor, Burden modified eigenvalues descriptor, and topological descriptor [73]. Table 3 also shows that the two 3D descriptors (RDF95u and RDF45m) in the QSAR model displayed in Eq. (17) belong to the same class of descriptor—radial distribution function (RDF) descriptor [73].

Table 3 Descriptions, classes, and types of molecular descriptors utilized in the developed QSAR model
$$\begin{aligned}\mathrm{log}\;1 /{(EC \times 10)}=&\;{6.1425} + {0.1698} \;{\mathrm{MATS1s}} \\&- {0.1062}\; {VE3\_DzZ} + {0.9477} \;{\mathrm{VE1\_Dzp}} \\&-{0.2241} \;{\mathrm{SpMin8\_Bhv}}- {0.8122} \;{\mathrm{SpMax5\_Bhi}} \\&- {0.3479}\; {\mathrm{topoRadius}}- {0.8045} \;{\mathrm{RDF95u}}\\&+{0.4608}\; {\mathrm{RDF45m}}\end{aligned}$$
(17)

In Table 4, the standardized regression coefficients and the statistical significance of the predictor variables utilized in the QSAR model displayed in Eq. (17)are presented. By examining the magnitudes and signs of the regression coefficients in the QSAR model, a broad interpretation of the model can be made [74]. While the descriptor that plays the most important role in the predictive ability of a model is identified by the magnitude of its regression coefficient in the model, the sign of this regression coefficient indicates the direction of the relationship between the descriptor and the predicted activity [74]. As shown in Table 4, the relative contributions of the eight descriptors in Eq. (17) to the predictive ability of the developed QSAR model, as can be inferred from the magnitudes of the standardized regression coefficients of the descriptors, decreased in the following order: VE1_Dzp > SpMax5_Bhi > RDF95u > RDF45m > topoRadius > SpMin8_Bhv > MATS1s > VE3_DzZ. Table 4 also shows that while three of the eight descriptors in Eq. (17) (VE1_Dzp, RDF45m, and MATS1s) made positive contribution to agonistic activities of OH-PCBs, five of the eight descriptors (VE3_DzZ, SpMin8_Bhv, topoRadius, RDF95u, and SpMax5_Bhi) made negative contribution to agonistic activities of OH-PCBs. The positive contributions of VE1_Dzp, RDF45m, and MATS1s to agonistic activities of OH-PCBs indicate that OH-PCB congener with higher values of these descriptors would be more active than OH-PCB congener with lower values of these descriptors. Conversely, the negative contributions of VE3_DzZ, SpMin8_Bhv, topoRadius, RDF95u, and SpMax5_Bhi to agonistic activities of OH-PCBs indicate that OH-PCB congener with higher values of these descriptors would be less active than OH-PCB congener with lower values of these descriptors. Broad interpretations of QSAR models, consistent with the approach used in this paper, are well documented in the literature [75,76,77]. The p values shown in Table 4 are all less than 0.05, indicating that the relationship between \(log\;1/(EC\times 10)\) and each of the eight descriptors in the QSAR model displayed in Eq. (17) was statistically significant [78]. This indicates that the strength of the association between \(log\;1/(EC\times 10)\) and each of the eight molecular descriptors is strong and reliable.

Table 4 Regression coefficients and statistical significance of predictor variables utilized for QSAR model development and values of metrics used for evaluating the goodness-of-fit and robustness of the developed QSAR model

Variance inflation factor (VIF) and correlation matrix—two approaches used for detecting multicollinearity among molecular descriptors in a QSAR model—are presented in Tables 4 and 5, respectively. Values of VIF reported in Table 4 for the eight descriptors are 1.134 for MATS1s, 1.268 for VE3_DzZ, 1.929 for VE1_Dzp, 3.177 for SpMin8_Bhv, 7.901 for SpMax5_Bhi, 2.350 for topoRadius, 3.790 for RDF95u, and 2.769 for RDF45m. A value of VIF less than 10 obtained for each of the eight molecular descriptors indicates that the QSAR model displayed in Eq. (17) contains no multicollinearity [79]. In the correlation matrix shown in Table 5, the absolute values of the correlation coefficients ranged from 0.000 (between VE1_Dzp and SpMin8_Bhv) to 0.798 (between SpMin8_Bhv and SpMax5_Bhi). Although no generally agreed cut-off value is currently available in literature for defining lack of collinearity between two descriptors, a squared correlation coefficient lower than 0.80 has been suggested as a threshold value for accepting lack of collinearity between two variables [63]. Detraction of QSAR model from mechanistic interpretation and deterioration of the model’s statistical parameters are the two major problems encountered when multicollinearity exists among the descriptors utilized in a QSAR model [60, 80].

Table 5 Correlation matrix of the molecular descriptors utilized for QSAR model development*

The values of coefficient of multiple determination (\({R}^{2}\)) and standard error of estimate (s)—two statistical metrics used for evaluating the goodness-of-fit of a QSAR model - are shown in Table 4. The values of \({R}^{2}\) and s reported in Table 4 were 0.8902 and 0.2840, respectively. This \({R}^{2}\) value indicates that 89.02% of total variation in the response variable (\(log1/(EC\times 10)\)) could be accounted for by its relationship with the eight predictor variables (molecular descriptors) utilized in the developed QSAR model. Value of \({R}^{2}\) > 0.6, as obtained in this paper, suggests that the QSAR model displayed in Eq. (17) provides a good fit to the data used in building the model [81]. The standard error of estimate (s) measures the dispersion of the observed values from the regression line [64]. The low value of s reported in Table 4 indicates that the observed values of the agonistic activities of OH-PCBs are close to the regression line predicted by the developed QSAR model. Another statistical parameter that is used as a measure of goodness-of-fit of a QSAR model is the adjusted \({R}^{2}\) (\({R}_{adj}^{2}\)). Unlike the value of \({R}^{2}\) which always increases regardless of whether the addition of an extra predictor variable to a QSAR model improves the model or not, value of \({R}_{adj}^{2}\) only increases when addition of an extra predictor variable improves the model, thus eliminating the possibility of overfitting [82]. The value of \({R}_{adj}^{2}\) reported in Table 4 for the QSAR model displayed in Eq. (17) was 0.8651. The eight predictor variables retained in the QSAR model are considered acceptable because further addition of extra independent variable to the model caused significant reduction in the reported \({R}_{adj}^{2}\) value [82]. Furthermore, since the ratio of the number of compounds in the training set to the number of descriptors in the QSAR model displayed in Eq. (17) was at least 5:1 as suggested by Topliss and Costello [83], the risk of chance correlation in the developed QSAR model was avoided.

To assess the stability and robustness of the QSAR model displayed in Eq. (17), parameters obtained from leave-one-out cross-validation and Y-randomization techniques were used. In Table 4, values of \({Q}_{LOO}^{2}\) and \({RMSE}_{CV}\)—two parameters obtained from leave-one-out cross-validation technique—are presented. The QSAR model displayed in Eq. (17) is considered robust and stable because the value of \({Q}_{LOO}^{2}\) reported in Table 4 (\({Q}_{LOO}^{2}\) = 0.8201) was greater than the cut-off value (\({Q}_{LOO}^{2}>0.5\)) suggested in the literature [71, 84]. Furthermore, a difference of less than 0.3 between \({R}^{2}\) value and \({Q}_{LOO}^{2}\) value, as obtained in this study, indicates that the QSAR model displayed in Eq. (17) did not suffer from overfitting [84]. The low value of root-mean-square error in cross-validation (\({RMSE}_{CV}\) = 0.3242) reported in Table 4 suggests that the accuracy of the prediction made by the developed model was good [85]. The results of Y-randomization procedure conducted to check the stability and robustness of the QSAR model developed in this paper are shown in Table 6. The \({R}^{2}\) and \({Q}^{2}\) values reported in Table 6 for each of the 50 random models generated were significantly lower than the values of \({R}^{2}\) and \({Q}^{2}\) obtained for the original QSAR model displayed in Eq. (17) (\({R}^{2}\) and \({Q}^{2}\) values obtained for the original model were 0.8902 and 0.8201, respectively). The lower values of \({R}^{2}\) (average was 0.1770) and \({Q}^{2}\) (average was −1.3278) obtained for the random models ruled out the possibility of chance correlation in the original QSAR model displayed in Eq. (17) [86]. The value of \({cR}_{p}^{2}\) reported in Table 6 was 0.8019. Value of \({cR}_{p}^{2}\) higher than 0.5, as obtained in this study, indicates that the QSAR model displayed in Eq. (17) was robust and stable [86].

Table 6 Results of Y-randomization test to check the robustness of the developed QSAR model

One of the core functions of a QSAR model is to make reliable prediction of biological activities for yet to be tested chemical compounds. Because generating an entirely new set of experimental data for the purpose of external validation is often difficult in everyday practice, the strategy suggested by QSAR experts is to divide the available dataset into two subsets—the training set and the test set [64]. In this paper, the 44 OH-PCB congeners assigned to the training set (70% of the entire dataset) were used for model building while the remaining 20 OH-PCB congeners assigned to the test set (30% of the entire dataset) were reserved for external validation of the developed model. Using the computed molecular descriptors presented in Table S3 (Supplementary information) as inputs, agonistic activities of OH-PCBs in the test set were predicted with the QSAR model displayed in Eq. (17). The values of the external validation metrics used for evaluating the predictive ability of the developed QSAR model and the threshold values used for judging the acceptability of these metrics are presented in Table 7. As shown in Table 7, the values of the external validation metrics obtained when prediction was made on chemical compounds assigned to the test set, before and after removing 5% of data with high residuals from the test set, indicate that all the validation metrics satisfied the requirements for accepting the external predictivity of the QSAR model displayed in Eq. (17) [71, 84, 87,88,89,90]. Using the values of observed activities, predicted activities and residuals shown in Tables S4 and S5 (Supplementary information), a plot of residuals versus experimental values of \(log\;1/(EC\times 10)\) (Fig. 2) and a plot of predicted values of \(log\;1/(EC\times 10)\) versus experimental values of \(log\;1/(EC\times 10)\) (Fig. 3) were constructed. The random distribution of the residuals on both sides of the horizontal zero line in Fig. 2 indicates that there was no systematic error in the QSAR model developed in this paper. Figure 3 also shows that there was a positive and strong correlation between the activities predicted by the developed QSAR model and the experimental activities obtained from literature. These results indicate that the prediction made by the QSAR model displayed in Eq. (17) was accurate and reliable.

Table 7 External validation parameters for the developed QSAR model and their threshold values
Fig. 2
figure 2

Plot of residuals versus observed activities

Fig. 3
figure 3

Plot of predicted activities versus observed activities

Applicability domain—a concept that expresses the scope and limitation of a QSAR model by defining the range of chemical structures for which the model is considered applicable—is another crucial aspect that must be included in a QSAR study [72]. To visualize the outliers in both the descriptor space and the response space, the Williams plot shown in Fig. 4 was constructed from the values of standardized residuals and leverages presented in Tables S4 and S5 (Supplementary information). In the Williams plot displayed in Fig. 4, a compound with a standardized residual greater than three standard deviation units is considered to be a response outlier while a compound with leverage higher than the warning leverage (h* = 0.6136) is considered to be a structurally influential chemical in the developed model [88]. As shown in Fig. 4, none of the 64 OH-PCBs in the dataset was response outlier but 2′,4′,6′-trichlorobiphenyl-4-ol and 2′,3,4′-trichlorobiphenyl-2-ol were found to be structurally influential chemicals because of their high leverage values. As structurally influential chemical in the training set, the compound 2′,3,4′-trichlorobiphenyl-2-ol greatly influenced the regression parameters by forcing the fitted regression line near its observed value [64]. On the other hand, the predicted response of a compound in the test set (compound 2′,4′,6′-trichlorobiphenyl-4-ol in this case) with a leverage value greater than the warning leverage may not be reliable because the prediction was probably a result of substantial extrapolation of the QSAR model [64].

Fig. 4
figure 4

Plot of standardized residuals versus leverages (Williams plot), with a warning leverage of 0.6136

Conclusion

Some congeners of hydroxylated polychlorinated biphenyls are known to cause endocrine disruption in humans by acting as nuclear receptor agonists or nuclear receptor antagonists. However, the high numbers of inactive OH-PCB congeners recorded in many experimental toxicity studies designed to measure the agonistic and antagonistic activities of OH-PCBs necessitate the need to develop QSAR models that can predict the activities of OH-PCBs prior to in vitro experiments. Use of QSAR models for large-scale screening of OH-PCB congeners for the purpose of prioritizing the congeners for further experimental testing offers the advantages of being rapid, inexpensive, and high-throughput. In this paper, a local QSAR model for predicting the agonistic activities of OH-PCBs to constitutive androstane receptor was constructed and validated using certain internal and external validation criteria. The developed QSAR model was found to be statistically reliable, robust, and possessed good external predictivity. The QSAR model developed in this paper can therefore be used by toxicologists to predict the agonistic activities of untested OH-PCBs to constitutive androstane receptor prior to experimental studies. Development of more QSAR models for the prediction of OH-PCB activities in toxicity studies involving other nuclear receptors is recommended.