Introduction

Hydroxylated polychlorinated biphenyls (OH-PCBs) and PCB-derived quinones are toxic metabolites of PCBs that have been identified in human beings and the wildlife. [13] According to previous reports, approximately 40 different OH-PCBs have been identified in human plasma [3] and the levels of these chemicals in human serum were found to constitute 10–30% of the total PCBs. [48] The endocrine disruption and other adverse effects of these compounds have drawn great attention because of their similarity in chemical structures to the natural estrogens and thyroid hormones. The different toxicity of these compounds has been investigated by many researchers. [914] Recently, Machala et al. [15] investigated the nongenotoxic adverse effects of oxygenated PCBs derivatives. They found that both low molecular weight PCB-metabolites and persistent OH-PCBs are capable of multiple nongenotoxic modes of action, including significant aryl hydrocarbon receptor (AhR)-mediated activity, modulation of estrogen receptor α (ER)-mediated responses, and inhibition of intercellular communication potentially associated with promotional effects.

It is clear that sufficient data on the toxicity of OH-PCBs and PCB-derived quinones are necessary for the risk assessment of these PCB derivatives. However, because of high cost and time-consuming process, toxicity values are rather scarce for nongenotoxic adverse effects of oxygenated PCBs derivatives. Thus, quantitative structure–activity relationships (QSAR), which correlate and predict toxicity data of the oxygenated PCB derivatives from their molecular structural descriptors, may be able to give some insight into the toxicological mechanism and generate predicted toxicity values efficiently. Since quantum chemical descriptors can be obtained easily by computation, clearly describe defined molecular properties, and are not restricted to closely related compounds, the development of QSAR models with quantum chemical descriptors is very useful.

The aim of this study is to develop QSAR models to predict the toxicity of hydroxylated and quinoid PCB metabolites based on their structural descriptors. We used the partial least squares (PLS) algorithm, which can analyze data with strongly collinear, noisy and numerous X variables. [16] It not only searches the relationship between a matrix Y (containing dependent variables) and a matrix X (containing predictor variables), but also reduces the dimension of the matrices while concurrently maximizing the relationship between the descriptors. Recently, reports showed that quantum chemical descriptors computed by different semiempirical methods affect the model’s stability and predictive ability. Therefore, we also used quantum chemical descriptors that were computed by different semiempirical methods in the PLS regression to analyze the toxicity of hydroxylated and quinoid PCB metabolites.

Materials and methods

Dataset

The in vitro potencies for downregulating gap junctional intercellular communication (GJIC) and for activating the AhR and the ER of a series of OH-PCBs and PCB quinones were recently determined in well-established liver and mammary cell models by Machala et al. [15] The acute inhibition of GJIC (IC50, μM) was determined in the rat liver epithelial cell line WB-F344 by the scrape loading/dye transfer method [15] and the IC50 (μM) data are listed in Table 1. Due to the low values of active compounds, AhR- and ER-mediated effects were not included in the present study, only the IC50 values determined by Machala et al. [15] were used in this study as the training sets. The results demonstrated that a nearly 9-fold difference exists between the known IC50 value for 4-OH-PCB 187 (8.7 μM) and that for 2-(4-Cl-phenyl)-1,4-benzoquinones (73.2 μM).

Table 1 The hydroxylated and quinoid PCBs under study and their acute inhibition of GJIC (IC50, μM)

Theoretical molecular structural descriptors

Molecular structural descriptors were calculated for OH-PCBs and PCB quinones by the semiempirical AM1, PM3 and MNDO methods. All calculations were performed using MOPAC (2000) contained in the CS Chem3D Ultra (Version. 6.0). The molecular structures were optimized using Eigenvector following, [17] a geometry optimization procedure within MOPAC 2000. The geometry optimization criterion GNORM was set at 0.1. A total of 15 MOPAC-derived descriptors that reflect the overall character of the oxygenated PCB derivatives were computed using the different semiempirical methods in this study. A full list is given in Table 2. The energy of intermolecular force is closely related to Mw values. The parameter α indicates the ease with which the species can be deformed by an electric field. Atomic charges are related to the reactive centers. The molecular orbital energies of a given molecule are related to chemical reactivity. Inductive effects and resonance effects exerted by the presence of different substituents and substructural groups within the molecule affect the electron distribution and stability of the molecular orbitals. Two non-empirical descriptors, ΔH f and μ, are expected to reflect the affinity for leaching to some extent. Additionally, three combinations of frontier molecular-orbital energies, E LUMOE HOMO, (E LUMOE HOMO)2 and E LUMO+E HOMO, which proved to be significant in previous studies, [18, 19] were also selected as predictor variables. The E LUMOE HOMO and E LUMO+E HOMO can be related to absolute hardness and electronegativity respectively [20, 21].

Table 2 List of molecular structural descriptors of hydroxylated and quinoid PCBs

Statistical analysis

The Simca (Simca-S Version 6.0, Umetri AB and Erisoft AB) software was used to perform the PLS analysis. The conditions for the computation were based on the default values of the software. The criterion used to determine the model dimensionality—the number of significant PLS components—is cross validation (CV). With CV, when the fraction of the total variation of the dependent variables that can be predicted by a component, Q 2, for the whole data set is larger than a significance limit (0.097), the tested PLS component is considered significant. Model adequacy was mainly characterized by the number of observations used for model building in the training set, the number of PLS principal components (k), \(Q^{2}_{{{\text{cum}}}} \), the correlation coefficient between observed and fitted values (R), the general standard error (SE) [22, 23] and the significance level (p).

Results and discussion

In a PLS model, variable importance in the projection (VIP) is a parameter in the PLS analysis that shows the importance of a variable in a PLS model. Terms with a large value of VIP, larger than 1, are the most relevant for explaining dependent variable. PLS analysis with the acute inhibition of GJIC (Table 1) as dependent variable and the 18 quantum chemical descriptors as independent variables generated many results. The optimal model, which has the largest \(Q^{2}_{{{\text{cum}}}} \), was obtained through stepwise culling the model with the smallest VIP value. Then the optimal PLS model was selected based on statistical values of \(Q^{2}_{{{\text{cum}}}} \), R, and p. Following the procedure described above, models (1), (2) and (3) were obtained using computed molecular descriptors by semiempirical PM3, AM1 and MNDO methods, respectively, for logIC50 of oxygenated PCB derivatives. The detailed model-fitting results are listed in Table 3. In Table 3, \(R^{{\text{2}}}_{{X{\left( {{\text{adj}}} \right)}{\left( {{\text{cum}}} \right)}}} \) and \(R^{{\text{2}}}_{{Y{\left( {{\text{adj}}} \right)}{\left( {{\text{cum}}} \right)}}} \) stand for cumulative variance of all the X’s and Y’s, respectively, explained by all extracted components. Eig stands for the Eigenvalue, which denotes the importance of the PLS principal components. It can be seen from Table 3 that one PLS principal component was selected, respectively, in models (1)–(3). For example, one PLS principal component explained 53.8% of the variance of the predictor variables, and 82.8% of the variance of the dependent variable. The predicted logIC50 values are close to the observed dependent values, as shown in Table 1. Plots of observed and predicted logt 1/2 values from models (1)–(3) are shown in Fig. 1. All the correlations between observed and predicted dependent values (R) by AM1, PM3 and MNDO methods are significant (R > 0.907, p < 0.0001). The models obtained from different semiempirical methods suggested that the molecular structural characteristics of OH-PCBs and PCB-derived quinones affect the acute inhibition of GJIC of these molecules.

Table 3 Model fitting results for models (1)–(3)
Fig. 1
figure 1

Plots of observed vs. predicted logIC50 values of models (1)–(3) a Correlation of observed logIC50 values with predicted logIC50 values by AM1 method; b Correlation of observed logIC50 values with predicted logIC50 values by PM3 method; c Correlation of observed logIC50 values with predicted logIC50 values by MNDO method. (The numbers correspond to those in Table 1)

Based on the unscaled pseudo-regression coefficients of the independent variables and constants transformed from PLS results, analytical QSPR equations were obtained and are shown in Eq. 1 to 3:

AM1 method (model (1)):

$$\log \text{IC}_{50} = 2.089 - 3.707 \times 10^{ - 3} α - 8.467 \times 10^{ - 3} \left( {E_{\text{LUMO}} - E_{\text{HOMO}} } \right)^2 - 1.407 \times 10^{ - 1} \left( {E_{\text{LUMO}} - E_{\text{HOMO}} } \right) + 1.003 \times 10^{ - 4} \text{EE} + 2.712 \times 10^{ - 4} \text{CCR} - 1.768 \times 10^{ - 1} E_{\text{HOMO} + 1} - 3.457 \times 10^{ - 2} \left( {E_{\text{LUMO}} + E_{\text{HOMO}} } \right) + 1.501 \times 10^{ - 3} \Delta H_\text{f} $$
(1)

PM3 method (model (2))

$$\log {\text{IC}}_{{50}} = 1.432 - 1.787 \times 10^{{ - 1}} E_{{{\text{HOMO}} + 1}} - 2.241 \times 10^{{ - 3}} α - 6.601 \times 10^{{ - 4}} {\text{Mw}} + 1.107 \times 10^{{ - 4}} {\text{EE}} + 7.686 \times 10^{{ - 5}} {\text{TE}} - 9.291 \times 10^{{ - 2}} {\left( {E_{{{\text{LUMO}}}} + E_{{{\text{HOMO}}}} } \right)} - 5.586 \times 10^{{ - 3}} {\left( {E_{{{\text{LUMO}}}} - E_{{{\text{HOMO}}}} } \right)}^{2} - 4.099 \times 10^{{ - 2}} {\left( {E_{{{\text{LUMO}}}} + E_{{{\text{HOMO}}}} } \right)} + 8.922 \times 10^{{ - 4}} \Delta H_{{\text{f}}} $$
(2)

MNDO method (model (3)):

$$\begin{aligned} & .922 \times 10^{{ - 4}} \Delta H_{{\text{f}}} \\ & \\ & \\ & \log {\text{IC}}_{{50}} = 1.728 - 9.013 \times 10^{{ - 4}} {\text{Mw}} + 1.340 \times 10^{{ - 4}} {\text{EE}} + 9.270 \times 10^{{ - 5}} {\text{TE}} - 6.305 \times 10^{{ - 1}} Q^{{\text{ - }}}_{{\text{o}}} - 5.913 \times 10^{{ - 1}} Q^{ + }_{{\text{H}}} + 1.550 \times 10^{{ - 3}} \Delta H_{{\text{f}}} - 6.174 \times 10^{{ - 2}} E_{{{\text{HOMO}}}} \\ \end{aligned} $$
(3)

Models (1)–(3) include eight, nine and seven predictor variables, respectively. VIP values for the variables are listed in Table 4. In Model (1), the VIP values of α, (E LUMOE HOMO)2, E LUMOE HOMO and EE are higher than 1.0, which shows a relative importance. Additionally, CCR, E HOMO+1, E LUMO+E HOMO and ΔH f also contribute to the model. As indicated by the Eq. 1, increasing EE, CCR and ΔH f values leads to an increase of the logIC50 values. On the contrary, increasing a, (E LUMOE HOMO)2, E LUMOE HOMO, E HOMO+1 and E LUMO+E HOMO leads to a decrease of the logIC50 values. In model (2), nine quantum chemical descriptors were selected to establish the model. Descriptors with VIP values larger than 1.0 include E HOMO+1, a, Mw, TE and EE. E LUMOE HOMO, (E LUMOE HOMO)2, E LUMO+E HOMO and ΔH f also contribute to the model. Equation 2 suggests that increasing TE, EE and ΔH f values leads to an increase of logIC50 values, while increasing E HOMO+1, α, Mw, E LUMOE HOMO, (E LUMOE HOMO)2 and E LUMO+E HOMO values will result in a decline of logIC50 values of PCB derivatives. In model (3), the VIP values of Mw, EE, TE, \(Q^{ - }_{{\text{O}}} \) and \(Q^{ + }_{{\text{H}}} \) are above 1.0. In contrast to the other two models, model (3) computed by MNDO method selected the atomic charges (i.e. \(Q^{ - }_{{\text{O}}} \) and \(Q^{ + }_{{\text{H}}} \)) as independent variables to establish the model. Five other descriptors were also selected as independent variables. As shown in Eq. 3, the higher the EE, TE and ΔH f values, the higher is logIC50. However, higher values of Mw, \(Q^{ - }_{{\text{O}}} \), \(Q^{ + }_{{\text{H}}} \) and E HOMO lead to a decrease of logIC50 values. The above results imply that molecular structural descriptors computed by different semiempirical methods may result in QSAR models that contain different descriptors for the acute inhibition of GJIC of OH-PCBs and PCB quinones. The difference of QSAR models among different semiempirical methods is further investigated in the following studies.

Table 4 The VIP values of variables in models (1)–(3)

Based on models (1), (2) and (3), logIC50 values for the other 9 PCB derivatives were predicted. The predicted logIC50 values for these PCB derivatives are shown in Table 1. It can be seen from Table 1 that the predicted inhibition potency of 2′-OH-PCB 3 is lower than most of the OH-PCBs, which is consistent with the results of Machala et al. [15] who found that the acute inhibition of GJIC of this compound is not significant. Similarly, the predicted logIC50 value of 2-(2′-Cl-phenyl)-1,4-benzoquinone is also higher than those of its congeners. However, the current models are not well suited for predicting logIC50 values of 1,4-hydroquinones. In the study of Machala et al. [15] all 1,4-hydroquinones showed no significant inhibition or weak inhibition. This may result from the fact that the structures of 1,4-hydroquinones are rather different from those of OH-PCBs and 1,4-benzoquinones, which could result in specific toxic mechanisms for the acute inhibition of GJIC. Further experimental investigation on the toxic mechanisms of these congeners is necessary. Briefly, the above results reveal that the predictions of the QSAR models developed by selected molecular descriptors in this present study could give an initial estimation of inhibition potency of PCB derivatives.

Comparing the results from the semiempirical AM1, PM3 and MNDO methods (Table 3), the performance of model (2) from PM3 method is slightly better than those of AM1 and MNDO, which indicates that the QSPR model from PM3 method is more robust and has a more reliable predictive capability for the logIC50 values of OH-PCBs and PCB quinones. However, given the dispersion parameters in these three optimal models, there would not be a significant difference between the \(Q^{2}_{{cum}} \) values. In model (2), the descriptors TE and EE correlate with Mw and α negatively, as shown in Table 5. Thus, this indicates that increasing Mw values can lead to decreasing logIC50 values of the oxygenated PCB derivatives. The E LUMOE HOMO gap is related to absolute hardness, defined as half the absolute value of E LUMOE HOMO, [20] which is regarded as a measure of energy stabilization in chemical systems; chemical structures tend to be more stable at higher values of the E LUMOE HOMO gap. [21, 24] The present study shows that OH-PCBs and PCB quinones with large gap values tend to be more stable and have low logIC50 values, i.e. the acute inhibition of GJIC of such PCB derivatives is strong. Theoretically, the descriptor ΔH f accounts for the stability of a compound. The more negative is ΔH f value the more stable is the compound. In this present study, the inhibition potency of stable OH-PCBs and PCB quinines tend to be strong, which approved again that the acute inhibition of GJIC of more stable PCB derivatives is stronger. The above conclusions are consistent with the results from Machala et al. [15] who found that persistent high molecular weight 4-OH-PCB 187 and 4-OH-PCB 146 showed the strongest inhibition potencies. The descriptor E HOMO+1 may indicate inhibition potency of PCB derivatives. It can be concluded from this study that PCB derivatives with high E HOMO+1 values may have low logIC 50 values. Absolute electronegativity can be defined as −(E LUMO+E HOMO)/2. [20] The result from model (2) shows that PCB derivatives with higher absolute electronegativity values tend to have higher logIC50 values, i.e. the acute inhibition of GJIC of such PCB derivatives is lower.

Table 5 Correlation matrix of selected quantum chemical descriptors of hydroxylated and quinoid PCBs computed by PM3 method

Conclusions

In the present study, three optimal QSPR models on acute inhibition of GJIC of OH-PCBs and PCB quinones were developed using quantum chemical descriptors computed by the semiempirical AM1, PM3 and MNDO methods together with PLS regression. The cross-validated \(Q^{2}_{{{\text{cum}}}} \) values for all the three QSPR models are 0.784, 0.789 and 0.755, respectively, indicating good predictive capabilities for inhibition potency of OH-PCBs and PCB quinones. The models obtained from different semiempirical methods suggested that molecular structural characteristics of OH-PCBs and PCB-derived quinones affect the acute inhibition of GJIC of these molecules. The predictions of the QSAR models developed by selected molecular descriptors in this present study could give an initial estimation of inhibition potency of PCB derivatives. Results from this study show that the slightly higher \(Q^{2}_{{{\text{cum}}}} \) value of the model using computed molecular descriptors using the PM3 Hamiltonian suggested a slightly better predication power than the models developed using AM1 or MNDO method.