Introduction

In Japan, polychlorinated biphenyls (PCBs), organochlorine pesticides (OCPs), and polybrominated diphenyl ethers (PBDEs) have been detected in human fetuses and human serum, despite the ban or restriction of their use (Fukata et al. 2005; Kawashiro et al. 2008; Mori et al. 2014). Highly lipophilic and stable, these compounds have a long residence time in human tissues; they have been detected in cord blood (Aylward et al. 2014), indicating that fetuses were exposed to them through the blood stream. Congeners of dioxin include polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofurans (PCDFs), and dioxin-like polychlorinated biphenyls (DL-PCBs); they affect human reproduction and development because of their endocrine-disrupting effects (Ankley et al. 2010; Brouwer et al. 1995; Mably et al. 1992). Experimental epidemiological studies also suggest that PCBs, OCPs, and PBDEs have developmental neurotoxicity (Grandjean and Landrigan 2006, 2014). Thus, it is important to determine the mechanisms by which these compounds are transported from the mother to the fetus.

In the present study, the maternal–fetal transfer rate of PCBs, OCPs, PBDEs, and dioxin-like compounds is predicted using multivariate analysis to detect relations between the compounds’ physicochemical properties and their concentrations in maternal blood (MB) and umbilical cord blood (CB) (Jotaki et al. 2011; Kawashiro et al. 2008; Mori et al. 2014; Sakurai et al. 2004). Previous studies have reported ratios of cord/maternal blood concentrations of PCBs, OCPs, PBDEs, and dioxin-like compounds between 0.1 and 1 (Aylward et al. 2014). In cases of fetal exposure to higher chlorinated congeners of PCBs or dioxins with larger molecular weights, the transfer rate would be lower (Mori et al. 2014; Needham et al. 2011). However, correlations between the maternal–fetal transfer rate and the compounds’ physicochemical properties are still not well understood. The quantitative structure–activity relationship (QSAR) method enables to predict the physicochemical properties or the theoretical molecular descriptors of chemicals, from their molecular structure. The QSAR method has been applied before to investigate the placental barrier for some organic compounds (Hewitt et al. 2007). However, transfer rates of organohalogen compounds, including dioxin-like compounds, could not be estimated.

Materials and methods

Sample acquisition, processing, and analysis

The MB and CB sample sets (n = 79) were collected from the Chiba University Hospital’s Delivery Unit and various other obstetric units in Japan, after the approval of this study by the Congress of Medical Bioethics of Chiba University and with the written and informed consent of the patients. The samples were stored at −20 °C. The concentrations of PCBs (TriCB, TetraCB, PentaCB, HexaCB, HeptaCB, OctaCB, NonaCB, and DecaCB) and OCPs (trans-nonachlor, hexachlorocyclohexane, hexachlorobenzene, and heptachlor epoxide) were analyzed in 29 sample sets using an Agilent 6890 Plus gas chromatography (Agilent Technologies) and an AutoSpec Ultima NT mass spectrometer (Micromass Ltd., Manchester, UK) equipped with a programmed temperature vaporization (PTV) injection system (Agilent Technologies, Palo Alto, CA, USA) (Jotaki et al. 2011). The analytical method of Sakurai et al. (2004) was employed in 41 sample sets for dioxin-like compounds (polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofuran (PCDFs), and dioxin-like polychlorinated biphenyl (DL-PCBs). These compounds were analyzed by high-resolution gas chromatography/high-resolution mass spectrometry (HRGC-HRMS). PBDEs’ (BDE47, 100, 153) analysis in nine samples was also conducted using HRGC-HRMS (Kawashiro et al. 2008).

Statistical analysis and modeling

A possible association of the maternal transfer rate with the contaminants’ physicochemical properties was investigated by multiple linear regression (MLR), partial least square (PLS) regression, and random forest (RF) regression.

Maternal–fetal transfer rates were assigned by dividing the concentration of organohalogen compounds in maternal blood and cord serum (Table 1). In addition, Table 1 presents other related physicochemical properties: biodegradation half-life, logarithm bioaccumulation factor (logBAF), logarithm bioconcentration factor (logBCF), logarithm octanol/water partition coefficient (logKow), logarithm octanol/air partition coefficient (logKoA), and water solubility. The following quantum-chemical descriptors were acquired for the QSAR model: molecular weight, final formation heat, energy of the highest occupied molecular orbital (E HOMO), energy of the lowest unoccupied molecular orbital (E LUMO), HOMO–LUMO gap, greatest negative partial atomic charge (q ), greatest positive partial atomic charge (q +), total dipole, total energy, electronic energy, and core–core repulsion (Table 1). These quantum-chemical descriptors were obtained by the PM6 semiempirical method contained in MOPAC 2009 (Ver 9.03CS) (Stewart 2007, 2009), which was implemented in ChemBioOffice 2013 (Cambridge Soft Corporation, USA). The biodegradation half-life, logBAF, logBCF, logKow, and water solubility were drawn from the Estimation Program Interface (EPI) Suite (United States Environmental Protection Agency, Washington, DC, USA) (EPA 2012). The quantum-chemical descriptors and physicochemical properties of the TetraCB, PentaCB, HexaCB, HeptaCB, and OctaCB isomers were used as CB74, CB118, CB153, CB180, and CB194, respectively. Furthermore, the toxic equivalency factor (TEF) was used as a descriptor for the present analysis (Van den Berg et al. 2006).

Table 1 Summary of maternal–fetal transfer rates and physicochemical properties of organohalogen compounds

Statistical analysis was performed using R Ver. 3.1.1 (The R Foundation for Statistical Computing), and SIMCA 13 (Umetrics, Umeå, Sweden). Before statistical analysis, all values were standardized using the equation:

$$ z = \left(x - \mu \right)/\sigma $$

where μ is the mean and σ is the standard deviation of the variables.

Principal component analysis (PCA) was employed to order the physicochemical and structure properties and the maternal–fetal transfer rates (Fig. 1). Spearman’s rank correlation was performed to identify collinear valuables in physicochemical and structural properties (Table 2); one variable was exempt from the analysis for every given pair of descriptors exhibiting a correlation coefficient value greater than 0.7.

Fig. 1
figure 1

The physicochemical properties of PCBs, OCPs, PBDEs, and dioxin-like compounds. a PCA loading plot and b score plot

Table 2 Intercorrelation matrix of the physicochemical properties of organohalogen compounds

A dataset of individual maternal–fetal transfer rates, containing 29 pairs of each isomer of PCBs and OCPs, 41 pairs of each congener of dioxin-like compounds, and 8 pairs of each congener of PBDEs, was used for developing the prediction models (Tables S1 and S2). Individual maternal–fetal transfer rates were calculated by dividing the concentration of each chemical in CB by its concentration in MB for each pairs of CB and MB. MLR, PLS regression, and RF regression were applied. The multicollinearity of the independent variables was assessed by calculating the variance inflation factor (VIF) for MLR; the explanatory variables’ VIF values were <5, indicating a rejection of multicollinearity. The parameters optimizing likelihood were identified after variable selection by Spearman’s rank correlation and VIF calculation. The optimized likelihoods from different models were then compared using the Akaike information criterion (AIC) (Akaike 1998):

$$ \mathrm{A}\mathrm{I}\mathrm{C} = 2k - 2 \ln\ L $$

where k represents the number of parameters and L represents maximized likelihood. The model with the lowest AIC was selected to achieve a trade-off between model complexity (preferring models with fewer parameters) and maximized likelihood.

PLS regression is widely used in chemometrics for model development because PLS can analyze data with strong collinear and multiple predictor variables (Wold et al. 2001). RF is a generalized regression method, which is effective in various QSAR tasks; every forest represents a consensus, nonlinear model derived from a large number of single models (Breiman 2001).

PLS and RF were obtained using the R package. Hyperparameters, mtry for RF (number of variables randomly sampled as candidates in each split) and the number of components for PLS, were optimized by the R package caret (Kuhn 2008). The R package caret was also used to calculate the variable importance for each model. In all cases, the training set contained 80 % of compounds (24 compounds [HCB, HCH, heptachlor epoxide, 1.2.3.7.8.PeCDD, 1.2.3.6.7.8.HxCDD, 1.2.3.4.6.7.8.HpCDD, OCDD, 2.3.4.7.8.PeCDF, 1.2.3.4.6.7.8.HpCDF, CB77, CB126, CB169, CB114, CB118, CB123, CB156, CB157, CB167, TetraCB, PentaCB, HexaCB, OctaCB, BDE47, and BDE153]), and the external test sets comprised 20 % of compounds (7 compounds [trans-nonachlor, 1.2.3.4.7.8.HxCDF, 1.2.3.6.7.8.HxCDF, CB105, CB189, HeptaCB, and BDE100]) (Tables S1 and S2). The external validation test set was randomly selected from each type of organohalogen compound (PCDD/Fs, coplanar PCBs, PCBs, organohalogen pesticide, and PBDEs). Models were optimized by 10-fold cross-validation using a training set for internal validation. Optimized mtry = 1 and the number of components = 9, for RF and PLS, respectively. Optimized models were validated by external test sets. The root mean square error (RMSE), the correlation coefficient (R 2), and the correlation coefficient’s standard deviation (SD) were reported for cross-validation testing set (RMSECV, R 2 CV, and SDCV) and between predicted and actual values of the response variable of the test data (RMSEpred and R 2 pred). Tropsha’s validation factor “R 2 ext,” “k,” and “(R 2 ext − R 2 0) / R 2 ext” were also calculated, and the applicability domain and y-randomization were entered for the RF model (Tropsha 2010). The RMSE, k, and R 2 0 were calculated by the following equations:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{N}{\displaystyle {\sum}_{i=1}^N{\left({y}_{exp}-{y}_{pred}\right)}^2}} $$
$$ {R^2}_{\mathrm{EXT}}=1-\frac{{\displaystyle {\sum}_{I=1}^N{\left({y}_{pred}-{y}_{exp}\right)}^2}}{{\displaystyle {\sum}_{I=1}^N{\left({y}_{pred}-\overline{y_{cv}}\right)}^2}} $$

where

$$ \overline{y_{cv}} $$

is the mean overall predictive values by R 2 CV

$$ k=\frac{{\displaystyle {\sum}_{i=1}^N{y}_{exp}\ast {y}_{pred}}}{{\displaystyle {\sum}_{i=1}^N{y_{pred}}^2}} $$
$$ {R^2}_0=1-\frac{{\displaystyle {\sum}_{I=1}^N{\left({y}_{pred}-K\ast {y}_{exp}\right)}^2}}{{\displaystyle {\sum}_{I=1}^N{\left({y}_{pred}-\overline{y_{exp}}\right)}^2}} $$

, where

$$ \overline{y_{exp}} $$

is the mean overall predictive values by R 2 predTropsha considered a QSAR model predictive if the following conditions are satisfied (Tropsha 2010):

$$ {R^2}_{\mathrm{pred}}>0.6 $$
$$ {R^2}_{\mathrm{EXT}}>0.5 $$
$$ \left({R^2}_{\mathrm{ext}} - {R^2}_0\right)/{R^2}_{\mathrm{ext}} < 0.1 $$
$$ 0.85\ \le k\le\ 1.15 $$

The averaged y-randomized R 2 (R 2 random) was calculated after 100 randomized iterations to check the reliability of the proposed model (Rucker et al. 2007). The dependent variable vector is randomly shuffled, and a new QSAR model is developed using the original independent variable matrix. If the new QSAR models are expected to have lower R 2 values than the proposed mode, the proposed model might be acceptable (Rucker et al. 2007; Tropsha 2010).

Finally, the distance of a test set was calculated to its nearest neighbor in the training set and compared to the APD threshold, calculated as follows:

APD = <d > +Zσ [Z: empirical cutoff value 0.5 (Zhang et al. 2006)]

The prediction was considered unreliable when the distance was higher than the APD. Calculation of <d> and σ was performed as follows: First, the average of the Euclidean distances between all pairs of the training set was calculated. Next, the set of distances that were lower than the average was formulated. <d> and σ were finally calculated as the average and standard deviation of all the distances included in this set (Zhang et al. 2006).

This analysis included only the contaminants that were detected in at least 80 % of the samples. All values were standardized and scaled to zero mean and unit variance before all statistical analysis. Results below limit of quantification (LOQ) were assigned a value of 0.5 LOQ.

Results

Ordination of the compounds’ physicochemical properties

PCA was applied to summarize profiles of physicochemical properties of PCBs, OCPs, PBDEs, and dioxin-like compounds as well as to examine the relation with transfer rate; the results are presented in Figs. 1, S1, and S2. The normalized parameters were indicated by four principal components (PC), i.e., PC1 (33.8 %), PC2 (17.2 %), PC3 (12.2 %), and PC4 (11.5 %), with a total variance of 74.6 %. In the score plot, OCPs is positively aligned with PC1, whereas PCDDs and PCDFs are negatively correlated with PC1. PCBs and PBDEs positively respond to PC2 and OCPs; PCDDs and PCDFs negatively correlate with PC2. These results divide the organohalogen compounds in three clusters (OCPs, PCDD/Fs, and PCBs and PBDEs) (Fig. 1a).

In the loading plot, logKow, HOMO–LUMO gap, molecular weight, and TEF are negatively aligned with PC1. Total energy, q , and E LUMO are positively aligned with PC1. LogBAF, final heat of formation, half-life, and total dipole are positively aligned with PC2. Finally, electronic energy and water solubility respond negatively to PC2 (Fig. 1b). Based on these results, the organohalogen compounds are divided into two categories: (a) PCBs and PBDEs and (b) OCPs and PCDD/Fs. PCBs and PBDEs are aligned with factors of bioconcentration (logBAF, logBCF, and half-life) and polarity (total dipole). PCDD/Fs are aligned with core–core repulsion, molecular weight, TEF, and OCPs aligned with water solubility (Fig. 1). The number of halogenated atom in PCBs, PBDEs, and PCDD/Fs aligns with factors of molecular weight (molecular weight, logKow, and logKoA). The maternal–fetal transfer rate is correlated positively with PC1, PC3, and PC4 and negatively with PC2 (Figs. 1b, S3, and S2), indicating that the maternal–fetal transfer rate of OCPs and lower chlorinated PCBs might be higher than that of PCDD/Fs and higher chlorinated PCBs.

Results of the MLR, PLS, and RF models

In accordance with Spearman-ranked correlation coefficient values (Table 2), one value is removed from each pair of eight redundant variables (logKow, logKoA, water solubility, half-life, total energy, electronic energy, core–core repulsion, and E LUMO) presenting correlation coefficients greater than 0.7. The remaining 10 variables (molecular weight, TEF, logBCF, logBAF, final heat of formation, E HOMO, q , q +, HOMO–LUMO gap, and total dipole) were selected for model development.

MLR and PLS models provide rather low predictive performance (Table 3), evaluated through the 10-fold cross-validation (R 2 CV = 0.425 ± 0.0964 and RMSECV = 0.0740 ± 0.00962 for MLR and R 2 CV = 0.492 ± 0.115 and RMSECV = 0.0699 ± 0.0109 for PLS) and the external test set (R 2 pred = 0.129 and RMSEred = 00897 for MLR and R 2 pred = 0.123 and RMSEpred = 0.112 for PLS). In these models, the q , E HOMO, and HOMO–LUMO gap are significant variables for the PLS; E HOMO, TEF, molecular weight, and logBAF are selected for the MLR (Table 4).

Table 3 Prediction performance of the investigated maternal transfer rate for MLR, PLS, and RF models
Table 4 Important variables of MLR, PLS, and RF for prediction of maternal transfer rate

The RF model provides better predictive performance, evaluated through the 10-fold cross-validation (R 2 CV = 0.566 ± 0.0885, RMSECV = 0.0648 ± 0.00848), the external test set (R 2 pred = 0.519 and RMSEpred = 0.0514) (Table 3), and values of Tropsha’s validation factor fit into the standard (R 2 EXT = 0.508, k = 1.033, and (R 2 pred − R 2 0) / R 2 pred = 0.0062). The value of the average of 100 random shuffles of R 2 (R 2 random = 0.532) was lower than R 2 CV = 0.566, indicating that the results from the proposed model were not due to chance correlation. The applicability was defined for the compounds that constituted the test compounds as described. Since half of the validation compounds fell inside the domain of applicability (Table 5), the reliability of this model from the APD was slightly low, meaning the RF model almost passed the tests for predictive ability, except for R 2 pred and domain of applicability. In RF model, total dipole, molecular weight, HOMO–LUMO gap, and E HOMO are accepted as significant variables (Table 4).

Table 5 Applicability domain for the test compounds

Discussion

Prediction of the maternal–fetal transfer rates

Transfer rates of PCBs, OCPs, PBDEs, and dioxin-like compounds in this study were 0.124 to 0.235, 0.161 to 0.255, 0.125 to 0.238, and 0.109 to 0.326 on a wet wt basis, respectively. Previous study reported that transfer rates of PCBs, OCPs, PBDEs, and dioxin-like compounds were 0.1 to 0.4, 0.1 to 3, 0.05 to 5, and 0.1 to 0.4 on a wet wt basis, respectively (Aylward et al. 2014), indicating that cord/maternal blood concentrations in this study were the same level as previous report.

In this study, three prediction models are developed and compared, although their prediction accuracy is not expected to differ significantly (Kovdienko et al. 2010). Indeed, RF regression clearly offers greater prediction accuracy than the MLR and PLS models in this study. A previous study indicated that RF would be suitable for the analysis of small sample size, high-dimensional feature space, and complex data structures (Qi 2012). Sample size and the number of target compounds in the current study are smaller than in a previous research (Lancz et al. 2015). Indeed, RF proves a sufficient robust prediction model. However, because the kinds of data about OCP were limited, the transfer rate of OCPs was difficult to predict in our model. In the future, more data about OCPs are needed to develop a more accurate model. The total dipole parameters, signified in RF (Table 4), relate highly to the compounds’ polarity. Previous reports supported that large polar molecules cross the placenta slowly, whereas lipophilic drugs pass more rapidly (Reynolds 1998) and the number of ionizable groups contributed negatively to the maternal–fetal transfer rate (Giaginis et al. 2009). It was also suggested that the topological polar surface area, q +, and the dipole moment influence the maternal–fetal transfer rate (Hewitt et al. 2007), indicating that it is difficult for high-polarity compounds to be transported to the fetus. Moreover, multidrug resistance proteins (MRPs) are known to mediate the transport of various glucuronides, xenobiotics, and their metabolites, including polar conjugates (Deeley et al. 2006), indicating that polarity of compounds might be crucial for the maternal–fetal transfer rate.

The logBCF and the molecular weight related to transfer rate in RF (Table 4). Compounds with higher molecular weight (approximately more than 500 Da) are expected to have transferred incompletely as they cannot penetrate the pores of the placental membrane (Audus 1999; Bourget et al. 1995; Hewitt et al. 2007; Koppe et al. 1992), indicating that molecular weight negatively correlates with maternal–fetal transfer rate. Previous studies also reported that logKow is significantly related to the maternal–fetal transfer rate and several physicochemical properties, such as molecular weight, water solubility, and the number and type of halogen group (Meylan and Howard 2000; Monteiro et al. 2008). In addition, fatty acids are suggested as a transporter for dioxin-like compounds (Koppe et al. 1992). In the present study, molecular weight is significantly correlated with logKow, logKoA, water solubility, total energy, core–core repulsion, E LUMO, and HOMO–LUMO gap (Table 2). This suggests that molecular weight and/or lipophilicity are important parameters for the maternal–fetal transport of organohalogen compounds.

E HOMO and HOMO–LUMO gap also related to transfer rate in the RF model (Table 4). It was reported that cytochrome P450 (CYPs) have been found in the human placenta (Pasanen 1999). CYPs are well-known xenobiotic enzymes and are responsible for the detoxification of drugs and xenobiotics. Lewis et al. have shown that binding to CYP3A4 is negatively dependent on the E HOMO, indicating that compounds with a large HOMO energy tended to be difficult to metabolize by CYP3A (Lewis et al. 2002). In RF, TEF was selected as the predictive variable in this study and TEF negatively correlates with the maternal–fetal transfer rate; this is also confirmed by PCA in PC1 (Fig. 1b). The aryl hydrocarbon receptor (AhR) protein (Jiang et al. 2010) has been recorded in the human placenta, indicating that dioxin-like compounds may be binding AhR proteins (Manchester et al. 1987) and thus having difficulty transferring across the human placenta. Moreover, CYP1A1 is the major CYP isoform present in human placenta (Pasanen 1999). Placental CYP1A1 is induced by lifestyle factors such as smoking, environmental factors (including PCBs, dioxin-like compounds), and medications (e.g., azidothymidine and glucocorticoids) (Myllynen et al. 2005). Based on these results, it is hypothesized that organohalogen compounds might be a reduction by CYP metabolism and/or binding AhR proteins.

Further studies on organohalogen compound transporters are required to develop a prediction model for the maternal–fetal transfer rate. Several transporters are expressed in the human placenta such as ATP-binding cassette transporters, ATP-binding cassette sub-family G member 2 (ABCG2)/breast cancer resistance protein, ATP-binding cassette sub-family B member 1 (ABCB1)/P-glycoprotein, and ATP-binding cassette sub-family C member 2 (ABCC2)/multidrug resistance protein 2 (MRP2) (Myllynen et al. 2009; Vahakangas and Myllynen 2009); however, relations with organohalogen compounds and these transporters are still not completely understood. Thyroid hormone (TH) can cross the placenta to the fetus, and maternal thyroxin is crucial for the development of fetal brain; however, due to structural similarity of thyroxin, a possible mechanism involved in the disruption of TH homeostasis is the competitive binding of organohalogen compounds to the TH transport protein transthyretin (Marchesini et al. 2008), thyroid bindng globulin, and albumin in blood (Ucan-Marin et al. 2010), indicating that organohalogen compounds may cross the placenta by TH transporters.

Conclusion

In this paper, we predict the maternal–fetal transfer rate of PCBs, OCPs, PBDEs, and dioxin-like compounds using multivariate analysis to detect relations between the physicochemical properties of these compounds and their maternal–fetal transfer rate. RF regression clearly offers greater prediction accuracy than the MLR and PLS models to predict for maternal transfer rate and molecular weight, and/or lipophilicity might be important parameters for the maternal–fetal transport of organohalogen compounds. Further studies on organohalogen compound transporters are required to develop a prediction model for the maternal–fetal transfer rate including protein binding actively and metabolic rate of these compounds in placenta.