Introduction

Hydroxylated polychlorinated biphenyls (OH-PCBs) are major metabolites of polychlorinated biphenyls (PCBs), a class of persistent, bioaccumulating, and toxic environmental pollutants that were banned from sale in the 1970s (Agency for Toxic Substances and Disease Registry (ATSDR) 2000, 2011; American Industrial Hygiene Association (AIHA) 2013; Grimm et al. 2015; International Agency for Research on Cancer (IARC) 2016; Lauby-Secretan et al. 2013). OH-PCBs are the oxidative products of PCB biotransformation by cytochrome P-450 monooxygenases (Dhakal et al. 2018; Grimm et al. 2015). OH-PCBs have been found in various organisms including plants and animals (Dhakal et al. 2014; Grimm et al. 2015; Ma et al. 2016; Zhai et al. 2010, 2011, 2013) and are the most commonly reported PCB metabolites in human serum and animal tissues (Gilroy et al. 2012; Jorundsdottir et al. 2010; Kunisue and Tanabe 2009; Letcher et al. 2009; Marek et al. 2013b; Nomiyama et al. 2010a, b; Park et al. 2009; Quinete et al. 2015; Sandau et al. 2000). OH-PCBs have toxicity profiles similar to and distinct from PCBs (Dhakal et al. 2018; Grimm et al. 2015). Although OH-PCBs themselves are not known to be carcinogens, they will be further metabolized to ultimate carcinogens PCB quinones (Espandiari et al. 2004; Lehmann et al. 2007; Schuur et al. 1999). Moreover, OH-PCBs can disrupt homeostasis of several endocrine hormones through various mechanisms (Amano et al. 2010; Antunes-Fernandes et al. 2011; Dhakal et al. 2018; Grimm et al. 2015; Liu et al. 2006; Machala et al. 2004; Pěnčíková et al. 2018; Schuur et al. 1998, 1999; Sethi et al. 2017; Shimokawa et al. 2006).

OH-PCBs are found in abiotic environmental matrices including water, snow, sediment, and in the air (Awad et al. 2016; Marek et al. 2013a, b, 2017; Sun et al. 2016; Ueno et al. 2007). In addition to biotransformation, the reaction between PCBs and the hydroxyl radical is believed to be another source of OH-PCBs in the environment. Such reaction can occur in the atmosphere and may occur at sewage plants when advanced oxidation processes are utilized in the effluent treatment process (Anderson and Hites 1996; Brubaker and Hites 1998; Mandalakis et al. 2003; Marek et al. 2013a; Sedlak and Andren 1994; Totten et al. 2002; Ueno et al. 2007). OH-PCBs are also present in the original commercial Aroclors (Marek et al. 2013a). Therefore, humans are at risk of exposure to OH-PCBs through a variety of pathways.

Despite the toxicological importance and environmental prevalence of OH-PCBs, there are major barriers to the analytical measurement of OH-PCBs. The most important is the limited number of certified analytical standards. There are 837 possible mono-OH-PCBs and many more oxidation products if multiple hydroxyl substitutions are considered (Buser et al. 1992; Rayne and Forest 2010). It may be impractical to obtain all OH-PCB standards. Analytical methods for detection and quantification of OH-PCBs must be developed in the absence of complete analytical standards.

While gas chromatography (GC) is one of the most common chromatographic techniques in the analysis of PCBs, OH-PCBs are not well volatilized and normally derivatized to more volatile substances. Methoxylated polychlorinated biphenyls (MeO-PCBs) are the most common derivatives of OH-PCBs for GC analysis (Houde et al. 2006; Kunisue and Tanabe 2009; Park et al. 2009; Sandau et al. 2000; Ueno et al. 2007). Like OH-PCBs, there are 837 possible mono-MeO-PCB congeners; however, less than 10% of these standards is commercially available. Even fewer is the number of isotope-labeled standards necessary for quality control and internal standards. The lack of OH-PCBs and their MeO-PCB analytical standards is hence one of the obstructions to OH-PCB studies.

The low concentrations of OH-PCBs in environmental samples are a second major barrier to the analytical determination of OH-PCB concentrations. Although OH-PCBs have been reported in complex environmental matrices (e.g., the air, water, and sediment) and in high-lipid-content tissues (e.g., liver, adipose tissue, and brain), the levels are in pico- to nano-gram scale (Awad et al. 2016; Houde et al. 2006; Kunisue and Tanabe 2009; Marek et al. 2013a, b, 2017; Park et al. 2009; Sandau et al. 2000; Sun et al. 2016; Ueno et al. 2007). It is probably not economically feasible to extract, purify, elucidate, and synthesize new standards of all those unknown OH-PCBs. Consequently, the quantitative studies of OH-PCBs are analytically limited.

Since the concept of life-course environmental exposures was introduced in the last decade (Wild 2005), the interest in the human and ecosystem exposomes has emerged, and there is much greater interest in the non-targeted analysis (NTA) (Sobus et al. 2017; Wild 2005). The potential of NTA has been improved along with the advancements in mass spectrometry (MS) and computer technology; thousands of suspected chemicals have been discovered. However, although various algorithms, computer programs, and chemical databases have been introduced, the ability to identify and confirm the chemical structures is still limited. The number of unknown or unidentified chemicals is yet to exceed that of the knowns (Sobus et al. 2017). Due to the unavailability of the standards of these unknowns, the quantification of the contamination levels and the exposure risk assessment are certainly challenging (International Programme on Chemical Safety (IPCS) 2010; U.S. Environmental Protection Agency (EPA) 1992). An alternative to authentic standard is by quantifying the unknowns with other chemicals having similar structures and physiochemical properties, such as in the case of OH-PCBs.

The purpose of this study is to develop a systematic strategy for the quantitative measurement of OH-PCBs in environmental matrices in the absence of authentic standards. The concept of semi-target analysis is still developing in the environmental research field. However, in the pharmaceutical field, the quantification of unidentified substances is not only common but mandatory because there has long been a concern that patients are unavoidably exposed to synthetic byproducts and degradants of medicines (European Medicines Agency (EMEA) 2006a, b; Food and Drug Administration (FDA) 2006, 2008; The International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) 2006a, b). The level of related substances in life-saving substances and pharmaceuticals is quantitatively controlled whether their chemical structures are identifiable or not. The limits and quantitation methods of both known and unknown impurities are specified and described in numerous modern monographs in many pharmacopeias including the United States Pharmacopeia (USP 2017), the European Pharmacopoeia (Ph. Eur. 2017), and the Japanese Pharmacopoeia (JP 2016). Since authentic standards of impurities are not available, the amounts of unknown impurities are normally computed by comparing their measured responses with that of their pharmacological substances which typically share the same chromophores. Likewise, the level of unknown pollutants can be accurately estimated if the responses of the unavailable standards are predicted with the right standards and/or a proper predictive model. This study was designed with this goal in mind.

We hypothesized that a simple but versatile model to predict the responses of the unknown MeO-PCBs can be systematically developed. This model will make the abandoned information of unknown OH-PCBs valuable before the availability of the authentic standards. Hence, the objectives of this study are (1) to develop, optimize, and evaluate the uncertainty of the model and (2) to compare the model with synthetic standards and demonstrate the application in an environmental sample.

Materials and methods

Commercial MeO-PCB standard solution

Details of MeO-PCB calibration standard solution can be found in Table S1 in the supporting information (SI). Briefly, the MeO-PCB standard solution was composed of (i) 70 mono-MeO-PCBs (9 mono-, 5 di-, 6 tri-, 12 tetra-, 13 penta-, 8 hexa-, 10 hepta-, 6 octa-, and 1 nona-chlorinated) purchased from AccuStandard (New Haven, CT, USA) and Wellington Laboratories (Guelph, ON, Canada); (ii) 3 surrogate standards of 4′-methoxyl-3,4-dichloro(1,1′,2,2′,3,3′,4,4′,5,5′,6,6′-13C12)biphenyl (13C-4′-MeO-PCB12), 4′-methoxyl-2,3′,4,5,5′-pentachloro(1,1′,2,2′,3,3′,4,4′,5,5′,6,6′-13C12)biphenyl (13C-4′-MeO-PCB120), and 4-methoxyl-2,2′,3,4′,5,5′,6-heptachloro(1,1′,2,2′,3,3′,4,4′,5,5′,6,6′-13C12)biphenyl (13C-4-MeO-PCB187) (Wellington Laboratories); and (iii) an internal standard of 2,4,6-Trichloro(2′,3′,4′,5′,6′-2H5)biphenyl (d5-PCB30) purchased from Cambridge Isotope Laboratories (Andover, MA, USA). The MeO-PCB standard solution was prepared with hexane (pesticide grade; Fisher Chemical, Fair Lawn, NJ, USA) at the concentration of ~ 50 ng/mL.

Independently synthesized MeO-PCB standard solutions

2-Methoxy-4-chlorobiphenyl (2-MeO-PCB3; > 99.9%), 2′-methoxy-4-chlorobiphenyl (2′-MeO-PCB3 > 99.0%), 2-methoxy-3,3′-dichlorobiphenyl (2-MeO-PCB11; > 99.4%), 4-methoxy-3,3′-dichlorobiphenyl (4-MeO-PCB11; > 99.99%), 5-methoxy-3,3′-dichlorobiphenyl (5-MeO-PCB11; > 99.1%), 6-methoxy-3,3′-dichlorobiphenyl (6-MeO-PCB11; > 99.0%), 4′-methoxy-2,3′,4-trichlorobiphenyl (4′-MeO-PCB25; > 99.7%), 4′-methoxy-3,3′,4-trichlorobiphenyl (4′-MeO-PCB35; > 98.6%), 4-methoxy-3,3′,5-trichlorobiphenyl (4-MeO-PCB36; > 97.0%), 4-methoxy-2,2′,5,5′-trichlorobiphenyl (4-MeO-PCB52; > 99.99%), and 4′-methoxy-2,3′,4,5′-tetrachlorobiphenyl (4′-MeO-PCB68; > 99.9%) were prepared by the Suzuki coupling of a suitable benzene boronic acid with a (methoxylated) bromochlorobenzene (Lehmler and Robertson 2001a, b; Li et al. 2008, 2009; McLean et al. 1996; Rodriguez et al. 2016; Zhai et al. 2011; Zhu et al. 2013). The purity of each compound was determined based on the relative peak area determined by GC-MS (Li et al. 2018). Representative GC-MS data are shown in Figures S1S11.

4′-Methoxy-2,3-dichlorobiphenyl (4′-MeO-PCB5) was synthesized by the Suzuki coupling of 1-bromo-2,3-dichlorobenzene (5 g, 19.5 mmol) with 4-methoxyphenylboronic acid (3.27 g, 21.5 mmol) using Pd(PPh3)4 (0.20 g, 0.175 mmol) as catalyst and Na2CO3 (21 mL) as a base in toluene (40 mL) and ethanol (20 mL). Yield, 3.95 g (81%, white solid, > 99.8%). M.p. 59–60 °C (Lit. mp. 63.4–64.8 °C) (Unsinn et al. 2013). 1H NMR (500 MHz, CDCl3) δ 7.46–7.41 (m, 1H), 7.35 (AA’XX’ system, 2H), 7.24–7.21 (m, 2H), 6.97 (AA’XX’ system, 2H), 3.86 (s, 3H). 13C NMR (126 MHz, CDCl3) δ 159.38, 142.56, 133.57, 131.71, 131.27, 130.49, 129.53, 129.10, 127.06, 113.57, 55.31. See Figures S12S14 for representative GC-MS data and 1H-NMR and 13C-NMR spectra of 4′-MeO-PCB5.

Sediment sample

Surficial sediment samples were collected on September 2015 from a PCB-contaminated wastewater lagoon located in Altavista, Virginia, at location C3 (37° 06′ 47″ N 79° 16′ 24″ W) described elsewhere (Mattes et al. 2018). Preparation, extraction, and clean-up procedures were modified from our previous procedure (Marek et al. 2013a) and that of Letcher et al. (1995) to achieve a higher concentration of OH-PCBs with the fewest matrix interferences. In duplicate, the wet sediment sample was weighed (~ 15 g), acidified with 3 mL of 6 N hydrochloric acid (Fisher Chemical), mixed with combusted diatomaceous earth (~ 30 g; Thermo Fisher Scientific, Waltham, MA, USA), and spiked with the surrogate standards. Samples were then extracted twice with hexane:acetone (1:1 v/v) (pesticide grade; Fisher Chemical) by accelerated solvent extraction (ASE) 300 (Thermo Fisher Scientific) according to Martinez et al. (2010). The combined solution was evaporated until a thin organic layer (~ 0.5 cm) appeared, added with concentrated sulfuric acid (Fisher Chemical), held overnight, and extracted with hexane. To separate OH-PCBs from PCBs, a solution of 1 N potassium hydroxide:ethanol (1:1 v/v) (Fisher Chemical and Sigma-Aldrich, St. Louis, MO, USA, respectively) is mixed with the extract. The ethanolic aqueous solution was then acidified with concentrated sulfuric acid and extracted with hexane. Next, OH-PCBs were derivatized to their methoxylated form, MeO-PCBs, with a solution of diazomethane (from Diazald; Sigma-Aldrich) in diethyl ether (pesticide grade; Fisher Chemical) according to Black (1983) and Kania-Korwel et al. (2008). The interferences were removed by washing with concentrated sulfuric acid, passing through sulfuric acid:silica gel (1:2 w/w) columns (Flash Chromatography Grade; 70–230 Mesh; Fisher Chemical), and eluted with dichloromethane (pesticide grade; Fisher Chemical). After neutralizing the residual acid with sodium bicarbonate solution (10% w/v) (Fisher Chemical), the remaining lipid residual was separated from MeO-PCBs by gel permeation chromatography (GPC). Sixty grams of Bio-Beads S-X3 (40–80 μm styrene-divinylbenzene beads with 3% cross-linkage; Bio-Rad Laboratories, California, LA, USA) presoaked in dichloromethane:hexane (1:1 v/v) was packed in 1-in.-i.d. glass column. The sample was eluted through the GPC column with the same solvent. The first 175 mL containing lipid residual was discarded, and the following 75 mL was collected for MeO-PCBs. Finally, the MeO-PCB fraction was concentrated to ~ 200 μL and spiked with 25 ng of the internal standard.

Instruments

Mass spectrometry (MS) coupling with gas chromatography (GC) is one of the most common analytical techniques in the analysis of PCBs and OH-PCBs (derivatized to MeO-PCBs), particularly in environmental matrices (Awad et al. 2016; Marek et al. 2013a, b, 2017; Sun et al. 2016; Ueno et al. 2007). We, thus, chose to develop the model using the GC with positive electron impact (EI) MS system. Two GC-EI-MS systems were used in this study. The first one was Agilent 7890B GC system equipped with either an Agilent DB-1701 capillary column (30 m, 0.25 mm i.d., 0.25 μm film thickness; J&W Scientific, Folsom, CA, USA) or a Supelco SPB-Octyl capillary column (30 m, 0.25 mm i.d., 0.25 μm film thickness; Supelco, Bellefonte, PA, USA) coupling with Agilent 7000D Triple Quadrupoles (QqQ) MS system. The other one is the Agilent 7890A GC system equipped with Supelco SPB-Octyl capillary column coupling with Agilent 7000B QqQ MS system.

Five microliters of the standard or samples were injected into the GC-EI-MS system in solvent vent mode at 4.4 psi with the following temperature program: initial 45 °C (hold for 0.06 min) and ramp at 600 °C/min to 325 °C. MeO-PCBs were eluted from GC columns with helium (0.8 mL/min) and the following temperature program (totally 70 min): initial 45 °C (hold for 2 min), ramp at 100 °C/min to 75 °C (hold for 5 min), ramp at 15 °C/min to 150 °C (hold for 1 min), and ramp at 2.5 °C/min to 280 °C (hold for 5 min). Although the evaporation of solvent in large volume injection (LVI) mode may alter the peak intensities and the predictive model, our solvent vent program was evaluated and could produce the model as same as a 1-μL spitless injection (Figure S18). The temperatures of the transfer line, the ionization source, and the quadrupoles in both systems were 280 °C, 250 °C, and 150 °C, respectively.

The product or daughter ions of MeO-PCBs and their fragmentation patterns resulting from either the ionization and the induced collision in multiple reaction monitoring (MRM) mode are complex and difficult to predict, particularly when the absolute positions of the substituents are unidentified. The prediction of the peak responses of molecular or parent ions in the selected ion monitoring (SIM) mode is more straightforward. We, therefore, chose to develop the model to predict the peak responses of the molecular ions. The molecular ions of standard MeO-PCBs were captured in positive SIM mode with the mass resolution of 0.7 u. MRM mode was utilized only for the quantitation of the known compounds and percent recovery of surrogate standards. The detail of MRM transitions is shown in Table S1.

Approach

We created the test dataset by injecting the MeO-PCB standard solution into GC-EI-MS systems. The MeO-PCB standard solution was injected 6 times under the same instrumental conditions. The instrument parameters, particularly the ionization voltage, were optimized to maximize the model predictivity. According to the internal standard method, the peak responses of molecular ions from the measurement were transformed into relative response factors (RRFs):

$$ \mathrm{RRF}=\frac{{\mathrm{Area}}_{\mathrm{IS}}/{\mathrm{Conc}}_{\mathrm{IS}}}{{\mathrm{Area}}_{\mathrm{MeO}-\mathrm{PCB}}/{\mathrm{Conc}}_{\mathrm{MeO}-\mathrm{PCB}}} $$
(1)

where ConcIS and ConcMeO − PCB are the concentration (ng/mL) of internal standard and MeO-PCBs in the standard solution and AreaIS and AreaMeO − PCB are the peak areas of internal standard and MeO-PCBs, respectively. The RRF is inversely proportional to detector sensitivity to the compound. The RRF values were calculated for each MeO-PCB and the RRF predictive models were developed as a function of the number of chlorine (#Cl) in the molecules of MeO-PCBs. See also the SI for additional discussion of RRF calculations.

Statistical methods

The regressions are obtained with the least square method. The R-squared (RSQ), adjusted-R-squared (ADJ.RSQ), Akaike information criterion (AIC), Bayesian information criterion (BIC), and p values are the statistical measures that were considered during model development. Cook’s distance and 10 times 10-fold cross-validation (10 × 10.CV) were employed during model optimization and uncertainty evaluation. The statistics were computed in the R statistical computing environment, version 3.4.4 (R Core Team 2018) with package “boot,” version 1.3-20 (Canty and Ripley 2017; Davison and Hinkley 1997) and package “beeswarm,” version 0.2.3 (Eklund 2016).

Results and discussion

Currently, more than 700 congeners of 837 possible mono-MeO-PCBs are unidentified or unknown, and their analytical standards are not commercially available. The quantification of unknown OH-PCBs with different calibrating compounds is not a novel approach but the methods are not consistent. Park et al. (2009) used the response factor of one standard in the quantification of unknown OH-PCBs having the same number of chlorine in the molecules or being in the same homolog group in livers of harbor seals. Quinete et al. (2015) applied a different approach. They calculated the concentration of unknown OH-PCBs in plasma samples from the standard compound with the closest retention time (RT). Sandau et al. (2000), Houde et al. (2006), Ueno et al. (2007), and Kunisue and Tanabe (2009) used the RRFs of standards in the same homolog group to quantify the unknowns in the whole blood of Canadian Inuit, in the plasma of bottlenose dolphins, in the abiotic environmental samples (surface water and precipitation), and in the whole blood of mammals and birds, respectively. However, there are only a few standards for each homolog, especially those with the higher number of chlorine. The small number of standards cannot provide statistically reliable results, the variation of standard choices and availabilities among research groups produces unavoidable bias as a result, and the different approaches in the quantification lead to incomparable results. To overcome these problems, we propose that the RRFs of all commercially available MeO-PCB standards can be combined into a single model that can predict the RFFs of all other unknown MeO-PCBs. Incorporating a greater number of standards will provide a more reliable result, the model will be robust to various standard choices, and the reproducible approach will result in comparable results.

We present our finding in 2 sections. First, we describe the model development process: the logic behind the selection of model predictor, the model selection, the effect of ionization energy on the models’ predictivity, the outlier removal, and the evaluation of the uncertainty. Second, we evaluate the accuracy of our prediction by analyzing a solution of independently synthesized MeO-PCBs and apply our method in a real environmental sample.

Model development

Chlorine number as a predictor of instrumental response

In the early stage of development, we considered several factors that affect the peak intensities of MeO-PCBs and evaluated them as independent variables or predictors in our model. The peak area of MeO-PCBs is a function of their molecular structures and EI ionization energy. The molecular structures are roughly composed of three features: (1) the number of chlorine (#Cl), (2) the position of the methoxy group, and (3) the positions of chlorine in the molecules. Neither the information about the positions of the methoxy group nor that of chlorine can precisely be obtained from the mass spectra or fragmentation patterns. The only apparent information from EI-MS is the number of chlorine from the mass-to-charge ratio (m/z) of molecular ions as shown in Table 1.

Table 1 The mass-to-charge ratio (m/z) of the molecular ions with the highest natural abundance of MeO-PCBs used in this study

The EI ionization energy also greatly affects the peak intensities of MeO-PCBs. To maximize peak intensities, the ionization energy is normally set at 70 eV (Beran and Kevan 1969; Mark 1982). However, this high energy produces several fragment ions with complicated magnitudes and patterns difficult to predict. At a lower EI ionization energy, this phenomenon diminishes, and the molecular ions dominate.

We hypothesized that #Cl could be a predictor in our RRF prediction model, and the decreasing of EI ionization energy could improve the predictivity of the model. Chromatographic RTs had been considered as another possibility since it can indirectly represent the physiochemical properties. However, the notion was abandoned because RTs change with types of columns; i.e., a suitable model in one column may not be appropriate in other columns. EI ionization energy and masses are more consistent between MS systems (Gross 2017; Hoffmann and Stroobant 2007; Linstrom and Mallard 1997; Linstrom and Mallard 2001). We wanted to preserve the ability to use different columns for confirmational purposes rather than quantitation.

The optimum level of ionization energy and the most suitable predictive model

We investigated the correlation between RRF and #Cl by injecting the MeO-PCB standard solution into Agilent 7890B GC–7000D MS system equipped with Agilent DB-1701 capillary column in sextuplicate and capturing the molecular ions in the positive SIM mode. Because our method requires the molecular ions, and lower ionization energies reduce the fragmentation and ionization of the compounds (Gross 2017; Hoffmann and Stroobant 2007), we evaluated the effect of reducing the ionization energy at 70, 60, 50, 40, 30 and 20 eV; at 10 eV, no peak was observed. After discarding co-eluting peaks of the compounds with the same #Cl, the remaining peaks were integrated, and their peak areas were transformed to RRF. The dataset can be found in Saktrakulkla et al. (2019).

To select the most suitable model for predicting RRFs from #Cl, we investigated the possible correlations between them and the statistical measures. The correlations can be in the form of polynomial from linear up to octic regressions (Figs. 1 and S15). Although the greater degree of polynomial or the more number of coefficients increases the coefficient of determination, RSQ, it does not indicate an improvement of the predictive performance of the model. In addition to RSQ, we also considered ADJ.RSQ, AIC, and BIC, in the model selection. According to the examination of these statistical measures, reforming the model from linear to quadratic regression remarkably increased RSQ and ADJ.RSQ and decreased AIC and BIC (Fig. 2, left plot and Table S2), while the addition of any other coefficients alters the statistics. Since the simpler form is the better model, we selected the two simplest models, linear and quadratic regressions, to investigate further.

$$ \mathrm{Linear}:\mathrm{RRF}={\upbeta}_1\times \#\mathrm{Cl}+{\upbeta}_0 $$
(2)
$$ \mathrm{Quadratic}-1:\mathrm{RRF}={\upbeta}_2\times \#{\mathrm{Cl}}^2+{\upbeta}_1\times \#\mathrm{Cl}+{\upbeta}_0 $$
(3)
$$ \mathrm{Quadratic}-2:\mathrm{RRF}={\upbeta}_1\times \#{\mathrm{Cl}}^2+{\upbeta}_0 $$
(4)
Fig. 1
figure 1

The relative response factors (RRF) of the molecular ions of 70 certified MeO-PCBs in hexane solution at 30 eV (left plot). The RRF increases with the number of chlorine in the MeO-PCB molecules. The average RRF for all compounds in each homolog group (#Cl) exhibits the greatest range of response at the electron impact (EI) ionization energy of 30 eV (right plot)

Fig. 2
figure 2

The R-squared (RSQ), adjusted RSQ (ADJ.RSQ), Akaike information criterion (AIC), and Bayesian information criterion (BIC) of the predictive model of unknown MeO-PCBs’ RRFs from homolog group (#Cl) at 30 eV (left plot). The quadratic regression shows the dramatical improvement when compared with the others. The linear, quadratic-1, and quadratic-2 predictive models (right plot). Among them, quadratic-2 regression is the most suitable predictive model

The comparison of the linear and quadratic-1 regressions (Eqs. 2 and 3) revealed another possible form of the correlation, quadratic-2 regression (Eq. 4). We examined the significances of coefficients (the p values of β) of these three regressions across the ionization energy of 20–70 eV (Table S6). We considered the quadratic-2 regression as the most stable and suitable model in the prediction of unknown MeO-PCBs’ RRFs from #Cl (Fig. 2, right plot).

The gradual increase of slope in the quadratic regression between RRFs and #Cl is due to decreasing detector sensivity with increasing chlorination and can be explained by the molecular moieties of MeO-PCBs. EI ionization efficiency or peak intensity is a function of electron density; the higher the electron density, the higher the peak intensity. The electron density of MeO-PCBs is most dense at the lone pairs of oxygen atom where the molecules will most likely to be ionized (Gross 2017; Hoffmann and Stroobant 2007). The phenyl ring is an electron-donating moiety and constantly supplies electron to the lone pairs. The chlorine atom, on the other hand, is an electron-withdrawing moiety and decreases the electron density from the lone pairs. Therefore, the peak intensity decreases with the increase of #Cl, and the reverse is true for the RRFs. The aromaticity of the biphenyls can compensate the withdrawing in the lower congeners, but in the higher congeners, the effect of chlorine is pronounced, thereby resulting in the gradual increase of slope in the quadratic regression.

As expected, the reduction of ionization energy reduces both fragmentation and ionization and alters the intensity of the molecular ion. As the ionization energy is reduced from 70 eV, the peak intensities of the molecular ions of MeO-PCBs increase reaching the maximum at 40 eV and then drop to the minimum at 20 eV (Figure S16). An excess energy leads to fragmentation of molecular ions at the higher energy levels, while an inadequate energy diminishes the ionization at the lower levels. The adjustment of EI ionization energy certainly benefits the correlation. Although there is a correlation between the RRF and #Cl at every energy level, the range of response is greatest at 30 eV (Fig. 1, right plot and FigureS15). Thus, we selected 30 eV as the optimum ionization energy for our model based on the signal intensities and the relationship between RRFs and #Cl.

Outlier removal and the estimation of the model’s uncertainty

The quadratic regression is the most suitable model for the prediction of unknown MeO-PCBs’ RRFs from #Cl at the optimum ionization at 30 eV. However, every prediction always comes with uncertainty, and both prediction and uncertainty can be interfered by outliers. The outliers may be due to the variability of the measurement or the nature of some chemicals whose sensitivities to EI ionization differ from the others. We use Cook’s Distance to remove 18 outliers and re-generated quadratic regression with the remaining data points. We then estimated the uncertainty of the predictive model with 10 × 10.CV which is appropriate in this circumstance because it does not require any new standards in the estimation. The resulting root mean square errors (RMSEs) from 10 × 10.CV were used to compute 95% prediction interval (PI) (Fig. 3 and Table 2). The ratios between the upper limit (UL) and the lower limit (LL) of the PI being less than 5 demonstrate the accuracy of the model in this GC-EI-MS system. It is noteworthy that the RMSE and PI of unknown MeO-nonaCBs cannot be computed because all RRFs of the standard MeO-nonaCB were determined as outliers. However, we considered this as a minor limitation since the model cannot predict only 2 RRFs of more than 700 congeners.

Fig. 3
figure 3

The predictive quadratic model of unknown MeO-PCBs’ relative response factors (RRFs) from homolog group (#Cl) at 30 eV whose outliers have been removed with Cook’s distance at the threshold of 4 / (Nk – 1), where N is the sample size and k is the number of independent variables. The root mean square errors (RMSEs) are then computed with 10 times 10-fold cross-validation (10 × 10.CV) to obtain the 95% prediction intervals (PIs). *All data points of MeO-nonaCB (#Cl = 9) were discarded as outliers, so the PI and RMSE could not be calculated

Table 2 The predicted relative response factor (RRF) for each homolog group (#Cl) at 30 eV, root mean square errors (RMSEs), the upper limit (UL), and the lower limit (LL) of the prediction intervals (PIs). The overall equation is RRF = 0.108(Cl#)2 +1.49 with R2 of 0.871, and N of 318

Model application

The verification of the model accuracy with synthetic MeO-PCB standards

To demonstrate that the model accurately predicts the concentration of MeOH-PCB analyzed by the instrument, we quantify the known amounts of 12 independently synthesized MeO-PCB standards. Eleven compounds were synthesized using methods previously reported (Lehmler and Robertson 2001a, b; Li et al. 2008, 2009; McLean et al. 1996; Rodriguez et al. 2016; Zhai et al. 2011; Zhu et al. 2013), 4′-MeO-PCB5 was synthesized with the Suzuki coupling (Unsinn et al. 2013), and their purities were determined with GC-MS (Li et al. 2018). We prepared solutions of 25, 50, and 100 ng/mL of each compound with 50 ng/mL of internal standard, dissolved in hexane. The solutions were injected into Agilent 7890B GC–7000D MS system equipped with Agilent DB-1701 capillary column, the peak responses of molecular ions were captured at 30 eV in the positive SIM mode, and the peak areas were converted to peak area ratios by dividing with the peak area of the internal standard. The peak area ratio was then multiplied with the predicted RRF of the corresponding homolog from the quadratic model and the added amount of internal standard which results in the predicted mass of MeOH-PCB in the solution.

Of all 12 congeners and three concentration levels, the prediction intervals from the quadratic model cover the actual concentrations of synthetic MeO-PCB standards (Figs. 4 and S17 and Tables S7 and S8). Ten of 12 of the predicted concentrations are more than the actual concentrations. The predicted concentrations range from 0.8 to 2.0 times of the actual concentrations with the mean and median of 1.4 and 1.5 times, respectively. These differences are consistent across the three concentrations. Although the model may slightly overestimate the concentrations of MeO-PCBs in this set of standards, all predicted concentrations are in the same order of magnitude of the actual concentrations. Our model is therefore accurate enough to be an alternative to MeO-PCB standards in the quantification of unknown OH-PCBs.

Fig. 4
figure 4

The plot of the actual concentrations at 25 ng/mL of the synthetic MeO-PCB standards (open circles), the predicted concentrations calculated from the RRFs of the model (solid circles), and the 95% prediction intervals (PIs, error bars)

Quantitation of environmental sample

We demonstrate the utility and importance of the model by analyzing a sample of PCB-contaminated sediment collected from a wastewater lagoon in Altavista, Virginia. Our laboratory analyzed samples from this location previously and reported concentrations of PCB congeners elsewhere (Mattes et al. 2018). We modified the sample preparation procedure as described elsewhere (Letcher et al. 1995; Marek et al. 2013a; Martinez et al. 2010) to increase the obtained amount of OH-PCBs and to eliminate matrix interference more effectively by increasing the mass of sediment samples, doubling the ASE extraction, replacing the concentrated hydrochloric acid with sulfuric acid, and employing gel permeation chromatography (GPC). OH-PCBs in the sample were then derivatized to MeO-PCBs with diazomethane and analyzed for known and unknown OH-PCBs.

We used the advantage of QqQ MS to confirm the fragmentation patterns of unknown MeO-PCBs with those of authentic standards. The fragmentation patterns of MeO-PCBs by EI ionization are generally used in the confirmation of the knowns and the structure elucidation of the unknowns (Bergman et al. 1995; Li et al. 2009; Liu et al. 2018; Tulp et al. 1980). However, due to the large number of possible MeO-PCBs, the product ions of a congener are confounded by those of the same and neighbor homologs of the co-eluting peaks when they are studied with total ion chromatogram (TIC) mode. To overcome this problem, we instead studied the fragmentations of all 70 authentic MeO-PCB standards in product ion mode with the collision energy (CE) ranging from 10 to 50 eV. The fragmentation by gas collision in this study provides the similar product ions as those previously reported by EI ionization possibly due to the same one-electron-shifting-based mechanism (Table S9) (Bergman et al. 1995; Gross 2017; Hoffmann and Stroobant 2007; Li et al. 2009; Liu et al. 2018; Tulp et al. 1980). The fragmentation patterns and their intensities are different among CE levels. Of each congener, we selected a production ion and a CE level that provide the highest intensity to improve the sensitivity of our MRM transitions in the known OH-PCB analysis (Table S1) (Awad et al. 2016; Marek et al. 2013a; Marek et al. 2017; Marek et al. 2013b). We also used the fragmentation patterns at different CE levels in the verification of the unknown OH-PCBs.

We confirmed and quantified the known OH-PCBs as MeO-PCBs in the sample in MRM mode by comparing their chromatographic RTs and peak responses with those of authentic standards in two capillary columns, Agilent DB-1701 and Supelco SPB-Octyl (Fig. 5, left plot). For the unknowns, we named them by their homolog groups and RTs in Agilent DB-1701 capillary column (Fig. 5, right plot). The product ions of the unknowns captured at five CE levels were compared with those of authentic standards (Table S9 and S10). For example, the peak of unknown MeO-triCB at 41.1 min shows the signature product ions (243.0 and 207.9 at 10 eV; 270.9 and 243.0 at 20 eV; 270.9, 243.0, 206.9, 182.9, and 173.0 at 30 eV; 242.9, 217.0, 207.0, 182.9, and 173.1 at 40 eV; and 242.8, 216.9, 207.0, 183.1, and 173.1 at 50 eV) that are similar to those found in the MeO-triCB standards. After the verification, the valid unknown MeO-PCBs were then quantified with the same method described in the previous section.

Fig. 5
figure 5

The concentrations of known and unknown OH-PCBs in a sediment sample from Altavista, Virginia. The knowns are named by homolog group (#Cl), hydroxyl position (OHPos) and congener number (Congener). The unknowns are named by #Cl and retention time (RT) in Agilent DB-1701 capillary column. The compounds measured with authentic standards (Knowns, black) are 13.5% and the compounds measured using the method developed in this study (Unknowns, white) are 86.5% of the of the total quantified OH-PCBs. The error bars of the Unknowns indicate 95% prediction intervals (PIs)

The sum total OH-PCB concentration in this sediment sample is 92.0 (55.9–128) ng/g dry weight (DW). However, the total concentration of the knowns, 12.4 ng/g DW, is only about one-sixth of that of the unknowns, 79.5 (43.4–116) ng/g DW. The known OH-PCBs in this sample are composed of 35 congeners in 32 co-eluting peaks. Among them, 4′-OH-PCB18 has the highest concentration, 7.34 ng/g, which is about 59% of the total known OH-PCB concentration and 8% of the sum total. Of 48 peaks of the unknown OH-PCBs found in this sample, the OH-triCB at 41.1 min shows the highest concentration, 31.4 (17.1–45.6) ng/g, governing 39% and 34% of the total known and the sum total OH-PCB concentrations, respectively. Tri-chlorinated congeners are the dominant homolog of both known and unknown OH-PCBs. See Table S11 and S12 for individual concentrations. If the concentration of only the knowns were quantified, the sample would be misinterpreted by almost an order of magnitude.

The total concentration of the known OH-PCBs in this sample (12.4 ng/g DW) is in the same order of magnitude as that we previously reported in sediment from a Lake Michigan waterway (0.20–26 ng/g DW) (Marek et al. 2013a). In that report, the samples were analyzed by comparing with 65 standards available at the time in a GC column, Supelco SPB-Octyl. We now conclude that the high concentration of 3′-OH-PCB65 (up to 10.2 ng/g DW) in most samples, including sediment and Aroclors, is an error. Further analysis has revealed that the peak used to identify that compound was an artifact of instrumental analysis methods used at that time. Regardless, the concentrations of the known OH-PCBs in sediment from a Lake Michigan waterway are still comparable to that from Altavista in this report.

The homolog profile of OH-PCBs calculated with the model implies the enzymatic substrate selectivity of the microbes in the area. The sediment samples from Altavista showed the PCB congener profile similar to that of Aroclor 1248 (Mattes et al. 2018). However, the OH-PCB homolog profile is different from that found in Aroclor 1248 (Marek et al. 2013a). The difference suggests the enzymatic activity of the microbes as the major source of OH-PCBs in this sample. In concordance, tri-chlorinated homolog is the dominance of OH-PCBs in this sample as same as that of PCBs reported by Mattes et al. (2018) from the same location. However, the second dominances are different. While di-chlorinated homolog is the second dominance of OH-PCBs, for PCBs, it is tetra-chlorinated homolog. This phenomenon is possibly due to the steric hindrance of chlorine that blocks the enzymatic binding of PCBs, thereby suppressing the oxidation of higher chlorinated congeners. Without the model, the interpretation may be different because the homolog profile of the known OH-PCBs shows the mono-chlorinated homolog as the second dominance of OH-PCBs. Clearly, there are important gaps in our understanding of how PCBs are transformed into OH-PCBs by microbial processes. This semi-target analytical method for quantification of OH-PCBs offers a new approach to resolving these gaps.

Reproducible process to generate the model

We have described a simple process to generate the quadratic model for the prediction of unknown MeO-PCBs’ RRFs from #Cl and an example of its application. The predictive model can be used as an alternative method to the chemical synthesis of new standards in the quantitation of unknown OH-PCBs. The procedure to reproduce the predictive model is summarized in Fig. 6. First, the RRFs are calculated from the signals of all non-co-eluting MeO-PCB standards captured with GC-EI-MS. Next, the first quadratic regression between RRF and #Cl is generated from raw data, and outliers are removed. The model is then re-generated with the remaining data points, the RMSEs are computed with 10 × 10.CV, and 95% PIs are obtained.

Fig. 6
figure 6

The procedure to generate the predictive quadratic model of unknown MeO-PCBs’ relative response factors (RRFs) from homolog groups (#Cl). Root mean square errors (RMSEs); prediction intervals (PIs); and gas chromatography coupling with electron impact mass spectrometry (GC-EI-MS). *Co-eluting congeners are not included in the model

To demonstrate the reproducibility of this procedure, we generated two quadratic predictive models: one in the same GC-EI-MS system equipped with a different type of column, Supelco SPB-Octyl capillary column (Figure S19) and the other in a different GC-EI-MS system, Agilent 7890A GC–7000B MS system, equipped with Supelco SPB-Octyl capillary column (Figure S20). The dataset can be found in Saktrakulkla et al. (2019). In the same GC-EI-MS system, the RRF difference between columns is due to the competitive ionization of co-eluting peaks. The PIs of the predictive model from the different GC-EI-MS system are broader because sensitivity and variability are system-dependent. However, this result shows that this procedure is independent to column type. Moreover, since the correlation between RRF and #Cl is based on the EI ionization efficiency, this quadratic model can be reproduced in any GC-EI-MS systems without regard to the type of mass analyzers (e.g., ion trap, time-of-flight, or orbitrap). We have provided the R-script to generate the predictive quadratic model in Saktrakulkla et al. (2019). However, the optimum EI ionization energy, RRFs, and PIs are system-specific, so optimization studies are required. This approach can be applied to liquid chromatography (LC); different types of ionization techniques (e.g., chemical ionization (CI) and electrospray ionization (ESI)); other biphenyl compounds (e.g., polybrominated diphenyl ethers (PBDEs) and polybrominated biphenyls (PBBs)); and other metabolites (e.g., methyl sulfone and sulfate).

Conclusions

Through a statistical approach, we were able to generate a model to predict the RRFs from #Cl for the quantification of unknown OH-PCBs. The procedure to generate is simple, and the model is versatile to be used as an alternative method to synthetic standards. The model enables us to move beyond the limit of standard availability in the quantification of OH-PCBs contaminating in the environment.