Introduction

Polycyclic aromatic hydrocarbons (PAHs) are persistent organic pollutants with great environmental concern (Zhang et al. 2012) due to their ambient persistence (Schenker et al. 2007), global migration (Simonich and Hites 1995), toxic, mutagenic and carcinogenic properties (Wang et al. 2018; Zhang and Chen 2017). PAHs mainly occurred from anthropogenic sources, including incomplete combustion or pyrolysis of fossil fuels and biomass, traffic emission, aluminum and iron/steel production, coke production, as well as the leakage of petroleum (Bozlaker et al. 2008; Cai et al. 2017; Liu et al. 2017; Xu et al. 2014). PAHs emitted into the atmosphere can enter surface soil by wet and dry deposition (Ahad et al. 2015; Lau et al. 2010; Yunker et al. 2002). As environmental conditions change, soil may serve as repositories of PAHs, which when perturbed, may release or increase bioaccessible and bioavailable fractions which were initially non-bioavailable (Luca et al. 2004; Neff et al. 2005; Nyarko and Klubi 2011; Ogbonnaya et al. 2017; Rhodes et al. 2012; Tao et al. 2008; Wang et al. 2001). Therefore, it is essential for human health research to evaluate the potential risks of PAHs in soils (Liu et al. 2015).

Several approaches (TEQ, BaP equivalents and exposure risk) have been adopted in ecological and human health research to assess the risk of PAHs. The health exposure risk including total carcinogenic risk and total hazard index (non-carcinogenic risk) are widely used to comprehensively evaluate human health potential risk associated with mixture multiple PAHs (Bari and Kindzierski 2017; Liu et al. 2015; MEEPRC 2014). The quantitative identification of potential sources and contributions of PAHs is of great importance to control the priority pollutants in soils (Tian et al. 2013). The source apportionments for PAHs concentrations in soils have been analyzed by principal component analysis-multiple linear regression (PCA-MLR) model and positive matrix factorization (PMF) model, which are based on an assumption of significant correlations of compounds derived from the same sources (Cai et al. 2017; Li et al. 2014a; Liu et al. 2017; Tian et al. 2013; Wang et al. 2013).

A lot of research has been conducted in the human health exposure risk assessment of PAHs in the soil sampling sites (Gulan et al. 2017; Li and Li 2017; Liu et al. 2016a; Wang et al. 2017; Zhang and Chen 2017). Current research concerning risk of PAHs in soils only focused on the receptor risk but ignored the source-oriented risk. However, the constituents of PAHs differ due to source activity, land use, initial composition and aerobic conditions (Ogbonnaya et al. 2017), thence exerting different levels of distribution patterns and risks (Peng et al. 2016, 2017). In other words, the highest contributor to concentrations may not be the most significant contributor to human health exposure risk. Therefore, the source apportionment to concentrations and also the source apportionment to exposure risk are essential for controlling priority contaminants.

The analysis of the human health exposure risk of sources is conducive to our understanding of the role of various source categories on the increasing potential for adverse effects of PAHs in soils. This study result will help us to achieve the purpose of alleviating or controlling the human health exposure risk posed by PAHs in soils by controlling the anthropogenic sources which caused major human health exposure risk.

The objectives of this study were: (1) to investigate the concentration distribution as well as exposure risk of PAHs in study region; (2) to apportion sources of PAHs using receptor model, including PCA-MLR and PMF models; and (3) to quantify source contribution to PAHs concentrations and exposure risk, combining the PMF model with the Total-CR and Total-HI. The method used in this work can be applied to derive quantitative exposure source-oriented risk and manage priority contaminants for human health.

Methods and models

Study region

In the study, Yulin (36°84′–39°35′N, 107°14′–111°14′E) was surveyed for the presence of PAHs. The study region is one of the seven coalfields in the world and represents a National Energy and Chemical Industry Base, with proven reserves of approximately 146 billion tons of coal, 1.831 trillion m3 of natural gas, 0.19 billion tons of petroleum, 885.8 billion tons of salt and other 40 types of minerals (Zhang et al. 2009). The map of the Yulin National Energy and Chemical Industry Base is shown in Fig. 1. The region covers 42,920.18 km2 with the Mu Us Sandy Land area (wind erosion) in the north and Loess Plateau area (water erosion) in the south (Hui 2001).

Fig. 1
figure 1

Map of study region and sampling sites (green sites near the non-sensitive land and red sites on the sensitive land)

There are 91 national priority monitoring projects, including coal mining, coking, coal preparation, oil and gas exploitation, chemical industrial, thermoelectricity generation and magnesium manufacture. All these monitoring projects are sources of serious industrial pollution which had aggravated ecological and human health hazard, especially coal mining and coal chemical industry (Li et al. 2012). Nevertheless, to our knowledge, no research has been carried out regarding the level of PAHs pollution in this region. Therefore, it is very important to identify the concentration distribution, exposure risk and sources of PAHs in the Yulin for the environment and human health protection.

The 38 sampling sites are also shown in Fig. 1. The sites included fifteen in the vicinity of characteristic industries represented non-sensitive land (S1, S3 ~ S6, S11, S17, S20, S22, S23 and S26 ~ S30), twenty three on the random residence and agricultural land represented sensitive land (S2, S7 ~ S10, S12 ~ S16, S18, S19, S21, S24, S25 and S31 ~ S38).

Soil sampling

A total of 38 surface soils were collected using a stainless steel from a depth of 0–20 cm in May 2014. Five subsamples from each site of 1 m2 area were well mixed and stored in glass bottles. All soil samples were stored at 4 °C until laboratory analysis. In this study, 16 EPA priority PAH homologues were analyzed: naphthalene (Nap), acenaphthene (Acp), acenaphthylene (Acy), fluorene (Fle), phenanthrene (Phe), anthracene (Ant), fluoranthene (Fla), pyrene (Pyr), benzo(a)anthracene (BaA), chrysene (Chr), benzo(b)fluoranthene (BbF), benzo(k)fluoranthene (BkF), benzo(a)pyrene (BaP), dibenzo(a,h)anthracene (DahA), indeno(1,2,3-cd)pyrene (InP) and benzo(g,h,i)perylene (BghiP).

PAH analysis

The sample was freeze-dried by Vacuum Freeze Drier (LNB, FD-C10N-50, China), and then 10 g of each homogenized soil sample was weighted for extraction. Surrogate standards (predeuterated PAHs) were added to the sample aliquot, and then sample was Soxhlet-extracted for 18 h with 100 mL acetone/hexane solvent (1:1, V/V). The extracts were then concentrated to approximately 2 mL by evaporation in the gentle nitrogen stream, followed by purification on a 3-cm activated silica, 3-cm neutral alumina and 2-cm anhydrous sodium sulfate (presoaked in hexane) column with 25 mL of hexane/dichloromethane (1:1) mixture. Eluates were further concentrated with gentle nitrogen stream, and solvent was set to 1.0 mL with hexane.

An Agilent 7890A gas chromatography-7000B mass selective detector (GC–MS, Agilent, USA) system was employed for determination of PAHs. Separation was achieved using a 30 m × 250 μm × 250 μm DB-5 silica capillary column. Helium was used as the carrier gas constant flowing at 1.0 mL/min (Liu et al. 2016b). Injection mode: splitless; injections were made at a volume of 1.0 μL. Injector, ion source, transfer line and quadrupole temperatures were set at 280, 230, 280 and 150 °C, respectively. The GC oven temperature was programmed as follows: held at 80 °C for 2 min, increased to 180 °C at rate of 20 °C/min and held for 5 min, then raised to 290 °C at rate of 10 °C/min and held for 5 min. The electron impact energy was set at 70 eV.

The identification of individual homologues was obtained by comparing GC retention time and ion abundance ratio of two exacts (m/z) of measured sample with those of authentic standard (Li et al. 2012). The correlation coefficients of calibration curves for the five concentration gradients ranged from 0.993 to 0.997. The detection limits of this method ranged from 6.0 to 10.0 μg/kg dw (dry weight) for soil. Before receptor-oriented source apportionment analysis, below detected limit data were replaced with 1/2 method detected limit. The recoveries for surrogate standards (naphthalene-d8, acenaphthylene-d8, phenanthrene-d10, chrysene-d12, benzo(a)pyrene-d12) fell within a range varied from 70.2 to 104%.

Receptor-oriented source apportionment

PCA-MLR

Principal component analysis with multiple linear regression analysis (PCA–MLR) model is one of the receptor models which has been widely applied to identify the source for PAHs in sediments and air quality studies (Di Gilio et al. 2017; Larsen and Baker 2003; Li et al. 2012; Liu et al. 2015, 2016a; Tian et al. 2013; Zhang et al. 2012). Wang et al. (2017) analyzed 16 samples using PCA method to calculate the percentage contribution of the different sources. Yang et al. (2017) apportioned source of soil heavy metals by PCA-MLR model and achieved reasonable result. To identify the potential source and infer the source contributions of the PAHs, PCA-MLR was carried out by SPSS version 19.0 on a matrix of 38 rows (number of soil sites) and 16 columns (number of PAH homologues). The detailed description of PCA-MLR can be found in the literature (Li et al. 2012; Sofowote et al. 2008; Thurston and Spengler 1985).

PMF model

In this work, the US Environmental Protection Agency (USEPA)’s multivariate receptor model positive matrix factorization (EPA PMF5.0) (USEPA 2014) was applied. PMF model can calculate source profile and contribution according to reliable factorization algorithms (Zhang et al. 2012) and has been widely applied to identify the source in sediments (Di Gilio et al. 2017; Tian et al. 2013; Vu et al. 2017; Xu et al. 2016; Zhang et al. 2012), especially air quality studies (Bari and Kindzierski 2016, 2017; Larsen and Baker 2003; Liu et al. 2015; Teixeira et al. 2015), recently being used in soil studies (Chen and Lu 2018; Jiang et al. 2017; Liang et al. 2017; Liu et al. 2016a) The mathematical expression of PMF can be described as Eq. (1):

$$x_{ik} = \sum\limits_{j = 1}^{p} {g_{ip} } f_{pk} + e_{ik} ,$$
(1)

where xik is the ith species concentration measured in the kth sample; fpk is the contribution of the pth source to the kth sample; gip is the concentration of the ith species from the pth source; and eik is the error (Tian et al. 2013; Zhang et al. 2012).

The goal of the model is to find values for gip and fpk that best reproduce the measured data xik. These values are adjusted until a minimum value of Q is found, and Q is defined as Eq. (2):

$$Q(E) = \sum\limits_{k = 1}^{m} {\sum\limits_{i = 1}^{n} {(e_{ik} /\sigma_{ik} )} } ,$$
(2)

where the value σik is the “uncertainty” in the ith species for sample k. The input data of the model were the concentrations of PAHs in soil and their respective uncertainties, which have been calculated according to Eq. (3) (Teixeira et al. 2015; USEPA 2014).

$$\begin{array}{*{20}l} {\sigma_{ik} = {\raise0.7ex\hbox{$5$} \!\mathord{\left/ {\vphantom {5 6}}\right.\kern-0pt} \!\lower0.7ex\hbox{$6$}} \times {\text{MDL}}} \hfill & {(x_{ik} \le {\text{MDL}})} \hfill \\ {\sigma_{ik} = (0.05 \times x_{ik} ) + {\text{MDL}}} \hfill & {(x_{ik} > {\text{MDL}})} \hfill \\ \end{array}$$
(3)

where the MDL is the Method Detection Limit.

Receptor model-Total-CR/Total-HI for risk apportionment

Health exposure risk assessment connects the levels of contaminants in the environment with a probability of toxic effects for a human population (Liu et al. 2017; Wang et al. 2017). The total carcinogenic risk (Total-CR) and total hazard index (Total-HI) were used to quantitatively characterize carcinogenic risks and non-carcinogenic risk of exposure to a PAHs mixture, respectively (MEEPRC 2014). For the soil, three exposure routes were considered: (1) oral ingestion of soils; (2) dermal contact of soils; and (3) air inhalation via soil vapor (Jiang et al. 2017). The three exposure routes are described in the supplementary material.

Based on the Total-CR and Total-HI, a coupling method was developed in this work and applied to quantitatively characterize the human health exposure risks for various PAH source categories. The mathematical expression of each source contribution to Total-CR/Total-HI can be described as Eqs. (4)–(13). The corresponding formula parameter values are shown in supplementary material Table S1:

$${\text{Con}}\left( {{\text{CR}}_{\text{ois}} } \right)_{ij} = {\text{Con}}_{ij} \times {\text{SF}}_{o} \times \frac{{{\text{ABS}}_{O} \times \left( {\frac{{{\text{OSIR}}_{c} \times {\text{ED}}_{c} \times {\text{EF}}_{c} }}{{{\text{BW}}_{c} }} + \frac{{{\text{OSIR}}_{a} \times {\text{ED}}_{a} \times {\text{EF}}_{a} }}{{{\text{BW}}_{a} }}} \right)}}{{{\text{AT}}_{\text{ca}} }} \times 10^{ - 6}$$
(4)
$${\text{Con}}\left( {{\text{HQ}}_{\text{ois}} } \right)_{ij} = \frac{{{\text{Con}}_{ij} }}{{{\text{RfD}}_{o} \times {\text{SAF}}}} \times \frac{{{\text{ABS}}_{o} \times {\text{OSTR}}_{c} \times {\text{ED}}_{c} \times {\text{EF}}_{c} }}{{{\text{BW}}_{C} \times {\text{AT}}_{\text{na}} }} \times 10^{ - 6}$$
(5)
$$\begin{aligned} {\text{Con}}\left( {{\text{CR}}_{\text{dsc}} } \right)_{ij} & = {\text{Con}}_{ij} \times \frac{{{\text{SF}}_{O} }}{{{\text{ABS}}_{gi} }} \\ & \quad \times \frac{{{\text{ABS}}_{d} \times {\text{E}}_{v} \times \left( {\frac{{{\text{SAE}}_{c} \times {\text{SSAR}}_{c} \times {\text{ED}}_{c} \times {\text{EF}}_{c} }}{{{\text{BW}}_{c} }} + \frac{{{\text{SAE}}_{a} \times {\text{SSAR}}_{a} \times {\text{ED}}_{a} \times {\text{EF}}_{a} }}{{{\text{BW}}_{a} }}} \right)}}{{{\text{AT}}_{\text{ca}} }} \times 10^{ - 6} \\ \end{aligned}$$
(6)
$${\text{Con}}\left( {{\text{HQ}}_{\text{dsc}} } \right)_{ij} = \frac{{{\text{Con}}_{ij} }}{{{\text{RfD}}_{o} \times {\text{ABS}}_{gi} \times {\text{SAF}}}} \times \frac{{{\text{ABS}}_{d} \times {\text{SAE}}_{C} \times {\text{SSAR}}_{c} \times {\text{ED}}_{c} \times {\text{EF}}_{c} \times {\text{E}}_{v} }}{{{\text{BW}}_{C} \times {\text{AT}}_{\text{na}} }} \times 10^{ - 6}$$
(7)
$$\begin{aligned} {\text{Con}}\left( {{\text{CR}}_{\text{pis}} } \right)_{ij} & = {\text{Con}}_{ij} \times \frac{{{\text{IUR}} \times {\text{BW}}_{a} }}{{{\text{DAIR}}_{a} }} \times {\text{PM}}_{10} \\ & \quad \times \frac{{{\text{DAIR}}_{c} \times {\text{PIAF}} \times {\text{ED}}_{c} \times \left( {{\text{fspo}} \times {\text{EFO}}_{c} + {\text{fspi}} \times {\text{EFI}}_{c} } \right)}}{{{\text{BW}}_{c} \times {\text{AT}}_{\text{ca}} }} \\ & \quad + \frac{{{\text{DAIR}}_{a} \times {\text{PIAF}} \times {\text{ED}}_{a} \times \left( {{\text{fspo}} \times {\text{EFO}}_{a} + {\text{fspi}} \times {\text{EFI}}_{a} } \right)}}{{{\text{BW}}_{a} \times {\text{AT}}_{\text{ca}} }} \times 10^{ - 6} \\ \end{aligned}$$
(8)
$$\begin{aligned} {\text{Con}}\left( {{\text{HQ}}_{\text{pis}} } \right)_{ij} & = \frac{{{\text{Con}}_{ij} }}{{{\text{SAF}} \times \frac{{{\text{RfC}} \times {\text{DAIR}}_{a} }}{{{\text{BW}}_{a} }}}} \\ & \quad \times \frac{{{\text{PM}}_{10} \times {\text{DAIR}}_{c} \times {\text{PIAF}} \times {\text{ED}}_{c} \times \left( {{\text{fspo}} \times {\text{EFO}}_{c} + {\text{fspi}} \times {\text{EFI}}_{c} } \right)}}{{{\text{BW}}_{c} \times {\text{AT}}_{\text{nc}} }} \times 10^{ - 6} \\ \end{aligned}$$
(9)
$${\text{Con}}\left( {{\text{CR}}_{k} } \right)_{ij} = {\text{Con}}\left( {{\text{CR}}_{\text{ois}} } \right)_{ij} + {\text{Con}}\left( {{\text{CR}}_{\text{dcs}} } \right)_{ij} + {\text{Con}}\left( {{\text{CR}}_{\text{pis}} } \right)_{ij}$$
(10)
$${\text{Con}}\left( {{\text{HI}}_{k} } \right)_{ij} = {\text{Con}}\left( {{\text{HQ}}_{\text{ois}} } \right)_{ij} + {\text{Con}}\left( {{\text{HQ}}_{\text{dcs}} } \right)_{ij} + {\text{Con}}\left( {{\text{HQ}}_{\text{pis}} } \right)_{ij}$$
(11)
$${\text{Con}}\left( {{\text{Total-CR}}_{k} } \right)_{ij} = \sum {{\text{Con}}\left( {{\text{CR}}_{k} } \right)_{ij} }$$
(12)
$${\text{Con}}\left( {{\text{Total-HI}}_{k} } \right)_{ij} = \sum {{\text{Con}}\left( {{\text{HI}}_{k} } \right)_{ij} }$$
(13)

where Con(Total-CRk)ij is the calculated total carcinogenic risk of ∑9PAHs from the jth source in the ith sampling site, Con(Total-HIk)ij is the calculated total hazard index of ∑8PAHs from the jth source in the ith sampling site, Conij is each individual PAH concentration apportioned from the jth source category in the ith sampling site.

Results and discussion

PAHs levels in soils

Sixteen EPA priority PAH homologues were detected in 38 surface soil samples. The statistical characteristics of ∑16PAHs concentrations in 38 sites were calculated and are shown in Table S2 and Fig. S1. The concentrations of ∑16PAHs in soils of Yulin National Energy and Chemical Industry Base showed wide variations, ranging from 110.22 (at S7) to 4934.13 (at S22) μg/kg with the mean value of 1581.87 μg/kg (as Fig. 2). Compared with other researches, the mean concentration of ∑16PAHs in soils of Yulin was lower than that reported in urban soils of London, UK (18,000 μg/kg) (Vane et al. 2014), Warsaw city, Poland (2654 μg/kg) (Bojakowska et al. 2017), Shanghai city, China (1790 μg/kg) (Wang et al. 2013) and in the vicinity of chemical plant soils in Changzhi, China (2780 μg/kg) (Liu et al. 2016a), whereas the mean concentration of ∑16PAHs here was higher than that reported in coal mine soils of Liaoning, China (1118 μg/kg) (Liu et al. 2012), Coal-Fired Power Plants soils in Xuzhou, China (1089 μg/kg) (Ma et al. 2016), soils around chemical industries in Tarragona County, Spain (1002 μg/kg) (Nadal et al. 2004), multi-land soils in Yangtze River Delta, China (266 μg/kg) (Cai et al. 2017), Pristina City, Kosovo and Metohija (168 μg/kg) (Gulan et al. 2017) and the mean concentration in soils of China (730 μg/kg) (Zhang and Chen 2017).

Fig. 2
figure 2

Spatial distribution of the ∑16PAHs (μg/kg) for the study region

According to Canadian Soil Quality Guidelines for the Protection of Environmental and Human Health (CCME 2010), to ensure that both human and ecological receptors are protected from direct contact with soil contaminated with carcinogenic PAHs, a benzo[a]pyrene total potency equivalent (BaP TPE) was calculated using Eq. (14). The Index of Additive Cancer Risk (IACR) assesses potential threats to potable groundwater water quality from leaching of carcinogenic PAH mixtures from soil, and the IACR is calculated by Eq. (15)

$${\text{BaP}}\;{\text{TPE}} = \sum\limits_{i = 1}^{n} {({\text{C}}_{i} \times {\text{PEF}}_{i} )} ,$$
(14)
$${\text{IACR}} = \sum\limits_{i = 1}^{n} {({\text{C}}_{i} / {\text{PW}}_{i} )} ,$$
(15)

where BaP TPE is the concentration of the carcinogenic PAH mixture, expressed as a total potency equivalent of BaP (mg/kg) (desired level of acceptable risk, SQGDH = 0.6 BaP TPE.), n is the number of carcinogenic PAH (with an available WHO PEF value), Ci is the concentration of the carcinogenic PAH compound i (mg/kg), PEFi is the potency equivalence factor for the carcinogenic PAH compound i (unitless) (see Table S2), IACR is the Index of Additive Cancer Risk, (unitless) (should not exceed a value of 1.0), PWi is the potable water component values for the carcinogenic PAH compound i (mg/kg) (see Table S2). Calculated BaP TPE and IACR are shown in Table S2 and Fig. 1, respectively.

The resulting BaP TPE values in soils of Yulin National Energy and Chemical Industry Base were below the desired level of acceptable risk, ranging from 0.013 BaP TPE (at S7) to 0.385 BaP TPE (at S22) with the mean value of 0.097 BaP TPE. Nonetheless, IACR levels at 29% of the sampling sites exceeded threshold value, indicating potential threats to potable groundwater water quality from leaching of carcinogenic PAH mixtures from soil in those sites, particularly site S10 and site S35, which are on the sensitive land.

According to the number of aromatic rings, the 16 PAH compounds were divided into three groups (Qiao et al. 2006), 2–3-ring PAHs (Nap, Acp, Ace, Fle, Phe, Ant),4-ring PAHs (Fla, Pyr, BaA, Chr) and 5–6-ring PAHs (BbF, BkF, BaP, DahA, InP, BghiP). Most of the sites were detected with higher proportions of 2–3 rings PAH homologous (40.7–83.5%, average 58.28%) and 4-ring PAH homologous (0.1–52.7%, average 25.97%), except for sites S8, S15, S17, S21 and S37, where the high molecular weight (HMW) PAHs (4–6 rings) were predominant (Fig. 3). The high molecular weight PAHs have higher toxic equivalency factor relative to PAHs with low molecular weight (LMW) (Nisbet and LaGoy 1992), as shown in Table S2. In this study, higher concentrations of low molecular weight PAHs in comparison with high molecular weight PAHs indicated relatively lower toxic risk. The result was coinciding with studies on LMW PAHs being the most predominant PAH (Farrar et al. 2005; Motelay-Massei et al. 2005).

Fig. 3
figure 3

Triangular diagram of percentage concentration for the 16 PAHs in soils

The effects range low (ERL) and the effects range median (ERM) values were used for biological risk assessment (Liu et al. 2016a; Tiwari et al. 2017). The measured concentrations of PAHs were compared with the ERL and ERM values (Fig. S1 in the supplementary material). Results showed that the individual PAH concentrations at all sites were below the ERM; however, 2–3-ring PAH in 39.47 ~ 60.53% of sites exceeded ERL. These findings indicated that those sites should pose potential biological threat (Xu et al. 2007). Liu et al. (2016a) assessed the PAHs in soils around chemical industries in Shanxi, China, in a similar result to this study with LMW PAHs having more adverse ecological effects than HMW PAHs.

Source identification of PAHs based on isomeric ratio method

Generally, the PAH isomeric ratio method is an easy way to distinguish between petrogenic and pyrogenic origins (Qiao et al. 2006; Tobiszewski and Namieśnik 2012; Wang et al. 2017; Yunker et al. 2002). In order to estimate the origin of the PAHs in the Yulin soils, four specific PAH ratios (BaA/(BaA + Chr), InP/(InP + BgP), Fla/(Fla + Pyr) and Ant/(Phe + Ant)) were used. As shown in Fig. 4, in majority of the sites, the BaA/(BaA + Chr) values were above 0.35, implying that PAHs may be attributed to biomass and coal combustion (Yunker et al. 2002). The mean ratio of InP/(InP + BgP) was 0.34, the value for petroleum and biomass combustion (Wang et al. 2017). The values of Ant/(Phe + Ant) were above 0.1, indicating PAHs may be derived from pyrogenic (Wang et al. 2017). For Fla/(Fla + Pyr), in majority of sites, the values were above 0.4, suggesting that PAHs may be influenced by petroleum, grass, coal and wood combustion (Yunker et al. 2002). Results indicate that petroleum, grass, coal and wood combustions were the major sources of PAHs in soils from Yulin National Energy and Chemical Industry Base.

Fig. 4
figure 4

Cross plots for the isomeric ratios of BaA/(BaA + Chr), InP/(InP + BgP), Ant/(Phe + Ant) and Fla/(Fla + Pyr) in the soils

Source apportionment of PAHs

PCA-MLR method

PCA-MLR method was performed with Kaiser-normalized and varimax rotation. Four factors (eigen value greater than 1.0) (Zhang et al. 2012) extracted by PCA model accounted for > 79% of the variance in the dataset. The rotated factor loadings were presented in supplementary material Table S3. The first factor accounted for 35% of the total variance and showed very high correlations for Acp and Nap and medium correlations for Fle, Phe, Ant, Fla, Pyr, Ace and BaP. Previous researches have shown that these compounds may indicate emissions from coke ovens (Khalili et al. 1995; Liu et al. 2016a; Sofowote et al. 2008). Fla and Pyr are also emitted from coal combustion (Aydin et al. 2014). Predominant PAHs in the coke oven emissions were Nap, Ace, Phe, Fle, Ant and Fla (Khalili et al. 1995; Kong et al. 2013). In addition, there are many coking plants in the study region, and during production and transportation, coke oven gas and fly ash can cause soil contamination through wet/dry deposition (Liu et al. 2016a). Therefore, factor 1 was identified as coke oven emissions and coal combustion. Factor 2 accounted for 19% of the total variance and was highly related to BkF, BbF and BaP and relatively low in relation to BaA and BghiP. It clearly implied that this source was a wood combustion source (Khalili et al. 1995). Factor 3 accounted for 14% of the total variance and showed high correlation to BghiP and DahA, which are common emissions from gasoline and diesel engine emissions (Vehicular emission sources) (Khalili et al. 1995; Liu et al. 2015; Mohamed et al. 2016; Sofowote et al. 2008). Factor 4 accounted for 11% of the shared variance and had high correlation for Chr and BaA which are common indicators of coal combustion (Sofowote et al. 2008; Wang et al. 2015). The previously mentioned four factors were assigned to represent coke oven emissions, wood combustion, vehicular emission and coal combustion, respectively.

An MLR of elements in the PCA factor scores matrix against the normal standard deviation of the ∑PAHs values was carried out to obtain the best correlation between measured and modeled ∑PAHs values for each site. The MLR analysis yielded excellent coefficients (ti) for the four factor scores with a stipulated minimum 95% confidence limit. The PCA-MLR model for the standard deviate of ∑PAHs values was expressed using Eq. (16):

$$Z_{{\sum {\text{PAHs}}}} = 0.830T_{1} + 0.365T_{2} + 0.290T_{3} + 0.305T_{4}$$
(16)

The percentage contributions to the mean for the four factors were calculated as (ti/∑ti) × 100% of each source, with 47% for factor 1 (coke oven emissions and coal combustion), 20% for factor 2 (wood combustion), 16% for factor 3 (vehicular emission), 17% for factor 4 (coal combustion).

PMF model

In this study, a 16 × 38 (16 PAH homologues and 38 sampling sites) dataset was carried out by the EPA PMF 5.0 model to identify the source contributions to ∑PAHs in soils. After the number of factors and Fpeak values were tested, four-factor solution was adopted in this study based on lower Q values and clear interpretability of the factors, and Fpeak = 0.32 was chosen. Source profiles of 16 PAHs are shown in Table S4. For the first factor, 2–3-ring PAHs, including Acp, Ace, Nap and Fle, got relatively high weighting. These LMW PAHs can be linked with coke production source (Khalili et al. 1995; Kong et al. 2013). The second factor only got relatively high loading on Ant, and the component is regarded as the marker of wood combustion source (Harrison et al. 1996; Khalili et al. 1995; Zhang et al. 2012). The relatively high loadings of BghiP and Inp signaled factor 3 as vehicular emission sources (Khalili et al. 1995; Liu et al. 2016a; Sofowote et al. 2008; Zhang et al. 2012). Finally, The fourth factor showed good loadings for Fla, Pry and Chr, consistent with sources related to coal combustion (Aydin et al. 2014; Liu et al. 2015; Sofowote et al. 2008; Wang et al. 2015).

Besides the extracted source profiles, the source contributions to ∑PAHs of four factors were also obtained by PMF model (Yang et al. 2013) and 98.4% of the measured ∑PAHs were explained. The highest contribution to the ∑PAHs in soils was coke oven emissions source (mean value is 34%), followed by coal combustion (33%), wood combustion (22%) and vehicular emission (11%). Coal industry is the regional pillar industry of Yulin, and there are hundreds of coal chemical industries and thermal power plants. The source contribution was primarily influenced by coal-derived activity.

Comparisons of source contributions by the two models

Both models were found to be available in quantifying source contributions for ∑PAHs. The fits between the measured and modeled ∑PAHs concentrations in 38 sites of the PMF and PCA-MLR models are presented in Fig. S2 and Fig. S3, respectively. The relationship between measured and modeled ∑PAHs concentrations was good and significant (R2 = 0.984 for PMF and R2 = 0.998 for PCA-MLR, respectively), and the slope of the regression line (1.01 for PMF and 1 for PCA-MLR, respectively) was close to unity indicating a close agreement (Aydin et al. 2014).

Additionally, the PAH contributions attributed to coal-derived sources (coke oven emissions and coal combustion source) identified by PCA-MLR (64%) are comparable to the sum (67%) of coke oven emissions and coal combustion identified as separate factors by PMF (34, and 33%, respectively). The wood combustion source obtained by PCA-MLR was 20% and PMF was 22%, respectively. The PAH contributions attributed to vehicular emission identified by PCA-MLC were 16% and PMF were 11%, respectively.

However, source apportionment derived from PCA-MLR model has unexplainable negative contributions to profiles, and calculation process of PCA-MLR model did not consider uniform uncertainties. PMF model analysis not only differentiated between coke oven emissions and coal combustion source, but overcame the problems of negative contributions to profiles (Karakas et al. 2017; Shi et al. 2009). Thus, PCA-MLR is not an optimal tool for quantifying source contributions for multiple sites; PMF is preferred method for source apportionment and coupled to health exposure risk to assess source-oriented risk.

Risk apportionment of PAHs by PMF-Total-CR/Total-HI method

The calculated total carcinogenic risk values of ∑9PAHs and total hazard index values of ∑8PAHs in this study are illustrated in Table S5 and Fig. S4. It happens that the adverse effects associated with the oral ingestion of soils are greater than those for dermal contact of soils and air inhalation via soil vapor, from the perspective of carcinogenic or hazard risk potency.

As shown in Fig. S5, among nine carcinogenic PAHs, CR values of Bap, BghiP, DahA and BaA exceed the target cancer risk level (1 × 10−6), at some sites, where individual PAHs will lead to cumulative site risks (Li et al. 2014b; USEPA 2002; Wang et al. 2015). Figure 5a shows that Total-CR values for 38 soil samples ranged from 2.63 × 10−7 to 5.99 × 10−6; the values at 60% of sites were within the 1 × 10−4–1 × 10−6 cumulative risk range (USEPA 2002). The median value was 1.21 × 10−6, which means increased potential for adverse efforts, with potential of 121 additional cancer cases per 100 million people due to exposure to PAHs in soils.

Fig. 5
figure 5

Spatial distribution of Total-CR and Total-HI values in the study area, a normal Total-CR values of 38 sampling sites, b normal Total-HI values of 38 sampling sites, c relative contributions of sources to Total-CR, d relative contributions of sources to Total-HI

The spatial distribution of non-carcinogenic risk values (Total-HI) is presented in Fig. 5b, Total-HI ranged from 7.59 × 10−5 to 5.44 × 10−3 and these were far below the safe level (= 1) (USEPA 2002). Higher level of Total-CR or Total-HI values was found at the sites near chemical industrial plants, indicating chemical industrial plants had significant influence on the spatial characteristics of health exposure risk in soils in the study region (Fig. 5a, b).

In this study, receptor models were utilized to quantitatively characterize the human health exposure risk of PAHs sources in soils of Yulin National Energy and Chemical Industry Base by combining the total carcinogenic risk and total hazard index values with estimated source contributions using PMF model (defined as PMF-Total-CR and PMF-Total-HI). Figure 5c, d shows the spatial distribution of relative risk contributions of four identified sources at Yulin of PAHs for 38 sites. There are 23 sites including S1–S6, S10–S11, S14–S17, S20–S23, S26–S27, S30, S33, S35 and S37–S38, where Total-CR values exceed target cancer risk, and the major exposure risk sources in these regions were related to coke oven emission source. It can be found that distribution of both PMF-Total-CR and PMF-Total-HI values was significantly influenced by land utilization types. Larger contribution to exposure risk posed by coal-derived sources was found at the sites near the chemical industry, while major contribution to exposure risk posed by vehicular emission sources was identified at the sites on the heavily populated sites.

According to Fig. 5, among the four source categories, coke oven emission contributed to the principally increased carcinogenic risk mean values, which may be attributed by two factors: (1) Coke oven emission presented the largest contribution to total PAH concentrations (Fig. S2), and (2) PAH compounds with high cancer slope factor were more prominent in coke oven emission than those in other three source categories. Different from PMF-Total-CR, PMF-Total-HI was highly apportioned from coal combustion source (Fig. 5), even though coal combustion less contributed to total PAH concentrations than coke oven emission. Coal combustion contributed 33% of the total PAH concentrations, while the non-carcinogenic risk value ascribed to this source reached 39%. The PMF-Total-CR and PMF-Total-HI results were more significant than the general Total-CR and Total-HI results for managing priority PAH source. For instance, the coke oven emissions should be controlled severely, in order to reduce the human health exposure risk from PAHs.

Conclusion

In this study, the concentrations, source contributions, total carcinogenic risk and total hazard index risk of 16 priority PAHs in soils of Yulin National Energy and Chemical Industry Base were investigated. Compared with other study, the concentrations of ∑16PAHs were at a medium level, with LMW PAHs accounting for the majority. The PCA-MLR and PMF models were applied to identify potential source categories and their contributions for PAHs in soils. Coke oven emissions, coal combustion, vehicular emission and wood combustion were the major source categories. The comparison of two models indicated that PMF is preferred model for this dataset. Exposure risk assessment of each source was carried out using a method of PMF-Total-CR/Total-HI. The results demonstrated that coke oven emissions apportioned the largest proportion of increased Total-CR, followed by coal combustion, vehicular emission and wood combustion. Increased Total-HI was highly apportioned from coal combustion source, despite its less contribution to total PAH concentrations. Results of this study are useful to clearly reveal the concentration, sources and potential risks of PAHs, as well as spatial distribution of risk associated with each of the source categories in soils from Yulin National Energy and Chemical Industry Base. An effective approach was provided for quantifying risk apportionment and studying human health.