Introduction

Polycyclic aromatic hydrocarbons (PAHs), containing two or more fused benzene rings, are ubiquitous contaminants detected in aquatic sediments throughout the world. PAHs are of great concern due to their potential and proven carcinogenicity. In addition, they exhibit a wide range of properties including toxicity, persistence, and mutagenic characteristics (Zedeck 1980; NRC 1983). These organic compounds have their origin in both natural and anthropogenic processes. Some studies have suggested that anthropogenic input of PAHs to aquatic sediments far surpasses natural process (NAC 1971). Major human activities which produce PAHs including biomass burning, pyrolysis of wood to produce charcoal and black carbon, coke production, manufacturing of gas fuel, combustion of fossil fuels in internal combustion engines and power plant, incineration of industrial and domestic wastes, oil refinery and chemical engineering operations, aluminum manufacturing, etc. By-products of these processes, which contain significant amount of PAHs, have been dumped on land, in water, or buried at subsurface sites. Airborne particulates carrying PAHs, generated from these processes, are transported worldwide in the atmosphere and eventually accumulate in soils and aquatic sediments (Zuo et al. 2007; Li et al. 2012).

Therefore, identifying the potential sources of PAHs in aquatic sediments is critical to better understand and control the contamination of PAHs. Several methods have been developed to determine the possible PAH sources in sediments, such as diagnostic PAH ratio approaches and receptor models. Diagnostic ratios of specific PAHs can provide qualitative information and have been widely applied to identify sources in various environments (Soclo et al. 2000; Rocher et al. 2004; Zhang et al. 2004; Wang et al. 2006; Li et al. 2006; Malik et al. 2011). However, their usage is restricted due to a lack of reliability. Receptor models could determine the pollution sources and quantify their relative contributions to the receptor (Thurston and Spengler 1985; Harrison et al. 1996; Zhang et al. 2012). Receptor models involving multivariate statistical methods (principal component analysis (PCA)/multiple linear regression model (MLR)) (Sofowote et al. 2008; Li et al. 2009; Shi et al. 2009, 2011), UNMIX model (Hopke 2003; Zhang et al. 2012), and positive matrix factorization model (PMF) (Sofowote et al. 2008; Vialle et al. 2011) have been proved to be useful tools in source apportionment studies. Some applications of PCA/MLR and PMF models to characterize pollution sources of PAHs in aquatic sediments have been published (Sofowote et al. 2008; Li et al. 2009; Zhang et al. 2012). The UNMIX model has been widely used for air source apportionment, and its application in sediments is rather scarce (Zhang et al. 2012).

In our previous work, the levels and spatial distribution of PAHs in surface sediments from middle and lower reaches of the Yellow River, China, were reported (Sun et al. 2009). The purpose of this work is to identify and apportion the contributions of the major sources of sedimentary PAHs in the middle and lower reaches of the Yellow River using PCA/MLR, PMF, and UNMIX. In addition, the distributions of contribution for each identified source category were studied as well. The results of this study will provide valuable information for regulatory actions to improve the environmental quality of the Yellow River, China.

Experimental methods

Sample collection

The sampling sites are illustrated in Fig. 1, details of the sampling stations are listed elsewhere (Sun et al. 2009). A total of 22 sampling sites along the middle and lower reaches of the Yellow River, and its tributaries are selected. Sediment samples were collected using grab sampler in November 2005 and then placed on ice and transferred to the laboratory directly. All sediment samples were stored at −18 ºC until analysis. In this study, 16 PAHs including naphthalene (Nap), acenaphthylene (Acy), acenaphthene (Ace), fluorene (Flu), phenanthrene (Phe), anthracene (Ant), fluoranthene (Fla), pyrene (Pyr), benzo[a]anthracene (BaA), chrysene (Chr), benzo[b]fluoranthene (BbF), benzo[k]fluoranthene (BkF), benzo[a]pyrene (BaP), indeno[1,2,3-cd]pyrene (InP), dibenzo[a,h]anthracene (DBA), and benzo[ghi]perylene (BghiP) were determined in the sediment samples.

Fig. 1
figure 1

Map of the study area

PAHs analysis

The homogenized sediment was spiked with surrogate standards and Soxhlet-extracted for 48 h with 250 mL of dichloromethane in a water bath maintained at 60 °C (Mai et al. 2001). Activated Cu was added for desulfurization. The extract for each sample was concentrated using a rotary evaporator and solvent-exchanged to hexane, and further reduced to approximately 1–2 mL by a gentle nitrogen stream. The concentrated extract was passed through a 1:2 alumina/silica gel glass column with 1 cm of anhydrous sodium sulfate overlaying the silica gel for cleanup and fractionation. Elution was performed with 15 mL of hexane first and then 70 mL of hexane/dichloromethane (7:3, v/v). The second fraction containing PAHs was reduced to 1–2 mL, subject to a solvent exchange to hexane, concentrated to 0.5 mL under a gentle purified nitrogen stream. The internal standard (hexamethylbenzene, 200 mg L−1, 5 μL) was added to the sample prior to GC/MS analysis.

PAHs were analyzed using a Hewlett-Packard 5890 gas chromatography and 5972 mass selective detector (GC-MSD) with a HP-5 capillary column (30 m × 0.25 mm × 0.25 μm) in the electron impact mode (70 eV). Instrumental conditions were as follows: the injector port and ion source were maintained 280 and 180 °C, respectively. Column temperature was programmed at 60 °C (hold for 2 min) increasing at 3 °C/min to 290 °C and hold for 30 min at 290 °C. The carrier gas was helium at a constant flow rate of 1.5 mL min−1. One microliter of each sample was injected in splitless mode. Mass range m/z 50–500 was used for quantitative determinations. Concentration of individual PAHs was obtained by the internal standard peaks area method and 6-point calibration curve for each component (Mai et al. 2001).

All analytical operations were conducted under strict quality control guidelines. The instrument was daily calibrated with standards. Method blanks (solvent), spiked blanks (standards spiked into solvent), matrix spike duplicates, and sample duplicates were analyzed routinely with field samples. In addition, surrogate standards were added to all of the samples to monitor procedural performance and matrix effects. The method detection limits (MDLs) of each PAHs were determined using six replicates of sediment spiked at three times IDLs, which were calculated from the lowest standards, extrapolating to the corresponding levels of PAHs that would generate a signal-to-noise ratio of 3:1. Non-spiked samples were also processed for blank subtraction. The MDLs were calculated by multiplying the standard deviation of the six spiked blanks by Student’s T value of 3.36 (one-side T distribution for 5° of freedom at the 99 % level of confidence).

Receptor models

  1. (1)

    PCA/MLR

    PCA/MLR is a traditional factor analytical tool for receptor modeling in environmental source apportionment studies (Sofowote et al. 2008; Li et al. 2009; Shi et al. 2009, 2011). Factor loading matrix and score matrix could be obtained through the input concentrations dataset of PAHs:

    $$ X=L\times T $$
    (1)

    Where X is the concentration matrix of PAHs, L is the factor loading matrix, and T is the factor score matrix. The potential source categories could be determined by the factor loading matrix. Then, the PCA scores are calculated according to the factor score matrix (Thurston and Spengler 1985). Finally, mass apportionment of each source to the total PAHs burden in sediments could be obtained through PCA scores using MLR.

  2. (2)

    PMF model

    PMF model, developed by Paatero and Tapper (1994), has been used to determine source profiles and contribution of PAHs in sediments based on factor analysis (Sofowote et al. 2008; Vialle et al. 2011). The model principle is to decompose the initial data matrix X (n × m) into the source contribution matrix G (n × p) and source profile matrix F (p × m), as well as the residual matrix E (n × m).

    $$ X= GF+E $$
    (2)
    $$ xij={\displaystyle \sum_{k=1}^p gjkfkj+ eij} $$
    (3)

    where n, m, and p represent the number of samples, number of PAHs, and number of independent sources, respectively; i = 1, …, n samples; j = 1, …, m species; k = 1, …, p source. The solution of PMF minimizes the objective function (Q) related to the residual and uncertainty using weighted least squares.

    $$ Q={\displaystyle \sum_{i=1}^m{{\displaystyle \sum_{j=1}^n\left(\frac{ eij}{\sigma ij}\right)}}^2} $$
    (4)

    where e ij is the difference between the observations and the model; σ ij is the uncertainty for each observation. The Q value, indicating the agreement of model fit, can be used to determine the optimal number of sources. The calculated Q by PMF should be approximately equal to the optimum theoretical Q estimated as Q = m × n − p × (m + n) (Wang et al. 2009). Source apportionment of PAHs in sediments by PMF had been described elsewhere in detail (Larsen and Baker 2003; Sofowote et al. 2008). In the present study, EPA PMF ver. 3.0 was employed.

  3. (3)

    UNMIX model

    The UNMIX model is a PCA-based receptor model with non-negative constraints, indicating that negative results would not be obtained by the UNMIX model. For a dataset with n samples and m PAHs, the UNMIX model firstly decreases the number of sources by performing a singular value decomposition of the data matrix, and then the UNMIX further reduces source compositions by projecting the dataset to a plane perpendicular to the first axis of N-dimensional space, which the edges of the projected data suggest the samples that determine the sources (Larsen and Baker 2003). Further details of the UNMIX model can be found in the references (Henry 2003; Larsen and Baker 2003; Zhang et al. 2012). In this work, EPA UNMIX 6.0 model was used.

Results and discussion

Method validation

The relative percentage difference between daily calibration and the 6-point calibration was less than 10 %. All PAHs in the method blanks were under the IDLs. The mean recoveries of surrogate standards including naphthene-d8, acenaphthene-d10, pheanthrene-d10, chrysene-d12, and perylene-d12 were 41.74 ± 7.57 % (n = 6), 61.29 ± 6.33 % (n = 6), 86.70 ± 9.21 % (n = 6), 92.45 ± 12.04 % (n = 6), and 102.56 ± 10.90 % (n = 6), respectively. The MDLs of each PAHs were 3.07 ng g−1 (Nap), 1.53 ng g−1 (Acy), 0.73 ng g−1 (Ace), 0.77 ng g−1 (Flu), 0.96 ng g−1 (Phe), 0.56 ng g−1 (Ant), 0.60 ng g−1 (Fla), 0.34 ng g−1 (Pyr), 0.44 ng g−1 (BaA), 0.51 ng g−1 (Chr), 0.51 ng g−1 (BbF), 0.44 ng g−1 (BkF), 0.45 ng g−1 (BaP), 0.38 ng g−1 (DBA), and 0.44 ng g−1 (BghiP).

Diagnostic ratios of PAHs

PAHs congener distribution varied with the composition and combustion temperature of the organic material as well as the source. Molecular ratios of selected PAHs, such as the ratio of Ant/(Phe + Ant) and Flua/(Flua + Pyr), could be used to identify the possible sources. The ratios of Ant/(Phe + Ant) and Flua/(Flua + Pyr) in sediment from the middle and lower reaches of the Yellow River were plotted in Fig. 2. As shown in Fig. 2, the ratios of Ant/(Phe + Ant) in sediments ranged from 0.06 to 0.16 with a mean of 0.12. For Ant/(Phe + Ant), a ratio <0.1 suggested petrogenic source, while a higher ratio than 0.1 meant pyrogenic pollution (Soclo et al. 2000). Therefore, pyrogenic sources were the major sources of PAHs in sediments from the middle and lower reaches of the Yellow River. As for Flua/(Flua + Pyr), the ratios below 0.4 suggested petrogenic origins, between 0.4 and 0.5 implied petroleum combustion, whereas the ratios above 0.5 indicated coal, grass, and wood combustion origins (Soclo et al. 2000). It was concluded from Fig. 2 that the ratios for Flua/(Flua + Pyr) in most regions were higher than 0.4, with an average value of 0.53. These were similar to the ratios for combustion, including combustion of petroleum, coal, wood, and biomass (Soclo et al. 2000; Rocher et al. 2004; Zhang et al. 2004). Based on the above information, it was concluded that PAHs in sediments from the middle and lower reaches of the Yellow River were not from a single source but a mixture (petroleum combustion, coal combustion, and biomass combustion).

Fig. 2
figure 2

Diagnostic ratio plots of Ant/(Ant + Phe) vs Fla/(Fla + Pyr)

Identification and source apportionment using PCA/MLR

Before statistical analysis of data, undetectable concentrations were replaced by half of the limit of detection. Statistical analyses, including the Kolmogorov–Smirnov (K–S) test, PCA/MLR, were performed using SPSS 17.0. The K–S test was carried out to test the frequency distribution of PAHs data, and all of the variables achieved a normal distribution with P > 0.05.

Source appointment by PCA

PCA was performed after varimax rotation of PAHs concentrations in sediments from the middle and lower reaches of the Yellow River, accounting for the total variance of the set of data. Loading determined the most representative PAHs compounds in each factor and usually a value >0.5 was selected. Table 1 showed the factorial weight matrix obtained from PAHs in sediments from the middle and lower reaches of the Yellow River. As shown in Table 1, three factors were extracted for the studied area. Factor 1 explained 74.8 % of the total variance of the data and had high (>0.7) positive loadings on Ind (0.917), BghiP (0.915), BbF (0.902), BkF (0.887), BaP (0.888), and DBA (0.800) and moderate (>0.5) positive loadings on Phe (0.643), Ant (0.601), BaA (0.579), Chr (0.566), and Flu (0.540). Flu was divided in two factors, while in factor 3, it showed a higher load. This factor could be selected to represent emissions from vehicles because it aggregated mainly PAHs of high molecular weight, except for Flu, Phe, and Ant, which had low molecular weight (LMW). According to the sources fingerprints summarized in literatures, elevated levels of BkF relative to other PAHs had been suggested to indicate diesel vehicles (Venkataraman et al. 1994; Larsen and Baker 2003), while BghiP had been identified as tracers of auto emissions (Harrison et al. 1996; Li and Kamens 1993; Miguel and Pereira 1989). In addition, InP was also found in both diesel and gasoline engine emissions (May and Wise 1984; Larsen and Baker 2003). Based on the above information, factor 1 was selected to represent traffic emission. The traffic emission source was possibly attributed to the rapid industrialization resulting in a large amount of petroleum and diesel consumption. This type of emission could be confirmed by the diagnostic ratio (Fig. 2).

Table 1 Rotated component matrix of PAHs in sediments from middle and lower reaches of the Yellow River

Factor 2, responsible for 8.4 % of the total variance, was highly loaded in Fla (0.933), Pyr (0.885), BaA (0.782) and to a lesser extent in Chr (0.634), Ace (0.607), and Ant (0.568). It was consistent with sources related to coal combustion. Khalili et al. (1995) noted that Fla, Pyr, and Chr were indicators of coal combustion. Larsen and Baker (2003) identified that Fla and Pyr were the typical markers for coal combustion. In addition, as summarized in the study by Harrison et al. (1996), Fla, Pyr, and Chr were considered to be the tracers of coal combustion. Thus, factor 2 was attributed tentatively to coal combustion, which was previously mentioned using ratio values. The source of coal combustion in the study area might be associated with house heating and industrial by using coal as the main energy.

The third rotated factor, contributed 7.0 % of the total variance, was characterized by high (>0.7) positive loadings on Acy (0.841) and moderate (>0.5) positive loadings on Nap (0.607), Ace (0.515), Flu (0.592), and Phe (0.578). This factor aggregated primarily PAHs of low molecular weight (LMW). LMW PAHs of Nap, Acy, Ace, Flu, and Phe were the markers from low-temperature pyrogenic processes such as biomass combustion of straw and firewood (Jenkins et al. 1996; Yang et al. 2006; Zhang et al. 2008). With regard to PAH emissions from biomass combustion, they were reported in subsection diagnostic ratios. This source was easy to understand by the fact that biomass combustion of straw and firewood was a common practice for cooking and heating in rural area of the studied area. In addition, Flu and Phe were indicators of coke oven origin (Simcik et al. 1999; Shen et al. 2007; Ma et al. 2010). This is not unexpected since that the middle and lower reaches of the Yellow River located in Henan province, where existed many industries for coke production. Therefore, factor 3 seemed to represent a combination of biomass burning and coke oven origin.

PCA suggested that traffic emissions, coal combustion, biomass burning and coke oven were the main sources of PAH contamination in sediments from middle and lower reaches of the Yellow River. This was basically comparable to the results from other researches in Huanghuai Plain and North China (Xu et al. 2006; Zuo et al. 2007; Yang et al. 2012).

Estimation of source contribution by MLR

MLR analysis was carried out on the factor scores in order to obtain mass apportionment of the three sources to the total PAHs in each sample from the middle and lower reaches of the Yellow River. The results of source contributions to the sum of 16 PAHs concentrations were listed in Table 3. In addition, the correlation of calculated ∑PAHs concentrations with measured values was presented in Fig. 3. The correlation coefficient could show how well the results were fitted by the model. If the value was close to 1, the estimated result would be more acceptable. According to correlation coefficient (R 2 = 0.972) in Fig. 3, the fitted results in this work could be accepted. Thus, the mean percent contribution was 36.6 % for vehicle emission, 34.2 % for coal combustion, 29.2 % for biomass source, and coke oven origin (Table 2).

Fig. 3
figure 3

Comparison of measured ∑PAHs concentration with predicted PAHs concentration determined by PCA/MLR, PMF, and UNMIX model

Table 2 Average source contributions (%) for PCA/MLR, PMF, and UNMIX model

Identification and source apportionment using PMF

PMF analyses were performed using 16 PAHs from 22 sites in the present study. We selected the random seed mode with 100 of the number of random starting point. The number of factors was examined from 3 to 10. The estimated PAHs by the PMF versus the observed PAHs concentrations obtained by our field measurement was compared. Five factors that gave the best correlation (R 2 = 0.999) was chosen for further discussion in this study. The Q value produced by PMF was approximately 158.3, a value in very close agreement with the theoretical Q of 162, which suggested that there were only five PMF factors in this data set. Each factors obtained by PMF in this study was compared with several profiles reported by the previous works. The results were shown in Table 3. As seen by Table 3, five distinct sources of PAHs were identified in this study. The identified sources were (1) coal combustion, (2) coke oven, (3) vehicle exhaust, (4) residential coal combustion, and (5) biomass burning.

Table 3 Source profiles obtained from positive matrix factorization

Factor 1 mainly consisted of three- or four-ring PAHs such as Phe, Ant, Flua, Pyr, BaA, and Chr. A similar profile was provided for coal combustion in the published literature (Khalili et al. 1995; Larsen and Baker 2003; Harrison et al. 1996). We considered factor 1 the coal combustion such as a boiler that was mainly used for electricity generating. Factor 2 was identified as coke oven based on loadings of Acy, Flu, and Phe because Flu and Phe had been reported as a tracer for coke oven (Simcik et al. 1999; Shen et al. 2007; Ma et al. 2010). The profile in factor 3 was dominated by BbF, BkF, BaP, Ind, and BghiP. The predominance of these PAHs had been attributed to a profile of vehicular emission. Factor 4 was mainly composed of Flu, Phe, Ant, Chr, and BbF. The profile of factor 4 was similar to that for coal combustion, especially for residential heating (Esen et al. 2008). Therefore, factor 4 was related to coal combustion that was used for residential heating. The profile of factor 5 was similar to that for biomass burning. Elevated levels of low-molecular-weight PAHs including Nap, Phe, Ant, and Pyr had been suggested to indicate biomass burning (Jenkins et al. 1996; Yang et al. 2006; Zhang et al. 2008). Factor 5 was selected to represent biomass burning. The estimated PAHs, which was the total concentrations of all PAHs obtained by the PMF model, was compared with observed PAHs concentrations obtained by this study. The ratio of estimated to observed concentrations was almost unity (Fig. 3, R 2 = 0.999, n = 22), which suggested that all PAHs were well estimated by the PMF method designed for this study. In addition, the source contributions to ∑PAHs of five factors were also obtained by PMF model (Table 2). The average contribution to the ∑PAHs in sediments was 25.1 % from vehicular emission source, followed by coal combustion (28.8 %), coke oven (17.6 %), residential coal combustion (17.1 %) and biomass burning (11.9 %).

Identification and source apportionment using UNMIX

From the original matrix of Baltimore PAHs, four “seed” compounds were chosen to start the “overnight” mode of UNMIX, and 16 remained at the completion of the analysis. Each compound had a correlation coefficient of 0.8 for a particular source. As seen from Table 4, UNMIX determined four sources of PAHs in the data. In a manner similar to the PCA/MLR and PMF analysis, it was clear that one source had a vehicular signature with high levels of BbF, BkF, BaP, Ind, and BghiP. One source was similar to the coal signature with high levels of Phe, Ant, Flua, Pyr, BaA, and Chr. Another source had high fractions of Phe, Flua, and Pyr but low levels of Chr and BbF, which was attributed to biomass burning. The last source contained substantial levels of Acy, Flu, and Phe, which was similar to the coke oven found in the PCA/MLR and PMF. The estimated average contribution for four sources was in the order of vehicular emission (36.7 %) > coal combustion (34.0 %) > coke oven (15.7 %) > biomass burning (13.6 %) (Table 2).

Table 4 Source profiles obtained from UNMIX

Comparison of source contributions by three receptor models

The estimated source contributions of PAHs in sediments from the middle and lower reaches of the Yellow River for the three receptor models were discussed here. The fits between the measured and estimated total PAHs concentrations in 22 sites by the three models were presented in Fig. 3. As seen from Fig. 3, the most of predicted ∑PAHs concentrations were close to the measured concentrations with R 2 values ranging from 0.98 to 1.00. It also suggested the good application of the three models to the sediment dataset. The overall source contributions presented in Table 2 compare well among the three methods. Vehicles contributed 25.1–36.7 % of the PAHs in sediments, coal 34.0–41.6 %, biomass burning and coke oven having the smallest disparity of 29.2–33.2 %.

Distribution contributions of PAHs sources

The spatial distribution of PAH contributions from each source category (extracted by three models) in sediments from the middle and lower reaches of the Yellow River was studied. The results were plotted in Fig. 4. Taking the sources distribution extracted from PCA/MLR as an example, samples T2, T4, and L1 got high contributions from coal combustion. It was due to that these sampling sites are located at mixed industrial and residential region, where coal combustion of residential heating supply, chemical, and coal industries were abundant. Apparently, municipal and industrial wastes enriched with combustion-derived PAHs have been discharged into the river and mainly deposited in sediments. Samples T7 and L2 got dominated contributions from vehicular emissions. It is reasonable, as site L2 was a famous tourist attraction, where leakiness of gasoline from yachts could give the dominated contributions, while site T7 is located close to the highway. So, pollution there was possibly mainly caused by automobile exhausts and street runoff. Samples M3 got dominated contributions from biomass burning. For the other sites, the potential source categories were complex. It is difficult to determine which source category was the most important. As for PMF and UNMIX, the similar distributed patterns for as shown in the PCA/MLR analysis were recorded. The results of PCA-MLR showed very high correlations (R 2 = 0.972); however, negative source contributions in some samples (M1, M10, T5) were observed in the source contribution plots. These negative contributions cannot be explained rotationally and are the outcomes of improper variable scaling inherent in eigenvalue-based (e.g., PCA) methods. However, with PMF and UNMIX, sources no longer exhibit extreme negative contributions.

Fig. 4
figure 4

Distribution of source contributions for each sediment sample from middle and lower reaches of the Yellow River

Conclusion

In this work, the sources of PAHs in sediments from the middle and lower reaches of the Yellow River were determined by using three source apportionment methods, including PCA/MLR, PMF, and UNMIX. All the three methods showed that the contributions of biomass burning, coal combustion, traffic-related pollution, and coke oven were dominant in the sediments from the middle and lower reaches of the Yellow River. In addition, overall source contributions compared well among methods. Vehicles contributed 25.1–36.7 %, coal 34.0–41.6 %, and biomass burning and coke oven 29.2–33.2 % of the total PAHs, respectively. Coal combustion and traffic-related pollution contributed approximately 70 % of anthropogenic PAHs to sediments, which indicated that energy consumption was a predominant factor of PAH pollution in the middle and lower reaches of the Yellow River. In addition, the distributions of contribution for each identified source category were studied, which showed similar distributed patterns for each source category among the sampling sites.