Introduction

Soil contamination of industrially contaminated sites has received wide attention in China in recent years (Luo et al. 2009; Wang et al. 2010). The process of industrial production and anthropogenic emissions is the major source of PAHs in the atmosphere and contaminated soils seriously in sites (Wang et al. 2003). It has been confirmed that soils in large-scale coking contaminated sites have been severely contaminated by PAHs (Wang et al. 2012). The PAHs present in contaminated sites not only directly affects soil physicochemical properties and the environment, but also threaten human health in the contaminated area (Blake et al. 2007; Mostert et al. 2012). The aims of removing PAHs from contaminated soils are urgently required in the risk management and remediation of contaminated sites. In some large-scale abandoned industrially contaminated sites in China, the historical production layout, storage, management, and other factors have led to extreme contamination values, and they have varying levels of contamination across the sites. Industrial site contamination of PAHs is typically characterized by significant variability in contaminant concentrations. This means that it is often difficult to describe spatial patterns and identify pollution hotspots.

Following remedial investigations of contaminated sites, hotspots of serious contamination have been found and these have an important role in the formulation of remediation strategies (Sinha et al. 2007). Hotspot areas also need to be identified in order to develop effective management practices for soil contamination (Zhang et al. 2008). There are many definitions of what is meant by “hotspots” (Cressie 1991; Komnitsas et al. 2009) and several methods have been proposed to identify hotspots (Kulldorff 1997; Carson 2001; Patil et al. 2001). Spatial autocorrelation analysis approaches may also be useful tools for identifying hotspots and spatial patterns of pollution. A few spatial autocorrelation analysis methods have been suggested for hotspot or spatial cluster pattern identification, such as Moran’s I (Jing et al. 2010; Prasannakumar et al. 2011), Geary’s C (Tiefelsdorf et al. 1997; Barreto-Neto et al. 2004), Getis’ G (Lai et al. 2009; Bohórquez et al. 2011), and Join Count analysis (Kabos et al. 2002; Epperson 2003), but Moran’s I index seems to be the most widely used (Getis et al. 1996). Moran’s I method has been used in a number of research fields, including environmental management (Zhang et al. 2004; Brody et al. 2006; Castillo et al. 2011; Su et al. 2011). However, within the geoscientific variables of a large-scale industrially contaminated site, this method is relatively untried.

This study used Global Moran’s I and Local Moran’s I indices of spatial autocorrelation to analyze the statistical structure characteristics of samples from a contaminated site. Global Moran’s I was used to describe the correlation distance and spatial pattern of soil PAHs. The standardized spatial autocorrelation coefficients of different spatial lags were arranged in order and mapped on to standardized spatial correlograms, which were used to show spatial dependence. The Local Moran’s I index was used to calculate the relationship between each sample and its neighbors, which allowed hotspots and cool spots for soil PAHs to be identified at the contaminated site. The research has provided important information about contamination at the site that can be used at other sites to aid remediation investigations and the creation of remediation strategies for large-scale industrially contaminated sites.

Materials and methods

Site characteristics

The plant is located in the north of China and has a total land area of 1.35 km2. It was one of China's large-scale independent coking chemical industrial enterprises that had been in operation for over 40 years. The major products were coke, coke oven gas, tar, benzene, ammonium sulfate, asphalt and naphthalene, amongst a total of 40 chemical products. Serious environmental pollution was caused by poor production technology and pollution controls during its initial period, and carcinogenic substances and harmful emissions have damaged the area taken up by the plant and its surrounding environment. The plant took in coal as a raw material to produce gas and coke and extracted coal-based chemicals from crude tar. On site, the facility consisted of a former coal preparation factory, coking factory, sieve coke factory, gas purification system, tar factory, and a gas factory. Spills and leakages during gas purification and refining, tar processing, storage, transport and production processes, as well as planned and unplanned emissions, were the major causes of the field pollution characteristics shown by the polycyclic aromatic hydrocarbons and associated compounds. Soil contamination at the site is serious, based on site reconnaissance and relevant data analysis. It is now imperative to undertake a site risk assessment, contamination management, and other related work.

Sampling and chemical analyses

Based on the plant’s historical production, management and technological layout, a total of 64 monitoring points were first sampled in an effort to pick soil sampling points in areas of potentially contaminated ground and to confirm known contamination areas, contamination depth, and contaminant species. A second sampling grid and monitoring points were combined with the first in order to provide a comprehensive picture of contamination at the site. The monitoring points and first analysis focused on the seriously polluted areas. A total of 114 effective soil sampling points were used in both sampling visits.

For the extraction of PAHs, 10 g of soil sample was extracted with acetone/dichloromethane (1/1, v/v) by ASE-300 (Dionex, Beijing, China). The extracted PAHs were concentrated by organomation and eluted with approximately 30 mL dichloromethane/n-hexane (2/1, v/v) and concentrated to 1 mL for analysis (Grimalt et al. 2004). PAHs in the extracts of all samples were analyzed by gas chromatography–mass spectrometry [Agilent, 6890N GC, 5975B mass spectrometric detector, USA] equipped with an HP-5MS capillary column (30 m, 0.25 mm inner diameter × 0.25 mm film thickness, Agilent, USA). In this method, the identification of 16 priority PAHs was performed by gas chromatography–mass spectrometry, and quantification analysis was based on peak area external reference of 16 PAH standard sample (Supelco Co., USA) containing Nap, Acy, Ace, Fle, Phe, Ant, Fla, Pyr, Baa, Chr, Bbf&Bkf, Bap, Daa, Bgp, Inp, the analytical procedure was comprehensively evaluated against quality control acceptance criteria (USEPA 2007a), the linear quantitative equation was obtained with r 2 > 0.99 (USEPA 2007b), and the method detection limits ranged from 10 to 15 μg kg−1, while the recoveries were 50 % to 105 % with relative standard deviation lower than 11 %.

Spatial autocorrelation

Spatial autocorrelation refers to the correlation of the same variable in different spatial positions and the relationship between a location variable value and nearby values. Ordinary correlation refers to the mutual relationship between two or more variables. Spatial autocorrelation includes spatial clusters and spatial outliers. A spatial cluster refers to a variable with high concentration surrounded by variables that also have high concentrations. In contrast, a spatial outlier means a high variable concentration surrounded by variables with low concentrations. Once the variables show a certain spatial regularity, there is spatial autocorrelation between them.

Spatial autocorrelation uses global and local indicators. Global spatial autocorrelation describes the overall distribution of a variable in order to determine if spatial clusters of this variable exist over a larger area and uses a single value to reflect the region's degree of autocorrelation. Local spatial autocorrelation computes each spatial unit in relation to neighboring units on a property. There have been many methods proposed for the calculation of spatial autocorrelation, such as Moran’s I, Geary’s C, Getis’ G, Join Count et al.’s Express Global spatial autocorrelation characteristic and Moran’s I and Getis’ G et al.’s Express Local spatial autocorrelation characteristics. Moran’s I seems to be the most often used, so this study identified pollution hotspots at the large coking contaminated site in north China using Moran’s I index.

Global Moran’s I index

Global Moran’s I was introduced by Moran in 1948 and examines whether spatial correlation exists or not over an entire region. The Global Moran’s I statistic for spatial autocorrelation is given as (Cliff et al. 1981):

$$ I=\frac{n}{{\displaystyle \sum_{i=1}^n{\displaystyle \sum_{j=1}^n{C}_{ij}}}}\cdotp \frac{{\displaystyle \sum_{i=1}^n}{\displaystyle \sum_{j=1}^n}{C}_{ij}\left({x}_i-\overline{x}\right)\left({x}_j-\overline{x}\right)}{{\displaystyle \sum_{i=1}^n}{\left({x}_i-\overline{x}\right)}^2} $$

where n is the number of the variable value for the whole region, x i and x j are the variable values at locations: i and j, \( \overline{x} \) is the mean of x, C is the spatial weights matrix and C ij is the spatial weight between locations: i and j. The value of Moran’s I usually varies between −1 and 1. A value that approximates to −1 indicates negative autocorrelation and a value that approximates to 1 indicates positive autocorrelation. No autocorrelation results in a value close to 0.

Generally, normal approximation is a precondition, so the Global Moran’s I index can be standardized to Z(I) as follows (Cliff et al. 1981):

$$ Z(I)=\frac{I-E(I)}{\sqrt{ Var}(I)} $$

where \( \begin{array}{c}\hfill E(I)=-\frac{1}{\left(n-1\right)}; Var(I)=\frac{1}{w_0^2\left({n}^2-1\right)}\left({n}^2{w}_1-n{w}_2+3{w}_0^2\right)-{E}^2(I)\hfill \\ {}\hfill {w}_0={\displaystyle \sum_{i-1}^n{\displaystyle \sum_{j=1}^n{w}_{ij};{w}_1=0.5{\displaystyle \sum_{i-1}^n{\displaystyle \sum_{j=1}^n{\left({w}_{ij}+{w}_{ji}\right)}^2;{w}_2={\displaystyle \sum_{i=1}^n\left({w}_{i\ast }+{w}_{\ast i}\right)2}}}}}\hfill \end{array} \)where w i ∗ is the sum of all weights located in the row i and w ∗ i is the sum of all weights in column i. Following significance testing on the I value (at 0.05 level), if Z(I) is greater than 1.96 or smaller than −1.96, then it signifies an existing significant positive or negative spatial autocorrelation, respectively.

Local Moran’s I index

Local Moran’s I is the index used to describe the locations of spatial clusters and spatial outliers. This study used the Local Moran’s I index to identify spatial clusters and outliers. The function is as follows:

$$ \begin{array}{c}\hfill {I}_i=\frac{\left({x}_i-\overline{x}\right)}{S^2}{\displaystyle \sum_{i=1}^n{C}_{ij}\left({x}_i-\overline{x}\right)}\hfill \\ {}\hfill {S}^2=\frac{1}{n-1}{\displaystyle \sum_{j=1j=i}^n{C}_{ij}}{\left({x}_i-\overline{x}\right)}^2\hfill \end{array} $$

The notations of the methods are similar to that for Global Moran’s I. Local Moran’s I can be standardized, and the significance of a spatial autocorrelation can be tested. The methods are similar with that used for the Global Moran’s I.

Data analyses using computer software

The calculation of the spatial autocorrelation index was conducted using OpenGeoDa 1.0.1 and STARS 0.8.2 software. The maps were produced using OriginPro 7.5 and ArcGis 10.0 software. Multivariate and statistical analyses on the sample data were performed using SPSS14.0.

Results and discussion

Correlation analysis for PAHs

Correlation analysis can be used to test similarity among PAH sample concentration data. The correlation analysis of different pollutants helps analyze the cause of soil contamination and identify pollutant sources and typical pollutants. The data for PAHs in soils from the contaminated coking plant site used in this study varied greatly and were highly skewed, which resulted in a non-normal distribution, so the Spearman statistical test was used to analyze the correlation. Table 1 presents the coefficients between soil PAHs. The results showed that all the PAHs were highly correlated (correlation is significant at the 0.05 level). The higher correlations between soil PAHs may indicate that these PAHs had similar sources. Although significantly positive correlations existed between PAHs, the correlation coefficients were different. For example, the correlations for Nap and Acy and Nap and Ant were significant at r = 0.814 and r = 0.674, respectively (at the 0.05 level) but different. A high correlation coefficient represents a greater correlation between the contaminants.

Table 1 Correlation analysis results for PAHs

Principal component analysis of PAHs

PCA is a useful statistical technique, which can reduce the initial dimension of a large dimension dataset and is a common technique for finding patterns in a dataset. The pollution caused by the 16 different PAHs analyzed in this study was not the same, and the sample numbers that exceeded the threshold concentration were also different. Two principal components (PC), with eigenvalues higher than 1, were extracted. Their eigenvalues were 9.92 and 2.51. The PCA method led to a reduction in the original 16 PAHs to two components that could explain 84.52 % of the data variation. Representation of the variables by the two components was adequate, considering the commonalities obtained in the analysis. In the rotated principal component analysis of the matrix (Table 2), the first principal component was dominated by Chr, Bbf&Bkf, Inp, Daa and Bgp, and the second principal component was dominated by Nap, which meant that these six PAHs were representative of the site. In the scattergram of “factor scores” (Fig. 1), principal component 1 is the X-axis and principal component 2 is the Y-axis. The scattergram shows that for division number 2 for sample numbers 13, 27, 39 and 89, principal component 2 gave the highest scores and principal component 1 gave relatively low scores. The corresponding concentration for Nap seriously exceeded acceptable levels. For division number 1 of the sample numbers, principal component 1 gave the highest scores and principal component 2 gave relatively low scores, which showed that the first component’s five PAHs showed similar levels of contamination.

Table 2 Principal component matrix analysis for PAHs
Fig. 1
figure 1

PCA scattergram of “factor scores” for each sample location

Status of PAH concentrations and identification of outliers

Table 3 shows a number of statistical parameters, including skewness, kurtosis, SD and CV, for the concentrations of PAHs in soils at the contaminated site. The concentration ranges were quite wide. For example, Nap concentrations in the 114 samples ranged from 0.01 to 4,100.00 mg kg−1, with a mean of 100.91 mg kg− 1. The data had high CVs and skewness. This implied that the original data included extreme values and were not normally or lognormally distributed, according to the Kolmogorov–Smirnov (K-S) test, because all the K-S P parameters were 0.00. In contaminated site investigation and remediation, only extremely large values have a major effect on the statistics and are of concern. In this study, the box-and-whiskers plot method was used to identify extreme values (extreme values were defined as those which were greater than the sum of the mean values and at least three times the standard deviation.).

Table 3 Summary statistics for the original PAH data

Global spatial autocorrelation analysis

In general, the higher the absolute value of Moran’s I, the stronger the spatial autocorrelation; the larger the absolute value of standardized Moran’s I, the more significant the spatial structure is (Huo et al. 2012). Figure 2 shows the standardized spatial correlograms for the soil PAHs. The Y-axis is the standardized Moran’s I and the X-axis is the lag distance. The values of the spatial autocorrelation for regionalized variables can be obtained from the standardized Moran’s I map. The data can also be used to estimate if spatial clusters or spatial outliers exist in the study area and to compare the significant spatial patterns of the variables. Generally, there were more than two positive autocorrelations in the standardized Moran’s I map (the nearer distance represents the spatial correlation distance of the variable). Figure 2 provides the standardized Moran’s I values for the six PAHs, and the results show the standardized Moran’s I values for Nap, which were positive at a distance from 350 to 750 m and 1,850 to 2,100 m. This meant that spatial clusters for Nap concentrations existed in these ranges. The standardized Moran’s I values for Nap were negative at a distance from 800 to 1,750 m, which implied spatial outliers existed in this range. Similarly, five other PAHs showed spatial clusters and spatial outliers in the study area. With the exception of Bgp, the amplitudes of the spatial outliers were larger than they were for the spatial clusters, which meant that negative spatial autocorrelation existed in the study area. The autocorrelation distances for Nap, Chr, Bbf&Bkf, Inp, Daa, and Igp were 750, 850, 1,200, 850, 750, and 1,200 m, respectively.

Fig. 2
figure 2

Standardized spatial correlograms for soil PAHs

Using the Local Moran’s I index to identify hotspots

Hotspots and cool spots for soil PAH distribution can be identified using the Local Moran’s I index method. Cool spots are often considered as clean areas, but soil PAH pollution studies of this particular contaminated site focused on the identification of high pollution risk areas, so it was important to identify hotspots if the degree of pollution at the site was to be quantified. The Local Moran’s I index was calculated based on a distance band of 300 m. The results are shown in Fig. 3. It was found that all soil PAHs showed significant spatial clusters and spatial outliers. The high–high values showed hotspots and low–low values showed cool spots. For Nap, there were nine high–high values, clustered in the mid-bottom part of the site and seven low-low spatial clusters. There were also several low–high and high–low outliers identified in the mid-bottom part of the site. For Chr, the number of high–high values and low–low values were also nine and seven, but the number of low–high spatial outliers increased to 15 and most of samples were located in the eastern part of the site. Bbf&Bkf and Daa had 12 and six high-high values that were clustered, respectively. Bgp and Daa, which had eight and five low–low values that were clustered, respectively, were the most and fewest low–low clusters of the six PAHs.

Fig. 3
figure 3

Local spatial cluster characteristics for soil PAHs

All the PAHs had four kinds of spatial clusters. Although they had different distributions, pollution hotspots were visible in the mid-bottom part of the site, where soil PAH samples with high concentrations were surrounded by samples with similarly high PAHs concentrations. Taking the hotspot distribution results into consideration, it seems that the hotspot distribution results were consistent with the contamination that would have been caused by past processing operations at the contaminated site. Overall, areas at the mid-bottom part of the site were mainly where the production process workshops for coking, gas purification, tar products, etc. were situated and were expected to be seriously contaminated. It was to be expected that nearly all of the high–high values would be found in these areas. A relationship can be established between where the pollution hotspots were found and the influencing factors based on the occurrence and distribution of processes around the contaminated site. The areas of the site mainly contained coal preparation areas and gas purification workshops, and these areas did not show serious contamination. Relatively low–low values were found in these areas.

Zhang et al. (2008) used Local Moran and GIS results to identify pollution hotspots for Pb in urban soils. He found that different distance bands produced different results, so he suggested that a number of factors should be considered when choosing suitable distance bands. A distance band of 300 m was used in this study. The distance between samples was 80–200 m. The spatial interpolation of un-sampled areas at a contaminated site is often based on the weighted average of the nearest neighbors, so 300 m was chosen as weight Matrix for this study.

Pollution hotspots in large-scale abandoned industrially contaminated sites need to be identified for remediation investigations. The knowledge of hotspots and the statistical characteristic of hotspots are useful for site environmental management. This study showed that the Global Moran’s I and Local Moran’s I indices of spatial autocorrelation are reliable tools for classifying spatial clusters and spatial outlier characteristics of PAH concentration data sets, and also for identifying hotspots of PAH pollution in contaminated sites. The research has provided important information about hotspot identification for the environmental risk management of large-scale industrially contaminated sites.

Conclusions

Principal component analysis and correlation analysis were applied to 114 samples, in which the total concentrations of 16 PAHs were measured at a contaminated large-scale independent coking chemical industrial site in China. Two significant components were extracted by PCA, which explained 84.52 % of the total variance. The analysis showed that Chr, Bbf&Bkf, Inp, Daa, Bgp, and Nap were representative of the whole site.

The results of the global spatial autocorrelation analysis showed that soil PAH distribution in soil at the contaminated site was not random, but there were some significant spatial autocorrelation features. The spatial positive correlation ranges for Nap, Chr, Bbf&Bkf, Inp, Daa, and Igp were found at 750, 850, 1,200, 850, 750, and 1,200 m, respectively.

Different significant spatial structures existed for the PAH distribution in the target site according to the Local Moran’s I index analysis. The pollution hotspots were located in the mid-bottom part of the site, which contained workshops of coking, gas purification, tar products, which were suspected to be the seriously contaminated areas in the site. Pollution hotspots at this contaminated site needed to be identified so that efficient environmental remediation strategies can be developed.