Introduction

Soil contamination, especially agricultural soil contamination had become a serious environmental problem because it poses a serious threat to human health by entering into food chains and to environmental security by leaching into ground water (Romic and Romic 2003; Shen and Chen 2000). To solve the food safety problem, modern agriculture and facilities agriculture, encouraged in densely populated developing countries such as China, and many modern agricultural demonstration zones (MADZs) were builted in China from the 1980s, twentieth century. The rapid industrialization of agriculture has resulted in the incorporation of several types of pollutants to soil, such as heavy metal (HM). In numerous kinds of soil pollutions, HM contaminations have become an important environmental issue because of their nonbiodegradable nature and long biological half-live for elimination from the body (Raghunath et al. 1999; Gallego et al. 2002; Cui et al. 2005). Agricultural land is fundamental resource of agricultural production that provides livelihood for the majority of people in developing countries. It is essential to identify the source of pollution and the pathway of soil contaminations enter into agricultural soil, i.e., pollution source analysis, in order to be able to formulate a policy to control or eliminate pollution and ensure food safety.

Source analysis was one of the major concerns of the present environmental study. Multivariate analysis, such as factor analysis (FA) and cluster analysis (CA) has been widely used to assist the interpretation of environmental data (e.g., Tuncer et al. 1993; Einax and Soldt 1999) and to distinguish between natural and anthropogenic inputs (e.g., Jobson 1991; Hopke 1992; Facchinelli et al. 2001; Gallego et al. 2002; Lucho-Constantino et al. 2005; Zhang 2006; Luo et al. 2007). Under the common influences of natural and anthropic input such as the utilization of chemical fertilizers and pesticides, the source of HMs in agricultural soil become more complex. Meanwhile, due to the heterogeneity of soil itself, concentrations of HMs vary remarkably over space (Luo et al. 2007). Only relying on multivariate statistics or geostatistics, it is very difficult both to disclose their sources and to characterize their regional variations. Fortunately, a combination of these two methods provides an appropriate solution and has been proved to be feasible in some studies (Chen et al. 2008; Facchinelli et al. 2001).

In the past decades, a large amount of soil pollution surveys on HMs have been carried out at different scales and there were many studies reported in the scientific literatures (Benvenuti et al. 1995; Chen et al. 2008; Cullbard et al. 1988; Li et al. 2004; Lucho-Constantino et al. 2005; Micó et al. 2008; Shi et al. 2007; von Steiger et al. 1996). However, few investigations have been conducted in modern agricultural zone of developing countries, where the agriculture soil ecosystem usually has higher pollution risk for agricultural industrialization. In this study, we choose one of typical MADZs in Southeastern China, Haining City, which was surrounded by rapidly industrialized cities such as Shanghai City, Hangzhou City and Suzhou City. Previous studies showed that land use/cover affected the HMs accumulation and spatial distribution, and the HM sources in different land use/cover are always different (Bloemen et al. 1995; Kelly et al. 1996; Rui et al. 2008). To remove the effect of land use/cover on HMs accumulation, we only study HMs in the main land use/cover -paddy fields. Paddy field is widely distributed in the southeast region of China. The general objectives of this study were to (1) investigate the total concentrations of HMs in topsoil of paddy fields, (2) characterize their spatial distribution, (3) define their possible sources based on data analysis with the use of chemometric techniques and geostatistics.

Materials

Descriptions of study area

The study area is located in the Hang-Jia-Hu Plain, northeastern region of Zhejiang Province, China (Fig. 1). The study area is bounded by east longitude 120°18′–120°52′ and north latitude 30°15′–30°35′ with a total area of about 731 km2. The study area is in the northern subtropical zone of monsoonal climate with a temperate and humid climate throughout the year with four distinct seasons. The average annual temperature is 15.9°C and the mean annual precipitation is approximately 1,190 mm. Paddy field is the dominant land use/land cover of arable land and paddy soil (Gleysols), one kind of anthropogenic soil that is the main soil type, although there are several other soil types in the area.

Fig. 1
figure 1

The location of study area and distribution of sampling points

Sampling design and soil analysis

A total of 224 soil samples (0–15 cm) were collected from paddy fields in November 2005 with consideration of land use uniformity and soil types to ensure all samples were located in paddy fields and a soil sample was collected from each soil type (Fig. 1). When sampling, soils in top layer of 6–8 points in each site of an area of about 0.1–0.2 ha were collected and then fully mixed, and finally divided into parts of 1–2 kg each. Only one of the parts was packed with a bag and brought back laboratory for analysis. All sample sites were recorded using a hand-held global position system (GPS). All samples were air-dried at room temperature (20–22°C), removed stones or other debris, and then sieved to 2 mm polyethylene sieve. Portions of soil samples (about 100 g) were ground in an agate grinder and sieved through 0.149 mm mesh. The prepared soil samples were then stored in polyethylene bottles for analysis.

Total metal concentration is useful in identifying the pollution source and the potential for contamination (He et al. 1997; Jung 2001). In this study, we analyzed soil pH, soil organic matter concentration (SOM) and the total concentrations of copper (Cu), lead (Pb), zinc (Zn), cadmium (Cd), chromium (Cr), mercury (Hg), cobalt (Co) and arsenic (As). In this study, As was also seemed as one of HMs for it has similar properties as HMs. Soil pH was measured by pH meter with a soil/water ratio of 1:2.5. SOM was determined by wet oxidation at 180°C with a mixture of potassium dichromate and sulfuric acid (Agricultural Chemistry Committee of China 1983). Before the concentrations of HM were measured, soil samples had been digested with a mixture of nitric acid (HNO3) and perchloric acid (HClO4) (Agricultural Chemistry Committee of China 1983). Total concentrations of Cu, Pb, Cd and Co were measured by inductively coupled plasma mass spectrometry (ICP-MS). Calibration of ICP-MS was carried out using standard solutions in the concentrations ranging from 10 to 1,000 μg l−1. For analyte concentrations outside the calibration range, dilution in 1% HNO3 was used. Zn and Cr were analyzed by inductively coupled plasma optical emission spectrometry (ICP-OES). In addition, the concentrations of As and Hg in the digested solution were determined by hydrid generation atomic fluorescence spectrometry (HG-AFS) and cold vapor atomic fluorescence spectrometry (CV-AFS), respectively.

The validity of the whole analyzing procedure was checked using the certified reference materials (CRM) IEAE-Soil-7 and CRM 277.Analyses of CRMs, replicate samples and blanks were performed after every ten samples and were carried through the entire sample preparation and analytical process. The precision of the measurements, estimated by carrying out 22 replicates, was in the range of 2.1–4.3% RSD (relative standard deviation). The average recovery rates of certified elements were between 93 and 105%. Method detection limit (MDL) for each heavy metal was Cu, As and Co, 1 μg g−1; Pb and Zn, 2 μg g−1; Cr, 5 μg g−1; Cd, 20 ng g−1; Hg, 2 ng g−1, respectively.

Methods

Data transformation

In multivariate statistics and linear geostatistics, a normal distribution for the variables under study was desirable (Webster and Oliver 2001; Gallego et al. 2002; Zhang and McGrath 2004). To avoid result distortions and low levels of significance, data transformation has been performed on all measured values. In numerous data transformation methods, logarithmic transformation is widely applied (Webster and Oliver 2001). However, it is often found that environmental variables do not always follow the lognormal distribution (Zhang and McGrath 2004). In our study, the Box–Cox transformation was used to make the data more normal and less skewness. The detail theory and process of Box–Cox transformation can be found in textbooks and monographs such as Box and Cox (1964).

Geostatistical analysis

Geostatistical estimation allows one to predict values at unsampled locations by taking into account of the spatial correlation between estimated and sampled points (i.e., spatial variability) and minimizes the variance of estimation error (Saito et al. 2005). Many studies used geostatistics to describe the spatial distribution of pollutants in contaminated soils (Atteia et al. 1994; Arrouays et al. 1996; Carlon et al. 2001; Goovaerts 1997; Meuli et al. 1998). The semivariogram, main components of kriging, is effective tools for evaluating spatial variability (Boyer et al. 1991; Cahn et al. 1994), which provide a clear description of the spatial structure of variable and some insight into possible processes affecting its distribution (Paz González et al. 2001; Wang and Tao 1998; Webster and Oliver 1990). In this study, anisotropy of variograms was not found for the data set; all the semivariograms in isotropic form were fit using a spherical model, exponential model, Gaussian model and linear model, respectively, and the best-fit model of them was applied to kriging interpolation. Ordinary kriging was chosen to create the spatial distribution maps of HM after data transformation, using the nearest 16 sampling points and a maximum searching distance equal to the range distance of the variable. For a more technical description of kriging and the semivariogram (see Cressie 1993 or Webster and Oliver 2001). Software version 7.0 of GS+ Geostatistics for the Environmental Sciences was used to perform all the geostatistical computations.

Multivariate analysis

Multivariate statistical solutions are mathematical hypotheses and their interpretation requires environmental knowledge. These techniques have been widely used to assist the interpretation of environmental data and to distinguish between natural and anthropogenic inputs of HMs (e.g., Dudka 1992; Einax and Soldt 1999; Lucho-Constantino et al. 2005). In this study, multivariate analysis (correlation analysis, FA and CA) was interpreted to build different groups of HMs with approximately the same or similar source.

Inter element relationships can provide interesting information on HM sources and pathways (Manta et al. 2002). In the study area, correlation analysis was applied to identify the relationships between total concentration of HMs and soil properties (soil pH and SOM). Also, correlation analysis was performed considering total concentration of HMs to check relationships among them. Spearman nonparametric correlation coefficient was used for most of variables in this study positively skewed. FA is an useful statistics tool that can extract latent information from multidimensional data and group the measured elements into fewer groups (Gallego et al. 2002). The original data matrix is decomposed into the products of a matrix of factor loadings and a matrix of factor scores plus a residual (Kowalkowski et al. 2006). In our study, the common factors were extracted by principal components method to gain the observed correlation matrix. Furthermore, in order to facilitate the interpretation of results, varimaxrotation was applied because orthogonal rotation minimized the number of variables with a high loading (Gallego et al. 2002). CA is often coupled to FA to confirm results and provide grouping of variables (Facchinelli et al. 2001). In our study, CA was performed according to the between-groups linkage method. Results are shown in a dendrogram, where steps in the hierarchical clustering solution and values of the distances between clusters are based on correlation coefficients (Pearson coefficient). Dendrogram was undertaken according to the average linkage clustering that calculates dissimilarity between clusters considering the cluster average values. Multivariate analysis was performed using SPSS R13.0 for Windows.

Results and discussion

Descriptive statistics

The soil pH in the area was in range of 4.44–8.30 with a mean of 6.57 and the SOM was in range of 5.7–45.9 g kg−1with a mean of 22.43 g kg−1. A descriptive summary of soil HMs (Cu, Pb, Zn, Cd, Cr, Hg, As and Co) was listed in Table 1. The results showed that the concentrations of all elements had wide range, suggesting extrinsic factors affect these element concentrations in the topsoil of paddy fields. The mean concentrations of Cu, Pb, Zn, Cr and Hg were 27.87, 29.59, 87.30, 77.60 and 0.17 mg kg−1, respectively, higher than their background values at Zhejiang Province scale (Zhejiang Soil Survey Office 1994; Cheng et al. 2006). The mean Cr (87.30 mg kg−1) and Hg (0.17 mg kg−1) concentrations of paddy fields in study area were also significantly higher than the mean Cr (67.29 mg kg−1) and Hg (0.13 mg kg−1) of agricultural soils in Zhejiang Province (Cheng et al. 2007), respectively, suggesting the Cr and Hg enrichments in study area were more serious than that in the other area of Zhejiang Province. In the study area, the Cu, Zn and Cr concentrations in more than 85% samples were higher than their local background concentrations and their mean concentrations were significantly higher than their local background concentration, showed Cu, Zn and Cr were moderate high enriching in majority area; the Cd, As and Co concentrations in more than 65% samples were lower than their local background concentrations, showed Cd, As and Co were enriching in minority area; the Pb concentration in more than 90% samples was higher than it local background concentration and its mean concentration was slight higher than its local background concentration, showed that Pb was slight enriching in majority area; the Hg concentration in about half samples exceeding it background value, showed that the topsoil was partial enriching by Hg. The enrichment of Pb probably due to the exhaust emissions from vehicle for the amount of vehicle sharply increase in recent years. The coefficient of variation (CV) of the metals was calculated as indicator of heterogeneity, which varied from 11.7% for Pb to 49.8% for Hg. For most metal concentrations, the CVs exceeded 20% indicating considerable variability. High concentrations of Hg coupled with high coefficients of variation suggest anthropogenic sources for Hg (Manta et al. 2002).

Table 1 Summary statistics for the eight heavy metal concentrations in paddy soils (mg kg−1) (n = 224)

Compared to enriched elements in topsoil, the mean concentrations of Cd, As and Co exhibited generally low levels, close to their local background concentrations. The mean Cd concentration (0.14 mg kg−1) of paddy fields in study area was significantly lower than that (0.20 mg kg−1) of agricultural soils in Zhejiang Province (Cheng et al. 2007). There were 38 (17.0%) soil samples for Cd, 78(34.8%) for As, and 73(32.6%) for Co exceeding the background values at Zhejiang Province scale, respectively. It indicated that Cd, As and Co enrichment did not exist in the topsoil of majority area, and natural factor as an important role for controlling their distribution.

Data transformation

Prior to multivariate analysis and spatial analysis, the normality of all element concentrations was checked. The parameters of skewness, kurtosis, and the significance level of Shapiro–Wilk’s test for normality (S–K p) were shown in Table 2. In our study, it was found that only Cd and Co passed the Shapiro–Wilk’s normality test (S–K P > 0.05) before data transformation, whereas other variables were all strongly skewed, with skewnesses greater than 0. Their kurtoses were also very sharp, caused by the fact that the majority of samples were clustered at relatively low values. Compared with log-transformation, the Box–Cox transformation has significantly reduced the skewness values of data sets, pushing them toward “0” although there were four transformed variables did not pass the Shapiro–Wilk’s normality test.

Table 2 Skewness, kurtosis, and significance level of Shapiro–Wilk’s test for normality (S–W p) of the raw, log-transformed (Log), and Box–Cox transformed (Box–Cox) data sets of eight soil heavy metals (n = 224)

Geostatistical analysis

Spatial structure analysis

Soil HMs in environment science have spatial structures, including spatial autocorrelation. In this study, the geostatistics method was used to spatial structure analysis. The semivariogram and the fitted models for 8 metals were presented in Fig. 2. The parameters of semivariogram for each element were summarized in Table 3. Semivariograms showed that soil Cu, Pb, Zn, Cd and Cr were all fitted for an exponential model, whereas soil Hg, As and Co were all best fitted for a spherical model.

Fig. 2
figure 2

Experimental semivariograms of soil heavy metals with fitted models

Table 3 Best-fitted semivariogram models of soil heavy metals and their parameters

The Nugget/Sill ratio was assumed to be a criterion to classify the spatial dependence of HMs. The ratio values lower than 25% and higher than 75% corresponded to strong and weak spatial dependency, respectively, while the ratio values between 25 and 75% corresponded to moderate spatial dependence (Cambardella et al. 1994). Usually, strong spatial dependence of soil properties can be attributed to intrinsic factors, and weak spatial dependence can be attributed to extrinsic factors (Cambardella et al. 1994). The Nugget/Sill ratios of all models lower than 25% and higher than 75% except for the model for Pb lower than 25%. It indicated that all metals had a moderate spatial dependency except that Pb had a strong spatial dependency, and the spatial dependence of Pb in study area may be attributed to natural factor.

Range value is a measure of extension where autocorrelation exists (Li et al. 2007; Webster and Oliver 1990). The ranges of spatial autocorrelation varied greatly from metal to metal. The variability of Pb was dominated by relatively short-range spatial correlation (3, 270 m), whereas the spatial correlation structures of the other elements had long ranges (16,500–60, 090 m). Short-range spatial correlation of Pb, suggesting the anthropogenic factor affecting Pb distribution, however, low Nugget/Sill ratio (0.073) of Pb, with low CV (11.7%), suggesting the natural factor also affecting Pb distribution. Based on above analysis, it is reasonable to conclude that Pb in topsoil of paddy fields was controlled by both natural factor and anthropic factor. Comparing the range of eight elements, it was found that Cd has longer effective range than Cu, Pb, Zn, Cr, As and Hg, indicating that the Cd has better spatial structure and less variation caused by extrinsic factors. According to the result of descriptive statistics, the Cd concentration of majority samples was lower than the local background concentration, suggesting the natural factor as an important role controlling Cd distributions.

Spatial distribution

In order to know distribution patterns of eight elements, kriging interpolation was used to obtain the filled contours maps (Fig. 3). The spatial maps of six elements (Cu, Zn, Cd, Cr, As and Co) showed similar distribution trends with the high concentrations in the northeast area and low concentrations in the southwest area. According to spatial maps, the Hg concentration in majority area was lower than the local background value, and only southwest area had high Hg, suggesting the main pollution source in southwest area was different from that in the other unpolluted area. Southwestern of study area adjoins the traditional industry area of Xiaoshan City, industrial fumes and atmosphere deposition could also be an important source of Hg enrichment in this area as the result some previous studies (Wang et al. 2003; Mukherjee and Zevenhoven 2006). The spatial maps of Pb showed local spatial variability and its distribution pattern seemed more irregular than the other elements, suggesting the anthropogenic factor as an important role affecting Pb distributions.

Fig. 3
figure 3

The prediction maps of soil heavy metals

Multivariate statistical analysis

Correlation analysis

To obtain the valuable information, the Spearman nonparametric correlation coefficients were calculated between eight elements and related soil properties (pH and Organic matter), and the results were shown in Table 4. SOM showed good correlation with Cu, Pb, Zn, Cr and As, indicating the strong adsorption effect of organic matter on these elements. All elements concentrations had significant correlation with pH except Cd, Hg and Co, however, the correlation coefficients were relatively low. It indicated that the effect of pH on the trace metals distribution was relatively limited. Strong correlations were found between Cu, Zn and Cr, indicating the main sources of these elements were similar or same. However, Hg was relatively poor correlated with the Cu, Zn and Cd, indicating the main source of Hg was different from the Cu, Zn and Cr.

Table 4 The nonparametric correlations between the contents of eight heavy metals and two selected soil properties (n = 224)

Factor analysis

To reduce the high dimensionality of variable space and better understand the relationships among HMs, FA was applied to the transformed data matrixes. The results of FA for HM concentrations were presented in Tables 5 and 6. According to the results of the initial eigenvalues in Table 5, two factors were extracted from the available data set, which account for over 76% of the total variation of soil elements (Cu, Pb, Zn, Cd, Cr, Hg, As and Co). In Table 6, the rotated component matrix for data indicated that Cu, Zn, Cr, As and Co were strongly associated with the first component (F1) with similar high absolute values. Soil Cd also associated with the first component with moderate absolute value. Elements Pb and Hg were shown in the second component (F2) with high values.

Table 5 Total variance explained for heavy metal concentrations by principal component analysis
Table 6 Component matrix and rotated component matrix for heavy metal concentrations

F1 was responsible for 57.8% of total variance and was dominated by Cu, Zn, Cr, As, Co and Cd. Cu, Zn and Cr were strong positively associated in F1, and As, Co and Cd were negatively associated in F1, suggesting Cu, Zn and Cr may have similar source and As, Co and Cd also may have similar source, however, they was different. In our study, it was found that the mean concentrations of Cu, Zn and Cr exhibited high lever, the concentrations of exceeding 85% samples higher than their local background concentrations, suggesting anthropic factor controlling their distribution; the mean concentrations of As, Co and Cd exhibited generally low levels, lower than their local background concentrations, suggesting natural factor controlling their distribution. F2 explained 18.2% of total variance and was mainly attributed to Pb and Hg. The mean concentrations of Pb and Hg in study area were slight higher than local background values, suggesting both anthropic factor and natural factor controlling their distribution.

Cluster analysis

In order to discriminate distinct groups of HMs as tracers of natural or anthropic source, CA was performed on eight elements. The results were illustrated with the dendrogram (Fig. 4) which enabled the identification of two main groups to describe the complex reality of the area studied, distinguishing Cu, Zn and Cr (Group I) from As, Co, Cd, Hg and Pb (Group II).

Fig. 4
figure 4

Dendrogram with between-groups linkage method of the cluster analysis of soil heavy metals

Group I was made up of those elements which involved in F1 that were strong positively associated, seem to be controlled by anthropic factor. Several studies have shown that soil acts as a long-term sink for Cu and Zn when it is fertilized with manure and chemical fertilizers (L’Herroux et al. 1997; Chang and Page 2000; Nicholson et al. 2003). Overuse or misuse of fertilizers and manure was popular existed in the area, and these elements were enriching in the topsoil of majority area. It indicated that fertilization may be the main pollution source of Cu, Zn and Cr.

In Group II, Hg and Pb were the extraordinary contamination elements and their origin were not entirely the same as Cd, As and Co, so their cluster distance were the furthest from other three elements (Cd, As and Co). It is reasonable to divided the Group II into two subgroups in pollution source identification; the first subgroup (Group II1) including As, Co and Cd and the second subgroup (Group II2) including Hg and Pb. Group II1 was made up of those elements which involved in F1 that were negatively associated. This was in agreement with factor analysis results Group II2 was made up of those elements which involved in F2, and they were poor correlated with the metals of Group II1 and Group I, suggesting the main source of Pb and Hg was different from that of Group II1 (natural factors) and Group I (anthropic factor), both natural factors and anthropic factor common controlling Pb and Hg distribution. And this was also in agreement with factor analysis results. However, multivariate statistical methods can only be used to discriminate natural or anthropic sources of HMs but cannot quantify and ascertain their concrete sources.

Based on above analysis, it is reasonable to conclude that Cu, Zn and Cr in topsoil of paddy fields are mainly controlled by anthropic inputs, Hg and Pb in topsoil of paddy fields are controlled by both anthropic inputs and parent materials, whereas As, Co, and Cd in topsoil of paddy fields appear to be mainly associated with parent materials.

Conclusions

In the study area, Cu, Zn and Cr were moderate high enriching in majority area, Pb was slight enriching in majority area, Hg was enriching in partial area and Cd, As and Co were enriching in minority area. Cu, Zn, Cd, Cr, As, Hg and Co had a moderate spatial dependency, and the soil Pb had a strong spatial dependency. The Cu, Zn, Cd, Cr, As and Co in topsoil of paddy fields showed similar distribution trends with the high concentrations in the northeast area and low concentrations in the southwest area; the Hg showed a distribution trend with the high concentration in southwest area and lower concentration in the other area; and the Pb showed a distribution trend with the lower concentration in the middle area and high concentration in the other area.

The degrees of the eight HMs enrichment in topsoil of paddy fields were different in study area for the influence of anthropogenic activity on them was different. The result of chemometric techniques and geostatistics analyses indicated that the Cu, Zn and Cr were mainly controlled by anthropic factor, Cd, As and Co mainly controlled by natural factors and Pb and Hg controlled by both natural and anthropic factors.

The survey data show that over 85% of the study area existed Cu, Zn and Cr enrichment, indicating the requirement of modifications of agricultural cultivation practices in these areas. The spatial map showed that the southwest area had high Hg, indicating the requirements of controlling Hg enrichment in this area.