1 Introduction

Soil contains numerous metals from both natural occurrence and anthropogenic activities (Kabata-Pendias and Pendias 2001). They can be essential or beneficial to plants under certain conditions but toxic when available beyond specific thresholds. Furthermore, with the characteristic of non-biodegradability and persistence (Adriano 2001), depending on the soil properties and changing environmental conditions including land use, agricultural input, and climate change, heavy metals can be readily transferred within the soil matrix, into soil–crop systems, leached into groundwater, and eventually accumulated in human bodies via the food web (Facchinelli et al. 2001; Wei and Yang 2010). With continuous exposure to heavy metals, even at very low doses, some elements such as cadmium or lead can cause significant adverse effects on human health (Raghunath et al. 1999; Senesi et al. 1999).

The natural concentration of heavy metals in arable soil depends primarily on the geological parent material composition, but it can also originate from various sources such as inorganic fertilizers, manure and pesticide application, sewage irrigation, atmospheric deposition, and industrial activities (Nan et al. 2002; Rattan et al. 2005). Heavy metal contamination of agricultural lands poses a potential threat to safe crop production worldwide, and much effort has been expended to indentify its sources (e.g., Chen et al. 2008). However, the contamination sources vary geographically due to differing local environments and development of history. Therefore, efforts to identify the sources of heavy metals in arable lands are of much significance for contamination prevention and control in various agro-ecosystems.

Multivariate statistical analysis is a useful technique for identifying common patterns in data distribution, leading to a reduction of the initial dimensions of the data sets and facilitating its interpretation (Jolliffe 2002; Castellano et al. 2007). Different multivariate statistical methods have been applied in urban soils and dust (Manta et al. 2002; Lu et al. 2010), sediments (Liu et al. 2003), and agricultural soils (Micó et al. 2006). As one of the most effective multivariate analysis methods, principal component analysis (PCA) is a statistical technique that transforms an original set of variables into a smaller set, which represents the systematic information of the original data set. The results of the PCA are easy to interpret in resulting scores and loading plots for visual inspection. Geographical information system (GIS) is a valuable tool for interpreting spatial variability and providing evidence of nonpoint source contamination. With the GIS-based mapping techniques, the visualization of spatial relationships between environmental data and other land features can be achieved, which, in turn, enables the influence of human activities on heavy metals contents to be spatially highlighted.

The present study was undertaken in a typical farm-scale area of Sanjiang Plain in Northeast China, an area that has gone through significant farming development during the last several decades, with transformation of wetlands into farmlands (Liu et al. 2004). In the past, contamination from heavy metals was rarely reported in this area, but concern about the accumulated risk of metals in soil gained attention after several decades of intensive tillage with continuous increasing fertilizer input (Table 1). In 2011, top soils in farmlands in the study area were collected to measure concentrations of soil pH, organic matter (SOM), total iron (Fe), manganese (Mn), copper (Cu), nickel (Ni), lead (Pb), zinc (Zn), cadmium (Cd), chromium (Cr), and cobalt (Co), the results of which were then analyzed by PCA analysis and GIS interpolation. The main objectives of this study were: (1) to explore this complex dataset and reveal the hidden structures of heavy metals in agricultural soils by analyzing the sources of these metals, and (2) to connect the main factors derived from anthropogenic activities with land use in the study area, highlighting the influence of various farming practices on soil heavy metal accumulation. This work will support strategy and policies making to reduce heavy metals inputs and protect soils from long-term heavy metal accumulation.

Table 1 Fertilizer rates in study area from 1980 to 2010 (Pure NPK, kg h m−2)

2 Materials and methods

2.1 Study area

The study focused on a farming area (47°18′–47°50′ N, 133°50′–134°33′ E) located in the Sanjiang Plain, Heilongjiang Province of Northeast China, the west bank of the lower reaches of Wusuli River , one of the boundary rivers of China and Russia (Fig. 1). The area involved comprises 1,345 km2 with an elevation between 36 and 345 m. The climate is continental with a mean annual temperature 2.9 °C (ranging from −19.3 °C in January to 21.6 °C in July). The average annual precipitation is 595 mm (occurring mainly in July and August) with an average annual evaporation of 1,002 mm. It is a quite cold growing area, as a result, the average frost-free period is only 138 days and the frozen earth has a mean annual maximum depth 141 cm.

Fig. 1
figure 1

Location of study area

According to the world reference base for soil resources (WRB) provided by the International Union of Soil Sciences (IUSS) (WRB, FAO 2006), most of the agricultural soils in the study area have been classified as Albic Luvisol (Luvisols having an albic E horizon), a fertile soil with a stable yield potential and mainly used for arable or forestland. Luvisols extend over 500 to 600 million hectares worldwide and are distributed most commonly in flat or gently sloping land in cool temperate regions, such as west/central Russia, the USA, and central Europe.

2.2 Soil sampling and analysis

Since plants take up heavy metals mainly through their roots, the 20–30 cm of soil depth is the most affected by anthropogenic activities. Consequently, 149 soil samples of the arable layer (0–20 cm) were taken in the study area in 2011 as shown in Fig. 1. On each sample site, five replicate sub-samples were taken, hand mixed, and sieved in plastic bags.

Firstly, all the soil samples were air-dried at 25 °C, plant residues removed, and ground up to pass through 2-mm nylon sieves, then stored in plastic bags (Jackson, 1979). Soil pH was determined using a pH meter (METTLER TOLEDO, Switzerland) with a soil/water ratio of 1:5. Soil organic matter was measured with a CHN Elemental Analyzer (Euro Vector S.P.A EA3000, Milan, Italy; dry combustion temperature of 900 °C). Prior to the metal analysis, the soil was ground further to pass through a 0.147-mm nylon sieve (100 mesh), digested with an acid mixture containing HF, HNO3, and HClO4 and then determined by inductively coupled plasma optical emission spectrometry (ICP-OES, IRIS Intrepid II XSP, Thermo Electron, USA).

2.3 Data processes and spatial analysis methods

2.3.1 Analysis methods

Occurrence of exceptional values can lead to data discontinuity and this would violate the geostatistics theory (Liu et al. 2006). In this study, outliers were defined as the raw data values X when \( {X_i} > \overline{X}+3\mathrm{SD} \) or\( {X_i} < \overline{X}-3\mathrm{SD} \). Then the outliers were substituted with maximum or minimum value of the dataset correspondingly. Then the mean, maximum, and minimum values; standard deviation (SD); and coefficients of variation (CV) were calculated. The One-Sample Kolmogorov–Smirnov Test (One sample K-S Test) confirmed the normal distribution of all data sets except the set for Ni, which was also normally distributed when log-transformed. Thus, all statistical analyses were performed on the data sets that were normally distributed. To find whether contents of elements were significantly different between analysis groups, analysis of variance (ANOVA) was applied. Software SPSS 13.0 was employed in all the descriptive analyses.

2.3.2 Multivariate and spatial analysis methods

Principal component analysis is a multivariate statistical technique that reduces the original multidimensional space to a new lower dimensional space (principal components, PCs) which is then used to explain relationships and associations among objects and variables (Wold et al. 1987; Jolliffe 2002). Since the original variables showed ambiguities in the component matrix, varimax rotation was made to clarify the loading of heavy metals. The analysis of PCA was also carried out in SPSS13.0.

For not sampled areas, Kriging is considered as an optimal method of spatial prediction that provides a superior liner unbiased estimator for quantities that vary in space (Goovaerts 1999). To explore the spatial distribution of PCA analyses results, Kriging method was used with PC software ArcGIS 9.2.

3 Results

3.1 Descriptive statistics

Soil pH, concentrations of SOM, and nine metal elements in soils of study area are shown in Table 2. To facilitate evaluation and comparison of the results, the regional background values of soil metals (Heilongjiang Province) and the limiting concentrations derived from the State Environment Protection Administration (SEPA) of China were also presented. Except for Cd, it was clear that concentrations of all the metal elements were in the range of their regional averaged backgrounds. Soil Cd content (mean value of 0.241 mg kg−1) was much higher than its background value (0.086 mg kg−1) but it did not exceed the limiting concentration 0.30 mg kg−1 of SEPA.

Table 2 Descriptive statistics of soil properties and metal concentrations (mg kg−1; g kg−1for Fe and Mn, % for SOM)

In order to explore the spatial variation of soil Cd, the distribution map of its concentration was interpreted using the Kriging method and presented in Fig. 2 with land use map (year 2009) of the study area. It was clearly shown that the soil Cd concentration was considerably higher in the southwestern part in comparison with the northeastern part where land use types were differed. Consequently, to explore the variation of soil metals in diverse land use types represented by various crop fields in the study area, mean concentrations of all metals in rice paddies, maize, and soybean fields were checked (Table 3). According to the results of the ANOVA analysis, mean soil Fe, Mn, Zn, Cd, and Co concentrations from dry fields (maize and soybean fields) were significantly higher than concentrations in paddy lands (rice field). This was not the case for the mean concentrations of Cu, Cr, and Ni. Between different crops in drylands (maize and soybean fields), there was no significant difference.

Fig. 2
figure 2

Spatial distribution of soil Cd concentration in top soils of study area

Table 3 Mean concentrations of soil metals in different crop fieldsa

3.2 Correlation matrix of soil heavy metals

Interrelationships among heavy metal elements can provide much information about their sources and pathways (Oliva and Espinosa 2007). The results of the correlation matrix as shown in Table 4 were used to investigate the dataset. Relationship between two variables can be recognized from the r value. When r > 0.7, 0.4 < r < 0.7, and r < 0.4, the two variables involved are considered to be highly correlated, moderately correlated, and weakly correlated, respectively. In this respect, the pH value was moderately correlated with SOM there is little correlation with metals. SOM, however, was moderately correlated with soil Fe, Cu, Pb, and Co. Among the metals, high correlations were found for Fe, Mn, Pb, Cd, and Co. Besides, close relationships of Ni with Cu and Cr were found with r values of 0.715 and 0.658, respectively. The significance of the relationship among these three elements will become more apparent from further analysis.

Table 4 Correlation matrix of soil pH, SOM, and metal elements in topsoil

3.3 Principal component analysis

The principal component analysis method, the results of which are shown in Table 5, was applied to explore if there are any hidden structures in the dataset. Three eigenvalues higher than 1.00 (before and after rotation) were extracted. Principal component analysis resulted in a reduction of the initial dimension of the dataset to three components, which explains a 78.8 % of the data variation. According to the initial component matrix (not rotated), PC1, PC2, and PC3 explained 40.1, 24.6, and 14.1 % of the total variance, respectively.

Table 5 Total variance explained and factor loadings

The loadings of components and rotated components were then worked out to observe the dataset structure, which were also presented in Table 5. With the results of rotated components, the data groups were much clearer, which can easily to be detected in the loading plots (Fig. 3). According to the loading values of three components PC1, PC2, and PC3, Fe, Mn, Pb, Zn, Cd, and Co showed the maximum loading in the first component and consequently associated with PC1. Similarly, the second component PC2 included Cu, Ni, and Cr. SOM and pH were categorized into the last component PC3.

Fig. 3
figure 3

Loading plots for three rotated components of PCA analysis

4 Discussion

4.1 Heavy metal accumulations in soils

The geological background level of heavy metals is low in China (Cheng 2003). As in one of the main crop production area of China, the local industry of the study area was not as developed compared with the other parts of China. Besides, the border location of the study area limited some possibilities of atmospheric deposition of metals derived from heavy traffic. Therefore, the concentrations of heavy metals in the study area generally indicate low levels of contamination, except for soil Cd. Compared to some industrialized areas of China such as Beijing, Shandong, Guangzhou, and Hangzhou (see Table 2), most of the heavy metals in soils of the study area showed lower concentrations, especially Pb and Zn.

The risk of cadmium contamination in arable soils was revealed in the study area. The maximum concentration of soil Cd (0.36 mg kg−1) exceeded the limitation of national standard (0.30 mg kg−1). Phosphate fertilizers have been well known as the major external source of soil Cd (Mortvedt 1987; Baerug and Singh 1990). Different farming practices and cultivation history consequently have probably led to the different levels of soil Cd in drylands and paddy lands. The effect of land use on soil metal concentrations will be more discussed when we analyze the sources of soil metals.

4.2 Possible sources of accumulated heavy metals

Three components were extracted with the results of the PCA analysis. Among the components, PC3 presented factors related to soil properties, there were only two, pH and SOM. Consequently, focus was placed on PC1 and PC2 to explore possible sources of heavy metals.

4.2.1 PC1 (Fe, Mn, Pb, Zn, Cd, and Co)

Six metals Fe, Mn, Pb, Zn, Cd, and Co in this study comprised the first component, PC1; this component explained 40.1 % of the total variance. Since Cd accumulation in agricultural soils caused by long-term use of phosphate (P) fertilizers, the other metals in PC1, which are highly related to Cd, may also be much influenced by farming activities.

From all the sources of heavy metal pollution in agricultural soils, fertilizer and agrochemicals could be the major ones (Huang et al. 2007; Atafar et al. 2010). Commercial phosphate fertilizers, containing small amounts of heavy metal contaminants that come from its raw material, phosphate rock, have been generally reported to be the major source of heavy metals among all inorganic fertilizers (Mortvedt 1996; Nziguheba and Smolders 2008). Because the farming area has had several decades of intensive tillage, long-term fertilizer application might be a major source of accumulated heavy metals in soils of the study area. Wastewater irrigation, which can be another important source of heavy metal contamination in water-limited and industrialized areas of China (Cheng 2003), cannot be a main source in this study because all water for drylands came from natural rainfall. In addition, water from the Wusuli River (boundary river of China and Russia) and groundwater, which was quite clean, was used for rice irrigation. Overall, PC1 can be deduced as an “anthropogenic” factor, one that is more heavily influenced by human activities (fertilizer application) than is the case for the other components.

In order to explore the spatial characteristic of PC1, a geostatistic method was employed to check the spatial distribution of PC1 scores. The PC1 scores were plotted as contours on the land use map (year 2009) of study area (Fig. 4). Land use can be one of the main factors that affect the concentration of soil metal elements (Bai et al. 2010). This observation was proved in this study since all the metal concentrations in the PC1 component were significantly higher in drylands than paddy lands. The PC1 scores were generally above zero for the soils in dryland areas, and below zero for soils in paddy lands. This can be explained by the fact that since the 1980s much more fertilizer, especially phosphorus fertilizer, was applied on drylands than that in paddy fields (see Table 1). Additionally, for mechanized sowing, fertilizing, and harvesting, heavy diesel and petrol machines driving in or around the dry fields brought the possibilities of atmosphere deposition, especially Pb and Zn from leaded gasoline combustion and zinc dust from vehicles tire wear.

Fig. 4
figure 4

Contour distribution map of PC1 factor scores

4.2.2 PC2 (Cu, Ni, and Cr)

Cr and Ni, highly correlated with each other in this study, are commonly associated in a number of rock types and, thereby, in soils formed from these rock strata (Spurgeon et al. 2008). Besides anthropogenic inputs of Cr and Ni in fertilizers, limestone and manure have lower concentrations than are already present in the soil (Facchinelli et al. 2001). Consequently, Ni and Cr have been found to be present in the parent materials of soils with little temporal and spatial variation in worldwide rural soils (Salonen and Korkka-Niemi 2007; Wu et al. 2010).

Cu and Zn in natural soil are known to exhibit similar geochemical affinities as iron family elements (Spurgeon et al. 2008), but this has not been shown in this study area because with correlation coefficient r = 0.189. Cu was highly correlated to Cr and Ni, with all three elements belonging to PC2, while Zn is included in PC1. It can be deduced that soil Zn in the study was more or less influenced by human activities but not Cu. Therefore, geology can be considered as the predominate factor affecting Cu, Cr, and Ni concentrations. This PC2 component was a “lithogetic” factor, in which the variability of the metals appears to be controlled by parent rocks.

5 Conclusions

The present study examined the heavy metal contents and their sources in topsoils in a farming area of northeast China. Except for Cd, the mean values of all the heavy metal concentrations were less than the average local background values, and showed lower Pb and Zn content when compared with agricultural soils from more industrialized regions. Soil Cd content was almost three times of its background value from 20 years ago. According to the PCA analysis, Fe, Mn, Pb, Zn, Cd, and Co were embraced in the first component PC1 and this explained 40.1 % of the total variance. The conclusion was that this was an “anthropogenic” factor. With little industrial development or sewage irrigation, the exogenous metal elements mostly come from fertilizer and agrochemicals (principally phosphorus fertilizer). Owing to various fertilizer rates, metal contents were significantly higher in drylands than in paddy lands. Concentrations of Cu, Ni, and Cr were mostly controlled by parent rock materials leading to the conclusion that these elements comprise a “lithogetic” factor. Generally, the soils in the study area were not contaminated with the heavy metals that were studied, but the risk of Cd accumulation from fertilizers requires further attention and monitoring.