Introduction

Heavy metal accumulation in soil is a threat to bio-ecosystems (Chen et al. 2018; Mishra et al. 2019; Shi et al. 2018). The reduction of soil quality (SQ) and soil degradation caused by excessive concentrations of heavy metals in soil have been well studied (e.g. Khalid et al. 2017; Wang et al. 2018; Zhang et al. 2018). High concentrations of heavy metal in soil cause global concern regarding the detrimental and toxic effect of these elements (Herath et al. 2017; Timofeev et al. 2019). The accumulation of heavy metals in soil, water, air and sediment could have natural (geogenic) or anthropogenic sources (Alvarez et al. 2017). The geological component of the Earth’s surface can affect the concentration of heavy metals (Liu et al. 2015).

Volcanic eruptions, weathering and erosion of mineral deposits and pedogenic processes are the main geogenic factors causing heavy metal pollution (HMP) in soil (Alvarez et al. 2017; Guan et al. 2018; Hu et al. 2018; Yalcin et al. 2007). Industrial, agricultural and urban activities are the three major anthropogenic sources of intensive soil pollution (SP). Smelting factories, industrial complexes, mining, traffic and cities are the most frequently reported sources of HMP in the soil and environment (Cao et al. 2010; Hou et al. 2013; Jamal et al. 2018). Differentiation of anthropogenic and geogenic contributions by heavy metal sources in soil is complicated. The composition of parent materials and anthropogenic activity can both affect the accumulation of heavy metals in a region. Multivariate analysis such as principal component analysis (PCA) and evaluation of the variation of elements are practical methods for determining the source of heavy metals (e.g. Cunha et al. 2019; Ding et al. 2017; Esteki et al. 2017; Rodríguez et al. 2008). Soil properties such as organic matter content, pH and clay content can affect the accumulation of heavy metals in soil (Fernández et al. 2018). Some of these properties are controlled by anthropogenic activity and correlation analysis of heavy metals with soil attributes can also help to verify the source of HMP.

The quantification of SP is based on laboratory measurements and the calculation of soil pollution indices. The commonly used indices for assessing SP are including the integrated pollution index (IPI), potential ecological risk index (RI), hazard index, geo-accumulation index and carcinogenic risk (Jiang et al. 2018; Khalid et al. 2017; Singh et al. 2018; Wu et al. 2018). All conventional pollution indices are calculated with respect to specific parameters such as the geochemical background concentration of heavy metals, pre-industrial reference, specific toxicity response coefficient or the daily intake of elements into the human body (Kowalska et al. 2018; Sun et al. 2019), which may result in shortcoming in a situation that these parameters are not appropriately determined (Mazurek et al. 2017). In addition, considerable time and cost are usually required to determine these parameters for all pollutant elements. Accordingly, a quick and reliable framework for indexing HMP based on multivariate analysis can provide a practical tool for monitoring and controlling the level of heavy metals in soil.

Accurate data collection is considered as the most important step for indexing and monitoring soil threats (Askari and Holden 2014). An appropriate minimum data set (MDS) provides rapid and precise data for a cost-effective assessment of SP (Li et al. 2019; Raiesi 2017). PCA is a commonly used approach to determine a MDS in soil researches (Askari and Holden 2015). Discriminant analysis (DA) has also the potential for determining essential variables (Nosrati 2013). DA has been mostly used to categorize the soil variables among land uses (e.g. Hamidi Nehrani et al. 2020; Nosrati 2013), while its efficiency for assessing the pollution risk of soil heavy metals has not been researched. The evaluation of multivariate technique efficiency for identifying the most critical and high pollutant elements is required for attaining a reliable assessment of HMP in soil.

Zanjan Province in northwestern Iran is an important region for agricultural production. Several studies have revealed considerable HMP in Zanjan’s soil that can pose a main threat to the health of humans and animals (e.g. Naderi et al. 2017; Zamani et al. 2015). Zn, Cd, Pb, Ni, Co, Cu and Cr are commonly reported as causes of SP in this province (e.g. Jamal et al. 2018; Maleki et al. 2014; Naderi et al. 2017; Zamani et al. 2015). A flexible framework for indexing SQ was employed by Askari and Holden (2014 and 2015) and Masto et al. (2008). This research evaluated the potential to deploy the framework to index the threat of HMP in soil. In this framework, the quantification of SP is based on the selection and interpretation of soil variables and their integration into a single index using linear and non-linear functions (Askari and Holden 2015). To the best of our knowledge, the ability of this framework for quantifying HMP in a polluted land, such as Zanjan Province, has not been researched. Despite considerable literature on the use of multivariate analysis, especially PCA, there has not been any study that compares the efficiency of PCA and DA for identifying critical heavy metals. In this study, the efficiency of multivariate analysis was evaluated for indexing SP and for identifying the geogenic and anthropogenic sources of heavy metals. The objectives were to select the best multivariate techniques (DA and PCA) for identifying the critical pollutant elements as a MDS, to evaluate the capability of multivariate analysis for determining the potential source of heavy metals (anthropogenic or geogenic) in the soil and to select the best scoring and integrating methods for indexing SP.

Materials and methods

Site characterization and experimental design

The study was conducted in Zanjan Province in northwestern Iran, located at 36° 20′ N to 36° 45′ N latitude and 48° 19′ E to 48° 54′ E longitude (Fig. 1; ca. 2000 km2). The average annual precipitation is 310 to 360 mm, and the mean daily temperature is 15.7 °C. The soil in the study was classified according to USDA Soil Taxonomy as Inceptisol (Soil Survey Staff 2014). Agriculture (irrigated and rained farming), rangeland and urban areas are the major land uses in Zanjan Province. Therefore, these three land uses were used to collate soil samples for this study. Sampling was carried out on two grids at intervals of 3 km for agricultural areas and rangeland and 1.5 km in the urban area. The sampling intervals were greater for agricultural land and rangeland because of the larger areas of these two types of land use (Fig. 1). A total of 241 samples were collected at depths of 0–10 cm, including 137 samples under agriculture, 77 samples under rangeland and 27 samples in urban.

Fig. 1
figure 1

Location of study area and sampling points

Laboratory analysis

Of the potential pollutant elements in the soil, Cd, Zn, Pb, Ni, Cr, Co and Cu were reported in the study area (Eslami et al. 2007; Khodadadi et al. 2013; Parizanganeh et al. 2012; Zamani et al. 2015). Thus, the concentration of Cd, Zn, Pb, Ni, Cr, Co and Cu were determined according to USEPA 3050B (USEPA 1996). Soil samples were air-dried and sieved prior to chemical analysis. An atomic absorption spectrometry (Perkin-Elmer: AA 200) was employed to determine Zn, Pb, Co, Cu, Cr and Ni and a Rayleigh; WF-1E graphite furnace atomic absorption was used to determine Cd. Soil organic carbon (SOC) was determined using Walkley–Black wet dichromate oxidation method (Nelson and Sommers 1996). Hydrometer method was used to measure particle size distribution (Gee and Or 2002). An electrical conductivity (EC) meter and a pH meter were employed to determine soil EC (Rhoades 1996; Amanifar et al. 2019) and soil pH (Thomas 1996). Cation exchange capacity (CEC) was measured using ammonium acetate method (Chapman 1965). The results were an average of three replicates.

Critical pollutant elements

The efficiency of PCA was compared with the ability of DA to identify a minimum data set for assessing SP. PCA was performed on standardized values of heavy metals to determine the high pollutant data set (HPDS). The principal components (PCs) with eigenvalues of greater than one were used to determine the critical elements (Askari and Holden 2014). The interpretability of components was increased by performing a Varimax rotation (Govaerts et al. 2006). Varimax rotation, which is an orthogonal rotation, transforms the loadings of PCs to maximize the correlations between variables and PCs (Forina et al. 1989; Hamidi Nehrani et al. 2020). The minimum data set was identified using loading values of components, and 10% of the highest loading value was used for selecting pollutant elements (Rezaei et al. 2006). The elements with a low loading value and a high correlation coefficient were eliminated (Hamidi Nehrani et al. 2020; Raiesi 2017), and the first HPDS was identified using PCA (HPDS-1).

To perform the DA on the measured heavy metals, soil samples were classified into three levels of pollution risk (low, moderate and high) according to the suggested critical limits for heavy metals (Table 1) as determined by the Department of Environment in Iran (DEIRI 2013). The pollution risk levels were used as grouping categories and measured heavy metals were the independent variables. A quadratic discriminate function and within-group covariance matrices were employed to perform DA using SPSS v. 21.0 (SPSS Inc.). The stepwise approach was applied to normalized value of elements to identify the variables discriminating among pollution risk levels (Nosrati 2013). The Wilks lambda method was used for the stepwise DA. The elements that were significantly different among pollution risk classes (p value < 0.05) and minimized Wilks lambda value were identified as second HPDS by DA (HPDS-2).

Table 1 Soil pollution risk levels of heavy metals. These levels were determined by Department of Environment in Iran (DEIRI 2013)

Heavy metal sources

An exploratory analysis was conducted in three steps to identify heavy metals sources (anthropogenic or geogenic). In the first step, the interrelationship among elements was explored using principal component having eigenvalues of greater than one, as calculated in “Critical pollutant elements”. A plot of loading values for selected components and the correlation matrix of the heavy metals were used (Rodríguez et al. 2008). In the second step, the coefficient of variation (CV) was calculated for the elements and used as an indication of heavy metal variability. The elements with CVs of greater than 50% (representing large variability) are usually affected by anthropogenic sources (Qishlaqi et al. 2009). Eventually, the relationship between soil properties and heavy metals was evaluated.

Soil pollution indices

The SP indices were developed using five data sets including HPDS-1 (identified using PCA), HPDS-2 (determined using DA), anthropogenic data set (ADS), geogenic data set (GDS) and all measured elements as the total data set (TDS). The indexing approach for developing each soil pollution index (SPI) was summarized in Fig. 2. A “less is better curve” was used to score the variables (Andrews et al. 2004). Non-linear and linear equations were used for scoring the elements. The non-linear scoring was done based on Eq. 1:

$$ {S}_{NL}=1/\left(1+\left(x/{x}_0\ \right)\hat{\mkern6mu} b\ \right) $$
(1)

where SNL is the non-linear score of elements, x0 is the mean value of the elements, x is the value of heavy metals and b is the slope (+ 2.5) for a “less is better” curve (Askari and Holden 2014).

Fig. 2
figure 2

The producer for developing soil pollution indices using five datasets (TDS, HPDS-1, HPDS-2, ADS and GDS)

The linear scores were calculated using Eq. 2:

$$ {S}_L=1-\left(\left(x-l\right)/\left(h-l\right)\right) $$
(2)

where SL is the linear score, x is the element value, h is the maximum value and l is the minimum value and of heavy metals (Askari and Holden 2015).

Additive (Eq. 3 for all four data sets; Fig. 2) and weighted additive (Eq. 4 for TDS and HPDS-1; Fig. 2) methods were used to integrate the scores of elements into indices (Andrews et al. 2002).

$$ {SPI}_A={\sum}_{i=1}^n{S}_i/n $$
(3)
$$ {SPI}_W={\sum}_{i=1}^n{W}_i{S}_i $$
(4)

where SPIA is additive index, SPIw is weighted additive index, Si is the variable score, n is the number of elements in each data set and Wi is the weighting value of heavy metals (Askari and Holden 2015).

The ratio of the element’s communality and the sum of communalities calculated in PCA were used to weight the indicators for the TDS (Askari and Holden 2014). The elements from the HPDS-1 were weighted using the variance of each selected component in PCA normalized to unity (Liu et al. 2018). Finally, fourteen SPIs were developed in this study (Fig. 2).

Validation of the SPI

RI (Hakanson 1980) and IPI (Chen et al. 2005), as the best-known pollution indices, were used to judge the efficiency of indices. RI and IPI are two conventional tools for risk assessment of soil heavy metals (Tume et al. 2018). Therefore, they were employed to validate SPIs developed in this study. Equations 5, 6 and 7 were used to calculate RI as follows:

$$ {C}_f^i=\frac{C_n^i}{C_0^i} $$
(5)
$$ {E}_r^i={T}_r^i\times {C}_f^i $$
(6)
$$ RI={\sum}_{i=1}^n{E}_r^i $$
(7)

where \( {C}_f^i \) is the pollution factor of each indicator (element i), \( {C}_0^i \) is the concentration of element i and \( {C}_n^i \) is the corresponding background value (Kusin et al. 2017; Shen et al. 2017). \( {E}_r^i \) is the potential ecological risk factor of each indicator;\( {T}_r^i \) is the toxic-response factor of each indicator (Zn:1, Pb:5, Cr:5, Ni:5, Cd:30) (Suresh et al. 2012).

IPI was determined by averaging the values of the pollution index (PI) calculated using Eq. 8.

$$ {PI}_i=\frac{C_i}{B_i} $$
(8)

where Ci is the content of heavy metals, Bi is the background value of metals and PIi is pollution index for each element (Chen et al. 2005). The background concentrations of heavy metals were estimated according to the instruction presented by Cabrera et al. (1999). To calculate Bi, 53 samples were collected for the natural region (The areas far from human activities and industrial zones) and geometric mean of soil heavy metals were considered as the background value of metals (Azimzadeh and Khademi 2013). The IPI values were classified into non-pollution (IPI ≤ 2), moderate level of pollution (1 < IPI ≤ 2), high level of pollution (2 < IPI ≤ 5) and extremely high level of pollution (IPI > 5) (Chen et al. 2005; Meza-Montenegro et al. 2012).

Values of the potential ecological risk factor and index (\( {E}_r^i\ and\ RI\Big) \) were also categorized based on their ecological risk (Men et al. 2018) and presented in Table 2. Finally, the SPIs were verified by assessing their correlation with IR and IPI, and the best SPI was deployed for the evaluation of HMP. Furthermore, the differentiation ability of SPIs was compared among pollution risk classes presented in Table 1.

Table 2 The risk classification of the potential ecological risk factor (\( {E}_r^i \)) and potential ecological risk index (RI) (Men et al. 2018)

Statistical analysis and spatial distribution of HM

The analysis of histograms and Kolmogorov–Smirnov test were utilized to examine the normality of heavy metals, and non-normal variables were log-transformed. The homogeneity of variance was tested using Levene’s test. The analysis of variance (ANOVA) was carried out with SPSS 20.0 software (Ho 2013) to compare mean differences with 95% confidence based on the least significant difference (LSD). Scoring and indexing were performed using Microsoft Excel (Frye 2015). Spatial distribution of SPI was determined and mapped using ordinary Kriging (The exponential semi-variance model) interpolation method in GIS software.

Results

The statistical parameters of soil properties measured in this study are presented in Table 3. Pb ranged from 40 to 1358 mg kg−1, Zn from 86 to 1354 mg kg−1, Cu from 11 to 353 mg kg−1, Cd from 0.24 to 4 mg kg−1, Co from 17 to 36 mg kg−1, Ni from 13 to 87 mg kg−1 and Cr from 7 to 66 mg kg−1. The mean concentrations of heavy metals were in the order of Zn > Pb > Ni > Cu > Co > Cr > Cd and were higher than their corresponding natural background concentrations. The background contents of 0.25, 57.80, 91.80, 26.99, 40.74, 19.99 and 24.18 mg kg−1 were determined for Cd, Pb, Zn, Cu, Ni, Cr and Co, respectively.

Table 3 Statistical parameters of measured soil properties

Identifying critical elements using PCA

A total of 74.3% of the total variance of original variables were explained using three PCs having eigenvalues of greater than one (Table 4). The most effective elements in each component were identified based on the loading matrix of selected components. In the first PC, which explained 38.8% of the variance, the loading values of Cr and Ni were within 10% of the highest value. A significant correlation was found between them (Table 5; r = 0.74). Thus Cr, which had the highest loading value, was chosen from PC1. In the second component, which described 20.8% of the total variance, Pb and Zn were within the 10% of highest loading. Their correlation coefficient value was less than 0.7 (Table 5). Therefore, both Pb and Zn were selected from the PC2. Cu was the only element within the 10% of the highest loading value of the third component and was therefore selected from PC3. Cr, Pb, Zn and Cu were identified as the HPDS-1 using PCA.

Table 4 Result of principal component analysis
Table 5 Correlation matrix of heavy metals

Identifying critical elements using DA

DA identified two significant functions for differentiating heavy metals based on their pollutant risk (Table 6). The first discriminate function (DF) explained 94.4% of the total variance, and the second DF described 5.6% of the variance. Cu, Ni, Cr and Co were removed through stepwise removal approach, and Zn, Pb and Cd, which minimized Wilks lambda value and highly correlated with DFs, were identified as elements having the most pollutant risk (Table 7). Therefore, HPDS-2 comprised Cd, Zn and Pb. The canonical discriminant coefficients of elements were presented in Table 8. Cd had higher discriminant coefficient than Zn and Pb (Table 8).

Table 6 Summary of discriminate functions
Table 7 Absolute correlation between each variable and discriminant functions
Table 8 Canonical discriminant function coefficients

Source of heavy metals

The measured elements were categorized into two groups based on the loading plot of three PCs as presented in Fig. 3. Group 1 comprised Cd, Zn, Pb and Cu, and group 2 comprised Ni, Cr and Co. The elements in group 2 had a low and negative correlation with the elements in group 1. Cadmium correlated significantly (Table 5; r ≥ 0.4) with Cu, Zn and Pb. A high correlation was also noted between Ni and Cr (Table 5; r = 0.74) and Cr and Co (r = 0.59). The correlation results between heavy metals and other soil properties showed that Zn, Cd and Cu correlated significantly and positively with EC and organic carbon (Table 9; r > 0.47). The CVs for Cu, Cd, Zn and Pb exceeded 50% (Table 3), indicating considerable variability so that these elements could have been affected by anthropogenic activity (Fan and Wang 2017; Yongming et al. 2006). Ni, Cr and Co had lower CVs (less than 50%), which could represent the effect of geogenic rather than anthropogenic factors (Table 3). Therefore, Ni, Cr and Co were identified as GDS, and Zn, Pb, Cd and Cu were identified as ADS in the study soil.

Fig. 3
figure 3

The loading plot of three principal components

Table 9 Correlation coefficients between heavy metals and soil properties

Soil pollution indices

The procedure for developing fourteen SPIs using five data sets (TDS, HPDS-1, HPDS-2, ADS and GDS) was summarized in Fig. 2. Cu had the highest (Table 4; 0.158) and Cd had the lowest (Table 4; 0.118) weight and contribution to the weighted SPI developed using TDS (Fig. 2; SPI-2 and SPI-4). For weighted SPIs calculated by HPDS-1 (Fig. 2; SPI-6 and SPI-8), Cr had the highest weight (Table 4; 0.408) and Cu had the lowest weight (Table 4; 0.154). The weights of SPI-6 and SPI-8 were calculated according to Eq. 9.

$$ \mathrm{SPI}=\left(0.408\kern0.5em \mathrm{Cr}\right)+\left(0.218\kern0.5em \mathrm{Pb}\right)+\left(0.218\kern0.5em \mathrm{Zn}\right)+\left(0.154\kern0.5em \mathrm{Cu}\right) $$
(9)

The SPIs ranged from 0 to 1. The increase of SPI value indicated a better soil condition and less HMP. The SPI value closer to zero showed a high risk of HMP.

Validating and mapping the SPI

The average values of PI, \( {E}_r^i \), IPI and IR in each land use were presented in Table 10. The IPI values varied from 0.96 to 3.37 under agricultural area, 1.01 to 3.21 under rangeland and 2.15 to 6.65 in the urban area. RI values were in a range of 47.66 to 374.82 in agricultural land, 196.56 to 518.15 in the urban area and 65.31 to 5.11.15 in rangeland. The urban land use had the greatest IPI value of 3.35, followed by an IPI value of 1.58 in agricultural land use and 1.54 in rangeland. The average PI values > 2 were observed for Cd, Zn, Pb and Cu under the urban area and for Cd in both agricultural area and rangeland. Medium level of pollution (Table 10; 1 < PI < 2) was noted for other metals in agricultural land and rangeland. Cd indicated a considerable potential ecological risk (\( {E}_r^i>80\Big) \) in three land uses. Regarding the RI values, a high ecological risk for the urban area (Mean RI > 300) and low ecological risk for agricultural land and rangeland (Mean RI < 150) were observed. The correlation matrix of SPIs with IPI and RI was shown in Table 11. SPI-1, SPI-2, SPI-10, SPI-11 and SPI-12 highly correlated with IPI (Table 11; r > 0.9). SPI-5 and SPI-9 had also a good correlation with IPI (Table 11; r > 0.8). A high correlation with RI (r > 0.9) was obtained for SPI-10 and SPI-12. The correlation coefficients of SPI-9 and SPI-11 with RI were also higher than 0.8. The differentiation results of SPIs among the pollution risk levels determined according to DEIRI (2013) were summarized in Table 12. Except for the SPI-13 and SPI-14 (developed using GDS), all other indices were significantly different among SP levels (Table 12; p < 0.01). Higher F-values were noted for the indices developed using HPDS-2 and ADS, particularly for the indices calculated using linear function (Table 12; SPI-10 and SPI-12).

Table 10 The average values of PI, \( {E}_r^i \), IPI and IR under each land use
Table 11 The correlation matrix of SPIs with IPI and RI
Table 12 Mean comparison of SPI among soil pollution risk levels

The best discriminating capability (F value = 103) and the highest correlation with both IPI and IR (r > 0.92) were obtained for SPI-10. Therefore, SPI-10 was suggested as a reliable index for evaluating HMP in Zanjan Province. HMP in the study area was mapped (Fig. 4) using 95% confidence intervals of pollution risk levels for SPI-10. The value of 0.79 was identified as the cut-off value of SPI-10 between the high and medium risk of SP, and the value of 0.92 was determined as the cut-off point between medium and low level of HMP. Thus, the values ≥ 0.92 were identified as a good soil condition (low HMP), and the values of less than 0.79 were considered as a poor soil condition (high HMP).

Fig. 4
figure 4

Soil pollution map of study area produced based on cut-off points of SPI-10

Discussion

Seven soil heavy metals (Cd, Zn, Pb, Cr, Ni, Cu and Co) that were identified as potential elements causing SP in Zanjan Province in Iran (Eslami et al. 2007; Jamal et al. 2018; Khodadadi et al. 2013; Maleki et al. 2014; Zamani et al. 2015) were measured and considered for development of SPIs for rangeland, urban areas and agricultural land. These three land uses are the dominant types of land use in Zanjan. A lead and zinc smelting factory, mines, industrial complexes, traffic and agricultural activities were the most likely sources of HMP in the study area (Naderi et al. 2017; Zamani et al. 2015).

Critical elements identified by PCA and DA

Three PCs with eigenvalues of greater than one, which explained 74% of the total variance, were used to identify Cr, Pb, Zn and Cu as HPDS-1. PCA is based on the correlation matrix of the soil variables and the interrelationship of elements is the main factor for removing less important variables (Raiesi 2017; Rezaei et al. 2006). Although PCA is a conventional approach for removing redundant data (Askari and Holden 2015), some failures have been reported when interpreting statistical parameters to identify the most proper variables (Rossi et al. 2009). For instance, the importance of each element and their degree of pollution risk are not considered using PCA, and it may result in a failure to identify essential elements for indexing SP.

DA indicated the importance of Cd > Pb > Zn as HPDS-2 for assessing SP (Table 8). The DA results are consistent with those of Parizanganeh et al. (2012) and Zamani et al. (2015), who found that Cd, Zn and Pb were key indicators for the evaluation of SP in Zanjan Province. The maximum values of Cd, Zn and Pb (Table 3) were higher than their maximum allowable thresholds determined by the World Health Organization and Food and Agricultural Organization. The maximum allowable limits reported for Cd, Zn and Pb were 3, 100 and 300 mg kg−1, respectively (Khalid et al. 2017). Cd had the highest level of pollution (PI > 3) and ecological risk (\( {E}_r^i \) > 90) under all three land uses. The urban and industrial areas were highly polluted by Cd, Zn and Pb (Table 10). These results confirmed the superiority of DA over PCA for best identifying pollutant elements in Zanjan soil.

Sources of heavy metals

The scatter plots of three PCs (Fig. 3) revealed spatial adjacency for Cr, Co and Ni, which had higher loading values in the PC1 (Table 4). These elements had small coefficients of variation (Table 3; CV < 40%) and were highly correlated (Table 5). Zn, Pb, Cu and Cd were also spatially close together (Fig. 3). The higher loading values in PC2 were for Zn and Pb and in PC3 were for Cu and Cd (Table 4). A similar grouping for soil heavy metals was reported by Rodríguez et al. (2008) and Qishlaqi et al. (2009). Cd, Cu, Zn and Pb correlated significantly, and these elements had CV > 70%. High correlations are usually reported among elements that have a common source of pollution (Hu et al. 2018; Li and Feng 2012; Zhang et al. 2018).

Heavy metals with geogenic sources have relatively smaller CVs than elements that accumulate in soil owing to anthropogenic activity (Yongming et al. 2006). Many studies have used a high CV to identify heavy metals, having anthropogenic sources in soil (Ding et al. 2017; Fan and Wang 2017). Because Cr, Co and Ni had low and negative correlations with Pb, Zn, Cd and Cu, it could be inferred that the total concentrations of these two sets of heavy metals were controlled by different factors. In addition, Zn, Cd and Cu (r > 0.47) correlated significantly with EC and organic carbon (Table 9). SOC and EC are more likely to be affected by human activity compared with the other soil properties measured in this study (Husson et al. 2018; Schweizer et al. 2018). The amount of Cr, Ni and Co in the soil was usually controlled by pedogenic and geogenic factors such as weathering of calcareous parent-material. Anthropogenic factors less affected their contents in the soil (Qishlaqi et al. 2009; Rodríguez et al. 2008).

Comparison of the concentration of heavy metals and soil standard ranges reported for Iranian soil resource quality guidelines (DEIRI 2013) and the globally accepted standard values of heavy metals in non-polluted soils (Sherameti and Varma 2015) showed that the Cu, Cd, Pb and Zn concentrations were higher than their normal range for non-polluted soil. High pollution by Cd, Zn, Pb and Cu was also confirmed by considering their \( {E}_r^i \) and PI values (Table 10). These results confirmed the efficiency of the techniques used to identify anthropogenic or geogenic sources of heavy metals in this study. Lead and zinc smelting and mining could be the main sources for Pb and Zn accumulations. The increases of Cu and Cd might be related to urban and agricultural activity such as traffic and the application of phosphorus fertilizer (Li et al. 2009). Nicholson et al. (2003) concluded that agro-genic activity results in the accumulation of Cd and Cu in soil.

Indexing and mapping soil pollution

Of the requirements for monitoring soil condition, cost, reliability and the simplicity of sampling and measurement are mentioned as important factors, which can affect the practicality of assessment methods (Askari and Holden 2014 and 2015). This study evaluated a simple yet comprehensive indexing framework that could be applied to identify the critical elements and integrate them into a SPI. Unlike the conventional soil pollution indices, such a framework avoids the possibility of the unsuitable choice of specific factors such as the geochemical background concentration, pre-industrial reference or specific toxicity response coefficient, particularly when the determination of these factors are considered costly, inaccurate or difficult. This framework had the ability to determine the anthropogenic and geogenic source of HM in soil. Thus, it could be used for monitoring anthropogenic activities that caused the accumulation of heavy metals.

For a practical assessment of SP, it is important to identify critical pollutants in the soil as essential indicators for indexing SP. Accordingly, all elements, which were recommended as potential pollutant heavy metals in the study area, were considered as the TDS for evaluating HMP. Although a more comprehensive result might be obtained using indices developed based on all potential pollutant elements (Askari et al. 2015), an appropriate MDS could reduce time and cost of SP assessment and could remove co-linearity and data redundancy (Bünemann et al. 2018). Therefore, fourteen SPIs were calculated using five data sets (TDS, HPDS-1, HPDS-2, ADS and GDS). The validation of SPI was imperative to assure that the elements were selected wisely for developing the SPI. An inappropriate omission of some elements from SPI causes uncertainty during SP evaluation and reduces the comprehensiveness of SPI. On the other side, an improper inclusion of elements may reduce the efficiency of SPI for precisely assessing the pollution risk degree and the ecological risk of critical elements.

Different assessment methods were considered to evaluate the reliability of SPIs using the best-known pollution indices (\( {E}_r^i \), PI, IPI and RI) and the pollution risk classes (Table 1). The degree of pollution risk for each individual element was evaluated by employing \( {E}_r^i \) and PI. The holistic assessment of HMP in soil was considered by the use of IPI and RI, which combined all analysed elements for a comprehensive assessment of SP. RI and \( {E}_r^i \) have been suggested as reliable indices for evaluating the ecological risk of soil HMP (Men et al. 2018; Mohseni-Bandpei et al. 2017). PI and IPI have been also tested as practical approaches for assessing the pollution level of heavy metals (Chen et al. 2005; Meza-Montenegro et al. 2012). The differentiation ability of SPIs among the pollution risk levels (Tables 1 and 12) was also examined, as a complementary approach, to determine objectively whether the SPIs reflected the actual risk of HMP in the studied lands. A better validation result was observed for linear indices (SPI-10 and SPI-12) compared with non-linear indices (SPI-9 and SPI-11) that was consistent with the findings of Askari and Holden (2015) and was contrary to the findings of Masto et al. (2008) and Andrews et al. (2002) who reported better results using non-linear indices for indexing the soil condition. The greatest accuracy was obtained for SPI-10 developed using HPDS-2 and the linear scoring function.

Owing to the lower mean standard error and minimum RMSE, ordinary kriging (exponential semi-variance model) was applied to map the SP of heavy metals in this study (Naderi et al. 2017). Kriging interpolation is a conventional approach used to map heavy metal distribution in many studies (Alyazichi et al. 2015; Moore et al. 2016). The SP map based on the cut-off values of SPI-10 (Fig. 4) indicates that 3.8% of the study area had excessive accumulations of heavy metals and could be classified as having poor soil quality, 31.5% was classified as moderate HMP and 64.7% was classified as having a low level of HMP. The indexing approach used can provide a simple and reliable method for assessing HMP in soil.

Conclusion

With regard to the research objectives

  1. 1.

    Pb, Zn and Cu were identified using PCA as HPDS-1, and Cd, Zn and Pb were identified as HPDS-2 using DA. DA yielded a better data set for indexing SP and showed the highest pollution risk for Cd in Zanjan.

  2. 2.

    A combination of PCA, variation assessment and interrelationship evaluation of soil variables yielded a reliable approach for identifying heavy metal sources. The exploratory multivariate analysis used in this study indicated that Cd, Zn, Pb and Cu had accumulated in the soil from anthropogenic sources and Cr, Ni and Co from geogenic sources.

  3. 3.

    The highest accuracy for indexing HMP was obtained using a minimum data set of Cd, Zn and Pb, which were identified as HPDS using DA. The linear function and additive method provided the best scoring and integrating approach for indexing SP.

The validation results confirmed the efficiency of the suggested multivariate approaches for reliable identification of critical pollutant elements, source appointment and indexing HMP. Excessive accumulation of Cd > Pb > Zn > Cu in Zanjan soil, particularly in urban and agricultural areas, poses a serious risk to the health of humans and animals.