Introduction

Heavy metals (HMs) are of great significance in soil environments, owing to their high ecotoxicity and degradation resistance once becoming excessively persistent in soils (Hu et al., 2020; Marchant et al., 2017; Saby et al., 2006; Xia et al., 2019; Xie et al., 2018). Originally, HMs come from natural backgrounds, i.e. so-called endogenous sources, which refers to, e.g. a complex parental rock weathering process (Hu et al., 2017; Ogunlaja et al., 2019; Schneider et al., 2016). However, as industrialization and urbanization has progressed, anthropogenic activities, known for ‘exogenous sources’ such as various industrial enterprises emitting solid wastes and sewage, mines exploitation, fertiliser inputs, and traffic emissions, have become major contamination sources (Wei & Yang, 2010). The combined endogenous and exogenous contaminations have resulted in complex spatial variability, making soil remediation difficult to carry out (Uría et al., 2009; Marchant et al., 2011). Without having a knowledge of the contamination sources, removing pollutants in the soil environment can only be temporary and consumes significant time and resources (Proshad et al., 2019; Song et al., 2018; Sungur et al., 2019). Thus, identifying and apportioning the contamination sources in a soil environment can be necessary and highly valuable (Chen et al., 2018; Dong et al., 2019).

Source identification and apportionment generally refers to utilizing multivariate models to identify and apportion the contamination sources of HMs in soils. It can not only efficiently contribute to remediation of soil contamination, but also assists in relevant management for decision-makers (Luo et al., 2015; Manoli et al., 2002). The approaches can be divided into qualitative and quantitative approaches, according to recent research (Zhi et al., 2016). The qualitative approaches include multivariate statistical analysis techniques consisting of methods like principal component analysis (PCA) (Dai et al., 2018; Gülten 2019), cluster analysis (CA) (Facchinelli, et al., 2001), factor analysis (FA) (Dong et al., 2019), a support vector machine (SVM) (Chen et al., 2013), and tracing techniques consisting of stable isotope ratio technique (Glaser et al., 2005). Based on the various algorithms for data analysis, multivariate statistical methods are widely used, owing to their advantages in low costs and easy employment (Davies, 1997; Nikos et al., 2012). However, these models can also suffer flaws, such as the negative effects of abnormal values, and mismatches between the apportioned source and the real emitting source (Huang et al., 2015). As for the stable isotope ratio techniques, isotope tracing techniques can accomplish a contamination circle for determining integrated routes, but their application can be limited by e.g. expensive costs, low concentrations of HMs, or minor differences between ratio of isotopes (Zhu et al., 2017). As a result, more comprehensive profile analysis methods are required.

A finite mixture distribution model (FMDM) is a stochastic approach which separates an entire integrated data distribution into several subgroups of-distributions, while also estimating the proportions of natural and anthropogenic sources of HMs in soils according to comparisons between the relevant parameters of the subgroups of-distributions and soil background values (Lin et al., 2009; Shao et al., 2018). Raster-based principal components analysis (RB-PCA) is an improved version of the PCA, and combines geostatistical interpolation with a statistical approach, instead of interpolating each component through scores calculated by a dimensionality reduction (Wang et al., 2018). The RB-PCA replaces the original points with each HM's interpolation raster map as a data source, and calculates the score of each raster with a higher precision map, helping to identify contribution sources more efficiently.

As a main type of quantitative apportion, receptor models—including positive matrix factorization (PMF) (Mehr et al., 2017), absolute principal component score/multiple linear regression (Huang et al., 2018), and ‘UNMIX’ (Liao et al., 2020)—do not need to acquire prior knowledge regarding source profiles, but quantitatively apportion contamination sources by outputting valid contribution values of every source regarding every HM, which exactly compensates for the drawbacks of the qualitative methods. A PMF model considers the bias and uncertainties during the entire calculation process (Liang et al., 2017). The factor contributions' and factor profiles' non-negative limitations also make it more plausible as compared to other multivariate statistical models. Therefore, it can be the prior choice for apportioning potential contamination sources (Hu et al., 2019; Lv et al., 2018).

Overall, this study suggested an integrated proposal combining qualitative and quantitative means to identify and apportion potential contamination sources of HMs (Cd, Hg, As, Pb, Cr) in soils in the study area. The main objectives of this study were to: (1) generally assess the ecological risk level of the study area through a potential ecological risk index (RI); (2) qualitatively identify the natural and anthropogenic sources, and interpolate/map the spatial location of each source using the FMDM and RB-PCA, as assisted with natural and social auxiliary datasets; and (3) quantitatively apportion each source with factor contribution loadings of every HM, using PMF.

Materials and methods

Sampling and chemical analysis

Located in the southeast of the Zhejiang province (27°03′–28°36′ N, 119°37′–121°18′ E), Wenzhou City covers a total land area of 11,784 square kilometres (Fig. 1). With a topography that decreases from west to east, Wenzhou City consists of low hills and plains near the sea. Wenzhou City is a vital port, not only in Zhejiang province but also across all over China, which has a long history of industrial development and business trade.

Fig. 1
figure 1

Maps of study area and sampling sites

In this study, a total of 1474 topsoil (0–20 cm) samples were collected in the study area. The collecting process can be simply described as: (1) dividing the study area into different strata according to land use types; and (2) applying grid to set the sample sites in each of strata. For land use types except residential land, one sample point is collected within a square of 1 km × 1 km, but for residential land areas where enterprises were clustering, the number of samples within a square are augmented to ensure that the sample statistics are efficient. As for the sample details, they obeyed a basic principle of setting a soil sample at an intersection point, and combining it with five subgroups of samples collected from five locations within 5 m. The specific coordinates of the sampling locations were recorded with a GPS.

Auxiliary variables

Several types of auxiliary datasets were provided and categorised, respectively, as natural environmental, and economical social datasets. This study obtained information regarding the soil parental material in the study area from a 1:20,000 soil map of Zhejiang Province published by the National Soil Survey of China. For the anthropogenic factors, industrial enterprises, ore mines generally have most serious impacts, as they cause multi-environmental pollution through contaminated airflows or sewage irrigations during processing (Men et al., 2018; Tian et al., 2018; Zhang et al., 2018). In this study, 1846 enterprises within the study area were collected from the General Survey of Key Pollution Sources in Zhejiang Province in 2016, and were categorised into four main categories according to National Industry Classification standard documents: GB/4754–2011. In addition, 285 mines were surveyed and divided into seven different categories according to their quantity and impacts on soil HMs.

Fertiliser inputs and traffic emissions also have vital impacts on the HMs in soils. Different types of fertilisers contain certain contents of HMs and can influence the aggregation and transfer process, as well as the forms in which the HMs exist in soil (Yan, 2008). In detail, nitrogen fertilisers such as urea contain Cd, Pb, Cu and Zn; phosphate fertilisers such as superphosphate involve Cd, Hg, As, Pb, and Cr, while potash fertilizers such as potassium chloride include Cd, Pb, Cu and Zn; Moreover, organic fertilisers can also have impacts on HMs for it interference the viability of HMs in the soil threfore result in contamination. In this study, we collected the fertiliser data from the Survey of Heavy metals in Agricultural Products Areas in Zhejiang Province along with sample points collection process and calculated the contents of four fertiliser inputs (nitrogen fertiliser, phosphate fertiliser, potash fertiliser, and organic fertiliser) on the farmland of each town or village in Wenzhou City. Moreover, particulate matter such as dusts containing HMs generated by vehicle fuels and brake pad wear can also enter the soil on both sides of the road during vehicle driving (Aminiyan et al., 2018; Li et al., 2017; Nabulo et al., 2006; Zhu et al., 2015). In this study, traffic system vector datasets including different levels of roads were obtained from the Institute of Geography and Natural Resources Research of the Chinese Academy of Sciences for further analysis.

Methodology

Buffer areas and fishnets construction and analysis

As a prerequisite for the quantitative analysis of the anthropogenic auxiliary datasets, fishnets and buffer areas were constructed to help analyse the impacts of potential contamination sources. For industrial enterprises, firstly we created fishnets of 1.5 km covering the study area and counted the samples falling into the units and then implemented the HMs contents of sample points spatially joined with each unit; therefore, each unit can have the attributes of every point falling within, and then the mean values of HMs of each unit were calculated. Similar to ore mines, we constructed buffer areas with a radius of 2 km (Kosharna & Korzhnev, 2018) and made the HMs of the sample points spatial join with the buffer areas of different kinds of ore mines as well, and then the mean values of HMs were calculated in buffer areas of different kinds of mines. As for traffic emissions, we initially set different threshold distances to different levels of road as the radius of the line buffer polygons (Pirsaheb et al., 2016), then set the HMs of the sample points spatial join with the buffer polygons of different levels of road, and finally calculated the mean values of HMs in different levels of roads.

Potential ecological risk index

The potential ecological RI was deemed as an accurate approach for assessing the ecological risk caused by HMs (Kashif et al., 2020). It considers the concentrations, categories, levels of toxicity, and sensitivities of HMs together, and classifies them into different potential ecological levels using a quantitative methodology. Therefore, it has been used in most ecological studies. The calculation process can be shown as in the equation below (Hakason, 1980):

$$RI= \sum_{i=1}^{m}{E}_{r}^{i}=\sum_{i=1}^{m}{T}_{r}^{i}\times \frac{{C}^{i}}{{C}_{n}^{i}}$$
(1)

\({E}_{r}^{i}\) is the potential ecological RI of the heavy HM i, and \({T}_{r}^{i}\) is the toxicity index of each HM. For Cd, Hg, As, Pb, and Cr, their toxicity indexes are 30, 40, 10, 5, and 2, respectively (Hakason, 1980). \({C}^{i}\) denotes the centration of HM i, \({C}_{n}^{i}\) denotes the background value of HM i, and m is the genre of HM. The classification of the potential ecological risk (\({E}_{r}^{i}\)) and hazard quotients of the HMs can be seen in Table 1.

Table 1 Classification of potential ecological risk (\({E}_{r}^{i}\)) and hazard quotient (risk index, RI)

Finite mixture distribution model (FMDM)

Given that for a random variable x, a mixture distribution consists of m components, and the distribution of the ith individual component is determined by a specific probability density function (i.e. pdf) fi(x), we can express the general pdf f(x) for the mixture distribution as (Lin et al., 2009):

$$f\left(x\right)=\sum_{i=1}^{m}{\pi }_{i}{f}_{i}\left(x\right)={\pi }_{1}{f}_{1}\left(x\right)+\dots +{\pi }_{m}{f}_{m}\left(x\right)$$
(2)
$$\sum_{i=1}^{m}{\pi }_{i}=1\left(0\le {\pi }_{i}\le 1\right)$$
(3)

Here, πi presents the mixed weights of every subgroups of-distribution.

It is known that most natural processes, and especially HMs in soils, usually express as a normal distribution or a log-normal distribution (Zhi et al., 2016). Therefore, we used the log-normal distribution to describe the contents of HMs in soils coming from different sources, which can be illustrated as follows:

$$f_{i} \left( {x\left| {\mu_{m} ,\sigma_{m} } \right.} \right) = \frac{1}{{\sqrt {2\pi } \sigma_{m} x}}e^{{ - \frac{{\left( {\ln x - \mu_{m} } \right)^{2} }}{{2\sigma_{m}^{2} }}}} ,x > 0$$
(4)

In the above, µm and σm denote the mean and standard deviation of every subgroups of-distribution, respectively. Such parameters were calculated using the expectation maximization algorithm. As a result, we applied a Chi-square goodness-of-fit test to test the null hypothesis H0, to ensure that the assumed model could be fit with the observed distribution. In particular, the process of calculating the cut-off value between the ith and (i + 1) components can be presented as follows:

$$\pi _{i} \int\limits_{{a_{0} }}^{{ + \infty }} {f_{i} \left( x \right)dx = \pi _{{i + 1}} } \int\limits_{{ - \infty }}^{{a_{0} }} {f_{{i + 1}} \left( x \right)dx}$$
(5)

Raster-based principal components analysis (RB-PCA)

Traditional PCA obeys a principle of using an orthogonal transformation to convert a set of observations of variables that might be correlated with a set of values of linearly uncorrelated variables (Shao et al. 2018). Here, we improved the accuracy of cluster manipulation by replacing the original point variables (which exactly referred to the content of each HM) with inverse distance weighted (IDW) prediction maps (\({z}_{1},{z}_{2},\cdots ,{z}_{k}\)). As the results of IDW prediction maps tend to be the most authentic values within interpolation methods, replacing individual points scores with successive raster scores can prove to have high validation accuracy.

$$\mathrm{f}={a}_{1}{z}_{1}+{a}_{2}{z}_{2}+\cdot \cdot \cdot +{a}_{k}{z}_{k}$$
(6)

Here, \({a}_{1},{a}_{2},\cdots ,{a}_{k}\) denotes the respective coefficient of every cluster score.

Positive matrix factorization (PMF)

As a multivariate FA tool which is originally intended to address chemical mass balance problems, the PMF model can be summarised as (Pekey & Dogan, 2013):

$${\text{X}} = {\text{GF}} + {\text{E}}\;{\rm or }\;x_{{ij}} = \sum\limits_{{k = 1}}^{p} {g_{{ik}} f_{{kj}} + e_{{ij}} }$$
(7)

In the above, \(\mathrm{G}(\mathrm{m}\times \mathrm{p})\) denotes the factor contribution, and \({x}_{ij}\) is one of the factors; \(\mathrm{G}\left(\mathrm{m}\times \mathrm{p}\right)\) represents the matrix of the factor profile, whereas \({g}_{ik}\) represents the contribution from factor k to receptor i, \({f}_{kj}\) represents the concentration of group j in factor k; \(\mathrm{E}(\mathrm{m}\times \mathrm{n})\) is the residual error matrix, and \({e}_{ij}\) represents every residual error item.

$${\text{Q}} = \mathop \sum \limits_{{i = 1}}^{n} \mathop \sum \limits_{{j = 1}}^{m} \left[ {\frac{{x_{{ij}} - \mathop \sum \nolimits_{{k = 1}}^{p} g_{{ik}} f_{{kj}} }}{{u_{{ij}} }}} \right]^{2}$$
(8)

PMF model uses parameter Q to minimize outputs derived from factor contributions and profiles. There can be two versions of Q, categorised as: (1) Q(true) is the goodness-of-it parameter calculated by including all points; and (2) Q(robust) is the goodness-of-it parameter calculated excluding points not fit by the model, but defined as samples in which the uncertainty-scaled residual is above four.

Under the constraint of a non-negative condition, the objective function Q could be minimised to produce more precise factor contributions and profiles, as enhanced by the 'Multilinear engine-2′. In the calculation process, the uncertainty could thus be obtained based on an element-specific method detection limit, and the error percentage could be measured using standard reference material (Norris and Duvall 2014).

Combined mechanism of three models

The three models interpret source identifications and apportionments of the HMs in soil with different mechanics; they are also complementary to each other. The FMDM analysis process starts from the statistical distribution analysis of each element (Hu & Cheng., 2013). It sufficiently identifies contamination sources for each HM, especially for dissociating between natural and anthropogenic sources, whereas RB-PCA conducts a qualitative approach to identify potential contamination sources within the study area. By combining the high scores in the spatial distribution of each principal component and conducting auxiliary multiple variables analysis, the sources could be determined (Dong et al., 2018; Qu et al., 2013). In this study, a combination of FMDM and RB-PCA approaches will not only separate natural or anthropogenic sources statistically within the study area, but also can provide further qualitative insights into the spatial patterns of different contamination sources. Apart from first two approaches, PMF adds quantitative results to the apportionment scheme, by outputting the defined sources' contributions to every element (Liang et al., 2017). As a result, an integrated contamination source identification and apportionment is achieved (Fig. 2).

Fig. 2
figure 2

The flowchart of combination of models in the study

Data analysis

In this study, all of the statistical analysis and correlation tests were conducted using the Statistical Package for Social Sciences (SPSS) Statistics 25 (IBM Corp.; 2016; IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp). The construction of fishnets and buffer areas, expression of spatial distribution, spatial join, and interpolation of axillary variables were all accomplished by ArcGIS 10.3 (ESRI, ArcGIS 10.3, Redlands, CA, USA). In the RB-PCA, the basic IDW interpolation of every element was fulfilled using a gstat package, whereas the PCA was conducted using a psych package in the software R Studio. The FMDM was completed in R Studio with the Mclust package, and the PMF model was utilised based on the US-EPA PMF 5.0 program (Norris and Duvall 2014).

Results and discussion

Summary statistical analysis of heavy metals (HMs) in soils

The basic summary of the statistical analyses of Cd, Hg, As, Pb, and Cr from the 1474 sample points is given in Table 2. The average concentrations of the five HMs were 0.28, 0.16, 8.73, 37.13, and 59.09 mg/kg, respectively. Except for Pb, the means of the other HMs were all high above the background values (Wang et al., 2007), indicating that apart from natural factors such as parental materials, other anthropogenic activities should have influences on the spatial heterogeneity of HMs in soils. (Dong et al., 2018). The coefficient of variation (CV) revealed the degree of the dispersion of the statistics (Brock, 1953). The CVs of Cd, Hg, and As all exceeded a value of 100%, which reflected that there would be extreme high values in these HMs.

Table 2 Summary statistics for HMs in topsoil (mg/kg)

Potential ecological risk assessment of HMs in soils

The potential ecological coefficient \({E}_{r}^{i}\) and hazard quotient RI were calculated and are summarised in Table 3. According to the mean value of \({E}_{r}^{i}\), the potential ecological risk assessment can be ranked as Hg > Cd > As > Pb > Cr. In particular, the mean values of \({E}_{r}^{i}\) of Hg and Cd were 57.44 and 50.33, respectively, accounting for 44.26% and 38.78% of the RI, respectively. Thus, Hg and Cd can be recognised as greater ecological contamination contributors, with a moderate ecological risk in soils. Apart from Hg and Cd, the other HMs were below the threshold (40) and can therefore be treated as being of low ecological risk in soils. The value of RI ranged from 57.71 to 4192.53, with a mean value of 129.78. According to the rule of hazard quotient classification, the mean value was far below the first threshold value of 150, indicating that the general ecological risk of the entire study area was considered to be safe. Nevertheless, special attention should be paid to the maximum value of the RI, which reached a high value of 4192.53, far above the highest threshold of 600. In addition, the main contributors Cd and Hg had respective contributions of 2865.17 and 881.89 to this value, suggesting there should be abnormally extreme high values of these two HMs. In that regard, anthropogenic activities such as industrial emissions, mine exploitation, urban life wastes, and agricultural inputs could all be potential sources (Han et al. 2016) (Table 4).

Table 3 Potential ecological coefficient (\({E}_{r}^{i}\)) and hazard quotient (RI) of HMs
Table 4 Mean values for HMs’ contents in diverse land use cover (mg/kg)

Auxiliary variables analysis

Natural variables analysis

Land use cover and population density

The map used for the land use cover and population density distribution is shown in Fig. 3. As can be seen, woodland and arable land predominantly covered the entire study area, taken up the total areas of Wenzhou City up to 62.15% and 23.33%, respectively. Grassland scattered over the northwest areas, commonly associated with woodland areas and taken up 2.38% areas. The residential areas were located mostly by the coast and near rivers, taken up 8.60% areas. The P values of most HMs of different land use cover showed significant differences. The quantitative analysis showed the Cd in residential land were on average higher than in other regions, which reflected that anthropogenic influences, particularly from human activities and life wastes, had certain impacts on these values (Jiang et al., 2017).

Fig. 3
figure 3

Map of land use cover and population density in the study area

Soil parental materials

The parental materials' and subgroups of parental materials' spatial distributions are shown in Fig. 4. Slope wash was dominant within the entire study area, whereas other parental materials were mainly distributed in the coastal areas. According to Table 5, the P values of Cd and Pb of different parental materials showed significant differences which indicates that the content of both HMs varies evidently from different types of parental materials. Particularly, Cd has a relatively high content in fluvial faces deposit, lacustrine faces, and slope washes, whereas Pb originates form mud faces and channel faces with a higher content. Other parental materials also had different contributions to the various HMs. Slope wash, especially for sand shale, had the most multiple contributions for Hg, As, and Pb.

Fig. 4
figure 4

Maps of soil parental materials and subgroups of parental materials

Table 5 Mean values for HMs’ contents in different parental materials and their subgroups (mg/kg)

Anthropogenic variables analysis

Industrial enterprises and mines

The map of enterprises spatial distribution illustrated in Fig. 5 showed that most enterprises were clustered in the coastal and main city areas. The map of kernel density clearly showed the degree of aggregation status; the highest density areas were also located along the riverside and urban areas. Fishnet maps of the four main categories, as shown as Fig. 6, delivered accurate quantity aggregation information for the different types of enterprises (Li et al., 2019). Figure 6a shows that the chemical manufacturing enterprises were mainly distributed along the river and the south coast areas, and between one and five enterprises were mainly contained in each grid. Figure 6b illustrates that metal products showed a similar spatial distribution trend as in Fig. 6a, but with additional enterprises in each grid. As for Fig. 6c, life wastes enterprises were scattered in the entire study area, whereas Fig. 6d shows that very few other enterprises were clustered in the Middle East coastal areas.

Fig. 5
figure 5

Maps of enterprises' spatial distribution and kernel density

Fig. 6
figure 6

Fishnet maps of enterprises divided as four main categories: a chemical manufacturing, b metal products, c life wastes, d others

A statistical analysis of the HMs average contents in different enterprise categories is given in Table 6. The P values of most HMs within buffer areas of different enterprise categories also showed great differences. As compared to 942 samples which had no industrial influences, the average contents of HMs in the chemical manufacturing, metal products, and life wastes enterprises were all above the original ones, confirming that enterprises with environmental wastes had impacts on the HMs in soils. Besides, it was also clear that with an increasing quantity of enterprises in a cluster, the HMs aggregated more in the soil. In particular, for metal products, the increasing trend between enterprise quantity and HM content was evident for Cd, Hg, and As, which were also the main elements in the industrial emissions.

Table 6 Mean values for HMs’ contents in buffer areas of different enterprise categories (mg/kg)

The map of mine distribution (Fig. 7) showed the spatial location of every mine. As opposite from the cluster conditions for enterprises, the mines in Wenzhou City were generally scattered in the west side of the entire study area. Table 7 also shows that basically each type of mines all had different levels of influences on the HMs in soils, as the mean contents of HMs near mines were generally high above those far from mines. In particular, according to the largest value gaps between the average contents near mines and those far from mines, it was reasonable to believe mine exploitation had the most impact on Cd, As, and Cr (within the five elements).

Fig. 7
figure 7

The map of mine distribution of different categories

Table 7 Mean values for HMs’ contents in buffer areas of different mine categories (mg/kg)
Fertiliser inputs and traffic emissions

The map of different fertilisers inputs in all towns and villages (Fig. 8) revealed that high inputs of four fertilisers were basically centred in the north and middle counties of the study area, except for the southwest parts of the study area, with few little potash fertiliser inputs. Table 8 validates the influences that the fertiliser inputs have on the HMs in soils, since the P values of HMs within regions of different kinds of fertilisers inputs showed significant differences as well. With increases in the value intervals of different fertilisers, the contents of the HMs also increased. For the factor of traffic emissions, as the sample numbers falling into the line buffer areas were so limited, the impacts that traffic emissions (Table 9) have on the HMs in soils cannot be summarised clearly in our study. Accordingly, we ignored this factor in the discussion.

Fig. 8
figure 8

The map of contents of different fertilisers in towns or villages

Table 8 Mean values for HMs’ contents in different fertilisers value interval (mg/kg)
Table 9 Mean values for HMs contents in different road thresholds (mg/kg)

Source identification and apportionment

FMDM

The FMDM fitting results showed that Cd and Pb conformed to a log-normal distribution, whereas Hg, As, and Cr conformed to a log-normal mixture distribution. All of the five HMs passed the significance level test (P > 0.05). Cd and Pb fit the single log-normal distribution model shown in Fig. 9a, d. They both presented as only one single group. According to Table 10, for Cd, its mean value was slightly higher than the background value but still within the background interval, revealing that it was basically coming from natural parental materials, but might have been slightly influenced by other anthropogenic activities (Hu et al., 2018). As for Pb, in cases where its mean value was lower than the background, it was believable that most Pb in such study areas could be generally attributed to natural resources.

Fig. 9
figure 9

Finite mixture distribution model (FMDM) fitting of Cd a, Hg b, As c, Pb d, Cr e in soils

Table 10 The parameters, goodness of fit, and cut-off values for finite mixture distribution model (FMDM)

In contrast, Hg exactly fit the double log-normal mixed distribution model shown in Fig. 9b. The FMDM model identified Hg as two main groups, where one was accounted for 81%, and the other accounted for 19%. As the result of the first group’s mean value was 0.12, i.e. below the background value as 0.13, this source can be identified as a natural resource as well. For the other group, its mean value was 0.36, which was not only higher than the background value, but was also out of the range of background interval, showing that it was seriously influenced by one possible anthropogenic source.

As and Cr were identified as triple-normal mixed distribution models, as shown in Fig. 9c, e. There were three groups both identified for these two elements. For As, the account rate was 41%, 42%, and 17% for each group. The same as in the previous analysis, three potential sources could be identified (one natural source and two anthropogenic sources). In addition, as the second group’s mean value was still within the background interval range, but the third one was out of such range, the second group for As can be identified as moderately polluted, whereas the third group can be recognised as severely polluted. The same analysis can also apply for Cr, i.e. it can be recognised as one natural source accounting for 48%, and two moderately polluting sources, each accounting for 26%.

RB-PCA

Obtained from HMs of 1474 sample points, the IDW prediction map of five HMs provided reliable basis for RB-PCA, as both low values of ME and RMSE of five HMs revealed fine prediction accuracy (Huang, 2020). The sum of the cumulative variance showed that the five HMs could be divided into two main components (> 85%) (Wang et al., 2018). However, such results also showed that the contributions of Hg to both two components were so minor that could be ignored. In this case, here we justified two components to three, and the rotation matrix is shown in Table 11. Three components accounted for 35.46%, 55.38%, and 9.16% of the total, respectively.

Table 11 Raster-based principal component analysis (RB-PCA) rotation matrix parameters

The contribution of the first principle component (PC1) had factor loadings of 0.79 for both Cd and Pb, indicating that these two elements came from the same pollution source. The OK spatial interpolation map (Fig. 10a) showed that high scores of PC1 were generally scattered all over the study area. The second principal component (PC2) takes up 55.38%, with high factor loadings of 0.83 and 0.63 for As and Cr, respectively. Figure 10b shows that the high-scoring clusters were mainly distributed in the west part of the study area. With factor loadings of 0.90 and 0.48, the third component (PC3) contains Hg and Cr. Figure 10c shows high scores generally centred in the north and middle parts of the study area, where the residents and industrial enterprises were located.

Fig. 10
figure 10

Raster interpolation maps of three principal components

PMF

Considering a few experiments results of setting the factor number of PMF model from 1–5, and pairing with the previous two model analysis, we set the factor number to three in which the accuracy test shows the highest, and then validated the model fitting status and uncertainty. The basic model fitting parameters of the PMF are shown in Table 12. Both the general linear fitting status and the uncertainty diagnostics of the five HMs were quite satisfactory, with the highest R2 (of Hg) reaching 0.99, and the lowest (Pb) still above 0.73. The KS test P values were all above 0.05, indicating an accurate fit of the PMF modelling (Lv et al. 2018). Within 50 Bootstrap runs, the DISP LD in Q and %dQ remained pretty low level, revealing a stable model output (Norris and Duvall 2014). In addition, to make the contribution gap clearer, we rotated the positive matrix, by setting the F peak = −0.5 (Han et al., 2016).

Table 12 The fitting parameters and uncertainty of the positive matrix factorization (PMF) model

According to the results from Table 13 and Fig. 11, Hg was exclusively monopolised by factor 1 with the highest contribution of 80.77%, and comparingly low contributions of other factors. Factor 2 highly contributed 89.22% and 84.81% for both Cd and Pb, respectively, and contributed 7.31%, 35.84%, and 27.42% for Hg, As, and Cr, respectively. Factor 3 had less than great contributions for each HM, but was especially high for As and Cr, with contributions of 48.23% and 46.95%, respectively. In that regard, the high contributions of each factor to the applied HMs exactly matched the results of the previous two models.

Table 13 Source profile and contribution of the positive matrix factorization (PMF) model
Fig. 11
figure 11

The contribution percentage stacked bar chart

Comprehensive interpretation of contamination sources

By combining the results of three models, we found that there are mainly three contamination sources. The first contamination source was identified as natural lithogenic source with especially high contributions to Cd and Pb. The qualitative results of the PCA illustrated that the spatial distribution (Fig. 10a) of PC1 was scattered all over the study area, meaning it had no cluster trend, which was deemed as the symbol of anthropogenic activities. Moreover, the quantitative analysis (Table 5) also showed slope wash (containing tuffaceous tuffs and sand shale) taking up the largest parental materials portion of the study area (Fig. 4) with high values of Cd and Pb, matching the spatial analysis result. The results of the FMDM also showed the single log-normal statistical status of these two elements. As their mean values were close to the background values, it was certified that the first contamination source should be a natural lithogenic source. Moreover, according to the quantitative results of the PMF, the natural lithogenic source contributed especially high for Cd and Pb as 89.22% and 84.81%, respectively, and for Hg, As, and Cr as 7.31%, 35.84%, and 27.42%, respectively.

The second contamination source was attributed to industrial and agricultural mixed pollution, especially Hg-related pollution. As the results of the PCA suggested, the OK score interpolation map of PC3 (Fig. 10c) clustered on the urban areas, which is similar to the areas where most of the metal products and chemical manufacturing enterprises were located at (Fig. 6a and Fig. 6b); moreover, the distribution of fertilisers inputs (Fig. 8) also presents similar distribution as Fig. 10c. Besides, the corresponding quantitative statistical analysis of industrial enterprises (Table 6) revealed that metal products enterprises had severe impacts on Hg. The FMDM suggested that Hg had two main sources: one was a natural source, as discussed previously, and the other one was evidently from industrial Hg contamination which is mostly from enterprises of metal products, chemical manufacturing etc. In cases where the Hg was also influenced by potash and organic fertiliser inputs (Fig. 8c, d) and their high inputs areas also matched the urban areas, it was probable that the fertiliser inputs may also have had impacts associated with the enterprise emissions effect. According to the quantitative results of the PMF, industrial emissions and fertiliser inputs mixed pollution contributed particularly high for Hg at 80.77%, and comparingly low for Cd, As, Pb, and Cr at 2.94%, 15.93%, 4.79%, and 25.63%, respectively.

As for the third contamination source, it was dominated by mine exploitation activities. The PCA results suggested that according to Fig. 10b, the high scores were mainly clustered in the west part of the study area, generally matching the trend of where the mines were located at in the study area, as shown in Fig. 7. Table 5 also shows that mine exploitation had significant impacts on As and Cr; therefore, PC2 can be deemed as a main contribution from mine exploitation. The FMDM results also illustrated that As and Cr had a triple log-normal mixed distribution, which would have been influenced by mine exploitation activities, in addition to the natural source and industrial emissions sources previously discussed. The PMF results showed that this factor has a high contribution for As and Cr at 48.23% and 46.95%, respectively, and of 7.84%,11.92%, and 10.40% for Cd, Hg, and Pb, respectively.

Conclusion

This study employed qualitative and quantitative approaches (including FMDM, RB-PCA, and PMF) and applied multiple auxiliary variables to identify and apportion potential contamination sources of HMs (Cd, Hg, As, Pb, and Cr) in soils in Wenzhou City, China. The potential ecological risk assessment showed that the general ecological risk level of the entire study area was low, but Hg and Cd could be threatening, as they were considered as the main contributors and had extremely high values at the maximum level. The results indicated that three main sources had been identified and apportioned. Parental materials contributed to all elements, and especially for Cd and Pb, with high contributions up to 89.22% and 84.81%, respectively, and contributions for Hg, As, and Cr of 7.31%, 35.84%, and 27.42% respectively. Industrial emissions mixed with fertiliser inputs were another vital contamination factor in the study area, owing to large proportions of metal products enterprises and related fertiliser inputs. These contributed to extremely high levels for Hg, with contributions of 80.77%, 2.94%, 15.93%, 4.79%, and 25.63% for Cd, As, Pb, and Cr, respectively. Mine exploitation was the third important contamination source; it had high contributions for As and Cr at 48.23% and 46.95%, respectively, and at 7.84%, 11.92%, and 10.40% for Cd, Hg, and Pb, respectively. Such results can help decision-makers in making more efficient and scientific decisions in regulating HM pollution.