Introduction

Soil, the core element connecting the atmosphere, hydrosphere, biosphere, and lithosphere, provides the natural support and biological barrier of basically all ecosystems on the earth’s surface. The soil quality is inextricably linked to humans and environments (Lehmann et al., 2020). In recent years, due to the rapid expansion of cities, contamination of soil by heavy metals (HMs) has become increasingly severe worldwide (Árvay et al., 2017; Li et al., 2014). The accumulation of HMs not only affects the ecological environment, but also biological entitles, i.e., animals and plants. HMs contaminants directly or indirectly threaten human health by introducing metal elements that are essential for human health into the body, through various environmental exposure pathways such as drinking water and food chain (Huang et al., 2021; Jiang et al., 2022a; Yang et al., 2018).

In general, HMs such as Cd, Cr, Hg, Pb, Cu, Zn, Ni, and metalloid arsenic (As) are derived from both geogenic and anthropogenic sources (Kebonye et al., 2021). They are released naturally through weathering of geological parent materials and/or from various human activities including agriculture, transportation, mining and smelting, and wastes rich in metal residues (Li et al., 2021; Ungureanu et al., 2017). HMs are prioritized as contaminants under control due to their potential toxicity, persistence, and irreversibility (Lin et al., 2018; Man et al., 2010). HMs in soils would cause a threat to the health of human beings when they are converted from solid form to either ionic moieties or organometallic moieties (Madrid et al., 2002). To improve the soil environmental quality and protect human health, it is necessary to differentiate the natural and anthropogenic sources of HMs, determine the quantitative contribution, and assess the impact of HMs exposure on the ecological environment and human health (Cheng et al., 2020; Huang et al., 2021).

The past studies have been conducted on the distribution, source apportionment, and risk assessment of topsoil HMs under various human activities. However, most of the research focused only on human activity or the same receptor medium, such as the agricultural (Rodriguez et al., 2008; Zhang et al., 2020), urban parks (Gu et al., 2017; Liu et al., 2020; Wang et al., 2019), industrial park (Li et al., 2021; Long et al., 2021), and mining and smelting area (Árvay et al., 2017; Tian et al., 2018; Xiao et al., 2017). The distribution, the inter-relationship between heavy metals, and quantitative analysis of sources and health risk assessment of topsoil HMs under the influence of mixed anthropogenic activities (e.g., long-term mining and smelting of polymetallic ores and intensive agricultural activities) remained poorly understood.

Multivariate statistical analyses such as Principal Component Analysis (PCA), Chemical Mass Balance (CMB), and Cluster Analysis (CA) have been widely employed to reveal the distribution and possible sources of HMs (Davis et al., 2009; Facchinelli et al., 2001; Li et al., 2004). However, these methods are limited in identifying multiple sources simultaneously, obtaining the nonlinear correlations, and classifying different variables and samples, especially for high-dimensional and complex datasets (Astel et al., 2007; Kim et al., 2020). Compared with the conventional models, the positive matrix factorization (PMF) was developed to cope with uncertainties and error propagation problems (Huang et al., 2021; Zanotti et al., 2019). Recently, PMF has been widely applied to identify pollution sources and apportion contributions in various environmental media (Brinkman et al., 2006; Zhang et al., 2018). The success of its application is affected by the error of sample data values, model structure, and parameter representation, and the conclusion would rely on the interpretation of background data in the study area (Huang et al., 2021; Li et al., 2020; Mao et al., 2023). In addition, to accurately explain the relationship between variables, owing to the complexity and uncertainty of soil heavy metal occurrences, it is imperative to reduce and classify the data. To overcome the shortcomings of traditional classification methods, self-organizing map (SOM), which can reveal local relationships between variables and classify nonlinearity and dispersed data (Jiang et al., 2022a; Kebonye et al., 2021; Lee et al., 2019), was increasingly adopted in data classification of related research in the field of earth environmental science (Agyeman et al., 2022a; Nakagawa et al., 2020; Qu et al., 2021). Moreover, SOM supports the technique of using reference vectors to provide an informational picture of the data, which clearly shows the interdependencies between variables (Pearce et al., 2011; Wang et al., 2020; Zhu et al., 2020).

In the current study, primary topsoil HMs (Cu, Pb, Zn, Cr, Ni, Cd, As, Hg, and Mn) in areas with typical multi-mineral resources and intensive agricultural activities were analyzed to evaluate the pollution level of topsoil HMs. Combining with PMF and SOM as well as Pearson correlation analysis, the distribution and potential contaminant sources were investigated. The potential ecological-health risks were evaluated. This study provides a strategy and scientific basis for preventing, controlling, and remediating the HMs contaminations in areas undergoing mining and agricultural activities.

Study area

Mountainous, low mountainous, and plain areas make up the majority of the physiognomy in the study area. In terms of topography and land-use type, the study area is divided into mountainous, mining, agricultural, and residential areas. The region with strong agricultural activity (i.e., farmland zones) is primarily located at the upper piedmont sloping plain in the southwest, followed by the riverbed or riparian ribbon alluvial plain in the southern or eastern areas, or the valleys of the northwestern mountain (Fig. 1). There are also many agricultural activities in the central or northern region even though some farmlands are distributed in the valleys and mountains. Additionally, large-scale vegetable planting bases, modern agricultural industrial parks, and fruit industrial parks are found. There are abundant coal, iron, and gold ore resources, along with lead and zinc, phosphorite, and graphite mineral resources, while the other mineral resources are relatively small in size or distribution of ore spots (Jiang et al., 2022b).

Fig. 1
figure 1

Sampling location of HMs in topsoil samples under different land-use types in the study area

The coal resources are primarily located in the southwest and east mining areas of the study area. Iron mines are mainly located in the south and northwest area, followed by the central area, where hematite, magnetite, and titanium-magnet minerals of large scale are developed. The non-ferrous mineral resources (mainly lead–zinc ore, with a few copper and molybdenum ore) controlled by the metallogenic geological conditions are distributed in the southeast. The gold ores (49 deposits) are the mineral resources with the broadest distribution area and the largest mining scale in this region (around 1500 km2). Additionally, silver ore resources in the study area are less abundant, and most silver deposits are formed by the symbiotic or associated combination of Ag and other beneficial ore-forming elements (e.g., Pb and Zn).

Materials and methods

Soil sampling and analyses

A total of 101 topsoil samples (from 0 to 20 cm) were collected vertically by soil drills, including 67 residential farmland soils (RF), 17 mining farmland soils (MF), and 7 woodland–grassland soils (WG) (Fig. 1). In addition to the soil samples, 10 mining or tailing/slag waste samples (MS) were collected in the mining factory, smelting plant, and tailings pond (Fig. 1). Each sample is taken by a clean sampler to reduce cross-contamination and stored in polyethylene bags and sent to the laboratory for analysis. All samples were naturally dried at 20 °C until constant weight after removing the plant residual roots, gravel, and other debris, and then crushed and passed through a 2-mm sieve.

The samples of HMs were digested in sterile dry Teflon crucibles by a concentrated acid mixture (HNO3-HF-HClO4) and a quantified aqua regia solution (HCl-HNO3), and then placed on the heating plate in a fume cupboard. The electric heating plate was heated to 260 ℃ until the liquid evaporates completely. Then, 5 ml of HNO3 was added to the samples and held at 130 ℃ in a drying baker to dissolve the residue. When crucibles cooled, the samples were removed to a 50-ml volumetric flask and diluted with ultrapure water for testing. The concentrations of Cu, Pb, Ni, and Cd were determined by Inductively Coupled Plasma Mass Spectrometry (ICP-MS) (Thermo X series II, Thermo Fisher Scientific, USA), and those of As and Hg were analyzed by Atomic Fluorescence Spectrometer (AFS, HJ-680–2013, China). The concentrations of Zn, Cr, and Mn were measured by inductively coupled plasma atomic emission spectrometry (ICP-AES) (iCAP6300, Thermo Scientific, Waltham, USA). The analyses and quality assurance/quality control (QA/QC) for HMs were determined by duplicate samples, blanks, and standard substances (GSS-24) from the Centre of National Standard Reference Material of China. The recoveries of substrate samples and the standard substances ranged from 88 to 106% and 98% to 103%, respectively. The detection limits (MDLs) of Cu, Pb, Zn, Cr, Ni, Cd, As, and Hg were 0.6, 0.5, 0.03, 0.2, 0.6, 0.05, 0.2, and 0.005 mg/kg, respectively. The relative standard deviation of the duplicate samples was less than 5%. pH was measured by glass electrode method using a pH meter (Thermo, USA).

Evaluation method of pollution levels

Geochemical methods including geo‑accumulation index (Igeo) and enrichment factor (EF) can provide relative levels of soil heavy metal contamination.

Geo‑accumulation index (I geo)

Igeo is an intuitive geochemical standard for quantitative evaluation of heavy metal contamination levels which takes into account the influence of background values caused by natural geological processes as well as exogenous HMs produced by human activities:

$$I_{{{\text{geo}}}} = \log_{2} [C_{i} /(K \cdot B_{i} )]$$
(1)

where Ci represents the measured concentration of heavy metal i (mg/kg), Bi denotes the geochemical background value of corresponding element i (mg/kg), and K is the correction coefficient, generally 1.5 (Chen et al., 2019; Long et al., 2021).

Enrichment factor (EF)

EF is usually used to evaluate the enrichment degree of metal elements in topsoil and can also help to distinguish an anthropogenic source from natural sources:

$${\text{EF}} = \frac{{(C_{i} /C_{{{\text{ref}}}} ){\text{sample}}}}{{(C_{i} /C_{{{\text{ref}}}} ){\text{background}}}}$$
(2)

where Ci represents the concentration of element i, and Cref is the reference element for normalization (mg/kg). Mn is adopted as the reference element because it is more stable in the earth's crust than other metals (Cheng et al., 2018; Wu et al., 2010). The classification of the Igeo and EF is shown in Table S1.

Self‑organizing map (SOM)

The self-organizing map (SOM) is an unsupervised competitive learning neural network method. Similar sample points in high-dimensional space could be mapped to the neighboring neurons in the two-dimensional output layer (2D) after the systematic analysis of nonlinear complex data (Haselbeck et al., 2019; Mao et al., 2021). Additionally, the global and local relationships between variables are both displayed in the unified distance matrices (U-matrix) and median distance matrices (D-matrix) of SOM (Lee et al., 2019; Mao et al., 2021), as described by Jiang et al. (2022a) and Kim et al. (2020). The calculations and visualization of SOM were executed using MATLAB software, and a more detailed description is shown in Text S1.

Positive matrix factorization model (PMF)

In this study, the PMF receptor model 5.0 developed by the USEPA (2014) was used to identify the sources of HMs in topsoil. The PMF model decomposed the original matrix through several calculations and obtained the optimal matrix G and F to minimize the objective function Q:

$$X_{ij} = \mathop \sum \limits_{k = 1}^{p} G_{ik} F_{kj} + E_{ij}$$
(3)
$$Q = \mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{m} \left( {\frac{{E_{ij} }}{{U_{ij} }}} \right)^{2}$$
(4)

where i, j, and k represent the number of samples, elements, and different pollution sources, respectively. Xij is the concentration of the jth chemical component of the ith sample (mg/kg), Fkj is the contribution concentration of the jth chemical component of the source k (mg/kg), Gik is the contribution of the source k to the ith sample, and Eij is the residual matrix.

$${\text{If }}c \le {\text{MDL}},U_{ij} = {\text{ MDL}},{\text{ else}}, \, U_{ij} = \sqrt {\left( {{\text{Error fraction}} \times C} \right)^{2} + \left( {0.5 \times {\text{MDL}}} \right)^{2} }$$
(5)

where Uij is the uncertainty of the jth chemical composition of the ith sample. MDL and error fraction are the method detection limit and the percentage of measurement uncertainty, respectively. And C is the concentration of the elements.

Potential ecological‑health risks assessment

Potential ecological risks assessment

The potential ecological risk index was developed to evaluate the potential impact of contaminations on the ecological environment according to the toxicity of HMs and the response of the environment:

$${\text{RI}} = \mathop \sum \limits_{i = 1}^{n} E_{r}^{i} = \mathop \sum \limits_{i = 1}^{n} T_{r}^{i} \times (C_{m}^{i} /C_{R}^{i} )$$
(6)

where RI is the comprehensive potential ecological risk index for all HMs; \(E_{r}^{i}\) represents the potential ecological risk index of single heavy metal i; \(T_{r}^{i}\) stands for the biological toxic response coefficient of corresponding heavy metal i, and the values of element increase were 1(Zn), 1(Mn), 2(Cr), 5(Cu), 5(Ni), 5(Pb), 10(As), 30(Cd) and 40(Hg) (Agyeman et al., 2022b; Hakanson, 1980; Jiang et al., 2020; Li et al., 2021; Zuo et al., 2022); \(C_{M}^{i}\) and \(C_{R}^{i}\) are the measured concentration and the reference value of the heavy metal i, respectively (mg/kg). The assessment standard for \(E_{R}^{i}\) and RI is presented in Table S2.

Human health risks assessment (HRA)

In general, direct oral ingestion, dermal contact, and inhalation absorption are three major pathways for human long-term exposure to HMs in topsoil (Chen et al., 2015; Wang et al., 2019). The HRA model proposed by the United States Environmental Protection Agency (USEPA, 2011) was employed to quantitatively evaluate the potential health risk of hazardous substances to the human body.

Exposure doses through oral ingestion, dermal contact, and inhalation can be calculated as follows:

$${\text{ADI}}_{{{\text{ing}} - i}} = \, C_{i} \times \frac{{{\text{Ing}}R \times {\text{EF}} \times {\text{ED}}}}{{{\text{BW}} \times {\text{AT}}}} \times {\text{CF}}$$
(7)
$${\text{ADI}}_{{{\text{dermal - }}i}} = \, C_{i} \times \frac{{{\text{SA}} \times {\text{AF}} \times {\text{ABS}} \times {\text{EF}} \times {\text{ED}}}}{{{\text{BW}} \times {\text{AT}}}} \times {\text{CF}}$$
(8)
$${\text{ADI}}_{{{\text{inh}} - i}} = \, C_{i} \times \frac{{{\text{Inh}}R \times {\text{EF}} \times {\text{ED}}}}{{{\text{PEF}} \times {\text{BW}} \times {\text{AT}}}}$$
(9)

where ADIing, ADIdermal, and ADIinh represent the average daily intake from soil ingestion, dermal, and inhalation absorption, respectively (mg/kg·day). The significance and exact values of each parameter are shown in Table S3.

Non-carcinogenic risk assessment

Non-carcinogenic risk assessment of a single element in metal-contaminated soil is usually characterized by the hazard quotient (HQ). However, the hazard index (HI) represented the accumulative/total risk value of various non-carcinogenic indicators from all exposure pathways:

$${\text{HI}} = \sum {\text{HQ}}_{i} = \sum \frac{{{\text{ADI}}_{i} }}{{{\text{RfD}}_{i} }}$$
(10)

where the RfDi is the corresponding reference toxicity threshold dose of element i [mg/(kg·day)]. For non-carcinogenic risk, if HI > 1, there will be obvious potential adverse health effects for the exposed individual. Otherwise, it is considered to be an acceptable level (Gu et al., 2017; Zhang et al., 2020).

Carcinogenic risk assessment

Generally, carcinogenic risk (CR) refers to the probability of developing any type of cancer over an individual lifetime due to carcinogenic exposure in the environment. Similarly, total carcinogenic risks (TCR) are obtained by summing the individual cancer risks across all carcinogens and/or exposure pathways, as follows:

$${\text{CR}} = {\text{ADI}} \times {\text{SF}}$$
(11)
$${\text{TCR }} = \sum {\text{CR}}_{i} = \sum {\text{ADI}}_{i} \times {\text{ SF}}_{i}$$
(12)

where SFi represents the cancer slope factor of the element under different exposure pathways [(kg·day)/mg], and the values of SF and RfD from the literature are shown in Table S4. Generally, the tolerable level of carcinogenic risk ranges from 1 × 10–6 to 1 × 10–4, whereas TCR less than 1 × 10–6 is considered a negligible risk. If the values of TCR exceed 1 × 10–4, the level of carcinogenic risk is high and poses potential threat to human (Chen et al., 2015; Li et al., 2014). The values of RfD and SF for different HMs are shown in Table S4.

Source-oriented HRA

The contribution of different sources to health risk was quantified by combining the PMF with HRA model. The concentration Ci of different HMs in Eqs. (7)–(9) was replaced by \(C_{ki}^{n}\), as follows:

$$C_{ki}^{n} = C_{i} \times C_{k}^{n}$$
(13)

After that, the source-oriented HRA was repeated using Eqs. (10)–(12). Where \(C_{k}^{n}\) represents calculated contribution rate of heavy metal i at sample k originating from identified source n.

Results and discussion

Characteristics and distributions of HMs

The geochemical background values (GBVs) of soil environmental quality and the risk control standard values (RSVs) for soil contamination of agricultural land (GB 15,618—2018) were employed for contrastive analysis of the content and distribution characteristics of HMs. The HMs in the topsoil samples vary greatly and were dependent upon land-use types (Table 1). Although the contents of HMs such as Cu, Cr, Pb, and Cd in a few mining waste residues were above the RSVs of corresponding soil quality standards, the HMs contents in most soil samples were lower (Table 1). However, with the exception of 74.26% Cr and 64.36% As, the contents of Cu, Pb, Zn, Ni, Cd, and Hg in most samples were indeed greater than the GBVs (Table 1). Specifically, the exceeding rates of Pb (86.14%) and Cu (83.17%) were the highest, followed by Zn, Ni, Cd, and Hg (> 50%), implying that Cu and Pb may be the main contaminants.

Table 1 Heavy metal content statistics of soil samples in the study area (mg/kg)a

Understanding the spatial distribution of HMs is helpful to uncover the source apportionment of HMs (Huang et al., 2021; Jin et al., 2019). The spatial distribution of investigated HMs contents exhibited regional differences (Fig. S1), which were likely affected by both natural and human causes. Samples with Cu contents exceeding the RSVs were mainly distributed in the east and southeast of the study area (Fig. S1a). Samples with high Cu contents (1.5–twofold of GBVs) were found in the south and southwest of the study area. In the same areas, there were samples with high Cr and Ni contents (exceeding the RSVs or twofold of GBVs) (Fig. S1d and S1e). Most samples also had high levels of Pb and Hg (exceeding twofold of GBVs) (Fig. S1b and S1h), and high Zn and Cd concentrations were observed in a few samples (Fig. S1c and S1f). Comparatively, irrespective of a few mining residues and farmland samples, As was much less abundant (Fig. S1d and S1g). In contrast, the content of HMs in the samples in the north mountain valleys of the study area had relatively low levels of HMs (Fig. S1).

On the whole, the distribution Cu, Pb, Zn, Ni, Cd, and Hg displayed a similar pattern, i.e., the piedmont sloping plain and valley in the south of the study area, in contrast with the lower values in mountain valleys in the northwest/northeast. Because the south area had intensive human activities, this result implied that their sources were likely associated with human activities. Relative to the topsoils, the maximum contents of HMs in the mining waste or tailing/slag waste samples (MS) were higher than the GBVs and even several times greater than the RSVs (Table 1). The wastes in the mine tailings pond and concentrators could pose a threat to the surrounding pristine topsoils, likely resulting in potential health risks to residents in the mining area.

Pollution assessment of HMs

The contamination of topsoil samples ranged from the uncontaminated level to the extremely contaminated level, as determined by the classification of Igeo (Fig. 2a, Table S1). Despite As and Ni, other HMs mostly exhibited levels of heavily contaminated or higher. In the cases of Zn, Cr, and Cd, the heavily contaminated samples appeared in the MS samples in the mining area. Generally, the contamination of HMs in topsoil displayed the order of Hg > Cu > Pb > Cd > Ni > Zn > Cr > As (Fig. S1). Distinct anthropogenic activity intensity such as industrial production, mining, and agricultural activities could have contributed to the soil contaminations (Chen et al., 2015, 2019; Marrugo-Negrete et al., 2017; Wang et al., 2019). Grassland (WG) samples appeared to be the least contaminated by HMs, i.e., the Igeo values were relatively lower than other land-use types. For the residential farmland (RF) samples, the Igeo values of Zn, Cd, Hg, Pb, and As were the highest, while the highest Igeo values for Cu, Ni, and Cr were observed in mine wastes (MS).

Fig. 2
figure 2

Box diagrams of a Igeo and b EF in different sample types

The EF of HMs ranged from the minimal enrichment level to extremely high enrichment (Table S1, Fig. 2b). Consistently, a majority of samples had significant enrichment of Cu, Zn, Cd, and Hg, whereas only a few samples had significant enrichment of Ni, Pb, As, and Cr. Analogous to the Igeo, Hg was considered to be the most enriching parameter, and moderate enrichment or above levels accounted for 25.74% of total samples (Fig. S3). In addition, the Pb and Cu samples at the similar level independently accounted for 14.85% and 9.9% of total samples, thereby they were also considered the prime contaminants in topsoils. Comparatively, other HMs were mostly at the level of minimal enrichment or even lower. Among different land-use types, despite the MS samples, EF values of Zn, Cd, Hg, Pb, and As in RF samples were higher than the other two land-use types, also in line with the Igeo result. In contrast, the EF values of Cu, Ni, and Cr were the highest in MF samples located in mining farmland areas (Fig. 2b), and the high EF values of Hg in both WG and RF samples were also observed. The above results illustrated that the influence of anthropogenic activities on the accumulation of HMs such as Hg, Cu, and Pb in the study area should be given prime attention.

Source investigation of HMs

Self-organizing map (SOM) for source identification

Totally 54 (6 × 9) neurons and 4 clusters were selected according to Heuristic rules and the Davies–Bouldin index (Fig. 3a). The samples in the same cluster likely had similar distributions of HMs; thus, these samples might have similar underlying sources. Each U-matrix map represented an index value obtained after dimension reduction, as marked by shades of blue to yellow (Lee et al., 2019; Zhu et al., 2020). The neurons with high values were shown in yellow, while the neurons with low values were indicated in blue (Fig. 3b). Afterward, informative and qualitative relations among the parameters including HMs and sample types were intuitively shown by comparing SOM graphs according to the color gradient.

Fig. 3
figure 3

a The SOM matrix map of topsoil samples: the clustering pattern in the SOM, different colors represent different clusters, and the number in a hexagon denotes the sample number. b The SOM visualization of corresponding variables. The blue and yellow colors correspond to low and high values, respectively, which can detect the correlation between variables

The color change gradients of Cu, Cr, Ni, and Mn were similar, indicating that these HMs have positive inter-correlations (Fig. 3b). Similarly, Hg, Cd, Pb, and Zn had nearly consistent color change gradients, hinting that the possible factors affecting their contents and distributions might be close. In contrast, As had a unique color change gradient, indicating that the factors affecting its distribution and source were different from those of other HMs. The D-matrix of SOM was then divided into three clusters, each of which likely represented different sources of HMs.

Cluster 1, characterized by the As (other metals did not fall within this area), was located on the left side of the D-matrix map in the SOM results. It was composed of approximately 75% topsoil samples in the study area. However, except for a few samples, the content of As in the topsoil samples was low (Fig. S1g) as also evidenced by the low Igeo and EF values (Fig. 2). These topsoil samples were widely distributed in the study area (Fig. S4), suggesting that the source of As was possibly affected by geogenic sources, e.g., the weathering of rocks or soil parent materials.

Cluster 2, characterized by Hg, Cd, Pb, and Zn, was located at the upper-right corner of the D-matrix. Totally 8 RF samples and 3 MF samples were scattered in the upper part of the alluvial plain in the southwestern mountain of the study area, and some of them were close to the MS samples (Fig. S4). Based on the results of Igeo and EF (Fig. 2), the distribution and sources of these HMs (cluster 2) would have been severely affected by anthropogenic activities, such as mining and agricultural activities.

Cluster 3, characterized by Cu, Cr, Ni, and Mn, was located in the lower-right part of the D-matrix, including 7 MS, 5 MF, and 2 RF samples. The samples were mainly located near the mining region in the southeastern mountainous of the study area (Fig. S4), suggesting that these HMs in topsoil were potentially affected by mining activities.

Pearson correlation and PMF model

Comprehensive analysis using Pearson linear correlation and the PMF 5.0 model was employed to investigate the potential sources of HMs (Fig. 4). Pearson correlation analysis was employed to initially determine the correlations in the pairwise comparisons of HMs in topsoil, revealing a potential common source or geochemical characteristics (Cheng et al., 2020; Huang et al., 2021; Zuo et al., 2022). Then, the factor contributions and the correlation coefficients of variables were correlated to verify the results of the PMF model (Huang et al., 2021; Mao et al., 2023; Zuo et al., 2022). Nine variables and 101 samples were selected to investigate the source of HMs and the amount of their contribution utilizing the PMF model. Considering the results of SOM, 3–6 factors were examined with 20 base runs in random seed mode to find the “optimal solution” (Cheng et al., 2020; Zhang et al., 2018). Four factors (Fig. 4a) were determined based on the minimum and stable objective function Q, which was applied to ensure residual matrix E (Cheng et al., 2020; Salim et al., 2019; Zanotti et al., 2019). The signal-to-noise ratios (S/N) of all HMs were larger than 4, defined as “strong,” and the most regression coefficients (R2) between the observations and predictions were larger than 0.6. It indicated that the PMF was applicable to locate the source, and the results were reliable (Jiang et al., 2020; Mao et al., 2023; Norris et al., 2014; Salim et al., 2019).

Fig. 4
figure 4

Source apportionment of HMs. a The contribution of each factor in the results of PMF model. b Factor profiles of HMs in different types of topsoil derived from PMF model. c The Pearson correlations between HMs and the relationships with different sources using PMF model. The correlation coefficient was represented by color gradient in pairwise comparisons of HMs. Factor contribution in PMF model was related to each HMs

Factor 1, accounting for 24.9% of the contribution rate (Fig. 4a), was mainly explained by As (75.91%, Fig. 6b), followed by Cu, Pb, Zn, Cr, and Mn (below 30%) (Fig. 4c). As illustrated by Igeo, EF, and SOM results, the source of As could be recognized as geogenic. As and Cr are widely distributed in the earth’s crust (Facchinelli et al., 2001; Šajn et al., 2011) and might be associated with natural sources (Jin et al., 2019; Zhang et al., 2018). Although the sources of Cu, Pb, Zn, and Cr were related to intense anthropogenic activities in many environments (Cai et al., 2019; Marrugo-Negrete et al., 2017), the metallogenic geological background conditions of the non-ferrous mineral resources (mainly lead–zinc ore, with a few copper ore) in the study area could contribute to their enrichment. Thus, Factor 1 could represent the effect of some natural sources (e.g., the weathering of rocks or soil parent materials). However, we cannot exclude the possibility that a few topsoil samples were influenced by both anthropogenic and geogenic inputs.

Factor 2, with 6.6% of the total contribution rate, was mainly associated with Hg (79.35%, Fig. 4b), followed by Pb (Fig. 4c). The previous studies have shown that Hg in the topsoils could harbor diverse sources, such as the mining and smelting of gold or mercury mines (Árvay et al., 2017; Csavina et al., 2012), copper ore smelting (Tomiyasu et al., 2017; Xiao et al., 2017), tailpipe or fossil fuel burning emissions (Jiang et al., 2006; Pacyna et al., 2010), and mercury pesticide or sludge fertilizer (Dong et al., 2017; Huang et al., 2021). The mine wastes (MS) had the lowest Igeo and EF values for Hg, indicating that mining activities were not the main source of Hg in topsoil. Because Hg was significantly correlated with Pb (Fig. 4c), the sources of Hg and Pb might be similar. As a major marker to identify traffic sources, Pb particles emitted from vehicle exhaust will contaminate the soil with dust (Duzgoren-Aydin et al., 2004; Fei et al., 2022; Kadi, 2009; Sun et al., 2019). Moreover, the spatial distribution showed that samples with high Hg content were mainly distributed in areas with the high population density and frequent anthropogenic activities (Fig. S1h). Therefore, Factor 2 might be interpreted as mixed sources caused by multi-anthropogenic factors.

Factor 3, accounting for a 22.6% contribution rate, was mainly characterized by Cd and Zn (~ 50%, Fig. 4b), followed by Pb, Mn, Cu, and Cr (Fig. 4c). Cd exists in phosphate fertilizer (Hu et al., 2018; Nan et al., 2002; Nicholson et al., 2003), normally referred to the hallmark of agricultural practice (Baltas et al., 2020; Sun et al., 2013). The Zn and Cu were found in almost all agricultural inputs, including fertilizers, pesticides, fungicides, and manures (Li et al., 2006; Marrugo-Negrete et al., 2017). As such, agricultural activities would be one of the main sources of Cd, Zn, and Cu (Hu et al., 2018; Lu et al., 2012; Nogueirol et al., 2010). As observed by the Igeo and EF results (Fig. 2), some farmland soils, e.g., RF samples, were replete with Cd and Zn. Meanwhile, the spatial distribution of Cd and Zn in topsoil in the study area also exerted similar distributions (Fig. S1c and S1f), and the contents of Cd and Zn were significantly correlated (Fig. 4c), thereby Factor 3 might be tightly related to agricultural activities.

Factor 4 with a contribution rate of 45.9% (Fig. 4a) was primarily characterized by Ni and Cr (~ 60%, Fig. 4b), followed by Cu and Mn (Fig. 4c). Generally, the earth's crust parent materials and pedogenesis were considered to be the main sources of Cr, Ni, and Mn (Liu et al., 2020; Micó et al., 2006; Wang et al., 2019). However, several studies have indicated that Cr and Ni in topsoils were derived from industrial activities, including perennial mining, ore smelting, coal consumption, and steel production (Fei et al., 2022; Li et al., 2014; Yang et al., 2018). Meanwhile, iron ore, chromite, gold, and other mining activities were also potential sources of Cr and Ni in soils (Luo et al., 2010; Xue et al., 2000). The Cu, which contributed 45.86% in Factor 4, may be related to the smelting and processing of metals (Liu et al., 2020). Moreover, Cr and Ni were significantly correlated (p < 0.05), and Cu was also correlated with Ni. Furthermore, Cr, Ni, Cu, and Mn were negatively correlated with As that was initially determined as a geogenic product (Fig. 4c). Because the MF and MS samples located in the mining areas had high Igeo and EF values of Cr, Ni, and Cu (Fig. 2; Fig. S1), this factor would be assigned to mining activities.

Potential ecological risks assessment

The above analyses suggested that the HMs contamination potentially posed threats to local human health. The Er values of Hg and Cd in the RF and MF were relatively higher than those of other HMs (Fig. 5a). Approximately 61.19% of Hg and 32.84% of Cd in the RF samples were at the moderate-risk level or higher, and 2.99% of Pb and 1.49% of Cu samples were at the same level. In addition, 41.18% of Hg and 23.53% of Cd in the MF samples were at a moderate-risk level or higher. Among them, the very-high-risk level (Er > 320) was found in both MF and RF samples. The Er values of Hg in the WG samples at the moderate-risk level or higher were up to 85.71%. To sum up, the potential ecological risk in the study area was mainly caused by Hg, followed by Cd.

Fig. 5
figure 5

Statistical analysis of Er and RI for HMs in topsoil samples

For the total ecological risks, 62.69% of RF, 52.94% of MF, 71.43% of WG, and 50% of MS samples presented a moderate-risk level or higher according to the classification of RI (Fig. 5b, Table S2). In particular, 7 samples in RF and 3 samples in MF were at quite a strong risk level, and even 2 RF samples and 1 MS sample showed extremely strong risk. Hence, the potential ecological risks of HMs decreased in the order of RF > MF > WG > MS. Due to the high ecotoxicity, HMs can still pose high ecological risks even at a low contamination level (Chen et al., 2019; Huang et al., 2021; Li et al., 2021). According to the Er values of HMs, potential ecological risks were mainly caused by Hg and Cd.

Probabilistic health risks assessment

Except for two Pb samples for children, the maximum non-carcinogenic hazard quotient (HQ) of HMs (Cu, Pb, Zn, Ni, Hg, and Mn) in the samples for both children and adults in each land-use type showed an acceptable risk level (< 1) (Fig. S5). Additionally, the non-carcinogenic hazard index (HI) values of only 3 topsoil samples for children were greater than the unacceptable risk level (> 1) (Fig. S5a). Moreover, all samples for adults posed negligible health risks (Fig. S5b). Nonetheless, the HI values for children were higher than that for adults, ranging from 0.364 to 0.488, close to the unacceptable risk level (Table S5), in line with the previous studies (Huang et al., 2021; Long et al., 2021). As such, the non-carcinogenic risk for children generated by Pb, particularly in the area with frequent anthropogenic activities should be paid close attention.

The potential carcinogenic hazards to both children and adults caused by Cd, As, and Cr had high levels of risk (CR > 10–6) (Fig. 6). The overall CR followed the order: Cr > As > Cd. Cr had the highest potential carcinogenic health risk level for both children and adults, especially in MF samples (Fig. 6). Cr (VI), classified as class A carcinogen (Park et al. 2004; WHO 2017), is a known source of carcinogens and toxicity and has been identified as a cause of gastric, liver, lung, oral, kidney, and urinary cancers (Tseng et al., 2018; Huang et al., 2021; Zhang et al., 2021). Meanwhile, the carcinogenic risk level caused by As was the second highest, while the level of Cd was the lowest (was tolerable or close to acceptable, CR < 10–6) irrespective of some RF samples. As can lead to chronic arsenic poisoning (WHO, 2017, 2020). Although the contaminations of Cr, As, and Cd were insignificant (Fig. 2), these HMs still pose significant carcinogenic hazards to both children and adults. The Pb and Ni were also classified as carcinogenic indicators by the IARC (2023) and WHO, but their carcinogenic risks were negligible (CR < 10–6).

Fig. 6
figure 6

Box diagram of carcinogenic risk value of HMs in topsoil samples: a and b represent the carcinogenic risk for children and adults of each land-use type

The health risks of HMs under different land-use types are shown in Table S5. The overall total carcinogenic risk level (TCR) of the land-use type followed the order: MS > MF > RF > WG, suggesting that the mining activities could result in health risks to the surrounding soils. The TCR of the HMs for children was greater than the negligible level (< 10–6), and 8 topsoil samples exceeded the unacceptable risk values. The TCR of the HMs for adults was lower than that for children. The results of source-oriented HRA revealed that natural sources, mining activities, and natural sources should collectively cause significant impacts on the production of human health risks (Fig. 7). In addition to the natural sources, more proportion of agricultural activities contributed to the non-carcinogenic risk, whereas mining activities were more relevant to the carcinogenic risk. Consequently, when analyzing the comprehensive health risk assessment, especially for children, the potential carcinogenic health risks caused by exposure to Cr, As, and Cd should not be ignored. Therefore, targeted risk management and remediation strategies and actions must be implemented to avoid long-lasting health impacts on local ecology and residents (Zhang et al., 2021; Zhang et al., 2022).

Fig. 7
figure 7

Conceptual scheme of PMF results (illustrating the different sources of contamination) and the corresponding health risks (non-carcinogenic risk and carcinogenic risk)

Conclusion

The spatial distribution of the investigated HMs contents in the topsoil exhibited regional differences. The topsoil samples in the study area were largely contaminated with Hg and Cu as a result of the influences of anthropogenic activities. Comprehensive analysis combining with SOM method, PMF model, and Pearson correlation analysis suggested that the natural, mixed, agricultural, and mining sources were the pivotal driving forces for the accumulation of HMs, whereas mining activities accounted for the highest proportion among the four sources due to the long-term mining history. Owing to the high ecotoxicity, HMs can pose high ecological risks even at the low contamination level. The potential ecological risk in the study area was mainly caused by Hg, followed by Cd. As compared to the limited non-carcinogenic risk caused by agriculture activities, the potential carcinogenic health risks caused by mining activities should be monitored.