Introduction

Coal is one of the most important energy sources in many parts of the world, particularly in developing countries like Bangladesh (Habib & Khan, 2021; Tozsin, 2014). Although Bangladesh owns a minimal reserve of gas and coal, its effect has always been a significant concern for the ecosystem via water, air, and soil pollution (Bhuiyan et al., 2010; Bilgen, 2014). With the rapid development of the coal industry and urbanization, as well as coal production and combustion activities, soil toxic metal(loid)s (TMLs, hereinafter) pollution has attracted global attention due to its toxicity, distribution, and long-term persistence (Adimalla, 2020; Liu et al., 2021; Parzentny & Róg, 2021). Coal mine-induced soils may degrade soil quality (Wang et al., 2020). They enter the environment via geogenic sources, including weathering and pedogenic processes (Duan et al., 2020). Anthropogenic sources such as coal combustion activities (Singh et al., 2018), coal-power-plants (Zhang et al., 2021); agrochemical practice (Huang et al., 2018), industrial emissions (Wang et al., 2021), coal exhaust (Yang et al., 2019) have released a significant fraction of TMLs into the soils. Besides, TMLs can cause a possible health risk to humans via external exposure, breathing, and the food web (Guan et al., 2018; Liu et al., 2019). TMLs may harm human health because some (i.e., As, Cd, Hg, and Cr) are carcinogenic and mutagenic (Qi et al., 2020). To lessen the health hazard of TMLs, it is crucial to identify and understand their abundance, distribution, origin, and health issues in the natural ecosystem (Yang et al., 2021). Therefore, noticeably assessing the source-specific health problems caused by TMLs is the preliminary prerequisite for developing targeted risk mitigation management and strategies (Sun et al., 2021).

To figure out where TMLs come from in soil, several different methods (like chemometric techniques and receptor models) have been used in recent years (Duan et al., 2020; Guan et al., 2018; Guo et al., 2021). USEPA-recommended positive matrix factorization (PMF) was used to measure metal(loid) pollution source apportionment (Khan et al., 2023; USEPA, 2018). PMF reliably identifies metal pollution sources (Islam et al., 2020; Kumar et al., 2021). The multivariate curve resolution weighted alternating least-squares model and chemical mass balance vary from PMF. Unlike previous models, it employs a weighted least-square fit using the well-known error measures of the metal(loid)s to generate composition and contribution profiles. It ensures that all profile values are non-negative (Zhang et al., 2020). This model can quickly determine TMLs and construct metal(loid) remediation solutions. The PMF model has classified metal(loid) sources in several environmental systems (Lv, 2019; Xiao et al., 2020; Yang et al., 2019, 2021).

Recently, a new chemometric technique, the absolute principal component score-multiple linear regression (APCS-MLR) model, has been introduced to quantify pollution sources (Chen et al., 2016). APCS-MLR, like the PMF model, may categorize samples and identify pollution sources by reducing a large dataset into indicators with fewer dimensions (Lv, 2019; Proshad et al., 2021). Several earlier studies found that the APCS-MLR model could produce similar results to the PMF method when identifying TML sources and suggested that it might be a better option for pollution source identification studies (Duan et al., 2020; Proshad et al., 2021; Zhang et al., 2020). A recent study indicated that PMF with positive values was more accurate for source identification than APCS-MLR (Guo et al., 2021), whereas APCS-MLR was better for source identification than PMF (Su et al., 2021). Thus, both techniques yield more consistent source distribution and identification with different ranking and contribution rates (Proshad et al., 2021). Early research focused on pollution source contributions to soil metal(loid) abundance. However, they overlooked source-related risks. TMLs have a wide range of spatial patterns, toxicity levels, and environmental hazards, therefore various studies have shown that high TML concentrations in soils may not necessarily harm to the ecosystem and living things (Guan et al., 2018; Huang et al., 2018). Source-oriented risk assessment improves targeted risk reduction since risk levels vary by source (Yang et al., 2019). Few studies have examined the source-oriented health risk of TMLs using source apportionment and health risk evaluation (Duan et al., 2020; Guo et al., 2021). The literature lacks a scientific understanding of the combined nexus between source apportionment and the soil TML risk assessment. The literature on TML source-specific health concerns from soil surrounding the chosen coal basin, northwest Bangladesh, is still scarce.

The research region has undergone rapid urbanization and industrial and agricultural activity. The coal industrial region of Barapukuria may threaten ecology, the environment, and human health since coal production and electricity generation harm soil, water, and agriculture. With significant crop output, the area's agricultural economy dominates the coal industry's fast growth (BBS, 2018). Several studies have found moderate to severe TML contamination caused by coal processing, preparation, and utilization activities and combustion residuals on soil, water, sediments, crops, and plants (e.g., Bhuiyan et al., 2010; Habib et al., 2019a; Halim et al., 2015; Hossen et al., 2021; Rahman et al., 2017). This research hypothesized that agrochemical application, air deposition, and industrial pollutants may harm population health. Source-oriented risk apportionment of soil TMLs might confirm this. The prime objectives are to (1) determine the concentrations of soil physicochemical and TMLs and their spatial variations using principal coordination analysis (PCoA) and a self-organizing map (SOM) in the coal mine region and (2) investigate the co-occurrence relationships of metal(loid)s using network inference. (3) apportion the possible sources of TMLs via comparing the PMF and APCS-MLR models along with ranking and contribution rates; (4) measure the source-specific health risks of soil TMLs by coupling their source apportionment into a probabilistic human health risk model; (5) address the probability distribution of carcinogenic risk factors using Monte Carlo simulation to lessen the uncertainties of health risk appraisal. Source apportionment and health risk assessment of TMLs in the soil surrounding the designated coal basin in northwest Bangladesh are yet to be scientifically understood. This study is the first to determine soil metal(loid)s contamination in the designated coal basin using both PMF and APCS-MLR models and the source-specific health risks of TMLs in a coal basin-derived agricultural soil from northwest Bangladesh. This research will help identify toxic element sources in coal industry-associated soils and how to regulate toxic element contamination. Finally, this study will help scientists, particularly geologists, planners, and politicians, manage coal mining soil in Bangladesh, reduce TML health concerns, and meet the UN SDGs.

Material and methods

Study area

Structurally the sampling area is situated within the southern slope of the Rangpur Saddle, which is a part of the stable platform of Bengal Basin (Habib et al., 2019a) (Fig. 1). The area is located in a humid subtropical region (mean annual precipitation: 1104–2985 mm) in a moderate to densely populated (823 people per km2) and an agriculture dominant farming area in the NW region of Bangladesh (Barapukuria, Dinajpur) (Fig. 1). Municipalities (Parbatipur and Phulbari) and Maddhapara Granite Mine are distributed within the sampling area. Physio-graphically, the area comprises dominantly Pleistocene Barind Clay Residuum (terrace deposits) and Holocene Alluvial flood plains of the Tista-Karotoya River system (major tributaries of the Brahmaputra-Jamuna River). It is level to gently sloping southward and comparatively well-drained, and elevation ranges from 27 to 31 m above mean sea level. The soils contain light-brown to orange, light-red, yellowish-grey to bluish-grey, various shades of yellow, red, orange, and white non-calcareous sticky silty clay; subordinated with a minor proportion of grey to yellowish-grey fine-grained sand having root tubes, burrows, vugs, and fractures and Fe-oxides nodules with a considerable fraction of grey soils primarily used as rice cultivation. Most samples belong to the silty loam or silt classes (Habib et al., 2019a). According to USDA taxonomy, it may correlate to Aerie Haplaquept. It is of Amnura, Belabo, Noadda, Jagdal soil series association. The predominant wind direction is generally from east to west (40%), followed by the west to east (25%) and north-east (18%) and the speed is relatively moderate. The nearby active rivers are namely Atrai, Banglai, Jabuneswari, Kala, Kharkhari, Tilai, Chirnai, and Little Jamuna-Tulsiganga (distributaries of the large river Tista), flowing over the area from north to south (Habib & Khan, 2021; Habib et al., 2019b).

Fig. 1
figure 1

Map showing the study area and soil sampling sites from coal mine and hard rock mine region, Bangladesh with the regional context

Sample collection and processing

In the selected sites, top soils (uncultivated and agricultural) (n = 149) were collected near an operational coal mine and power-plant surrounding (Fig. 1) by following the systematic random sampling technique (Ćujić et al., 2016) where regularly spaced intervals were ensured carefully. Each of the sampling site was separated from each other by ~ 2 km and the total sampling area covers approximately 1000 km2 considering the all-directional areas centering the coal-mine and coal-based power plant (Fig. 1). A 10 cm × 10 cm sampling auger (length: 30 cm) was used to collect the soil samples. From each site, three sub-samples were taken from the same area up to a depth of 10–20 cm and thoroughly blended to form a representative composite replicate sample of that particular spot (Rodriguez-Iruretagoiena et al., 2015). After sampling, the samples were immediately placed and sealed in polythene bags with appropriate labelling to avoid probable weathering and cross-contamination, and transported to the laboratory and stored in the refrigerator until further analysis (Özkul, 2016). All samples were then dried (except for samples for Hg analysis) at room temperature and well mixed, then milled by pre-cleaned mortar and pestle, ensuring no cross-contamination followed by sieving and homogenization.

Analytical processes and quality control

Physicochemical parameters, e.g., pH of an aqueous suspension of soil samples (1:2.5 = soil: distilled water, w/v), were determined by pH meter (Inesa, PHS-2F, China; Habib et al., 2019b) and total OM (OM) of samples were measured by mass loss upon ignition of dried samples in a furnace at 550 °C for at least six-hour. Additionally, the traditional pipette method was applied for particle size analysis. However, elemental analysis of Cr, Mn, Fe, Co, Zn and As were conducted using instrumental neutron activation analysis (INAA). In contrast, Pb, Cd, Ni, Cu and Se were measured by inductively coupled plasma-optical emission spectrometry (ICP-OES: Optima 800, Perkin Elmer Instruments, USA). Briefly, in INAA, ~ 60 mg of sample was irradiated by neutrons (of TRIGA Mark II research reactor, Bangladesh Atomic Energy Commission) followed by γ-ray counting (High purity Ge-detector) with systematic nuclear decay schemes (Khan et al., 2021). On the other hand, ~ HNO3 + HClO4 + HF + HCl based digested sample (Khan et al., 2015) solution was used for ICP-OES measurements (Rodriguez-Iruretagoienaet al., 2015). However, a Mercury analyzer (FIMS 200, Perkin Elmer, USA) was used to determine the Hg-contents in the samples. Triplicate measurements of reference material (IAEA-Soil-7) were used to check the data quality of different analytical techniques. Analytical data of Soil-7 (of this study) were in good agreement (2–15%) with those of certificate values, whereas reproducibility (RSDs in %) was within 10%. Detection limits (3σ) of Pb, Cd, Ni, Cr, Mn, Fe (%), Co, Zn, Cu, As, Se, and Hg were 1.2, 0.005, 0.08, 1.5, 10, 0.005%, 0.05, 1.0, 0.2, 0.01, 0.05, and 0.003 mg/kg, respectively.

Source contributions to soil TMLs

PMF model

In this study, the USEPA PMF5.0 (USEPA, 2014) was used to identify the source distribution of soil metal(loid)s. Under the non-negative limit, the original matrix value was distributed/fragmented into a contribution matrix and a factor profile (Frie et al., 2017). The performance of this model is based on precise factorization algorithms that calculate the source profile and input, and it is often used to categorize metal(loid) sources in soils (Proshad et al., 2021). It uses uncertainty analysis as a receptor model to assist balance the dataset, because it doesn't need a source profile (Guo et al., 2021). The computed procedure and the mathematical function of PMF 5.0 are outlined in the Supplementary Material Text S1.

APCS-MLR model

The source distribution of soil TMLs was computed using an APCS-MLR model based on absolute principal component scores (APCS) and a multiple linear regression model (MLR). The APCS–MLR model was merged into the APCS and MLR models, both of which were based on the PCA approach. The PCA was used to get components from the same parameters using data dimensionality reduction. The rotational factor load of the metal(loid) was estimated as a starting point for pinpointing the metal's source (Guo et al., 2020). The PCA data was clearly not used to estimate the contribution rate of metal(loid) sources; hence the APCS technique had to be used as a non-standardized APCS method (Zhang et al., 2021). MLR includes the number of metals (loids) in soils as a dependent parameter, while APCS links the anticipated and actual values to ensure the model's accuracy (Proshad et al., 2021). The APCS-MLR model was described in detail in the Supplementary Material Text S2.

After the appropriate model was chosen, the mass contribution of soil metal(loid)s in each sampling location from definite sources was determined as follows:

$${C}_{ij}^{k}= {C}_{ij}^{k*} \times {C}_{i}$$
(1)

where \({C}_{ij}^{k}\) is the mass contents of the jth TMLs from kth source in the ith sample (mg/kg); \({C}_{ij}^{k*}\) is the computed contribution of jth metal(loid)s from ijkth source in the ith sample (mg/kg), \({C}_{i}\) is the concentrations of the soil TMLs in the ith sample (mg/kg).

The PMF and APCS-MLR models were both run 20 times with different beginning seeds, and the solution with the lowest objective function value was chosen as the final solution (Zhang et al., 2020). The bootstrap (BS), displacement (DISP), and bootstrap-displacement tools pioneered by Paatero et al. (2014) were used to do the uncertainty analysis of the solution (BS-DISP).

Source-oriented health risk evaluation of soil TMLs

In this study, health risk evaluation and the source distribution were integrated to measure the source-oriented health risks to children (< 6 years old) and adults (> 18 years old) (USEPA, 2011). The average daily dose (ADD) of the jth TMLs from kth source in ith sample locations through ingestion (ADDing), inhalation (ADDinh), and dermal route (ADDder) was computed by using Eqs. 24 (USEPA, 2011).

$${\mathrm{ADD}}_{\mathrm{ijing}}^{k}=\frac{{C}_{ij}^{k}\times \mathrm{IngR}\times \mathrm{EF}\times \mathrm{ED}}{\mathrm{BW}\times \mathrm{AT}}\times \mathrm{CF}$$
(2)
$${\mathrm{ADD}}_{\mathrm{ijinh}}^{k}=\frac{{C}_{ij}^{k}\times \mathrm{InhR}\times \mathrm{EF}\times \mathrm{ED}}{\mathrm{PEF}\times \mathrm{BW}\times \mathrm{AT}}$$
(3)
$${\mathrm{ADD}}_{\mathrm{ijder}}^{k}=\frac{{C}_{ij}^{k}\times \mathrm{SA}\times \mathrm{ABS}\times \mathrm{EF}\times \mathrm{ED}}{\mathrm{BW}\times \mathrm{AT}}\times \mathrm{CF}$$
(4)

The hazard quotient (\({\mathrm{HQ}}_{ij,p}^{k}\)) embodies the pth exposure route from kth source of jth TMLs in ith sample. The total hazard indexes (THI) is computed as the sum of hazard index (HI) (USEPA, 2011) in the following Eqs. 56:

$$\mathrm{HI}=\sum {\mathrm{HQ}}_{\mathrm{ij},p}^{k}\frac{{\mathrm{ADD}}_{\mathrm{ijing}}^{k}}{{\mathrm{RfD}}_{\mathrm{ing}}}+ \frac{{\mathrm{ADD}}_{\mathrm{ijinh}}^{k}}{{\mathrm{RfD}}_{\mathrm{inh}}}+\frac{{\mathrm{ADD}}_{\mathrm{ijder}}^{k}}{{\mathrm{RfD}}_{\mathrm{der}}}$$
(5)
$$\mathrm{THI}=\sum \mathrm{HI}$$
(6)

where RfDing, RfDinh, and RfDder (mg/(kg day)) are the reference doses of resultant TMLs through ingestion, inhalation, and dermal route, respectively. If HQ, HI and THI are > 1, there is a possible detrimental impact on human health (USEPA, 2009).

The carcinogenic risk (CR) of the jth TMLs in the ith sample from kth source could be computed based on Eq. 7 (USEPA, 2011). Total lifetime carcinogenic risk (TCR) could be computed by the sum of CR according to Eq. 8 (USEPA, 2011).

$$\text{CR}=\sum {\mathrm{CR}}_{\mathrm{ij},p}^{k}={\mathrm{ADD}}_{\mathrm{ijing}}^{k}\times {\mathrm{SF}}_{\mathrm{ing}}+{\mathrm{ADD}}_{\mathrm{ijinh}}^{k}\times {\mathrm{SF}}_{\mathrm{inh}}+{\mathrm{ADD}}_{\mathrm{ijder}}^{k}\times {\mathrm{SF}}_{\mathrm{der}}$$
(7)
$$\mathrm{TCR}= \sum \mathrm{CR}$$
(8)

where SFing, SFinh, and SFder are the slope factors for CR, (mg/ (kg d)) for soil metal(loid)s. If CR and TCR surpassed 1 × 10–4, there is a noteworthy carcinogenic risk, while the CR and TCR are < 1 × 10–6, no substantial carcinogenic impacts exist. When CR and TCR vary within 1 × 10–4 and 1 × 10–6, indicating acceptable risk of cancer (USEPA, 2009). The detailed variables adopted in the human health risk evaluation are listed in Table S1. The standard reference dose and slope factors for each metal(loid) are given in Table S2.

Source-oriented health risk model

To report the source-specific risk assessment, the source distribution outcomes of both PMF and APCS-MLR models are equally integrated into the NCR and CR models. First of all, the contribution contents of soil TMLs for each source recognized by PMF and/or APCs-MLR can be computed in Eqs. 911.

$$\left(\genfrac{}{}{0pt}{}{{\mathrm{CF}}_{ij}^{k1}={f}_{ij}^{k1}\times {C}_{ij}}{\sum_{k1=1}^{p}{f}_{ij}^{k1}=1}\right)$$
(9)
$$\left(\genfrac{}{}{0pt}{}{{\mathrm{CF}}_{ij}^{k2}={f}_{ij}^{k2}\times {C}_{ij}}{\sum_{k2=1}^{p}{f}_{ij}^{k2}=1}\right)$$
(10)
$${\mathrm{CF}}_{ij}^{k} ={\mathrm{CF}}_{ij}^{k1}\otimes {\mathrm{CF}}_{ij}^{k2}$$
(11)

where CFk1 and CFk2 denote the content fraction of the jth TMLs in the ith sample attributed by the kth source originated from the PMF and APCS-MLR models, respectively; fk1 and fk2 embody the resultant contribution fraction of the kth pollution source to the jth TMLs in the ith sample; Cij donates the measurement content of the jth TMLs in the ith sample; p is the number of recognized pollution source; CFij is the coupled estimation of content fraction from the both models (k = 1,2,…p); ⊗ is the composition operator (i.e., arithmetical men in our research).

Subsequently, the obtained content fractions of soil TMLs for each of the pollution sources in each sample are integrated into the NCR and CR models as represented in earlier Eqs. (2)–(8). Particularly, Cij is substituted by CFij and the total carcinogenic risk (TCR) attributed by the kth identified source can be computed in the following Eq. 12:

$${\mathrm{TCR}}_{j}^{k}=\sum_{i=1}^{n}{\mathrm{CR}}_{ij}^{k}$$
(12)

Statistical analyses

The TML concentrations were correlated using Spearman's rank correlation. Using Gephi software (ver. 0.9.2), co-occurrence network analysis showed the TMLs relationship. For co-occurrence network analysis, only strong relationships (r > 0.2, p < 0.01) were evaluated (Islam et al., 2020). PCA was used to evaluate the best number of components for soil TML source distribution. All statistical tests were considered significant if the p value was 0.05 or below and the data were normalized using Origin Pro 2020 (Origin Lab, Northampton, MA). PCoA resulting from log(x + 1) transformation was used to indicate overall TML change across sample sites and PRIMER (ver. 7.0.17) was used for this analysis (Clarke & Gorley, 2015). Source distribution was done by using PMF 5.0 (USEPA, 2014) and APCS-MLR was done by using SPSS (ver. 26.0). Both models were applied to get more consistent source distribution and identification with various ranking and contribution rates (Khan et al., 2023). Human body variability creates uncertainty in risk assessments (Chen et al., 2016; Qi et al., 2020). Monte Carlo simulation was used to lower the uncertainty of human health risk assessments (Islam et al., 2020). Table S1 lists the probabilistic variables used in this study. Crustal Ball software carried out the Monte Carlo analysis (ver. 11.1.2.3) with 10,000 iterations. To visualize a multi-dimensional dataset into a lower-dimensional representation, a self-organizing map (SOM), a non-linear approach, was applied to analyze the soil TMLs, which offers a graphical representation of the relationship between parameters. The SOM was drawn in MATLAB software (ver. R2019b) by a neural clustering toolbox. The detailed process of the SOM model can be found in the supplementary text S3.

Results and discussion

Descriptive statistics of physicochemical properties and TMLs in the soil

Table 1 provides an overview of the basic soil properties including pH, EC, OM, distribution/composition/pattern, and Pb, Cd, Ni, Cr, Mn, Fe, Co, Zn, Cu, As, Se, and Hg of soils. The pH, minerals (typically aluminosilicate, carbonates, clays, arsenopyrite, mica, pyrites, chlorite, biotite, feldspars, Fe–Mn Oxides), redox-states, organometallic complexes, and organic matter (OM) play a significant role in controlling the leaching and abundances/levels of some concern/toxic/hazardous/endangered chemical species in the soil. TMLs are accessible to the crops when the soil pH is acidic (Wang et al., 2018). The pH value ranged from 3.9 to 8.7 with a mean of 5.8 (Table 1). Most of the samples in the present investigation were acidic; 85% of them had pH values below 7.0, 14% were between 7.0 and 8.0, and just 1% were over 8.0. Acidic soils have developed in the region as a consequence of the large and widespread usage of nitrogen-containing fertilizers (Islam et al., 2021), and vegetation's production of H+ ions in the root zone has also contributed to the acidity of the topsoil (Geetha et al., 2017). Moreover, the absorption of TMLs in the soil owing to the action of OM causes the cation exchange capacity of organics and subsequent generation of carbonic acid. (Islam et al., 2015).

Table 1 Descriptive summary of the TMLs in soils of coal mining region, Bangladesh (n = 149, mg/kg)

Due to the cation exchange capacity of organics and the ensuing generation of organic acid, OM has also been shown to change the absorption of metals (loids) in soils (Islam et al., 2015). The samples had modest OM concentration, averaging 2.7% and ranging from 0.50 to 12.4%. According to the current research, low to medium levels of OM in soils suggest that TMLs are likely linked to OM to form complexes, which may also result in decreased TML availability to the surrounding ecosystem's living organisms. The amount of organic carbon detected in the soils of the moderately farmed fields near the industrial area in Dhaka City, Bangladesh, Islam et al. (2014) reported to be found 0.62–3.6 mg/kg, somewhat higher than the level identified in the current study. According to Cai et al. (2015), OM in agricultural soil ranged from 0.48 to 7.33% with a mean value of 2.72%, which is comparable to our finding. When calculating the number of mobile/labile salts present in soils, EC is taken into account. Samples' EC values ranged from 0.01 to 1.40 dS/m, with an average SD of 0.20–0.23 dS/m (Table 1). The granulometric analysis reveals that the samples belong to the silty clay class, in accordance with the US soil classification. Sand, silt, and clay content averaged 10.25, 71.14, and 21.95%, respectively (Table 1). More so than coarser fractions (sand), the finer-grained soils (clay slit) seem to affect TML solubility and mobility, weathering-oxidation, and leaching. The ranking of sand, silt, and clay in terms of adsorption illustrates the great affinity of TMLs and other metals for the finest fraction.

As listed in Table 1 and Fig. 2, there was a distinct variability in the concentration of TMLs among the studied samples along with their descriptive statistics and the relevant literature data. The average contents of these potentially toxic metals were 39.7 ± 30.4, 0.80 ± 0.92, 47.1 ± 32.6, 71.7 ± 33.7, 533.6 ± 285.0, 2.5 ± 1.1, 12.7 ± 8.0, 92.5 ± 76.8, 42.1 ± 25.1, 5.7 ± 5.0, 1.9 ± 0.97, and 0.54 ± 1.2 mg/kg for Pb, Cd, Ni, Cr, Mn, Fe, Co, Zn, Cu, As, Se and Hg, respectively. The highest measured TMLs (115% and 213%, respectively) for Cd and Hg indicate that those metals exhibited the most variability in the samples, which anthropogenic influences may have contributed to (Zhang et al., 2018). TMLs with values over one that are severely skewed are Mn, Cu, and Se exceptions. The majority of the samples were clustered at low values, which resulted in extremely acute kurtosis (Islam et al., 2019).

Fig. 2
figure 2

Comparison of selected potentially toxic elements concentrations (µg/g) found in coal industry-associated soils in this study and other contaminated soils worldwide. Safe-eco-toxicological limit, Kabata-Pendias (2000); World median, Bowen (1979); Ledo-Tinsukia (India), Reza et al. (2015); Huainan (China), Tang et al. (2018); Oltu (Turkey), Tozsin (2014); Ptolemais (Greece), Modis et al. (2013); Douro (Portugal), Ribeiro et al. (2010); Yongcheng (China), Shi et al. (2022); Tula-Moscow (Russia), Komnitsas and Modis (2006); Palapye (Botswana), Zhai et al. (2009); Jharia-Dhanbad (India), Masto et al. (2017); Smolnica (Poland), Pietrzykowski et al. (2014); Coalfields (n = 13, global), Xiao et al. (2020)

The concentrations of potentially toxic elements in coal industry-associated soils in the current study are compared with the trace elements in soil in Bangladesh and other studies worldwide (Fig. 2 and Table S3). Concentrations of Cd (8.9 times), Hg (10.8 times), and Pb (2.3 times) in soils of the current study are considerably higher than the typical UCC values proposed by Rudnick and Gao (2014), and the levels of other elements in soils of the present study are nearly close to the UCC values. The two most toxic elements, Cd (2.3 times) and Hg (9.0 times), are considerably higher than the typical world median value of trace elements in soil (Bowen, 1979), indicating coal industry-associated soils are polluted by trace elements, especially Cd and Hg (Fig. 2 and Table S3). Our obtained values are higher for Mn (3.8 times), Ni (2.2 times), Cd (4.0 times) than (Douro, Portugal); for Zn (2.6 times) and Cd (26.7 times) (Oltu, Turkey); for Ni (4.7 times) and Cr (4.1 times) (Ptolemais, Greece); for Cd (5.0 times) and Hg (6.8 times) (Yongcheng, China); Cu (2.3 times), respectively in several-order of magnitudes (Fig. 2 and Table S3). Ribeiro et al. (2010) observed elevated levels of As, Cd, Cu, Pb, Ni, and Zn in the Serrinha coal waste pile site soils, Douro Coalfield, Portugal. They also confirmed that the acid mine drainage and associated leaching of heavy metals are responsible for elevated levels of toxic elements in the soil. Tozsin (2014) observed Cr, Ni, and Hg concentrations in soil samples collected from Turkey's Oltu coal mine district were above the limit values. Tozsin (2014) also concluded that only Cr contamination of the Oltu coal mine soil was related to the leachate derived from coal samples. Modis et al. (2013) analyzed trace elements in soils from the lignite opencast mining and industrial area of Greece, and they concluded that the generation of acidic leachates from the mining waste is responsible for the solubilization and mobilization of inorganic contaminants. Tang et al., 2018 determined contamination levels of several important toxic heavy metals in coal gangue-reclaimed soils in China. They observed high levels of Cu, Pb and Zn in soil from coal gangue. The elements concentrations of the current study were higher than the above mention studies in the world, indicating the severe contamination of trace elements in coal mine soil in Bangladesh. Compared to non-contaminated soils, the observed amounts of Mn are two times higher (Kabata-Pendias, 2000). Most metal(loid) concentrations did not surpass the European Communities soil quality regulations (ECD, 1986).

The concentrations of the studied elements in soil samples were higher than the background values of trace elements in the soil of Bangladesh (Kashem & Singh, 1999). The study by Kashem and Singh (1999) concluded that soils from abandoned sites with industrial waste are highly contaminated by toxic elements, where As, Cd, and Pb are the most serious. The current study's findings also indicate the contamination of coal industry-associated soils by toxic trace elements in soils of the study area. A recent study shows that the soils around the Barapukuria mining zone are extensively polluted with Fe, Mn, Zn, Pb, As, Cr, Ni, Cd, and Cu (Bhuiyan et al., 2010; Hossain et al. 2015; Halim et al., 2015; Siddique et al., 2020; Hossen et al., 2021). Pollution was worse around mines and power plants. Mining, combustion, leaching, weathering (alteration-dissolution), coal water mixing/interactions emission, and deposition may release a large amount of toxic mobile heavy metal(loid)s from the coal matrix (Cao et al., 2021). Compared to soil-enriched elements, Cd, Fe, Co, As, Se and Hg averaged around their local background values. The samples had TML concentrations similar to Bangladeshi and global soil levels (Table S3).

Co-occurrence relationships of TMLs in soils

Co-occurrence correlations and PCoA were performed through network inference utilizing non-parametric Spearman's rank coefficient (Barberán et al., 2012) to disseminate the total variability in physicochemical variables TMLs in samples (Fig. 3a, b). Thus, co-network correlations were used to assess measured/determined metal(loid)s in soils. Network analysis uses topological properties like the cluster coefficient and modularity index to discuss and quickly interpret complex relationships, associations, occurrences, and common occurrences between complicated objects, variables, and attributes. When the Spearman's coefficient for TML co-occurrence correlations was more than 0.2, it showed significant linkages. The average clustering coefficients were 0.62–0.95, indicating strong TML relationships (Islam et al., 2020). Raw data indicated relationships in several network characteristics. They also enabled TML co-occurrence and clustering in Bangladeshi research soils.

Fig. 3
figure 3

a Co-occurrence network of toxic metal(loid)s (p < 0.01; r > 0.2), the node size based on concentration and line strength based on correlation value; b Principal coordinate analysis (PCoA). Each dot represents each sampling point and vectors overlay of variables shown if r > 0.3 to PCoA axis;

Figure 3a displays the samples' strong co-occurrence correlations of found TMLs (r > 0.2) Spearman's coefficients. Islam et al.'s (2020) study suggests using Cu, Zn, and As to identify lithogenic, atmospheric (coal flying dust, fly-ash), and external agrochemical factors. Since no positive correlation between reference metals and TMLs in soils has been found, Liu et al. (2015) conclude that this region was mostly influenced by external contamination. Cr and Pb are strongly linked by lithogenic/mineralogic and exogenous factors, according to co-occurrence network research. Nickel and Co were very positive (Fig. 3a). TML co-occurrence correlations were abnormally high, especially for industrial contaminants as Cr, Pb, Co, and Ni. Tamim et al. (2016) attribute the elevated element concentrations in the study location to human activities with geo-environmental and pedogenic processes. The negative linkages between other components (Fig. 4a) show a strong relationship between most TMLs. Liu et al. (2015) found a similar result.

Fig. 4
figure 4

a SOM analysis of toxic metal(loid)s in the soils; b Cluster estimated by DBI index c Cluster of soil sampling sites from SOM analysis

The PCoA explained 27.9% of soil sample TML variation (Fig. 3b). Principal component 1 (PCoA1) consisted of Cr, Cu, Mn, Cd, As, and Fe at similar quantities. According to the data's rotational component matrix, Pb, Zn, Co, Hg, Ni, and Se were strongly associated to PCoA2. Singh and Kumar (2017) reported that the PCoA1 is primarily Cr, Cu, Mn, Cd, As, and Fe, industrial and agricultural components. Human activities like mining, smelting, burning, and others may alter this 16.2% of the variance input rate. Ca-, Na-, and Pb-arsenate, which are commonly employed in agrochemicals to boost pesticide effectiveness for local crop production, are given special attention. According to Chen et al. (2016), frequent and excessive pesticide usage is the major cause of As and Cu soil pollution.

The soil matrix absorbed Cd, As, and Fe (Lv et al., 2015). Pb dominated the PCoA2, whereas Zn, Co, Hg, Ni, and Se accounted for 11.72% of the variance addition rate and may represent extrinsic factors. For this investigation, soil chemical species averages were increased. Hg, Zn, and Pb had a high coefficient of variation, showing that extrinsic inputs are the major sources of those TMLs in soils, even if the average levels of all examined TMLs did not vary. Our research also uses PMF and APCS-MLR models to investigate soil TML interactions and determine source apportionments.

Spatial pattern recognition of TMLs in soils using SOM model

SOM component planes are shown in Fig. 4. SOM planes were created using colored hexagons mimicking benzene to identify, demonstrate, show, and convey the significance of particular variables for each SOM class. The hexagon's inside must be smaller the more similar the samples are. The plane's colors signify positive and negative TML connections. Figure 4a shows twelve-TML component planes. Five-color patterns resembled metal(loid) groups. In the first pattern, neurons represented Se, Cu, and Pb in ascending sequence from top to bottom left. Metalloids found a source. Zn and Co analogized the second pattern of neuron growth from bottom left to top right. Combining fundamental element regulations governs both metals. Geogenic attributions ordered neurons from left to right in the third pattern, and Mn and Fe matched this spatial arrangement. The fourth pattern showed that the largest and lowest neuron percentages are in the top right corner and bottom left corner, respectively. Fitting Cd, Hg, and Similar to these geographical trends, pesticides used in agro-farming, air deposition, electroplating, painting, and other operations regulate coal basin soil quality. SOM neurons with the greatest and lowest percentages are in the fifth pattern's bottom left and top right corners. Cr and Ni were matched to this geographical pattern, which may be due to a lot of human-induced, untreated waste and sewage in the studied areas. In the following part, SOM analysis findings show that PMF and APCS-MLR TMLs are comparable.

After the training phase, the DBI values for SOM analysis using 1 to 6 clusters were obtained and are shown in Fig. 4b to help choose the best number of clusters. After choosing the ideal cluster number, the SOM map offered two distinct clusters of sample locations. The sample collection locations were grouped using the SOM analysis (Fig. 4c). Only 109 samples were added to cluster I, while 40 samples were added to cluster II. Significantly colored zones served as a marker for the cluster borders. A semi-qualitative link between the measured TMLs may be seen in color variations within the same cluster. The level of contamination separates clusters from sample sites with high to low levels of pollution (Kumar et al., 2021). Consequently, the SOM map is a useful tool for assessing the TMLs properties of soil samples.

Comparisons of APCS-MLR with PMF models for apportioning soil TMLs

APCS-MLR and PMF models were used to quantify the sources of soil TMLs, and the results are shown in Table 2 and Fig. 5. PCA was used to confirm the model's complexity before the PMF model was used. The usual TMLs from a certain source were regarded to be soil TMLs with a strong positive loading in the factors. The APCS-MLR model yielded five components, as shown in Table 2 and Fig. 5a. Cu, Se, Fe, Hg, and Pb were predominantly blamed for the first factor (F1); their average contributions were 67.77%, 29.10%, 23.74%, 17.19%, and 16.20%, respectively. The second factor (F2) had greater quantities of Zn, Co, and As (51.77%, 51.62%, and 49.42%). The 70.14% and 37.31% Mn and Fe concentrations were explained by the third factor (F3). The fourth factor (F4) made preliminary contributions of 47.33%, 33.70%, 33.15%, and 35.29%, respectively, to Cd, Pb, Ni, and Hg. The fifth factor (F5) explained the concentrations of Cr and Pb by 65.44% and 30.92%, respectively, but only partially explained the other TMLs.

Table 2 Contribution of each factor originated from APCS-MLR and PMF models
Fig. 5
figure 5

Factor profiles of toxic metal(loid)s in the soils originated from the APCS-MLR model a and PMF model b, and average contributions for the five pollution sources detected c and (d). (Industrial emission: IE, Agrochemical practice: AP, Geogenic source: GS, Coal power plant: CPP, Atmospheric deposition: AD)

The PMF approach also identified five factors based on the least and most stable Q values (Table 2; Fig. 5b). TMLs were confined to a strong category since the signal-to-noise ratios (S/N) were greater than 4, proving the model's predictions were logical. In addition, the fitting determination of coefficients (R2) of the PMF technique for each metal(loid) were greater than 0.61, showing an acceptable fit for the PMF method (Fig. S1). It is important to note that the residual value of the majority of the samples that were analyzed ranged from 3 to 3 (Duan et al., 2020). Cu, Cr, and Fe had the highest attribution in the first factor (F1), as shown in Table 2, with average values ranging from 48.77% to 79.23%. The contents of Co and Zn made up 74.09% and 91.56% of the second factor (F2). With an average increase of 86.10%, Cd had the highest input, which was mostly due to the third factor (F3). The largest average contributions in the fourth factor (F4) were from Mn and Hg, with 74.46% and 75.55%, respectively. The fifth factor (F5) clarified less of the previous TMLs while explaining 73.53% of the Se content.

Five variables were found for the APCS-MLR and PMF models, which both had success identifying the source categories. The factors 2 and 4 retrieved by the PMF model, which have been identified as the agrochemical practice and coal-related sources, are comparable to the factors 2 and 4, as can be shown in Fig. 5. Overall, the APCS-MLR model and the PMF model have produced results that are quite comparable. The APCS-MLR model also recognizes the potential contributors of atmospheric deposition (AD), agrochemical practice (AP), geogenic source (GS), coal power plant (CPP), and industrial emission (IE), with average contributions of 16.3%, 28.2%, 19%, 19.3%, and 17.2%, respectively (Fig. 5c). The findings suggest that the two chosen models for apportioning the sources of TMLs in the studied samples are consistent. Agrochemical practices have been identified as a key factor influencing the distribution of metals (loids) in soils, with contributions from PMF and APCS-MLR totaling 22.2% and 28.2%, respectively. According to many studies (Bhuiyan et al., 2010; Halim et al., 2015; Hossen et al., 2021; Rahman et al., 2017), Bangladesh has been and continues to be incredibly dependent on the agricultural sector.

Table 2 provides information on the R2 of detected soil TMLs. All TMLs had high R2 values ranging from 0.61 to 0.99 in the receptor models, indicating that both the PMF and APCS-MLR models showed satisfactory results. However, the R2 values of most TMLs in the PMF model were higher than those in the APCS-MLR model, indicating that the PMF model is more ideal for source distribution of TMLs in this research. Also, negative contributions that affect the accuracy of the results were observed in the APCS-MLR model. According to several prior research (Guo et al., 2021; Zhang et al., 2021) the performance of R2 of the PMF model in recognizing a variety of sources was superior than APCS-MLR. The APCS-MLR and PMF investigations both used a number of mathematical restrictions to identify the likely sources of pollution. Compared to the APCS-MLR model, the PMF model identified more sources of pollution with low percentage error. According to Deng et al. (2018), the PMF model's source's ideal numbers had the lowest Q value. Therefore, the ideal number of sources was five possible components. The apportionment and contribution rate of the prospective source varied according to different theoretical tenets. The results were thus inconsistent. In addition, the APCS-MLR failed to provide accurate previous estimates of the error-scale for the uncertain measurements. The lower data constraint was in this model.

In contrast, the PMF model may provide reliable results since it gives weight to uncertainty estimations for the dataset (USEPA, 2014; Luo et al., 2019). Notably, uncertainty analysis was given to evaluate the strength and coherence of the results of the PMF model. Thus, compared to the APCS-MLR model, the PMF model may examine source analysis results of soil TMLs in complex agricultural districts affected by contaminated coal more precisely.

Source apportionment of soil TMLs

The factor I (F1) was responsible for 25.1% of the contribution rate; Factor 1 was predominantly contributed by Cu (79.23%), Cr (49.68%), Fe (48.77%), and Pb (35.45%) (Fig. S1). Correlation coefficient analysis indicated that there was a significant association between Cu vs Cr (r = 0.523, p < 0.05), Cr vs Pb (r = 0.491, p < 0.05). Besides, Yang et al. (2019) and Duan et al. (2020) also noted that Cu, Cr, and Pb might be related to atmospheric deposition. Earlier literature has revealed that Pb and Cr are encompassed in brakes, roadside tires, and mine waste (Huang et al., 2015). These toxic metals can be liberated into the ambient atmosphere and then landed into the topsoils. In addition, the power-plant emission is a potential contributor of Pb in atmospheric deposition. Lin et al. (2018) also stated that Pb content might get into soils via atmospheric deposition. Thus, Cu, Cr, Fe, and Pb in F1 are considered as atmospheric deposition (coal flying dust, fly-ash).

Factor 2 (F2) accounted for 22.2% of the input rate, was mostly positively loaded on Co (74.09%), Zn (91.56%), and As (27.56%) (Fig. S1). A significant positive association was identified between Zn and As (r = 0.490, p < 0.01) and Co and Zn (r = 0.61, p < 0.05). Previous research showed that these elements' concentration is relevant to applying agrochemical products (Cai et al., 2019; Islam et al., 2020). A recent study by Lin et al. (2018) stated that Zn and As are intimately linked to applying agrochemical products and improving pesticide efficiency in crop fields for high yield. Similarly, Xiao et al. (2020), Jiang et al. (2020), and Yang et al. (2019) have noticed an increased concentration of As, Co, and Zn in soils, which may be attributed to agrochemical inputs. Some studies pointed out that agrochemical fertilizer also triggered the high accumulation of Zn and As in the soil (Duan et al., 2020; Xiao et al., 2020). Therefore, Factor 2 was considered as originated from an agrochemical practice source.

Factor 3 (F3) made up 13.4% of the total contribution and was dominated mainly by Cd (86.10%) and Cr (16.66%). Earlier researches noted that Cd and Cr would source from industrial activities (e.g., battery production, metal manufacturing, mineral exploration and utilization, effluents, electroplating, etc.) (Islam et al., 2015). The mean concentrations of Cd exceeded the background value of soil in Bangladesh (Table S3). The hot spot of Cd was linked with industrial contributions. In addition, Cd was well-known as fingerprints connected to industrial and municipal contamination where unplanned and undesired activities might be a considerable contribution of Cd in soils (Ogundele et al., 2020). Previous investigations have stated that Cd in the examined samples is likely derived from emissions, including smelting and metal-and alloy processing (Jiang et al., 2020). According to the field survey, metal manufacturing plants, combustion-based and other residuals/byproducts disposal sites, and mineral, coal processing and preparation plants (coal flying dust, fly-ash) at different scales are available in the area. Therefore, F3 was identified as industrial emission.

Factor 4 (F4) accounted for 75.55% and 74.46% of the total content of Hg and Mn (Table 2), which is liable for 24.3% of the total input. The results of descriptive statistics (Tables 1 and S3) showed that the level of Mn and Hg in studied samples was higher than in soils around coal industries worldwide. In addition, the elevated level of Mn and Hg areas were situated/occurred in the Barapukuria coal mining region, Bangladesh, which is further supported by a significant relationship between Mn and Hg (r = 0.531, p < 0.01). Previous researches have reported that coal-power-plant contains enhanced contents of Hg (Cai et al., 2019), are volatile and may have the possibility to migrate to the surrounding soils (Men et al., 2020). hence/thus, the use of coal-power-plant coal-power plants can/might potentially affect the surrounding soil health/environment/systems for the input of Mn and Hg. Mercury is generally associated with industrial activities (e.g., coal-based brickfield and power production, municipal wastes, and medical- waste (Xiao et al., 2021). Hence, F4 was regarded as the coal-power-plant input (typically carbon nano-tube, nano-minerals, ultrafine particulates in coal flying dust and coal combustion fly-ash.

Factor 5 (F5) was described by Se and As in the PMF method, with increased loadings of 73.53% and 21.48%, respectively, responsible for 15.1% of the total contribution. There was a considerable relationship between Se and As (r = 0.598, p < 0.01). Selenium is mainly discharged to water bodies via mine effluents, coal-based power plant discharges, surface-runoff, weathering-erosion, and wastewater. Se and As are discharged into soil media primarily by leaching, mobilization, weathering, and dissolution of the parent bed-rock/minerals matrix, though Se and As levels also increased in soils receiving from the ambient atmosphere. Some studies have demonstrated that Se and As in soils might be affected due to geologic- and pedogenic processes such as erosion, oxidation, and weathering (Bhuiyan et al., 2010; Halim et al., 2015). Generally, TMLs associated with the compassion of the soil parental rock materials often had a low degree of pollution (Wu et al., 2020). Furthermore, Se and As contents in F5 are considered the representative TMLs associated with soil parent granite rock. Thus, F5 was recognized as a geogenic source (rock-hosted minerals).

Overall, atmospheric deposition, agrochemical practice, geogenic source, coal-power–plant-, and industrial emissions are distributed as the probable contamination contributors of TMLs in the analyzed samples in the selected region (Fig. 5d). Comparatively, atmospheric deposition (25.1%) of soil particles (i.e., coal waste, mine tailings, and coal ash residuals) has contributed most, followed by agrochemical practice (22.2%) for the soil TMLs. The contributions of atmospheric deposition and coal-power-plants were almost similar, with 24.3% and 25.1%.

Source-oriented health risk evaluation of soil TMLs

The source-specific health hazards linked to distinct exposure pathways to TMLs in the soils were analyzed using a probabilistic technique based on the apportionment pollution outputs of PMF and APCS-MLR models. As outlined in Eqs. (2)–(4), the concentration percentage of soil TMLs contaminations obtained from the two designated models for the five sources (agrochemical practice, atmospheric deposition, industrial emission, coal-power-plant, and geogenic source) are integrated into the non-carcinogenic (NCR) and carcinogenic risks (CR) models (8). The uncertainty of health risk assessment was measured using Monte Carlo simulation. In addition, the risk values for the estimated concentration, i.e., total factors, were calculated in this study. Using probability density functions, Fig. 6 depicted the computation/calculation of the danger presented by TMLs in soils based on the total variables and the quantified five contamination sources. The mean NCR and CR values offered by the soil TMLs are slower than their respective guideline values for the total components (estimated concentrations), as illustrated in Fig. 6a, b, suggesting a negligible risk. The findings are in good agreement with those presented by earlier cited works that the health risk of soil TMLs was insignificant (Guo et al., 2021; Jiang et al., 2021).

Fig. 6
figure 6

Probability density function (PDF) for non-carcinogenic risk (NCR) of a children and b adults and carcinogenic risk (CR) of c children and d adults

For NCR, the total THI outcomes/results of TMLs were 1.66E−01 for adults and 1.28E−01 for children, suggesting a negligible NCR for both age groups in the study region (Table 3). For Pb, Cr, Fe, and Cu, atmospheric deposition was primarily pollutant sources of HI, which was dominated with 5.30E−03, 2.28E−02, 1.46E−02 and 6.31E−04 for adults, and 4.20E−03, 1.72E−02, 1.17E−06, and 5.03E−04 for children, respectively. For As, Zn, Co, and Ni, agrochemical practice was the major contributor for HI, and it was responsible for 8.44E−03, 3.09E−04, 3.66E−02 and 1.24E−03 for adults and 6.72E−03, 2.45E−04, 2.92E−02 and 9.67E−04 for children, respectively. For Cd, the elevated HI has come from industrial emission sources (6.87E−04 for adults and 4.96E−04 for children). For Mn and Hg, the HI of coal-power-plant source was the leading, accounting for 4.42E−03 and 4.09E−03 for adults and 2.73E−03 and 3.27E−03 for children, respectively. For Se, the geogenic source (2.37EE− 04 for adults and 1.89E−04 for children) was the key source with the elevated HI (Table S5). The highest source of contribution to the THI was agrochemical practice (adults: 5.22E−02 and children: 4.14E−02), followed by atmospheric deposition (adults: 4.33E−02 and children: 3.34E−02), coal-power-plant (adults: 3.05E−02 and children: 2.32E−02), geogenic source (adults: 2.44E−02 and children: 1.85E−02) and industrial emission (adults: 1.51E−02 and children: 1.15E−02), respectively. The THI of the agrochemical practice source was the largest, which was linked with the agricultural fertilizer and pesticides. These can bring a significant amount of toxic metals like As, Zn, and Cu that pose a potential threat to the local inhabitants. The total THI outcomes for adults were notably lower than those from other studies in China (2.81) (Guo et al., 2021) and India (13.8) (Singh et al., 2018). For CR, the total TCR value for children (5.45E−05) and adults (5.37E−05) was within the standard limit of 1E−06 and 1E−04, indicating the permissible risk level.

Table 3 Human health risk estimation posed by the soil TMLs from total factors and the five potential pollution sources detected

The contribution of five carcinogenic TMLs (As, Cr, Cd, Ni, and Pb) to CR was computed as outlined in Table 3. For Cr, the atmospheric deposition was the key source apportionment in the elevated CR, 2.51E−05 for children and 3.30E−05 for adults. Similar to Cr, for Pb, atmospheric deposition was an elevated CR (1.29E−08 for children and 1.76E−08 for adults). For As and Ni, the agrochemical practice source was also had an elevated CR ((3.02E−06 for children and 3.79E−06 for adults; 3.82E−07 for children, and 4.80E−07 for adults). Cd was from industrial emission, of which CR was accounted for 1.64E−07 for children and 2.15E−07 for adults. Hence, the atmospheric deposition should be insulated to lessen the CR. Based on Monte Carlo simulation results of CR, Fig. S4 shows that Cr content was the most contributed metal due to the highest variance and high-rank correlation for industrial emission, coal-power-plant, and atmospheric deposition sources, while As concentration was the most attributed metalloid due to the largest variance and high-rank correlation for agrochemical practice and geogenic sources (Fig. S2). The TCR of apportionment pollution sources decreased in the following order: atmospheric deposition (adults: 3.56E−05 and children: 2.72E−05), coal-power-plant (adults: 1.29E−05 and children: 9.88E−06), industrial emission (adults: 1.02E−05 and children: 7.82E−06), agrochemical practice (adults: 6.22E−06 and children: 4.89E−06) and geogenic source (adults: 6.14E−06 and children: 4.78E−06), respectively (Table S6).

In general, our research's NCR and CR values for adults and children were considerably smaller than those from other researchers obtained values (Sun et al., 2022; Yang et al., 2019). Spatially, concomitant with the distribution pattern of concentration levels, the NCR of soil TMLs are commonly higher in the western and southwestern parts and much lower in the eastern and north-central parts of the study region (Fig. 7). Similarly, CR of soil TMLs is usually higher in the central and western parts and much lower in the eastern part of the investigated region (Fig. 8). These TMLs, As, Cr, and Cd pose elevated risks owing to their enhanced concentrations (Table 3). Analogous findings were also reported previously in soils in different parts of many countries (Duan et al., 2020; Islam et al., 2020; Wang et al., 2018). Reasonably, ingestion is the prime exposure route, contributing to most NCR and CR risks compared to the other two (inhalation and dermal) routes (Table S7). This observation is consistent with earlier cited works that the health risk posed by ingestion was many folds of magnitude higher than those via inhalation and external pathways (Chen et al., 2016; Guo et al., 2020; Zhang et al., 2018). Overall, adults demonstrate marginally higher NCR and CR triggered by these five sources than children, implying adults are more susceptible to the health risk of soil TMLs (Zhang et al., 2021) (Fig. 7; Table 3). The main reason is that adults could be associated with their higher body weight, hand-to-mouth behavior pattern, less skin surface area, shorter duration of exposure pathway, and higher respiration rate (Huang et al., 2018; Jiang et al., 2021).

Fig. 7
figure 7

Cumulative density function (CDF) with normal distribution fitting for a non-carcinogenic risk and b carcinogenic risk of children and adults for most contributed metal(loid)s according to Monte Carlo Simulation

Fig. 8
figure 8

Spatial distributions of the health risks of soil toxic metal(loid)s from the total factor (measured concentrations) and the five pollution sources detected for non-carcinogenic risk of a children and b adults and carcinogenic risk of c children and d adults

Furthermore, similar trends of NCR and CR from five sources were observed for both age groups (Fig. S3). Source-specific health risk reveals that agrochemical practices (AP) were considered the key contributor to NCR for 32.37% of adults and 31.54% for children, which is slightly higher than the additions to contents of TMLs in soils (22.2% for PMF and 28.2% for APCS-MLR). On the other hand, industrial emission contributed the least (13.4% for PMF and 17.2% for APCS-MLR) to the deposition of TMLs in the soil surfaces; for that reason, it only attributed a comparatively small percentage to the NCR (adults: 9.1%; children: 8.96%). This might be elucidated by agricultural inputs containing Zn, As, and Cu with the elevated toxicity, which led to cause increased/enhanced NCR compared to other TMLs (Bhuiyan et al., 2010; Islam et al., 2021; Yang et al., 2019). Besides, the elevated SF and low RfD of As might interpret its high NCR compared with other TMLs (USEPA, 2011). Previous studies confirmed that bio-organic fertilizers could be used instead of agrochemical fertilizers which can efficiently lessen the deposition of TMLs in soils (Jiang et al., 2020). It is assumed that decreasing the using agrochemical fertilizers and pesticides can efficiently reduce the health risks of inhabitants. However, to restrict the human health risks of soil TMLs contamination in the study region for safeguarding public health perspective, specific attention need be paid to the agrochemical practice, particularly for fertilizers and pesticides used in an agricultural crop field in the rural areas. Achieving sustainable development goals, this study recommends the application of low-toxic bio-organic fertilizers in place of agrochemical fertilizers.

Conversely, atmospheric deposition was regarded as the primary contributor to CR for 50.11% for adults and 49.83% for children, which is two times higher than the contributions to abundances of TMLs in the soils (25.1% for PMF and 16.3% for APCS-MLR). Although geogenic sources contributed moderately (15.1% for PMF and 19% for APCS-MLR) to the deposition of toxic metal(loid)s in the soil layers, it only ascribed a reasonably small amount to the CR (adults: 8.63%; children: 8.76%). The high SF might explain this and low RfD of Cr and its high CR compared with other TMLs (USEPA, 2011). Many earlier studies have stated the high human health risks caused by As, Pb, and Cr in soils (Guo et al., 2021; Habib et al., 2019a). It is worth mentioning that there is a big difference between the source distribution and source-oriented carcinogenic risk assessment, which is possibly caused by the diverse soil toxicities of elements. Hence to acquire inclusive information to counteract and control the probable carcinogenic risk of TMLs, it is crucial to focus the source-oriented human health risk appraisal as/since the risk linkage with diverse sources will differ as to the contents and geochemical species may vary from source to source and distance to distance (Jiang et al., 2020; Sun et al., 2021; Zhang et al., 2021). For instance, recent research carried out by Guo et al. (2021) revealed that the inputs from the distribution of contamination sources to the total health risk could be diverse from their attributions to the entire mass contents. Thus, this study highlighted that community/civic health might not be well-protected if the remedial measures only released the TMLs in the soils. To better safeguard human health risk perspective, agrochemical practices and atmospheric deposition need to be prioritized for counteracting and risk control as the best alternative option.

Uncertainty assessment

Uncertainty in source distributions and risk assessment may exist in this investigation. To highlight the uncertainty of PMF method outputs, both base model bootstrap (BS) and base model displacement (DISP) models were used in the PMF method. The factors generated using this model were submitted to the base run for BS evaluation based on the relationships among the factor contributions (USEPA, 2014). The fact that BS results for five factors exceeded 90% in our study suggests that the base run's factor profiles are consistent. In addition, Fig S5 shows the diversity of the source inputs to the soil TMLs derived/obtained using the BS model. The BS model was tested in our study with a total of 100 trials and 80 Bootstrap random seeds, suggesting that the PMF model's results are quite reliable. The inter-quartile limit was exceeded in the base run values of Fe and As in factor 1, Cr, Fe, and Co in factor 4, and Cr, Fe, Se, and Zn in factor 5, showing that multiple hazardous soil metal(loid) findings/outcomes could impact PMF outcomes. The DISP method was shown to be an effective screening method for the PMF model's vigor and dependability. There were no factor swaps for the lowest Q (percent Q) for five factors in the DISP evaluation, indicating that PMF results were credible. In addition, Fig S6 shows the heterogeneity in the source apportionment contribution to soil TMLs derived from the DISP model. Except for Mn in factor 2, the base run values of soil metal(loid)s were all within the inter-quartile range, indicating no significant rotational uncertainty. Overall, the results of the BS and DISP models indicated that the PMF findings are reliable and acceptable (Fig. S7).

Additionally, uncertainty existed in the human health risk appraisal. For instance, only twelve TMLs were regarded in our research, and their total factors and potential health risks were computed according to the sum of the possible risk of each metal(loid)s (Augustsson et al., 2018). Besides, the bioavailability of TMLs was not accounted, which might have reasoned an overestimation/ underestimation in the probabilistic health risk model (Islam et al., 2020; Mehta et al., 2020). These drawbacks might have influenced the precision of the outcomes of our research to a specific extent. Thus, the bioavailability and the relationships among the metal(loid)s should be accounted for PMF method in future studies. Generally, this research highlighted that the most significant source distribution of pollution should not be chaotic with the leading source of human health risks in probabilistic health risk appraisal. However, this is the first inclusive and integrated/comprehensive effort in the country to help control the pollution source of soil, target TMLs and aid policy-makers articulate policies to lessen coal field management costs and mitigate potential further soil pollution and reduce the associated health risk. The control factor linked with precise human-induced inputs, including pH, traffic emission, gross domestic product, population density, industrial growth, and land use/land cover, deserves further investigation into the source apportionment of TMLs aid local authority make precise decisions.

Conclusions

This research aims to report for the first time the source characterization and source-oriented health risk evaluation of TMLs in topsoil by combining advanced techniques and models and scientifically analyzing the TML content in the agricultural soils of northwest Bangladesh caused by the coal basin. The mean concentrations of the analyzed metal(loid)s were in the descending order of Mn > Zn > Cr > Ni > Cu > Pb > Co > As > Fe > Se > Cd > Hg. The source apportionment findings revealed that five sources, including agrochemical practices, industrial emissions, coal-power-plant, geogenic sources, and atmospheric deposition, were recognized and quantified using PMF and APCS-MLR models, aided by co-occurrence networks, PCoA, and SOM analysis. Agrochemical practices were the critical pollution source for NCR, whereas atmospheric deposition was CR's largest pollution source for children and adults. Uncertainty assessment demonstrated that the PMF outcomes were more valid and suitable than APCS-MLR model. In future studies, the bioavailability and exchange of toxic soil metal(loid)s should be coupled with the PMF method to get the precise outcomes of source-oriented risk. The study outcomes give a novel insight into preventing and controlling soil TMLs in the coal mine basin. Also, in its first application in Bangladesh, the proposed integrated approach here will provide a helpful tool to revise the information in the risk assessment and enable consequent mitigation schemes. Hence, importance should be given to controlling and managing agrochemical practices and atmospheric deposition to safeguard the study basin's aquatic ecosystem and human health.