1 Introduction

Soil pollution by metals is currently acknowledged as a substantial environmental concern. This is primarily because of their non-biodegradable nature and persistent presence in the environment (Gupta et al., 2021; Duffus, 2002; Pourret, 2018). Furthermore, they act as contaminants in agricultural soil, as elevated levels of metals can have detrimental effects on crop health and productivity (Pourret & Bollinger, 2018; Pourret & Hursthouse, 2019). Various anthropogenic activities (such as excessive fertilizer and pesticide use, improper manure and compost application etc.) and natural sources (soil erosion, lithogenesis, weathering, and geological processes) contribute to pollution by metals in agricultural soil (Bolan et al., 2013; Borgese et al., 2013; Cao et al., 2010). Hence, research on metals in agricultural soil is critically important primarily for pollution of the agricultural food chain and the accumulation of metals in soil can lead to groundwater pollution, impacting the quality of drinking water and, consequently, human health (Adimalla, 2018; Adimalla & Wang, 2018). In India, rice, a staple crop with high consumption rates, has a propensity to accumulate toxic metals like cadmium (Cd), chromium (Cr), nickel (Ni), copper (Cu) and lead (Pb) (Mandal et al., 2021; Zhao & Wang, 2020). The state of Bihar, a prominent agricultural region in India, has been a focal point for studying metal pollution (Kumari et al., 2019). Due to the use of poor irrigation method and industrial pollution, large area of calcareous soils is affected by salinity and metals. These soils are calcareous by nature since their pH is higher than 8.5 and they contain more than 35% carbonate (Brady & Weil, 1999). The presence of calcium carbonate efficiently immobilizes metals by offering adsorption or nucleation surfaces and by stabilizing pH levels within the range where metal hydrolysis and precipitation occur (Fiorito et al., 2022). Thus, pollution by metal due to natural or anthropogenic activities in the vicinity of agricultural land raises greater concern (Zhao & Wang, 2020).

A crucial aspect is the prediction of the solubility of toxic elements in soil and their transfer to plants, which is instrumental in assessing the risk of metal pollution in the soil. Simple approaches like the free ion activity model (FIAM) have been successfully utilized by several authors to estimate the hazard quotient (HQ) and evaluate the potential uptake of metals by plants (Kumari et al., 2021). The content of metals in soil pore water plays a significant role in determining the risk by influencing the bioavailability of metals in crops (Golui et al., 2020). Given the continuous transfer of metals from soil to crops and their eventual ingestion by humans through food consumption, health risk analysis becomes an essential tool for assessing the level of interference. To evaluate health risks related to metal intake through dietary exposure, severity-adjusted margin of exposure (SAMOE), commonly known as the 'Risk Thermometer,' and SAMOE-target cancer risk (SAMOE-TCR) was employed (Chowdhury et al., 2020).

The health hazards associated with metals, owing to their elevated biotoxicity and enduring presence in the environment, has been conducted via multiple exposure routes, encompassing ingestion, inhalation, and dermal contact (Nebab et al., 2021). Numerous indices, including the pollution load index (PLI), contamination factor (CF), ecological risk factor (ERF), and geo-accumulation index (Igeo), have been developed to evaluate soil pollution caused by metals (Proshad et al., 2022). To effectively mitigate health risks associated with metals, it is crucial to pinpoint and quantify potential soil pollution sources (USEPA, 2021), the positive matrix factorization (PMF) model can assess the contributions of individual sources and allocate them to each element (Tian et al., 2018). This integration of PMF, along with self-organising map’s (SOM) proficiency in element recognition and classification, bolsters the findings by attributing contributions from various sources. Multi-model approaches employed for identifying risk management against metals and assessing whether the risk surpasses predefined threshold values (Tong et al., 2019). It aids in the identification of priority pollution sources and the effective control and management of these sources, all aimed at safeguarding ecosystems and the well-being of living organisms.

To the best of our knowledge, no significant research has been identified regarding the effects of metals in calcareous soil on the agricultural ecosystem, as well as the analysis of human health risks in the surrounding area. Therefore, this present study predicts the source of specific the main findings of this study were: i) to find the pollution levels of the agricultural soil with metals and their spatial distribution pattern in the study area using SOM and PMF model ii) to predict health risks from various aspects of metals, including their sources, patterns, content, and both carcinogenic and noncarcinogenic effects utilizing multi-model approaches iii) to evaluate the metal transfer from soil to rice plant parts using FIAM approach and to conduct a dietary exposure analysis using SAMOE. This study offered vital insights into enhancing soil environment policies and assessing the prevalence of anthropogenic health risks, both carcinogenic and non-carcinogenic.

2 Materials and Methods

2.1 Study Area

Bihar is one of the major agricultural states in India. Muzaffarpur is a district situated in northern Bihar known for its agricultural activity and it is geographically (3132 sq. km) bounded by 26.11°N / 85.39°E. Total 92 number of soil samples (Fig. S1) were collected from two polluted (metal rich) blocks of Muzaffarpur, namely baruraj (zone 1) and kanti (zone 2). Beside the agricultural importance, the land is also polluted with metals (Cd, Cr, Ni, Cu, Pb).

2.2 Collection, Preparation, and Characterization of Soil and Plant Samples

To assess the presence of metals in the agricultural soil within the calcareous soil zone, a total of 92 soil and rice plant samples were collected from two different polluted sites following a zig-zag pattern. These soil and rice plant samples were gathered, dried, and properly labelled in sterilized plastic bags prior to analysis. The soil pH and electrical conductivity (EC) were measured in a soil–water suspension at ratios of 1:2.5 and 1:5, respectively, employing a digital pH meter and an EC meter (Page et al., 1982). Additionally, the organic carbon (OC) content was determined through the Walkley and Black method (Walkley & Black, 1934). Available nitrogen, phosphorous and potassium content were determined using standard procedure outlined by Page et al. (1982).

2.3 Determination of Different Forms of the Metals in Soil

The soil sample were placed into a platinum crucible, followed by the addition of adequate amounts of concentrated HNO3, H2SO4, HClO4, and HF. The mixture was then heated on a hot plate until white fumes were observed. Subsequently, HClO4 and distilled water were added to the residue, followed by further heating. Finally, 3 M H2SO4 and distilled water were added and heated until white fumes appeared. After cooling the solution was filtered through Whatman 42 filter paper and the volume was adjusted to 100 ml (Page et al., 1982). The total metals in resulting solution were quantified using atomic absorption spectroscopy (AAS) (Model no. 816, Systronics, India). Metal extraction from the soil adhered to the sequential procedure initially proposed by Tessier et al. (1979). The bioavailable form of metals was quantified using DTPA (diethylene triamine penta-acetic acid) solution, and the respective content were F via AAS (Lindsay & Norvell, 1978). Quality control was rigorously maintained by incorporating certified reference material SRM 2710 into the process, along with a blank extract to account for any potential pollution.

2.4 Contamination Factor (CF) and Pollution Load Index (PLI)

In this study, we introduce contamination factor (CF) and pollution load index (PLI) as methods for evaluating the level of metals pollution in the soil (Islam et al., 2017). The calculations for CF and PLI are as follows:

$$\text{CF}=\frac{{\text{C}}_{\text{i}}}{{\text{C}}_{\text{b}}}$$
(1)
$$\text{PLI}=\sqrt[\text{n}]{{\text{CF}}_1\times{\text{CF}}_2\times{\text{CF}}_{3\;}{\text{CF}}_\text{n}}$$
(2)

Here, \({\text{C}}_{\text{i}}\) is content of metals and \({\text{C}}_{\text{b}}\) is the background value for individual metal, and n represent the number of metals.

2.5 Geo-Accumulation Index (Igeo)

The geo-accumulation index was employed to determine amount of metal pollution in the soil system using the following formula:

$${I}_{geo}={log}_{2}\left({C}_{n}/1.5{B}_{n}\right)$$
(3)

In this context, Cn signifies the measured content of metal while background content of the same metal denoted by ‘Bn’. The correction factor for the background matrix is set at 1.5.

2.6 Ecological Risk Factor (ERF)

The ecological risk factor (ERF) was employed to assess variations in toxic elements using following formula outlined Yang et al. (2009):

$$Ecological\;risk\;factor\;\left(ERF\right)=\sum {T}_{i}\times {B}_{n}/{C}_{n}$$
(4)

where, Cn signifies the measured content of metal while background content of the same metal denoted by ‘Bn’, Ti represented response factor towards toxicity for each specific metal, with values of Ti for Cr, Ni, Cd, Pb, and Cu were 2, 2, 30, 5, and 5 mg kg−1 respectively (Ferreira et al., 2022).

2.7 Health Risk

2.7.1 Hazard Quotient and Hazard Index

The hazard quotient (HQ) and hazard index (HI) are ssessed by computing exposures through ingestion, inhalation, and dermal routes, employing established equations (Chabukdhara & Nema, 2013) as follows:

$$HQ={Intake}_{\left(inhalation\;or\;ingestion\;or\;dermal\right)}/Reference\;dose\;\left({R}_{f}D\right)$$
(5)
$$HI=\sum {HQ}_{exP}={HQ}_{inhalation}+{HQ}_{ingestion}+{HQ}_{dermal}$$
(6)

Intake was calculated using the formula depicted in supplementary table (Table S1) (De Miguel et al., 2007). The reference dose (RfD) is the safe limit (for Cr: 0.3, Ni: 0.02, Cd: 0.001, Pb: 0.0035, and Cu: 0.5) towards metals exposure for humans throughout their lifespan. The term "exPs" refers to the three different pathways considered in the assessment.

2.7.2 Cancer Risk Assessment

The assessment of cancer risk (CR) involves evaluating the likelihood of developing cancer following exposure to a carcinogen. On the other hand, the total carcinogenic risk (TCR) quantifies overall risk of developing cancer over the lifetime. Carcinogenic risk (CR) and total carcinogenic risk (TCR) can be determined using the following equations (Weissmannová & Pavlovský, 2017):

$$CR=\sum ADD\times SF$$
(7)
$$TCR=\sum CR$$
(8)

where, ADD is defined as the average daily dose (mg kg body weight −1 day −1) of a specific metal.

2.8 Monte Carlo Simulation Model and Sobol Sensitivity Analysis

Monte Carlo simulation (MCS) technique is used to determine sensitivity as well as uncertainty analysis using crystal ball software. To pinpoint critical input parameters and gauge their influence on the variability of exposure outcomes, we employed sobol sensitivity analysis (SSA), as introduced by Sobol (2001). Sobol sensitivity indices (SSIs) provide a measure of the relative contributions of individual variables to the overall variance in model results. The initial part of the SSI is called the first order sensitivity index (FOSI) and second order sensitivity index (SOSI), characterizes the influence of interactions between variables with variance. Lastly, the overall order sensitivity index (TOSI) is employed to determination of the combined influence of a variable on the final variance. This methodology facilitates the identification of input variables that wield significant, noteworthy, or negligible influence in the research context (Mukherjee & Singh, 2022).

2.9 Estimation of Total Metals from Rice Plant Parts

The determination of total metal content in the root, shoot, and grain of rice plants, samples were digested on a hot plate using a 4:1 (v/v) ratio of concentrated HNO3 and HClO4, until color changed (Li et al., 2018). The estimation of metal content in these digested samples was consistent with the procedure outlined in the preceding Sect. (2.3).

2.9.1 Risk Assessment through FIAM Model

To predict the uptake of metals by rice grains in cultivated agricultural soil, an integrated solubility model called the free ion activity model (FIAM) was implemented using the following equation by combining the transfer factor (TF) and free ion activity (Mn+) detailed by Golui et al., (2020):

$$p\left[{M}_{Plant}\right]=C+{\beta }_{1}{P}^{\left[{M}_{c}\right]}+{\beta }_{2}\left[pH\right]$$
(9)

Here, C is derived from k1/nF-logTF, β1 is calculated as 1/nF, and β2 is determined as k2/nF. k1 and k2 are empirical constants that are specific to the respective metals. These coefficients, C, β1, and β2, are specific to the metal(loid) and plant in question. Through non-linear error minimization, equation was parameterized using the "SOLVER" function in Microsoft Excel 2019.

To assess the potential health risks associated with the consumption of rice grains grown in polluted soil with metals, the FIAM-HQ (FIAM- hazard quotient) was implemented following the guidelines provided by the USEPA (2021) as outlined in IRIS (2020)

$$FIAM-HQ=ADD/{R}_{f}D$$
(10)

Here, ADD (average daily dose) denoted as mg kg body weight−1 day −1 (Banerjee et al., 2022). The reference dose (RfD) is different for different metals, each having specific RfD values for specific metals with reference to IRIS (2020) and WHO (1996) guidelines.

If the hazard quotient (HQ) exceeds 1, indicates consumption rice grain leads to potential health risks (Kumar et al., 2021).

2.9.2 Evaluating Cancer Risk using the Severity Adjustment Margin of Exposure (SAMOE)

In accordance with the guidance provided by the Swedish national food agency, an effective tool for determination of risk characterization by risk thermometer were developed, as detailed by Sengupta et al. (2021). These risk thermometers, based on the methodology outlined by Chowdhury et al. (2020), enable us to anticipate metals exposure in food, considering health-based tolerable daily intake (TDI) as their basis, while also factoring in the critical target cancer risk (TCR) calculated using the formula by Bhattacharyya et al. (2021) in dietary risk assessment.

2.9.3 Statistical and Geostatistical Analysis

Statistical analysis, including the creation of violin plots, correlation plots, and self-organizing maps (SOM), was conducted using R-Studio. The spatial distribution plots of metals and the factor scores obtained from the PMF analysis were generated using the inverse distance weighted (IDW) interpolation method in ArcGIS software. Additionally, this study employed seven commonly used machine learning (ML) algorithms: linear regression, decision tree regressor, random forest regressor, k-nearest neighbors regressor, support vector regressor, extreme gradient boost regressor, and multivariate adaptive regression splines. These algorithms, along with performance metrics, were implemented using a python distribution.

3 Results and Discussion

3.1 Characterization of Soil Properties

The physicochemical properties of agriculturally polluted soil (two zone: zone 1 and zone 2) is represented in Table 1. The pH in these zones was measured as 8.74 ± 0.32 (zone 1) and 8.20 ± 0.39 (zone 2), respectively. These higher pH levels led to a decrease in the mobility of metals (Acosta et al., 2011). The zones (zone 1 and zone 2) displayed statistically significant variations (p < 0.05) in electrical conductivity (EC) and organic carbon (OC). Zone 1 exhibited higher EC compared to zone 2. Organic carbon plays a crucial role in enhancing soil structure, promoting drainage capacity, facilitating aeration, and reducing nutrient leaching (Zeng et al., 2011). Moreover, when metals bind to organic carbon, they become less accessible to plants (Dey et al., 2021). Additionally, the lower levels of available macronutrients (nitrogen, phosphorous and potassium) were found in both zones making them nutrient deficient (Kizildag et al., 2013 and Wandruszka, 2006) (Table 1). The mean total content of various metals in the two zones (zone 1 and zone 2) were represented in Table 1 and Fig. S2. Due to the occurrence of toxic metals, elevated pH levels, and low organic carbon content in the agricultural soil, there is a significant risk of Cr and Ni pollution in the calcareous soil zone of India. This situation is a growing concern for the health of animals, plants, and humans alike (WHO, 1996) (Table S2).

Table 1 Mean values of soil samples physicochemical attributes, total and bioavailable form of metals

3.2 Spatial Distribution, Pattern and Source-Oriented Assessment of Metals

The distribution of the total content of metals across the two zones was mapped (Fig. S3) using the inverse distance weighting (IDW) interpolation method. Using the inverse distance weighting (IDW) method, we categorized the raster patterns of metals content for five metals (Cr, Cd, Ni, Pb, Cu). These categories reveal the spatial distribution of metals in relation to various factors, including anthropogenic (fertilizer, pesticide use, rapid industrialization etc.) and environmental factors (soil erosion, lithogenesis, weathering, and geological processes), which might influence the distribution of metals. For Cu, map suggest that north eastern region in zone 1 have highest Cu pollution whereas for northern and western part of in zone 2 is affected more. For Pb, southern part showed highest Pb content in zone 1 and northern and southern part of zone 2 showed highest Pb content. Similarly, Cr pollution is high in north western part of zone 1 whereas southern as well as eastern part of zone 2 showed highest Cr content. Spatial distribution map showed Cd content is high throughout all the areas in zone 1 and zone 2. On the other hand, northern and south western part of zone 1 have highest Ni content while central part of zone 2 showed highest Ni content. Therefore, our findings suggest that zone 1 is more by metals than zone 2.

Self-organizing map (SOM) can be used to interpret substantial information to determine zone wise distribution pattern of metals and classify the sources of pollution. In SOM, colour ranked plots has been constructed for each variable in each hexagon to analyse partial similarity between samples on the basis of spaces of each hexagon. Unified distance matrix (U-matrix) is the combination each neuron's weight vectors and its adjacent. In component planes, comparable gradients signify a positive correlation, while opposing gradients indicate a negative correlation. Samples displaying shorter hexagonal distances share more akin characteristics. Figure 1a illustrates the component planes of each SOM unit, Fig. 1b describes unit variables in zone wise manner. Pb, Ni, Cd content are high in upper left corner neurons, whereas Cu is high in lower left and Cr is moderately high in lower right corner neuron (Fig. 1a-e). Metal content in two different zones showed that neurons in lower left corner to middle corner to lower right corner possess high content for zone 1 whereas for zone 2 middle corner to upper right corner showed high metal content (Fig. 1b). Additionally, zone-wise U-matrix clustered was prepared by SOM algorithm (Fig. 1c).

Fig. 1
figure 1

(a) Distribution pattern of each metal in polluted soils through self-organizing map (SOM); (b) zone wise metal concentration distribution maps; (c) Clustering of U-matrix denotes two sampling zones

The PMF model was employed for identification of metal sources in this area. The PMF model extensively determines ‘Q’ value by using six factors and based on the least four factors were optimised for stabilisation (Chai et al., 2021). The strong correlation coefficient value among metals makes this model reliable with appropriate demonstration. Figure 2a-c showed metals contribution to PMF model along with factors profiling and correlation matrix. The spatial distribution plots of metal content using GIS, based on factor scores, are depicted in Fig. 3a–d, illustrating the distribution of sources. In our study, the PMF model indicated that factor 1 made a contribution more for Ni (59.6%) and Cd (53.5%); factor 2 mainly contributed on Cr (62.2%); factor 3 mainly accounted for Cu (65.2%) and factor 4 was accounted for Pb (46.7%) (Fig. 3a). Factor 1 contributed 32.93% variation which is dominated by Ni, Cd and Pb. Numerous earlier research works have indicated that Ni present in soils might be influenced by the underlying geological material from which the soil originates i.e., natural source (Li et al., 2018). Moreover, it has been widely held that metals associated with the composition of the parent soil material tend to be elements with minimal pollution impact (Fei et al., 2022). Consequently, it is apparent that factor 1 is primarily influenced by natural sources rather than human activities. Thus, it is reasonable to deduce that factor 1 signifies the inherent geological composition of the soil, which stems from a natural origin.

Fig. 2
figure 2

Source allocation of metals in polluted soils of the study location (a) the contribution percentage of each factor by PMF; (b) PMF model factor profiles of metals in polluted soils; (c) PMF model integrated with Pearson correlation analysis to detect the correlation between metals

Fig. 3
figure 3

An illustration of the spatial distribution of factors (a–d) derived from the PMF model

Factor 2 accounted for 25.1% of source contribution (Fig. 2a). The spatial distribution map indicated that the high-value areas of Cr were located in northern and southern part of sampling areas where agricultural land was polluted with metals (Fig. 3b). The high content Cr is mainly due to the vigorous application of fertilizers, herbicide, insecticides and pesticides (Chai et al., 2021; Men et al., 2019). Phosphate fertilizers are commonly used in the local agriculture and long-term use led to increased accumulation Cr. Thus, factor 2 can be attributed for agricultural sources.

Further, factor 3 contributed for 65.2% of Cu followed by Cd (23.7%) with 24.56% of total source contribution (Fig. 2a). In addition, the high-value areas of Cu and Cd were located in northern part of sampling area (Fig. 3c) where effect of intense industrial activity dominates. Typically, Cu and Cd tend to exhibit strong connections with human activities, particularly those of an anthropogenic nature. These activities encompass sectors such as petrochemicals, cement manufacturing, and various other industrial processes (Jin et al., 2019; Xiao et al., 2019). These industrial operations have the potential to discharge substantial quantities of waste residues, wastewater, and gases, which subsequently introduce Cu, Cd and Ni into the soil through direct or indirect means (Cai et al., 2019). Given the aforementioned analysis, factor 3 may be interpreted as indicative of industrial origins.

The fourth and final factor accounted for 17.46% of the overall influence, contributing to 46.7% of the lead (Pb) content, trailed by copper (Cu) at 16.2%, and chromium (Cr) at 13.3% (Fig. 2a). Numerous research studies have consistently indicated that lead (Pb) is often linked to transportation, with vehicle exhaust emissions being the primary pathway for Pb to infiltrate the soil (Adamiec et al., 2016; Men et al., 2018). When considering the spatial distribution plots of Pb, regions with heightened values tend to coincide with major highways such as national and provincial routes within the study area. Consequently, factor 4 can be reasonably attributed to sources related to traffic emission. This outcome demonstrates the capability of the PMF model to assess the specific origins of metals within cultivated soil in a quantitative manner, yielding results that are notably more logical. For zone 1 and zone 2, the prevailing concerns have revolved around the pivotal roles played by natural and industrial pollution as the primary sources of pollution by metals, exerting a substantial influence on potential health risks for humans.

3.3 Evaluation of Different Indices for the Determination of Soil Pollution

In response to elevated levels of metals in the agricultural soils, the study focuses its efforts on assessing a range of indices and the potential risks to human health. Notably, the pollution load index (PLI) serves as a valuable tool for evaluating the extent of metal pollution in the soil. Table S3 represents the pollution load index (PLI), with zone 1 showing higher PLI values (1.28) compared to zone 2 (1.16), indicating significantly higher pollution levels in both zones (above the permissible limit PLI > 1) (Tomlinson et al., 1980). The predominant soil pollution in the sampling zones is from Cr, followed by Ni, Pb, Cu, and Cd.

Furthermore, the contamination factor (CF) is utilized to quantify the total pollution level in each zone. Table S3 illustrates the CF values for Cr, Ni, Pb, Cu, and Cd in zone 1 and zone 2, with average CF values of 5.76, 4.58, 2.79, 0.97, 0.29, and 5.43, 4.40, 2.66, 0.67, 0.27, respectively. Therefore, Cr, Ni, and Pb content in both zone 1 and 2 were high indicates high degree of pollution. Statistical analysis indicates significant variations (p < 0.05) among the two zones due to the presence of different metals in the soil, with zone 1 showing higher content compared to zone 2. The decreasing order of CF is Cr > Ni > Pb > Cu > Cd.

Table S3 displays the ecological risk factor (ERF) for all metals in the soil, with values of metals ERF for Cr, Ni, Pb, Cu, and Cd being 28.82, 22.90, 13.97, 4.89, 1.47 for zone 1, and 27.18, 22.04, 13.33, 3.35, 1.35 for zone 2, respectively. The average ERF of Cr and Ni indicates low to moderate ecological risk. The ERF for zone 1 (72.06) is higher compared to zone 2 (67.27). However, the calculated ERF values suggest a moderate ecological risk from Cr, Ni, and Pb pollution in the soil, posing an increased threat to the surrounding ecosystem of the agricultural land.

The geo-accumulation index (Igeo) proves to be a valuable indicator for evaluating the potential accumulation pattern of metals in the ecosystem. In the study area, the Igeo values (Table S3) indicate varying levels of pollution, ranging from low to extreme (0 ≥ Igeo ≤ 5) (Alves et al., 2018), in both zones due to the presence of (Cr, Ni, Pb, Cu, and Cd). The average Igeo for Cr and Ni indicates moderate to high pollution, Pb shows low to moderate pollution, and Cu and Cd show low pollution in the soils. The findings are consistent with previous research by Wang et al. (2022).

3.4 Assessment of Health Risk

3.4.1 Non-Carcinogenic Risk

The hazard quotient (HQ) and hazard index (HI) are used to determine the potential for non-carcinogenic health hazards, and values for both zones are provided in Table 2. The Table 2 illustrates that both adults and children exhibit similar trends in HQ and HI. The combined HQ values for metal exposure pathways in both subpopulations within this study area exhibit a decrease in the following sequence: ingestion > inhalation > dermal contact. This suggests that ingestion is the primary pathway for metal exposure, which has the greatest impact on human health, followed by inhalation, while skin contact has the least impact. These findings are consistent with a previous study (Jiang et al., 2021) that reported similar results. All HQ and HI values for the metals examined in the two zone were found to be below the safe limit (set at 1) for both adults and children.

Table 2 Carcinogenic and non-carcinogenic risk parameters (hazard quotients and hazard indexes) for adults and children

The ranking of HI values for zone 1 and zone 2 was found to be Pb > Cu > Ni > Cr > Cd for both adults and children. However, the total HI values for both groups were significantly lower than the safe limit of 1. Therefore, we can dismiss the potential non-carcinogenic health risks for these groups. Children exhibited a higher health risk as compared to adults due to their habitual hand-to-mouth behavior, higher food and beverage consumption, and greater inhalation of air relative to their body weight. Additionally, children's bodies are still developing, and they have limited abilities to metabolize, detoxify, and eliminate toxins compared to adults. This finding has been supported by previous studies conducted by researchers (Jiang et al., 2021). However, it's important to note that the health risk associated with only five metals was assessed in this study, and there are many other metals known to be harmful to human health. Therefore, further research is recommended.

3.4.2 Carcinogenic Risk

In this study, we estimated cancer risks associated with Pb, Cr, Cd, and Ni in agricultural soil for two zones, taking into account ingestion, inhalation, and skin contact pathways for both children and adults (Table S4). The findings of this study suggest that there are no significant carcinogenic health hazards for adults and children through ingestion, skin contact, and inhalation exposure routes in both zones. Nevertheless, cadmium (in zone 1) and nickel (in zone 2) present a relatively higher potential carcinogenic health risk compared to other metals for both adults and children. The TCR values of adults and children for zone 1 and zone 2 were 6.53E-05, 6.2E-05, and 7.63E-05, 7.27E-05 respectively, indicating the overall cancer risk (Table S4). Children, due to their physiological characteristics such as higher respiration rates per unit body weight and behavior, are more susceptible to carcinogenic risks from metal exposure in soil compared to adults. This result is consistent with previous studies (Wang et al., 2021). Hence, it is imperative to enact essential measures to alleviate the cancer risk linked to metal exposure.

3.5 Predictable Health Risk Assessment through Content of Metals

Moreover, the monte carlo simulation (MCS) method was employed to enhance the accuracy of health risk assessment, avoiding potential errors stemming from overestimation or underestimation of deterministic parameters (Fig. 4). In agricultural soil areas, the carcinogenic and non-carcinogenic effects of metals were assessed through three different pathways (ingestion, inhalation, and dermal exposure) for both adults and children (Table S3, S4). Non-carcinogenic risks were found to be negligible for both adults and children, registering values below 1. In contrast, the average hazard index (HI) values for adults and children were determined as 4.70E-02 and 4.31E-01 for zone 1, and 4.41E-02 and 4.03E-01 for zone 2, respectively (Table 2). These HI values remained below the guideline limits set by USEPA (2021). However, the target cancer risk (TCR) values for both adults and children surpassed the acceptable threshold of 10−6, signifying a moderate to high potential for cancer risk. Significantly, soil ingestion was identified as the primary pathway for carcinogenic risk in both adults and children, with children displaying heightened vulnerability. The predominant metals responsible for carcinogenic risk (CR) in both age groups were Cd and Ni (Wang et al., 2022). This underscores the necessity for heightened attention to metal exposure, particularly among children, in the agricultural land areas of the calcareous zone in India.

Fig. 4
figure 4

Monte Carlo sensitivity analysis showed total carcinogenic risk (TCR) assessment in (a) adults and (b) children

3.6 Metal Content in Rice Plant Parts

The total content of metals in various parts of the rice plant, including roots, shoots, and grains, is depicted in Fig. S4 and Fig. S5 for zone 1 and zone 2 respectively. In contrast, zone 1 demonstrated a higher level of metal content in the various components of rice plants as compared to zone 2. Furthermore, the findings indicate that the content of metals in all plant components surpassed the acceptable threshold in both regions (WHO, 1996) (Table S2). A notable positive correlation was observed between the total metal content, water-soluble, exchangeable, and oxide-bound metal fractions, and the uptake of metals by rice plants (roots, shoots, and grains) as depicted in Fig. S6. A comparable finding was reported in a prior study conducted by Nawab et al. (2021) and Khan et al. (2018). The findings indicated that Zone 1 exhibits higher pollution compared to Zone 2, making agricultural land in Zone 1 more affected and less suitable for crop cultivation, particularly rice.

3.7 Assessing Risk through FIAM-HQ

The absorption of metals by rice grains in the calcareous soil zone of India was estimated using the solubility-free ion activity model (FIAM). This involved incorporating crop-specific model parameters (C, β1, and β2) and the predictive coefficient of FIAM, as illustrated in Fig. S7 and Table S5. The results demonstrated that the variability of metals in rice grains could be effectively explained using the FIAM model, taking into account soil pH, OC, and DTPA-extractable metals. This finding aligns with previous research by Meena et al. (2016), emphasizing the significance of soil pH and OC in governing metals solubility in polluted areas. In the FIAM model, the values of β1 and β2 were positive for Cr and Cd but negative for Ni (β2), Pb (β1, β2), and Cu (β1) (Table S5). These model parameters, indicated by their positive values, revealed that the metals transfer from the soil to rice grains is adversely affected by increasing pH or lowering OC levels. The research also assessed the potential human health risk linked to the absorption of metals in rice grains through the utilization of the hazard quotient (HQ) derived from the FIAM model. Given that rice is a staple food in the Indian diet, a safe threshold for FIAM-HQ in staple foods (rice) was set at 0.5. Table S5 showed that the average FIAM-HQ values for Cr, Ni, Pb, Cu, and Cd were all below this safe threshold (FIAM-HQ < 0.5) (Raj et al., 2022). Therefore, it is recommended for human consumption to include rice grown in the agricultural soils of the calcareous soil zone studied.

3.8 Assessment of Risk Thermometer and Dietary Exposure

The assessment of potential human health risks related to food consumption; a 'Risk Thermometer' has been applied. Figure 5 illustrates the 'Risk Thermometer' for metals in rice grains, alongside the SAMOE value, and concern level. The scale of 'Risk Thermometer' quantifies the toxicity of metals concerning the intake of rice grains, revealing that Cr poses an elevated health risk [CrSAMOE = 0.006] to humans (Class 5). Ni content in rice grains presents a moderate to high health risk (NiSAMOE = 0.03), while Cd and Pb pose low health risks (CdSAMOE = 0.82; PbSAMOE = 0.24) (Table S5). In contrast, Cu shows no health risk (CuSAMOE = 45.09) for humans. This study underscores that the consumption of rice grains may not be safe for humans due to the elevated levels of Cr and Ni in both the grain and soil system (Kumar & Maiti, 2015). Additionally, the evaluation of SAMOE-target cancer risk (SAMOE-TCR) associated with the ingestion of metal rich rice grains is presented in Table S5. For rice grains, SAMOE-TCR values were significantly higher for Cr (5.90E − 03) and Ni (7.58E − 03), exceeding the tolerable limit of 10−4 except Cd (1.16E-05) and Pb (1.15E-05) (Li et al., 2020). This study reveals that the cancer risk associated with rice grains from the calcareous soil zone falls within the moderate to high range, corroborating previous research findings (Kormoker et al., 2022).

Fig. 5
figure 5

Risk thermometer scale shows the risk of metals (Cr, Ni, Cd. Pb, and Cu) through consumption of rice grown on contaminated agricultural soil

3.9 Sensitivity Analysis in Rice Plant Parts Employing Sobol Model

The sobol sensitivity analysis (SSA) was employed to analyse impact of different metals on soil for reduction of risk (Fig. 6). The risk assessors can identify critical and influential parameters that contribute to the overall risk through sensitivity analysis. The TOSI values of each metal in the model considered as input variables. SSA analysis revealed that TOSI values of Cr, Cu, Ni and Cd (except Pb) were higher in roots and shoot rather than grains. On the other hand, TOSI of Pb in grain was high than root and shoot (Fig. 6e). It indicates that roots are more sensitive to accumulation of metals. Similarly, the pairwise sensitivity analysis of input variables for the assessment of FOSI with SOSI by the interaction of root with shoot, shoot with grain and root with grain were also investigated. The results suggest that these interaction studies are very important to observe the transportation of metals from root to shoot to grain and intern sensitivity. The input pairs (root-shoot, root-grain, shoot-grain) are very much sensitive and SOSI values indicated that significant interaction observed between root and shoot for all the above metals. These findings suggest that the root and shoot components are highly sensitive to metals, indicating a significant level of soil pollution in this area after the decomposition of root and shoot material. Thus, necessary action must be taken to reduce the pollution risk.

Fig. 6
figure 6

Sensitivity analysis using SOBOL based on the total metal(s) in the soil health considering the first-order effect, total effect, and pair-wise interactions

3.9.1 Performance of Model

Taylor diagram is a visual representation that assesses the performance of various regression models (seven in total) in predicting metal content using machine learning techniques (Fig. S8). This analysis determines the most suitable regression model by comparing its predictions to observed values, primarily relying on two metrics: the correlation coefficient (r) and the root mean square error (RMSE). The ideal model exhibits a high r value and the lowest RMSE. When considering each metal individually for the two zones, the highest correlation coefficients (r) were as follows: for Cr, Cd, Ni, Cu, and Pb, LR (0.53), DT (0.52), MARS (0.63), RF (0.60), and LR (0.69), respectively. Additionally, the average RMSE values for the best-fitted models of Cr, Cd, Ni, Cu, and Pb were 5.11 (LR), 0.86 (DT), 4.65 (MARS), 1.78 (RF), and 5.48 (LR). In contrast, XGB, SVR, and KNN demonstrated inferior performance compared to other models. Taylor diagram provides a comprehensive assessment of the best-fit model's reliability on and validation, emphasizing the significance of achieving a high r value and a lower RMSE, ideally aligning closely with the observed line. This suggests that metal content throughout the two zones exhibit fluctuations.

4 Conclusions

This study aimed to assess the sources, accumulation, and human health risks associated with metals in the soil of India's calcareous soil zone. The results revealed that content of metals, especially Cr and Ni, significantly exceeded WHO limits in both zone 1 and zone 2, in soil and rice plants (roots, shoots, and grains). The spatial distribution pattern using PMF model revealed natural sources (32.92%) and agricultural sources (65.2%) are the important contributing factor for pollution in this study area. Various indices (PI, CF, Igeo, ERF) consistently showed higher pollution in zone 1 compared to zone 2. Furthermore, the research determined that the presence of toxic metals, mostly through ingestion, posed substantial health hazards, especially for children. The combination of MCS and SSA assessed potential health risks, indicating acceptable non-carcinogenic risks but high carcinogenic risks. The use of SOM maps facilitated the classification of pollution levels, distinguishing highly polluted from less polluted areas. Similarly, the assessment using FIAM-HQ and SAMOE-TCR emphasized the potential human health risk, particularly cancer, due to elevated Cr and Ni levels in rice grains. Based on these findings, it is recommended that relevant authorities continue monitoring polluted areas, implement effective strategies, and raise awareness in the community about the risks associated with toxic metal pollution in the food chain.