Introduction

India is the second-largest producer of coal in the world and possesses substantial coal reserves. It produced 893.13 million metric tons during the year 2022–2023, reinforcing its significant role in the world's coal production. Coal plays a pivotal role in India's energy landscape, accounting for 55% of the country's energy needs, with the power sector receiving 89% of the extracted coal, followed by steel, cement, sponge iron, and other industries (Ministry of Coal, 2023). Despite the advantages of coal production, raw coal, with contaminants such as various metals and sulphur, presents significant environmental challenges. (Gopinathan et al., 2022a, 2022b, 2022c; Kumar et al., 2023; Nath et al., 2023). The oxidation of Sulfate-bearing rocks undergo oxidation upon interaction with atmospheric oxygen and water and culminate in the generation of acidic liquid waste, known as Acid Mine Drainage (AMD). This is primarily containing sulfurous/sulfuric acid and elevated concentrations of metals, rendering it hazardous in nature (Gopinathan et al., 2022a, 2022b, 2022c). The generation and release of AMD can have profound repercussions on habitation patterns, soil quality, water quality, and alterations in topography (Avagyan, 2017; Gopinathan et al., 2022a, 2022b, 2022c; Varol & Tokatlı, 2022). Soil, with its high metal retention capacity, serves as a major repository for toxic substances. However, the leaching of these metals into the environment is a critical factor in contamination. Elevated levels of toxic elements (TEs) in the soil can harm soil ecosystems, reduction in microbial diversity, damage plants, and hinder agricultural productivity (Saikia et al., 2014; Wang et al., 2013; Zhai et al., 2018). Additionally, the excessive accumulation and non-biodegradable properties of TEs such as Cr, Ni, Pb, Co, Cu, Cd, Zn, Fe, Mn, and As may lead to their localization in human tissues through the circulatory system. This presents significant health risks, including skin irritation, respiratory issues such as black lung, potential carcinogenic effects, as well as neurological damage, gastrointestinal problems, and developmental issues (Habib et al., 2023; Jiwan & Ajay, 2011; Liu et al., 2013; Palma et al., 2015; Theophanides & Anastassopoulou, 2002; Xiao et al., 2020).

Several researchers are evaluating the extent of soil contamination in proximity to coal mining regions. The study conducted by Siddiqui et al., 2020) on the Jharia coalfield revealed that TEs such as Cd, Co, Cr, Cu, Ni, Pb, and Zn fell within the moderate risk category based on the potential ecological risk index (PERI). In contrast, Raj et al. (2017) demonstrated higher ecological risks in the same area, primarily attributed to elevated levels of Hg, Cd, and As. Manna and Maiti (2018) exploration of Raniganj coalfield demonstrated that TEs like Sr, Zr, Ca, Cu, Mn, Zn, and Ni exert a relatively low toxic impact on both the soil and biotic environments. The application of pollution indices, including Contamination Factor, Enrichment Factor, Geo-accumulation Index, and PERI, revealed Raniganj coalfield at moderate risk levels. Singh et al. (2023) examined Cr pollution levels ranging from moderate to severe at the Jagannath opencast coal mine. Similarly, the appraisal of coal mine soil through pollution indices is undertaken, as evidenced by the work of Khan et al. (2017), Maiti and Rana (2017), Masto et al. (2017), and Rana and Maiti (2018). Apart from pollution assessment, a study is essential to explore and comprehend the future accumulation of TEs provide valuable insights for early warnings of soil quality, and establish an environment protection plan for soil (Xia et al., 2024). Researchers have identified a mass balance strategy for predicting future accumulations, demonstrating its reasonable performance in forecasting metal accumulation (Mi et al., 2023; Shi et al., 2019). Nevertheless, research conducted in the field to obtain authentic data (input/output) involves high financial costs for monitoring and consumes a significant amount of time (Salmanighabeshi et al., 2015). Empirical equations are very useful to estimate the future accumulation trends of TEs in soil but the accuracy of prediction is based on assumptions while formating the Equation and assessment (Xia et al., 2024).

To date, many studies reported on the concentration, spatial distribution, and source appropriation of targeted TEs in soil environments in proximity to coal mining areas, coal chemical plants, and thermal power plants (Xiao et al., 2020). Despite efforts to determine the contamination status of different mines using pollution indices (previous paragraph), a significant research gap remains in assessing the health risks associated with Indian coal mines. Such assessments are crucial for understanding the potential impacts of these mines on human health, soil quality, and aquatic ecosystems in their surrounding areas. Furthermore, predicting soil environmental quality is essential as it aids in estimating contamination levels and potential health risks posed to humans. This information is crucial for implementing necessary stringent laws and environmental protection plans before any actual danger occurs to the local community. However, most of these studies focused on either a single site or a limited number of locations. This is the first overview assessment of coal mining sites in India, covering all pollution indices, including a health risk assessment of 10 TEs in the soil environment, along with insights into future accumulation trends. To achieve this objective, the present study focussed on several scopes: (i) finding existing literature on metals concentration in various coal mine soil; (ii) assessment of soil pollution using pollution indices and various factors; (iii) evaluating the health risks of TEs on humans through different exposure pathways; (iv) TEs Source identification using multivariate statistical analysis; and (v) prediction of soil environment quality near mining regions.

Data collection and processing

The data of TEs from contaminated coal mine soil in India is searched and analysed by two researchers. The literature search was performed in different databases (Scopus, Web of Science, PubMed) from January 2008 up to December 2023. The literature search, conducted using keywords such as soil pollution, metals, coal mining, and India, yielded relevant findings from databases following the guidelines of Systematic Review and Meta-Analysis (PRISMA), as presented in Fig. 1. The titles, abstracts, and full texts of the publications were independently examined by two researchers, with any disagreements addressed through final discussion and mutual agreement. The criteria were used to include the main articles in the present study: (1) Concentrations of TEs in field research related to coal mining in the English language. (2) Samples of soil were collected at a depth of 5–20 cm. (3) Methods of analysis with stringent quality assurance and quality control measures. (4) Articles containing descriptive statistics (mean, min, max, standard deviation) were considered for review. Finally, 17 articles meet the above criteria out of 123 articles available in databases (Fig. 1). Moreover, Sample size plays a pivotal role in evaluating data quality within research papers, as it directly impacts the reliability and applicability of the conclusions drawn from the analyses. For instance, in 17 selected peer-reviewed papers, sample sizes were reported as follows: 44, 58, 75, 32, 44, 18, 15, 25, 17, 83, 12, 128, 83, 30, 18, 20, and 10.

Fig. 1
figure 1

PRISMA flowchart for inclusion and exclusion of articles

Study area

The present study concentrates on a geographical region spanning seven states, which collectively contribute to approximately 80% of India's total coal production. These states are Assam, Chhattisgarh, Jharkhand, Madhya Pradesh, Odisha, Uttar Pradesh, and West Bengal. Figure 2 illustrates the distribution of coal mines across the aforementioned states in India, with numbers representing the total number of papers considered for the study. In these regions, Gneissic, sandstone, and shale with coal seams are the predominant rock types. Further details regarding the rock strata, temperature, annual rainfall, and population surrounded by these mines are provided in Table 1. The mines are located at a slight distance from agricultural fields. Nevertheless, agricultural areas are indirectly affected by contaminated groundwater in these regions.

Fig. 2
figure 2

Study Area of various coal mines

Table 1 Geo-Spatial characterization of selected Indian coal mines

Assessment of soil pollution using pollution indices

Different methods were utilized to digest soil TEs, and the most commonly applied digestion procedure for soil samples involved a mixture of HNO3: H2SO4: HClO4 in a ratio of 5:1:1. Following the digestion of soil samples, the concentrations of the filtered sample were measured using an Atomic Absorption Spectrophotometer. Figure 3 explains the procedure adopted to assess the soil contamination of coal mining areas in India.

Fig. 3
figure 3

Methodology adopted to assess the soil contamination study

Single pollution indices

  1. (a)

    Geo-accumulation index


The geo-accumulation index (Igeo) assesses the contamination level of a single TEs by comparing the measured concentration to the background concentration (Muller, 1969). In this study, the world background soil concentrations (mg/kg) considered as reference concentrations are 47000, 571, 70.9, 67.8, 6.9, 28.2, 0.49, 28.4, 17.8, and 11.4 for Fe, Mn, Cr, Zn, Co, Cu, Cd, Pb, Ni, and As, respectively (Ahmadi Doabi et al., 2019; P. K. Sahoo et al., 2016; Weissmannová & Pavlovský, 2017). Igeo value is calculated using the below Eq. (1).

$${\text{I}}_{{{\text{geo}}}} = {\text{log}}_{{2}} \left( {\frac{{{\text{CF}}}}{1.5}} \right);{\text{ CF}} = \frac{{{\text{C}}_{{\text{s}}} }}{{{\text{C}}_{{{\text{BG}}}} }}$$
(1)

where CF is the contamination factor; Cs and CBG are the sample concentration and world background soil concentrations for individual TEs. The correction factor of 1.5 is applied in this study to mitigate the effects of fluctuations in the background that may arise due to terrigenous influences. In this study, the classification of Igeo is structured into six distinct categories, each delineating varying degrees of soil contamination illustrated in Table 2.

Table 2 Classification of single pollution indices
  1. (b)

    Ecological risk index


The ecological risk index (Er) is a numerical representation that quantifies the potential risk associated with a specific contaminant. This index is determined through the application of Eq. (2) (Hakanson, 1980).

$${\text{E}}_{{\text{r}}} = {\text{ T}}_{{\text{f}}} \times {\text{ CF}}$$
(2)

where Tf is the toxic response for a given element, Tf values for Fe, Mn, Cr, Zn, Co, Cu, Cd, Pb, Ni, and As are 1, 1, 2, 1, 5, 5, 30, 5, 5, and 10, respectively (Abrahim & Parker, 2008; Tisha et al., 2021; Zhang & Liu, 2014). The resulting Er values are then categorized into five pollution levels to facilitate interpretation in Table 2.

Integrated pollution indices

  1. (a)

    Potential ecological risk index


The potential ecological risk index (PERI) is similar to the ecological risk index, wherein the degree of contamination is characterized by the summation of Er values for the analyzed TEs in the soil. This method allows for a comprehensive evaluation of the combined impact of multiple contaminants, providing a more holistic perspective on the potential environmental risk posed by the analyzed TEs in the soil (Hakanson, 1980).

$${\text{PERI}} = \sum\limits_{(i = 1)}^{n} {Ei}$$
(3)

The classification of PERI (Kamani et al., 2018) is categorized into five levels. A PERI value < 90 indicates a low risk, while a value in the range of 90 to 180 suggests a moderate risk. Within the interval of 180 to 360, the PERI denotes a high risk, emphasizing a more substantial potential impact. In the range of 360 to 720, the PERI signifies a very high risk. If the PERI ≥ 720, it is classified as extremely high, highlighting an elevated and critical level of ecological risk.

  1. (b)

    Pollution load index


The pollution load index (PLI) serves as a comprehensive indicator reflecting the overall contamination status observed at sampled locations for Trace Elements (TEs). The calculated Equation expressing this index is presented below (Tomlinson et al., 1980).

$$\mathrm{PLI }=\sqrt[n]{{\text{CF}}1\times \mathrm{ CF}2 \times \mathrm{ CF}3 \times \mathrm{ CF}4 \dots \dots . \times \mathrm{ CFn}}$$
(4)

where 'n' represents the number of examined heavy metal components, a Pollution Load Index (PLI) exceeding one indicates the soil is contaminated with metals. Conversely, if the PLI value is less than one, it signifies that the examined area is not contaminated by metals.

Health risk assessment

Exposure assessment

The calculation of health risks associated with humans focuses solely on trace elements (TEs) in the soil, excluding considerations of factors such as diet, air pollution, water pollution, and genetics, which are typically important in risk assessment. These additional factors are not accounted for in the present study's calculations. Exposure to TEs from the soil environment occurs through ingestion, dermal contact, and inhalation. These pathways of exposure are commonly observed in both rural and urban areas from contaminated soil (Baltas et al., 2020; Varol et al., 2020; Wang et al., 2017). The present study particularly analyzed these exposure routes in children, adult males, and females. The calculation of the Chronic Daily Intake (CDI) for TEs in the soil is determined using specific formulas, shown in Eqs. (5), (6), and (7).

$${\text{CDI}}_{{{\text{ing}}}} = \frac{{{\text{C}} \times {\text{ED}} \times {\text{EF}} \times {10}^{ - 6} \times {\text{IR}}_{{\text{ing }}} }}{{{\text{AT}} \times {\text{BW}}}}$$
(5)
$${\text{CDI}}_{{{\text{inh}}}} = \frac{{{\text{C}} \times {\text{ED}} \times {\text{EF}} \times {\text{IR}}_{{\text{inh }}} }}{{{\text{PEF}} \times {\text{AT}} \times {\text{BW}}}}$$
(6)
$${\text{CDI}}_{{{\text{derm}}}} = \frac{{{\text{C}} \times {\text{ED}} \times {\text{EF}} \times 10^{ - 6} \times {\text{AF}} \times {\text{ABS}} \times {\text{SA}}}}{{{\text{AT}} \times {\text{BW}}}}$$
(7)

Non-carcinogenic risk assessment

The Hazard Quotient (HQ) is used to estimate a non-carcinogenic risk associated with exposure to environmental pollution. The HQ value calculated for each exposure is calculated from the following equations (Chen et al., 2022).

$${\text{HQ }}_{{{\text{exposures}}}} { } = \frac{{{\text{CDI}}}}{{{\text{RfD}}}}$$
(8)
$${\text{HQ}}_{{{\text{for}}\;{\text{metal}}}} = {\text{HQ}}_{{{\text{ing}}}} + {\text{HQ}}_{{{\text{inh}}}} + {\text{HQ}}_{{{\text{derm}}}}$$
(9)
$${\text{HI}}_{{{\text{for}}\;{\text{site}}}} = \sum {\text{HQ }}\left( {{\text{for Cr}},{\text{ Mn}},{\text{ Ni}},{\text{ Pb}},{\text{ Cu}},{\text{ As}},{\text{ Fe}},{\text{ Zn}},{\text{ Cd}},{\text{ Co}}} \right)$$
(10)

Carcinogenic risk assessment

Carcinogenic risk (CR) refers to the possibility of acquiring cancer as a result of human exposure (Baltas et al., 2020). The CR of Cr, Cu, Cd, and As was determined using Eqs. (11), (12), and (13), which is the summation of the cancer risk from three pathways.

$${\text{CR}}_{{{\text{exposures}}}} = {\text{CDI}} \times {\text{SF}}$$
(11)
$${\text{CR}}_{{{\text{for}}\;{\text{metal}}}} = {\text{CR}}_{{{\text{ing}}}} + {\text{CR}}_{{{\text{inh}}}} + {\text{CR}}_{{{\text{derm}}}}$$
(12)
$${\text{TCR}}_{{{\text{for}}\;{\text{site}}}} = {\text{CR}}_{{{\text{Cr}}}} + {\text{CR}}_{{{\text{Cu}}}} + {\text{CR}}_{{{\text{Cd}}}} + {\text{CR}}_{{{\text{As}}}}$$
(13)

Definitions and reference values about the health risk index parameters are given in Table S1 and Table S2.

Data analysis and source Identification

In the analysis, Violin plots were utilized to examine the distribution of data, visualize the relationships, handle large data sets, and identify outliers across various mines. These visualizations provide insights into key statistical metrics such as the mean, mode, and values within the interquartile range (25th to 75th percentile). The generation of these plots was facilitated using the Origin Pro software. To establish the correlation between TEs in coal mines and their potential origins, a comprehensive analysis was conducted using Origin Pro software (version 2023b, learning edition). The analytical approach used in this study consists of statistical techniques such as Pearson's correlation coefficients and Principal Component Analysis (PCA), which are commonly employed in environmental research (Meza-Figueroa et al., 2007; Tahri et al., 2005; Yongming et al., 2006). Pearson's correlation coefficients were utilized to measure the strength of relationships between different TEs, enabling an assessment of how closely these metals are associated with the coal mine environment. PCA is a well-established method for data reduction and extraction of Principal Components (PCs) to analyze correlations among observed variables. We utilized PCA with Varimax normalized rotation to enhance data interpretation. This rotation method maximizes variances in factor loadings, making it easier to interpret relationships between variables (Loska & Wiechuła, 2003).

Prediction of soil environmental quality for early warnings

Environmental pollution poses a critical challenge to sustainable economic growth, with the mining sector being a prominent industry in developing countries (Sumaira & Siddique, 2023). The expansion of industries, notably prominent in countries like India and China, has become a significant contributor to pollution. In these regions, mining stands out as a primary sector, exerting substantial pressure on environmental resources, leading to soil, surface water, and groundwater contamination through rainfall and leaching processes associated with mining activities, exacerbating pollution concerns (Worlanyo & Jiangfeng, 2021). To address this situation, stringent laws and effective strategies are essential. Understanding the future accumulation of soil concentrations is crucial for developing appropriate measures. This prediction study assumes that metals have accumulated in the soil over the past years due to mining activities, transitioning from background concentrations to current value with constant uniform speed. For the default scenario, pollution behavior continues with constant input/output fluxes while predicting future concentrations (Xia et al., 2024). The following Equation is employed to predict the accumulation of metal concentrations in soil.

$${\text{C}}_{{\text{t}}} \left\{ {\begin{array}{*{20}l} {{\text{C}}_{0} \times \left[ {\frac{{{\text{C}}_{0} }}{{{\text{C}}_{B} }}} \right]^{\frac{t}{50}} } \hfill & {{\text{C}}_{0} \ge {\text{C}}_{B} } \hfill \\ {{\text{C}}_{0} } \hfill & {{\text{C}}_{0} {\text{ < C}}_{B} } \hfill \\ \end{array} } \right.$$
(14)

where Ct is the concentration of TEs concentration (mg/kg) at time t (years), C0 is the present concentration of metal (mg/kg), and CB is the background soil concentration.

Results and discussion

Trace elements concentrations in coal mine soils

Table 3 presents the statistical analysis of the data distribution observed in the selected papers listed. Notably, the concentrations of these metals exhibit a discernible descending order, with Fe having the highest concentration, followed by Mn, Cr, Zn, Ni, Cu, Pb, Co, As, and Cd. The average concentration (mg/kg) of TEs were 226.14 (Cr), 19267.96 (Fe), 846.93 (Mn), 147.23 (Ni), 100 (Cu), 212.06 (Zn), 1.23 (Cd), 80.53 (Pb), 27.42 (Co), and 6.59 (As). A study by Islam et al. (2023) on metal concentrations in coal mining regions revealed that the average concentrations of Cr (49.80 mg/kg) and Pb (53.65 mg/kg) were lower compared to the values observed in the present study. This suggests that Indian coal mining soil is highly contaminated with elevated levels of Cr and Pb. However, the mean concentration of coal mine soil for Cr, Mn, Ni, Cu, Zn, Pb, and Co are 2, 4.05, 5.32, 1.77, 9.6, and 6.15 times, respectively, exceeded the Indian natural background soil (Kuhad, 1989; Srinivasa Gowd et al., 2010). Cr and Ni concentrations are higher than China, Canada, Poland, and Swedish soil guidelines (Bhagure & Mirgane, 2011; Kumar et al., 2019; Wcisło, 2012), Cu and Zn concentrations are higher than Canada soil guidelines (Kumar et al., 2019), and Co concentrations are higher than Poland and Swedish soil guidelines (Bhagure & Mirgane, 2011; Wcisło, 2012). Excessive metal concentrations in soil can inhibit plant growth, pollute the food chain, harm soil microorganisms, and wildlife can suffer (Chibuike & Obiora, 2014; Vallverdú-Coll et al., 2016; Zhang et al., 2016).

Table 3 Descriptive statistical summary of the TEs in coal mining regions (n = 17, mg/kg)

Status of soil pollution with single indices

The geo-accumulation index is a widely used method for assessment of the degree of contamination in soil environment. Figure 4a. presents violin plots illustrating the Igeo values for TEs in various coal mine sites in India. The average Igeo values of coal mine for Cr (0.25), Fe (− 2.78), Mn (− 1.08), Ni (1.33) indicates moderately contaminated, Cu (0.51), Zn (0.19), Cd (0.15), and Pb (1.11) indicates un contaminated to moderate pollution, Co (− 1.86) moderate pollution, and As (− 2.8) indicates uncontaminated. The Igeo values for Cr vary from − 2.48 to 2.96, indicating that 38.5% of the coal mine sites are categorized as unpolluted, 23.1% of coal mine soil falls into the unpolluted to moderately polluted category, 23.1% are considered moderately polluted, and the remaining 15.4% are characterized by a moderate to heavy pollution level, as depicted in Fig. 5. Notably, all samples analyzed for Fe, Mn, and As show no contamination, with their Igeo values being consistently below zero, confirming the absence of contamination in the soil environment. The analysis reveals that 27.3%, 14.3%, and 7.7% of the coal mine samples are heavily polluted with Ni, Co, and Cu, respectively. Furthermore, the calculated values for Cu (15.4%), Pb (18.8%), and Zn (15.4%) indicate that coal mines fall within the category of moderate to heavy pollution. Hence, it is evident that Co, Ni, and Cu are the main contributors to the presence of TEs in the coal mining region.

Fig. 4
figure 4

Single pollution indices for various toxic elements present in the coal mine soils

Fig. 5
figure 5

The proportions of Geo-accumulation levels of Trace elements

The results concerning the Ecological risks assessment for different TEs across various soil samples from mine sites are presented in Fig. 4b. The average Er values for Ni and Cd stand at 41.36 and 75.06, respectively, indicating a moderate ecological risk. However, for the other TEs, their Er values are below 40, signifying a negligible ecological risk, specifically, the Er values for Cr (0.54 to 23.50), Fe (0.02 to 0.85), Mn (0.08 to 6.02), Ni (3.20 to 160.4) (with 9.09% of sites posing a high risk and 18.18% showing a moderate risk), Cu (2.01 to 61), Zn (0.29 to 6.57), Cd (3.67 to 160) (with 33% of sites indicating considerable risk and 25% classified as high risk), Pb (2.01 to 54.75), Co (4.97 to 66.73), and As (2.77 to 13.51). The toxic response factors of Cd (30) and Ni (5) are higher than other metals, which indicates Cd and Ni are primarily attributed to a high ecological risk. Moreover, studies conducted by Chandra and Ghosh (2023) and Kumari et al. (2023) has also indicated that Cd is a primary contributor in coal mining regions.

Status of pollution with integrated indices

Figure 6 depicts the Integrated indices assessment associated with various coal mine soils and their respective contributions from TEs. The PERI values exhibit a notable range, spanning from 11.09 to 316.14 (illustrated in Fig. 6a), reflecting substantial variability among different coal mining sites, with risks ranging from low to high. The collective average PERI value for all coal mine soils is 113.94, indicating moderate pollution. A closer examination of the calculated PERI values reveals that 47.05% of the coal mining sites fall into the low-risk category, while 17.63% are classified as moderate risk, and 35.32% are associated with high risk, as depicted in Fig. 6a. The TEs contribution to PERI is in the order of Cd > Ni > Cu > Pb > Co > Cr > Zn > As > Mn > Fe, with Cd (37.3%) as primary contributor, followed by Ni (22.6%), as shown in Fig. 6b.

Fig. 6
figure 6

Integrated indices assessment a PERI values, b TEs contribution to PERI, c PLI values for all coal mine sites

The Pollution Load Index (PLI) for different coal mine sites in India is depicted in Fig. 6c. The PLI values, ranging from 0.31 to 9.68, indicate low to high-risk levels. Out of the sites assessed, 70.5% are classified as falling under the contamination level, and 29.4% show no contamination. On average, the PLI value is 2.45, indicating that all coal mine sites are slightly contaminated. A study conducted by Kumari et al. (2023) also supports the moderate pollution levels observed in coal mines regions.

Stepwise regression analysis

Integrated indices were computed by combining individual pollution indices for various TEs, yet not all metals exerted a discernible influence on the overall outcome. To refine the assessment, it became imperative to eliminate insignificant TEs for the calculation of indices. Specifically, a stepwise regression analysis was conducted in Microsoft Excel (Office 2021) to derive a linear regression equation of PERI, considering only TEs that are significant (p < 0.05). In the initial step of regression analysis, an obtained relationship with a commendable R2 value of 0.966 was established. However, the 'p' value for Mn coefficients (0.82) exceeded the 0.05 threshold, rendering Mn concentration statistically insignificant for PERI computation in the context of the existing data. Subsequent steps (Step 2 to Step 6) involve the exclusion of TEs such as Co, Fe, As, Cu, and Cr due to their 'p' values (0.70, 0.82, 0.40, 0.11, and 0.12, respectively) falling outside the significant range. In the final Step 7, the refined regression equation emerged, incorporating only Ni, Zn, Cd, and Pb as TEs with a statistically significant correlation (p < 0.05) to the PERI. The resulting regression equation, denoted as Eq. (15), encapsulates these influential TEs and serves as a more focused and precise tool for calculating the PERI in the specific context of existing data. Similar to the PERI regression equation, the calculation of PLI linear regression involves several steps, from step 1 to step 6. In this process, elements such as Mn, Ni, Fe, Zn, Co, and Cr are observed, and their coefficients 'p' values are recorded. For instance, Mn (0.99), Ni (0.92), Fe (0.22), Zn (0.47), Co (0.09), and Cr (0.99) are deemed insignificant in this case. In the final step, the regression equation obtained includes significant elements such as Cu, Cd, Pb, and As, as represented in Eq. (16). This regression equation simplifies the data, facilitating the calculation of the PLI for a particular site. The correlation coefficient of this obtained relation is 0.959, indicating a strong correlation between the variables.

$${\text{PERI}} = 0.{335}\left( {{\text{Ni}}} \right) + 0.0{86 }\left( {{\text{Zn}}} \right) + {48}.{337}\left( {{\text{Cd}}} \right) + 0.{455}\left( {{\text{Pb}}} \right) - {1}.{426}\left( {{\text{R}}^{{2}} = 0.{969}} \right)$$
(15)
$${\text{PLI}} = 0.0{19}\left( {{\text{Cu}}} \right) + 0.{543}\left( {{\text{Cd}}} \right) + 0.0{13}\left( {{\text{Pb}}} \right) - 0.{213}\left( {{\text{As}}} \right)\left( {{\text{R}}^{{2}} = 0.{959}} \right)$$
(16)

Health risk assessment

Non-carcinogenic risk assessment

The non-carcinogenic risk assessment considered various pathways, including oral ingestion, dermal contact, and inhalation. Figure 7 illustrates the HI values for three distinct categories and their associated exposure pathways. In particular, oral ingestion contributed to 98.8% of adult males, 98.79% of adult females, and 99.61% of children (as shown in Fig. 7b) to the HI value. The contributions from other pathways, such as dermal contact and inhalation, are less than 1%. This highlights that oral ingestion is the predominant pathway contributing to the HI value for all categories of humans. This highlights the fact that the primary exposure to TEs in soil is through the consumption of crops (Chandra & Ghosh, 2023; Chen et al., 2022). According to the United States Environmental Protection Agency (USEPA), a 2001 report established that a HI below one indicates a low risk from exposure to a specific combination of harmful substances. Conversely, an HI greater than one suggests that cumulative exposure to toxic substances could potentially lead to non-carcinogenic health effects (Baltas et al., 2020).

Fig. 7
figure 7

HI values for Different categories and their contribution towards exposure pathways

The HI values for adult males across different coal mine sites range from 0.007 to 0.464, for adult females from 0.008 to 0.543, and for children from 0.064 to 4.23. While there is considerable variation in these values, no significant risk has been identified for both adult males and females. However, in the case of children, 29.4% of the sites exceed the value of 1, indicating a positive non-carcinogenic risk established. On average, the HI values for adult males, adult females, and children are 0.121, 0.142, and 1.105, respectively (as shown in Fig. 7a). This analysis underscores that children are more susceptible to non-carcinogenic risks compared to adult males and females. The difference in HI values between adult males and adult females may be connected to variations in average body weight and skin surface area between gender groups (Yang et al., 2018). Children are particularly vulnerable to environmental pollutants due to increased respiration rates per unit body weight, hand-to-mouth actions for soils, and faster gastrointestinal absorption of certain toxic components (Ma et al., 2018). Also, a study conducted by Gopinathan et al. (2023) uncovered that young children are encountering greater health risks compared to adults. In conclusion, the findings of this study suggest that children living in coal mine areas are at an increased risk of experiencing non-carcinogenic health effects as a result of exposure to TEs. It is crucial to implement measures to reduce its exposure in these areas, particularly to young children who are more vulnerable to the harmful effects of these pollutants.

The study's results revealed a consistent ranking order for the HI among eight TEs (Cr, Pb, Mn, As, Ni, Cu, Cd, and Zn). Among these elements, Cr was the primary contributor to HI, followed by Pb, Mn, As, Ni, Cu, Cd, and Zn, as shown in Fig. 8. Notably, in the case of children, Cr, Pb, and Mn pose the most significant non-carcinogenic risks, contributing to 54%, 20.3%, and 12.8% of the total HI values for TEs, respectively. The calculated HI values of Cr, Pb, and Mn were 0.60, 0.23, and 0.14, respectively. Combining these metal HI values nearly reaches 1, indicating that children are particularly vulnerable to non-carcinogenic risks associated with these metals. Pb, Cr, and Mn can enter the human body through contaminated food, water, air particles, and skin absorption. Inhaling Cr can lead to lung cancer, respiratory issues, and skin irritation, while long-term exposure can harm the liver, kidneys, and circulatory system. Pb can cause brain and nervous system damage, damage to kidneys, anemia, high blood pressure, and reproductive issues, especially in children (Chen et al., 2022; Rahman & Singh, 2019). Elevated Mn levels in soil can lead to health issues, especially in children, including neurological abnormalities, behavioral problems, and learning difficulties when they ingest or inhale the contaminated soil. (Bjørklund et al., 2017). Therefore, it is vital to monitor and regulate environmental levels of Pb, Cr, and Mn to prevent potential health risks from exposure.

Fig. 8
figure 8

Trace metals contributions to HI

Carcinogenic risk assessment of Cr, Cu, Cd, and As

The elements such as Cr, Cd, and As are commonly acknowledged as carcinogenic, Cu is not typically classified as a primary carcinogen. When its concentration exceeds the world background soil concentration, Cu can potentially exhibit carcinogenic effects in humans (Theophanides & Anastassopoulou, 2002). In the present study, Cu levels were found to be 1.77 times higher than the world background soil concentration, and thus, Cu is considered for calculating the carcinogenic risks. Also, studies conducted by Chen et al. (2022) and Xiao et al. (2020) are considered Cu as carcinogenic while calculating the risks. Figure 9 shows carcinogenic risk assessment of all coal mine soils in India. Total CR values between 1E−6 and 1E−4 suggest a tolerable carcinogenic risk, whereas values more than 1E−4 indicate a serious health concern (Tume et al., 2018). The calculated total CR value for adult males ranges from 4.04E−6 to 2.81E−4, for adult females 1.25E−4 to 3.6E−4, and for children 9.2E−6 to 6.97E−4. This analysis helps in understanding that 23.53%, 23.53%, and 47.06% of sites exceed the 1E−4 for adult males, adult females, and children, respectively. The average value for adult males, adult females, and children are 0.83E−4, 1.04E−4, and 2.02E−4, respectively (Fig. 10); this indicates that for adult male carcinogenic risk is tolerable, but adult females and children are prone to carcinogenic risk (TCR > 1.0E−4). Furthermore, it is noteworthy that the TEs contributing to the carcinogenic risk maintain a consistent order of ranking across all adult males, adult females, and children, with Cu being the most predominant contributor, followed by Cr, Cd, and As, specifically, considering adult females and children, Cu accounts for 57.6% and 57.86% of the total CR value, respectively, while Cr contributes 39.37% for adult females and 39.11% for children (Fig. 10). The individual CR values for Cr and Cu in the case of adult females are 6.9E−4 and 1.0E−3, respectively, whereas for children, they are 1.3E−3 and 1.98E−3, respectively. These elevated values surpass the established threshold, indicating the presence of carcinogenic risks. This analysis underscores the fact that these two metals (TCR > 1.0E−4) pose a carcinogenic risk for both adult females and children.

Fig. 9
figure 9

Carcinogenic risk assessment

Fig. 10
figure 10

Trace metals contribution towards total carcinogenic risk for different categories

Comparison with other studies

The average concentration of TEs in coal mine soil from different countries is depicted in Table 4. The TEs such as Cu, Ni, Pb, Zn, and Cd concentrations in the present study are notably high when compared to those in other countries, and these metals exceed the levels found in the world's background soil. South Africa reported higher concentrations of Cr and Co, while China reported elevated levels of As. These comparisons underscore the severity of contamination in the Indian soil environment resulting from coal mining, which is exacerbated by inadequate regulatory measures, extensive mining activities, and potentially a lack of environmental protection plans.

Table 4 Average concentrations of various coal mines from different countries (mg/kg)

Identification of pollution sources through multivariate statistical analysis

Correlation analysis

Correlation analyses serve as an effective geostatistical tool for uncovering inter-element relationships within multivariate distributions of TEs data obtained from soil samples (Varol et al., 2020). The Pearson correlation analysis conducted for TEs in the soil sample collected from the coal mine, as illustrated in Fig. 11, revealed several noteworthy relationships. Notably, Co appeared to have no significant correlations with the other analyzed TEs, suggesting its relatively independent presence in the soil. Fe displayed a strong and significant correlation with Zn and Cu, suggesting homologous characteristics derived from a similar source. Furthermore, a strong positive correlation between Cr and Pb, Mn and Ni, Cd, and As exhibited a robust positive correlation, implying a potential common source or environmental influence on their presence. These findings provide valuable insights into the elemental composition and potential sources of contamination in the soil, with implications for environmental and health assessments. To gain a deeper understanding of the origins of contamination in the soil, further analysed using Principal component analysis.

Fig. 11
figure 11

Pearson correlation analysis

Principal component analysis

Principal component analysis (PCA) was employed to differentiate between TEs in coal mine soil originating from lithogenic sources and those influenced by anthropogenic sources. This method aids in identifying the environmental composition of the soil, providing insights into the presence and impact of TEs in coal mines (Sun et al., 2010). Upon conducting the Kaiser–Meyer–Olkin (KMO) test and Bartlett's sphericity test to evaluate the data's suitability, the results unveiled a KMO value of 0.69, alongside a significance level lower than 0.001. This indicates that the dataset is indeed conducive to PCA. This analysis is employed with varimax rotation, revealing factors with eigenvalues greater than 1. The study identified the appearance of four principal components (PCs) with eigenvalues exceeding one, collectively explaining 85.25% of the total variance (Table 5). PC 1, which contributed to 42% of the overall variance, exhibited the highest loadings for TEs like Fe, Cu, and Zn. Fe primarily originates from the oxidation of sulfide minerals that become exposed during mining operations, leading to the formation of AMD and eventual accumulation within the soil layers (Yilmaz et al., 2019). Bhuiyan et al. (2010) revealed that Zn could potentially be liberated from minerals like sphalerite, melnikovite, and mispickel, which are commonly found in association with coal seams. Furthermore, Fe exhibits a robust positive correlation with both Cu and Zn at a correlation value of 0.89 (Fig. 11). This reinforces the conclusion that PC1 can be attributed to coal mining activities. PC 2, with a variance of 19.83%, displayed the highest positive loadings for Mn and Ni. Fang et al. (2021) revealed that Mn and Ni are discharged into the environment as a result of mining operations, the utilization of heavy machinery, and inadequate handling of coal stockpiles during transportation. PC 3, with an 11.94% variance, had prominent positive loadings for Pb and Cr. In their study, Zhang et al. (2021) highlighted that Cr predominantly emanates from coal dust, a byproduct of coal mining, thus indicating its anthropogenic origin. A study conducted by Karan et al. (2024) also provides support for the release of Cr from coal fly ash. Moreover, the substantial positive correlation (0.6) observed between Cr and Pb (as shown in Fig. 11) suggests a shared source for these elements, namely coal dust. Lastly, PC 4, which contributed to 11.48% of the total variance, exhibited significant positive loadings for Cd. Notably, the Cd concentration was found to be lower than the background soil levels and reference soils, suggesting that it likely originates from the inherent characteristics of the soil itself.

Table 5 Results of varimax-rotated principal component analysis

Anticipating the future contamination levels

Soil contamination elevates the risk of groundwater contamination through runoff and leaching, posing a significant concern in India, where groundwater serves as the primary source for domestic and irrigation needs. Unfortunately, the utilization of contaminated water not only reduces crop yields but also increases the risk of food contamination (Raj et al., 2017; Reza et al., 2018; Singh et al., 2023). The anticipation of soil concentrations is pivotal in ensuring effective environmental management, adherence to regulatory standards, and safeguarding the well-being of both ecosystems and human health. This study examines the anticipated dynamic trends in TEs such as Cr, Cu, Ni, and Cd in soil resulting from coal mining activities. The assumptions underlying this research include a consistent accumulation of pollutants in coal mining areas, with concentrations expected to rise from the background level to the current value. Figure 12 depicts the anticipated average concentrations of TEs in the soil environment. The trend of TEs demonstrates a respective average increment of 43% in Cr, 39% in Cu, 88% in Ni, and 38% in Cd in the soil environment compared to the previously calculated values. Following the information provided in the preceding Section "Trace elements concentrations in coal mine soils", the comparison involving Indian natural background soil and other reference soils reveals that the future trends of the targeted metals are surpassing the predicted levels. For a more comprehensive understanding, additional assessments like ecological risk index and health risk assessment are necessary.

Fig. 12
figure 12

Predicted average metal concentrations in coal mine soil

Figure 13 presents the projected ecological risk index alongside the percentage contribution of metals from 2025 to 2075. The ecological risk index values show a progressive increase from 190 (high risk) to 2156 (extremely high risk), indicating that soils are at a very high risk, as classified in Table 2. The contribution of Cr, Cu, and Cd to the ecological risk index is decreasing, but Ni contribution increased from 33.8 to 70.6% during the prediction period. The possible reason behind Ni's contribution is the highest percentage of increase in the predicted concentration. A high rate of Ni intake may pose a serious threat to humans, such as cancer and lung function reduction (El-Naggar et al., 2021). Moreover, the health risk assessment revealed an increase in the HI values for all three groups (Male, Female, and Child) during the prediction period. Notably, the HI value for children exhibited a rapid increase, signifying the presence of more pronounced non-carcinogenic risks. In detail, the HI values for both males and females remain below the safety threshold of 1. However, in 2075, the HI value for females surpasses 1, reaching 1.12. Furthermore, for children, the HI values escalate significantly from 1.12 to 8.79 (Fig. 14a), indicating that children face high risks in proximity to coal mining areas. TCR values for a male range from 1.37E−4 to 7.55E−4, for females 1.61E−4 to 8.8E−4, and for children 3.13E−4 to 1E−3 (Fig. 14b). It states that all three groups are prone to carcinogenic risks and Cr and Cu are being more dominant to carcinogenic risks during the prediction period.

Fig. 13
figure 13

Prediction of ecological risk assessment from 2025 to 2075

Fig. 14
figure 14

Prediction of health risk assessment

This analysis is conducted with a limited dataset, and the reported results are based on a constrained set of information. The study reveals certain uncertainties in predicting concentrations, highlighting the absence of an assessment for the quality of results generated compared to real-world data. The accuracy of predictions is significantly influenced by the quality of input data, particularly the rates of pollution accumulation and attenuation. In some cases, the prediction of metal concentration is underestimated due to conditions drawn while formulating the empirical Equation. These uncertainties impose limitations on the applicability of the present study for accurately predicting metal concentrations. However, the predicted concentrations provide a general trend of increasing metal levels, serving as a foundation for discussing and implementing cost-effective remediation strategies, as detailed below.

  1. 1.

    Providing valuable insights for early warnings of soil quality, and establish an environment protection plan for soil.

  2. 2.

    The prediction of soil environment quality near mining regions highlights the safeguarding of human and ecological health.

  3. 3.

    These results underscore the need for waste management and environment protection plans to reduce the negative impacts on the local community.

  4. 4.

    Moreover, by collaborating efforts among scientists, policymakers, industrial stakeholders, and local communities, soil pollution concerns can be effectively tackled, guaranteeing a safe and healthy environment for both present and future generations.

Conclusion

This study evaluated the comprehensive review of present and future contamination status of soil environment in the vicinity of coal mines in India. The Geo-accumulation index highlighted Co, Ni, and Cu as primary contributors to the accumulation of TEs in coal mining regions, with Ni and Cd posing significant ecological risks. The non-carcinogenic risk assessment showed oral ingestion as the main exposure pathway for TEs, with children at risk. Additionally, carcinogenic risk results indicate that both adult females and children are at risk. The results of source identification revealed that TEs originated from coal mining activities. Predictions of the soil environment indicate that rising concentrations of TEs result in heightened ecological and health risks. The evaluation was conducted with limited data, and the accuracy may not be as high as when compared to real data. However, these findings provide insights for identifying the contamination status of similar industries. They empower policymakers, industry stakeholders, and researchers to implement regulatory measures and sustainable treatment approaches, thereby safeguarding public health and mitigating ecological risks. The future scope of the present study is discussed below.

  1. 1.

    Conducting additional field monitoring studies in proximity to coal mining regions is imperative to ensure the reliability of qualitative data obtained through reviews.

  2. 2.

    Exploring the seasonal characteristics of Acid Mine Drainage (AMD) and its influence on the soil environment.

  3. 3.

    Exploring the potential of phytoremediation as a sustainable and cost-effective approach for remediating contaminated soil.