1 Introduction

In order to meet the energy requirements of the growing population, businesses, and industries, the overall coal production and coal mines have tremendously increased in India, which is third among the top ten coal producing nations in the world (World Coal Association 2011). Jharia coal field (JCF) in Jharkhand state is the imperative coalfield, having the biggest reserve of coking coal in India. Coal exploration commenced exhaustively in this coalfield during 1925 (Ghose and Majee 2007). Soon after the exploration period, JCF became a leading mine producing prime coking coal, which accounts for more than 30 % of the total Indian coal production (Ghose and Majee 2007). The rigorous coal mining and mine fires in the region have brought about various geo-environmental hazards such as soil and water pollution (Rout et al. 2014).

Trace element pollution is covert, relentless, and irreparable; it not only takes down the eminence of the atmosphere, water bodies, and food crops, but also threatens the health prospects of mankind (Dong et al. 2011; Li et al. 2014; Sharma et al. 2007). Since soil is a prominent sink for trace elements discharged from anthropogenic sources, thus understanding the occurrence and accessibility of trace elements in soils is of utmost importance in view of implications for environmental health (Hu and Cheng 2013). Trace elements in the soil are derived from the parent material (lithogenic source) and various anthropogenic sources like urbanization and industrialization including mining and agriculture (Dantu 2009; Facchinelli et al. 2001). Among anthropogenic sources of trace elements, coal mining plays a significant role. Many toxic trace elements are released during coal mining and the burning of coal (Rout et al. 2013). Coal mining activities are inevitably connected with the excavation of huge amounts of dump rocks, which are also of serious environmental concern due to the potential for production of acidic drainage with high element content. Burning coal gives rise to airborne compounds such as fly ash and bottom ash, which contains trace elements that settle down or wash out from the atmosphere into the land and thus pollute soil and water.

Trace element pollution around JCF in air, particularly in dust, has been reported during recent years (Singh 2011; Rout et al. 2014; Pandey et al. 2014). However, the only study reported in Jharia region with respect to trace elements pollution in soil was conducted in an old reclaimed coal mine and overburden dumps with respect to garden soil as a control. The dust generated due to coal mining activities ultimately blends with the soil causing spatial changes in soil quality. Information on background concentrations of hazardous trace elements like Cd, Cr, Cu, Ni, Zn, Pb, Fe, and Mn in soil around JCF has not been attempted earlier and even spatial variations in trace elements contamination in soil are not known. Studies related to source apportionment are also not available for the coal mining area in India. In view of the above, an extensive field study was conducted to assess the levels of trace element contamination in soil and their enrichment factor (EF), contamination factor (CF), and pollution load index (PLI). Multivariate statistical approaches, such as principal component analysis (PCA) and modeling on absolute principal component scores (APCS) followed by multiple linear regression (MLR) were employed for the source apportionment of trace element in the soil. Hierarchical agglomerative cluster analysis (HCA) and heat map were applied to understand the behavior of trace elements and sites.

The objectives of this field study were as follows: (1) to evaluate the levels of soil contamination in respect to average shale concentrations of toxic trace elements in the region, (2) environmental quality assessment with respect to trace elements in the soil, (3) to find out the associations among different trace elements and their spatial distribution, and (4) to identify and quantify the potential sources of contamination for each element to explain the spatial patterns of soil pollution in the area and their relation with emission source.

2 Study area

Jharia coal field is situated in Dhanbad district of Jharkhand, India. It lays between latitudes 23° 39ʹ to 23° 48ʹ N, longitudes 86° 11ʹ to 86° 27ʹ E and 222 m above mean sea level (Fig. 1). This coalfield is about 40 km in length and approximately 12 km in width, covering an area of about 450 km2. JCF is an important coalfield of the Damodar Valley basin and the most heavily mined. JCF consists of about 100 mines in which 68 mines are operating, while 25 mines have been closed due to mine fire. JCF is engulfed by almost 70 mine fires covering an area of nearly 18 km2. This region is encompassed by metamorphic rocks including granites, granite-gneisses, quartzite, amphibolites, and mica schists occurring at different depths (Singh et al. 2012). The Damodar Valley basin comprises of fireclays, coal seams, sand stones, silt stones, shales, and conglomerates.

Fig. 1
figure 1

Location map of the study area

Annual wind rose diagram showed the dominant wind directions in the area are SE/SSE with low calm conditions (5.3 %) and wind speed mostly ranges between 4 to 6 m h−1 (Fig. 1b). Monthly average relative humidity, rainfall, and temperature in the study area are shown in Fig. 1c.

3 Materials and methods

3.1 Sample collection

Twenty-five sites were selected around JCF for soil sampling. At each site, three sub sites were selected and the soil samples were collected up to 15-cm depth with the help of an auger during April 2010. Sites were selected near the coal mining area as well as up to 10 km around the mining area. Ten sites were selected from coal mining areas (sample prefix MA and MB), ten between 2 and 3 km away from the coal mines (sample prefix NA and NB) and five between 8 and 10 km from the coal mining areas (sample prefix C). Sites far away from the mining area are believed to be less affected by pollution due to mining activities.

3.2 Physico-chemical analyses

The pH of the soil samples was measured in suspension of 1:5 soil to water using a Cyberscan digital pH meter (Model 510, EUTEOH Instruments, Singapore). In the same suspension, conductivity of the soil samples was analyzed by conductivity meter (Model 306, Systronics, India). The total organic carbon (TOC) content in soil samples was measured by Walkley and Black’s method, using FeSO4 after digestion of samples with K2Cr2O7–H2SO4 solution (Allison 1986).

3.3 Trace element analyses

Air dried and <2-mm sieved, soil sample weighing 1.0 g, was digested with HNO3 and HClO4 in 9:4 ratio at 80 °C until a clear transparent solution was obtained (Gaidajis 2003). The digested solution was filtered through Whatman No. 42 filter paper and the solution was maintained to 25 mL with double distilled water and stored in inert glass bottle until analyzed. The concentrations of trace elements were measured with the help of an atomic absorption spectrophotometer (Model AAnalyst 800, PerkinElmer, USA).

Accuracy and precision of trace element analysis were confirmed through repeated analysis of samples against National Institute of Standards and Technology Standard Reference Material for all the trace elements. The results were found within ±2 % of the certified values. Quality control steps were adopted to evaluate contamination and reliability of data. After five determinations, blank and drift standards (Sisco Research Laboratories Pvt. Ltd., India) were run for calibration of the instrument. The coefficient of variation of replicate analysis was determined for different determinations and for precision of analysis. Variations less than 10 % were considered correct.

3.4 Background value consideration

Background values represent basis for measurement regarding anthropogenic pollution. Due to long time duration of mining and other anthropogenic activities in JCF, it is not judicious to consider the native soils of this area as virgin with respect to pollution. Average concentrations of individual elements in surface rocks have been proposed to take as the background values (Bhuiyan et al. 2010). Element concentrations of fossil argillaceous sediments (average shale) have also been considered as the background values (Gowd et al. 2010). Geology of the JCF indicates that soils of this area have been derived from erosion and weathering of pre-existing rocks, mainly shale and sandstone (Singh et al. 2013). Therefore, the elemental concentrations of shale and sandstone will be a close approximation of the element content of the clean soil of this area and has been taken as the background value. Black shale and sandstone were collected from two sites and analyzed for trace element concentrations after crushing and digestion as per the method described in Section 3.3.

3.5 Soil pollution quantification

3.5.1 Enrichment factor

Element concentration against a reference value gives enrichment factor (EF) of the element of interest. A geochemically distinguishing element, having high concentration in the environment, and not showing the property such as antagonism or synergism towards the examined element, can be used as reference element (Gonzalez-Macias et al. 2006). The reference element employed in this study is iron (Fe), which is most widely used for normalization (Bhuiyan et al. 2010; Gowd et al. 2010). Therefore, EF is calculated using the following relationship.

$$ \mathrm{E}\mathrm{F} = \frac{\left(\mathrm{Element}\ \mathrm{concentration}\right)/{\left(\mathrm{F}\mathrm{e}\ \mathrm{concentration}\right)}_{\mathrm{Sample}}}{\left(\mathrm{Element}\ \mathrm{concentration}\right)/{\left(\mathrm{F}\mathrm{e}\ \mathrm{concentration}\right)}_{\mathrm{Shale}}} $$

The EF values near to 1.0 is a sign of crusted origin, less than 1.0 suggest a probable mobilization or reduction of elements, while values >1.0 show that the element is of anthropogenic origin. EFs more than 10 are suggested to be non-crusted origin (Buat-Menard and Chesselet 1979).

3.5.2 Contamination factor

Contamination factor (CF) reflects the anthropogenic input in elemental pollution and is widely used as a measure of overall contamination of soil in Asian countries (Dantu 2009; Bhuiyan et al. 2010). CF is calculated by dividing the concentration of element in the soil by the background concentration (Hakanson 1980).

$$ \mathrm{C}\mathrm{F}=\frac{C_i}{B_i} $$

C i is the concentration of the examined element i, and B i is the geochemical background value of that element. The contamination grades in an increasing order of contamination are rated from 1 to 6 (0 = none, 1 = none to medium, 2 = moderate, 3 = moderate to strong, 4 = strongly polluted, 5 = strong to very strong, 6 = very strong) (Varol 2011).

3.5.3 Pollution load index

Pollution load index (PLI) for a set of n polluting elements is defined as a value calculated from the geometric mean of the contamination factors of those elements. PLI is calculated by the following expression given by Tomlinson et al. (1980).

$$ \mathrm{P}\mathrm{L}\mathrm{I} = {\left(\mathrm{C}\mathrm{F}1 \times \mathrm{C}\mathrm{F}2 \times \mathrm{C}\mathrm{F}3 \times \cdot p \cdot p \times \mathrm{C}\mathrm{F}n\right)}^{1/n} $$

PLI value higher than unity suggests pollution existence, while lower than 1 indicates no pollution load.

3.5.4 Statistical analyses

The PCA/APCS is a multivariate receptor model which seeks to ascertain the probable sources, their composition, and contribution to each observation. PCA/APCS approach followed by MLR procedure reveals quantitative source contributions (Thurston and Spengler 1985; Shi et al. 2009; Li et al. 2012). PCA was conducted with the help of orthogonal rotated factor matrix method.

In short, PCA is required to find out the factor scores. PCA is used to cut down the collinearity and dimensionality of a data set and to find out the concealed correlations or structures (Wentzell et al. 1997). The matrix of observed trace element concentrations is separated into the product of a matrix of factor scores (FS), a matrix of factor loadings (L) and a matrix of remainders (R) (Schaefer et al. 2010):

$$ {\mathrm{TE}}_{ab} = {\mathrm{FS}}_{cb}{L}_{ac}+{R}_{ab} $$

Where a, b, and c are number of features, number of objects, and number of components, respectively. PCA was executed on covariance matrix of all features, and factors were extracted on the basis of eigenvalue > 1 criterion. The factor scores derived by PCA are used to calculate APCS for each sample point. As PCA is based on normalized data, the mean of the scores in each component moves through origin. Therefore, absolute zero for each component score was computed by assuming an unreal sample with concentrations equivalent to zero for all variables by the formula (Thurston and Spengler 1985; Schaefer et al. 2010; Qu et al. 2013):

$$ {\left({\mathrm{AZ}}_0\right)}_x = \frac{0-{M}_{\mathrm{x}}}{{\mathrm{SD}}_{\mathrm{x}}} $$

where M x is the mean, SDx is the standard derivation, and (AZ0)x is the normalized Z score of the concentration equal to zero for element x. The APCSs for every component was then estimated by deducting the factor scores of this artificial sample from the factor scores of original data.

MLR of APCS over total mass of trace elements provides the regression coefficients, which is applied to transform APCS into mass of all contributing sources by using MLR model equation (Schaefer et al. 2010; Qu et al. 2013):

$$ {Y}_i = {\beta}_{0i} + {\displaystyle \sum_{j=1}^n}\left({\mathrm{APCS}}_j \times {\beta}_{ji}\right) $$

where Y i is the measured concentration of trace element i, β0i is the constant term of the MLR for trace element i, β ji is the coefficient of the source j for trace element i, n is the number of sources, and APCS j is value of the rotated factor j for the treated samples. APCS j × β ji stands for the contribution of source j to Y i . The average of the APCS j × β ji on all samples is the absolute contribution of the source j to trace element i. Source impact on trace element profiles can also be calculated.

Pearson’s correlation matrix was also used to recognize the relationship among trace elements and soil physico-chemical parameters to support the results obtained by multivariate analysis.

HCA was employed to group the sites, identify different geochemical associations, and clustering the samples with similar trace element contents based on the similarity between them. Two-way HCA heat map was formulated according to the Ward algorithmic method, and the squared Euclidean distance was employed for measuring the distance between clusters of similar trace element contents and similar sites. Normal standardization has been done prior to performing an HCA.

Source apportionment modeling and correlation matrix analyses were performed by using IBM SPSS Statistics 20 software and two-way HCA heat map was formed by using PAST statistical software version 2.17.

4 Results and discussion

4.1 Soil physico-chemical characteristics, trace element concentrations, and their correlations

The extensive coal mining in the region has altered the soil physico-chemical characteristics such as pH, Ec, and TOC (Table 1 and Supplementary data 1, Electronic Supplementary Material - ESM). The coal mining activities also increased the concentrations of potentially toxic trace elements, which varied considerably with the locations due to distances from the mining site (Table 1 and Supplementary data 1, ESM). The values of TOC, pH, and Ec ranged from 1.1 to 3.7 %, 4.4 to 7.1, and 157 to 654 μS m−1 with mean values of 2.2 %, 5.68, and 419.3 μS m−1, respectively (Table 1). The mean concentrations of Pb, Ni, Cu, Mn, Fe, Zn, Cd, and Cr were 27.8; 64.1; 66.3; 634; 39,662; 127; 0.42; and 43 μg g−1, respectively (Table 1). Fe, Mn, and Zn showed the highest concentrations in the range of 23,947–50,691; 353–993; and 77.5–188.4 μg g−1, respectively, with the mean values in 1.5, 1.2, and 1.9 orders of the world normal averages (Kabata-Pendias 2010). Mean Zn and Cu concentrations were higher than the critical soil concentrations (Table 1). Critical soil concentrations of trace elements are the threshold concentrations in the soil beyond which detrimental effects on vegetation are evident (Bhuiyan et al. 2010; Kabata-Pendias 2010; Alloway 2013).

Table 1 Descriptive statistics of total organic carbon (TOC), pH, electrical conductivity (Ec), and selected trace element concentrations in soil and black shale, for all sites

The overall trace element concentrations in the soil of mining areas are the sum of the concentrations of elements resulted from lithogenic process (i.e., composition of native minerals) and addition from all anthropogenic sources (i.e., atmospheric depositions of aerosol particles and raindrops containing trace elements or gaseous forms of elements). Waste generated during mining and ashes from coal fire can also contaminate the soils. There are losses of trace elements also through off-take in harvested or grazed plants, erosion of soil particles by wind or water, leaching down through soil profile in form of solution and volatilization of gaseous forms of elements. Coal mining implies excavation of the earthbound coal by removing out overburdens using heavy vehicles and mechanical devices, leading to piling up of huge quantities of mine spoil and dust emission along with coal. Banerjee (2000) reported high concentrations of trace elements in coal of JCF (Pb, 32; Ni, 48; Cu, 28; Mn, 84; Fe, 57; Zn, 60; Cd, 1.3; and Cr, 70 ppm). The spoils are a form of trace element-rich industrial solid wastes (Sebestova et al. 1996). Natural weathering conditions may further degrade these exposed spoils into small clay-sized particles; through this process, huge quantity of fine particles enriched with trace elements are released into the environment within the course of a few weeks (Masto et al. 2011). The high concentrations of Cu, Mn, Fe, and Zn in the area may be ascribed to the minerals associated with the geological formations in the Damodar basin including mica, fire clay, bauxite, limestone, kaolinite, barite, pyrite, and iron stone (Singh et al. 2012). But considerable spatial variations in trace element concentrations suggest anthropogenic sources related to coal mining and mine fires.

TOC positively correlated (p < 0.01) with Mn, Zn, Cd, and Fe. Soil pH and Ec showed significant correlations (p < 0.01) with Pb, Fe, Cd, Mn, Cu and Pb, Ni, Cu, Fe, Cr, respectively (Supplementary data 2, ESM). These relations can be explained by the fate of trace elements in the natural soil. High positive correlations of trace elements with TOC may be explained due to the fact that soil organic carbon is one of the major elemental adsorbents in the soil (Twardowska and Kyziol 2003). Humic acid and fulvic acid are two major components of organic matter having strong complexation ability with trace elements (Tang et al. 2014). Soil pH has a strong influence on element (Pb, Cd, and Cu) solubility (Usman 2008). The sorption of elements on soil organic matter, oxyhydroxide of Fe, Mn, and clay minerals depend on pH (Alloway 2013). Elements are more mobile in soils with lower pH values. Ec correlated positively with trace elements except Mn, Zn, and Cd. Correlation matrix indicates strong positive correlations (r 2 = 0.01) among Cu–Ni–Cr, Mn–Pb, Fe–Pb, Fe–Cu, Zn–Mn, Cd–Pb, Cd–Cu, Cd–Fe, Cr–Cd, and weak positive correlations (r 2 = 0.5) among Cu–Pb, Fe–Ni, Fe–Mn, Zn–Pb, Zn–Cu, and Cd–Mn. The significant positive correlations between these trace elements suggest their common origins and sinks in the soil of JCF mining area (Supplementary data 2, ESM). The presence of Fe and Mn in the soils has a bearing on the increment of other trace elements, as these elements influence the adsorption characteristics of Pb, Cu, Cd, Ni, and Cr (Young 2013). Most of the geochemical associations like Cr–Cu–Zn and Cu–Fe–Cd–Cr (Supplementary data 2, ESM) in the soils show that these elements are deposited from anthropogenic sources, since there are no known geogenic sources which can contribute to this type of associations in this area.

4.2 Pollution indices

The values of EF varied between different sites in the range from 0.92 to 2.28 for Cd, 0.74 to 2.12 for Zn, 0.88 to 1.62 for Pb, 0.57 to 1.86 for Cu, 0.61 to 1.46 for Ni, 0.37 to 1.17 for Mn, and 0.34 to 1.00 for Cr (Table 2). Overall sequence of average EF values for the trace elements was highest for Cd followed by Zn, Pb, Cu, Ni, Mn, and least for Cr. The highest EF for Pb occurred at site MA4, while two sites (NA2 and NB3) showed EF values lower than one. Cd showed highest EF at MA3 and lowest at C1 (Supplementary data 3, ESM). EF values between 0 and 1 suggest that the element is totally from crustal resources or natural processes. Pb, Cd, Cu, and Zn showed EF values more than one at most of the study sites indicating their anthropogenic origin (Zsefer et al. 1996). Higher enrichment values of trace elements near coal mining area indicate toward pollution of the soil due to coal mining activities.

Table 2 Descriptive statistics of enrichment factor, contamination factor and pollution load index for all sites

The CFs of trace elements of environmental interest in the study area varied from the highest to the lowest in order as 0.88 to 1.97 for Pb, 0.47 to 1.68 for Ni, 0.47 to 2.05 for Cu, 0.45 to 1.27 for Mn, 0.68 to 1.43 for Fe, 0.87 to 2.13 for Zn, 0.79 to 2.95 for Cd, and 0.29 to 1.11 for Cr (Table 2). Cd showed CF value in moderate category at ten sites and approaching moderate to strong contamination category at three sites. PLI showed that the soils near coal mining area were contaminated by all the investigated trace elements (Table 2, Supplementary data 3, ESM). The sampling sites MA3 and MB4 were most polluted, as these sites showed the highest PLI within the study area (Supplementary data 3, ESM).

4.3 Source apportionment (PCA-APCS-MLR)

PCA denotes the total variations in the original trace elements in a lowest number of factors. The first principal component serves as the weighted linear combination of the original variables responsible for the highest variability. Each subsequent component accounts for less variability than the previous (Fig. 2). By assessing the loadings on the components, an estimate of the sources accountable for each and every component can be made. Principal component (PC) loadings for trace element concentrations with corresponding eigenvalues and variances are given in Fig. 2 and loading factors of trace elements in three dimensional spaces are presented in supplementary data 4, ESM. Eigenvalue >1 was the criterion for selecting factors and a factor loading of 0.5 was selected as the lowest level of significance within a factor. Three PCs with eigenvalues 2.94, 2.57, and 1.96 were extracted with 93.37 % cumulative variance for different sites.

Fig. 2
figure 2

Rotated principal components loading for trace elements in the soil of JCF

PC1 (eigenvalue 2.94) with 36.74 % variance was loaded with Ni (0.975), Cr (0.939), and Cu (0.873), and was the most important component. Ni contamination of soils results from anthropogenic activities like mining, pulverized fuel ash, smelter, and plating works (Tian et al. 2012). The anthropogenic sources of Cr in the area may be the solid wastes generated during coal mining (Dhal et al. 2013). The potential sources of third heavily loaded element Cu are non-ferrous metal production, coal, and oil combustions in the area (Oorts 2013). On the basis of a critical analysis of existing sources of trace elements in the study area, PC1 may be defined as a component of coal mining, coal burning, and mine fires. Long-term coal mining activities (since 1894), coal burning, and mine fires (about 70 active mine fires) represented by PC1 may have been primary contributors of Ni, Cr, and Cu contamination in the soil. Pandey et al. (2014) also attributed the high concentrations of Ni and Cu in PM10 around Jharia coal mines to the processes related to mining and mine fires.

PC2 (eigenvalue 2.57), which has high positive loadings of Pb, Cd, and Fe, moderate positive loading of Cu, Mn and Cr, accounts for 32.16 % of variance (Fig. 2). Use of leaded gasoline in vehicle is not a practice now, but re-suspension of Pb loaded dust due to vehicle movement and atmospheric depositions may increase Pb concentration in the soil (Pandey et al. 2014). Higher Fe concentration may be ascribed to high levels of Fe present in loosely bound dust of coalfield and atmospheric depositions (Mahowald et al. 2005). High solubility of atmospheric Fe in the soil is reported (Chen et al. 2012). The sources of soil Cd via atmospheric deposition are associated with various human activities including oil and coal burning (Smolders and Mertens 2013). On the basis of the above-mentioned facts, PC2 indicates mixed sources from both atmospheric deposition and windblown dust. Dubey et al. (2012) and Pandey et al. (2014) also found high levels of Pb and Cd in PM10 due to re-suspension of dust and atmospheric deposition. PC3 (eigenvalue 1.96) has high loadings of Zn and Mn, accounting for 24.47 % of variance (Fig. 2). This component can be explained by the presence of Mn in this component. Zn and Mn have close association with organic matter (Zhang et al. 2009). Sulfide minerals, which are derived from mine drainage contribute to enrichment of the soil with Zn and Mn as a result of geochemical weathering (Bhuiyan et al. 2010). Considering the above reasons, the components loading of PC3 may have been derived from crustal sources and hence can be defined as a crustal component.

Factor scores (Supplementary data 5, ESM) representing coal mining activities/mine fires, windblown dust, and crustal sources were regressed against the standard normalized deviate (z) of the sum of trace elements (∑ TE). The mean contribution is 40 % by the coal mining activities/ mine fires, 23 % by windblown dust, 24 % by the crustal sources, while 13 % are not interpreted (Fig. 3a).

Fig. 3
figure 3

Percentage contributions of different sources responsible for soil contamination (a); contribution of each source for the elements included in the source apportionment modeling (b)

After identifying and quantifying the three major sources, we considered the impact of them for each element. Therefore, we calculated the contribution of each polluting source to the element concentration. Results in Fig. 3b showed that the first major pollution source, coal mining activities/mine fires, was the largest contributing source to Ni, Cu, and Cr, accounting for 96, 77, and 89 % of their concentrations. Crustal sources were the largest contributor to Mn and Zn, contributing to 84 and 92 % of these elements. The third major pollution source, windblown dust, was the largest contributor to Pb, Fe, and Cd, being responsible for 84, 53, and 17 % of their concentrations.

4.4 Two-way hierarchical cluster analysis (HCA) heat map

Heat maps are a well-organized method of imaging multipart data sets prepared as matrices and it executes two actions on a matrix. First, it records the rows and columns so that rows and columns with similar profiles are nearer to each other, causing these profiles to be more visible. Second, every entry in the data matrix is displayed as a color, forming it feasible to view the patterns (Key 2012). Clustering by means of Ward’s method reduces the total within-cluster variance. The HCA results for trace elements and soil physico-chemical characteristics (bottom cluster) are shown in Fig. 4 as a dendrogram. Figure 4 displays five clusters: (1) Pb–Cd–Fe; (2) Ec; (3) Ni–Cu–Cr; (4) Mn–Zn–TOC; (5) pH, in agreement with the correlation matrix and PCA results. HCA for sampling sites formed in the dendrogram (right-side cluster) distinguished the identical geochemical groups. The sampling points C1, C2, C3, C4, and C5 are clustered in group 1. Group 2 contains MA1, NA4, MA5, MB5, and MB2. The sampling sites MA4, MB3, MB1, MA2, MA3, and MB4 are included in group 3. Group 4 contains NB4, NB2, NB1, and NA1, while group 5 was formed by NB3, NA3, NA2, NA5, and NB5. The first group is formed by sites at non-mining areas and reflects similar behavior due to low levels of trace element contamination. It is clear from the heat map that concentrations of Ni, Cu, and Cr showed their minimum (blue color) at these sites, while pH value was maximum (near to red color) at C1, C2, C3, C4, and C5. Because of the highest number of blue spots (low Z score), these sites can be ranked least polluted. Sites MA1, NA4, MA5, MB5, and MB2 formed a group and showed a similar behavior. In this group, all four sites except NA4 are sites at coal mining area. NA4 showed similarity with MA1 mainly due to Cd and Cr (similar color pattern in heat map). Sites of group 3 (MA4, MB3, MB1, MA2, MA3, and MB4) were highly loaded with Mn and Zn. TOC at these sites also showed the highest values and highest number of red spots (high Z score) for these sites in heat map suggest them most polluted. Sites NB4, NB2, NB1, and NA1 showed higher load of Ni and Cr, while with no red and blue spots, NB3, NA3, NA2, NA5 and NB5 sites showed moderate concentrations of trace elements with respect to other groups. High concentrations of trace elements at any site depend on feasibility and availability of respective pollution sources.

Fig. 4
figure 4

Two-way hierarchical clustering of trace elements and different sites

5 Conclusions

Distinct spatial variations of trace elements in the soil were recorded around coal mining areas. High enrichment of trace elements was recorded near coal mining areas. EF value more than 1 suggests that Cd, Zn, and Pb enriched highly near coal mining areas in JCF mostly through anthropogenic sources. Coal mining activities/mine fires (40 %), crustal sources (24 %), and windblown dust (23 %) are the chief contributors of soil pollution in the area. Coal mining activities/mine fires are the main contributing source to soil Ni, Cu, and Cr, while crustal input was mainly represented by Mn and Zn and windblown dust by Pb, Fe, and Cd. HCA for sites reveals that high concentrations of trace elements at any site depend on feasibility and availability of respective pollution sources. The findings of this study may provide a comprehensive database for framing an appropriate strategy for necessary mitigating/preventive measures. This study intelligibly highlights the requirement of immediate control measures for the exceptionally serious trace element pollution in the study area and the soils in the area necessitate various remediation technologies to minimize the rate of contamination, and extent of future pollution problems.