1 Introduction

Arsenic contamination of groundwater is a huge hindrance in the provision of safe drinking water to millions of people in Asia and around the world (The World Bank 2005). Arsenic (As) is a hazardous element that can be detected in over 200 minerals in nature (Ravenscroft et al. 2009), and it can be discharged into groundwater under certain biogeochemical and hydrogeological scenarios (Guo et al. 2011; Ghosh, 2022). Because of its toxicity and the number of individuals exposed, arsenic is by far one of the most dangerous components of the environment (Hudson-Edwards et al. 2004) and is attributable to higher risks of illness and mortality throughout the world (Hopenhayn 2006). This problem has caused a wide range of concerns in terms of water quality and quantity over the last three decades (Mazumder et al. 2010). Arsenic is mostly obtained from eroded Himalayan sediments in India, and it is thought to permeate the solution as a result of reductive release from solid phases under anaerobic circumstances (Polizzotto et al. 2008; Rahman et al. 2014). And, within basins, arsenic migration from Himalayan-derived sediments will begin at the point in time and place where the aerobic–anaerobic transition occurs (Fendorf et al. 2010). However, there is uncertainty in the pathways, which is largely due to a lack of understanding of groundwater flow channels in the Ganges–Brahmaputra delta, which have been most affected by intensive irrigation pumping (Fendorf et al. 2010). Despite the fact that numerous writers have found arsenic-rich pyrite in sand samples from the Ganges delta in West Bengal (Chakraborti et al. 2001).

Although many research on arsenic-contaminated groundwaters used for household consumption have been conducted (Chakraborti et al. 2003), only a handful have created hazard assessment maps showing the geographical extent of groundwater arsenic for entire districts or states in India (Buragohain and Sarma 2012; Ghosh et al. 2019). It’s also frequently employed in environmental science, as evidenced by effective models in the atmosphere, terrestrial and subsurface, hydrological, biological, and ecological systems (Zhang et al. 2001). Logistic regression (Bretzler et al. 2017), Tyson polygons (Ghosh et al. 2019), ordinary Kriging (Sovann and Polya 2014), regression Kriging (Sovann and Polya 2014), and random forest models are some of the spatial geostatistical methods used to forecast the spread of groundwater pollutants (Podgorski et al. 2018; Wu et al. 2021). Groundwater can become contaminated from numerous types of human activities such as residential, municipal, commercial, industrial, and agricultural usage. Geostatistical analysis has been useful to determine water variables in space and time (Nas 2009). A huge potential exists to illustrate the real-world problem of metal contamination in groundwater using a geographic information system. Not only does GIS go beyond editing and interpreting geographical data; it also goes beyond the limits of a paper map, assisting planners in making smart decisions for the future (Al-Ramadan 2004).

The research area’s physiography is generally flat. There are a few sites with gentle slopes. The entire region is connected by a network of rivers and streams. All rivers flow north to south, following the overall slope of the ground surface (Mazumder et al. 2010). A few isolated wetlands have been found as sources of filler earth and alluvial soil, with Ganga alluvium being the parent material. A composition of sand, silt, and clay that can be used to make bricks or to fill up gaps. Local villagers primarily use these wetlands/deltas for agricultural purposes. The impacted area in West Bengal is part of the Ganga-Bramhaputra delta, which has deposits of various thicknesses. Arsenic could come from the coal fields, which would transport arsenic minerals from mine workings to the sediments. The arsenic-rich sediments transported from the Chotonagpur-Rajmahal highlands are thought to be the source of arsenic in groundwater in the lower Gangetic delt (Santra 2017). Some researchers argue that the maximal use of groundwater for irrigation is to blame for arsenic leaching in groundwater. Summer paddy cultivation increased in West Bengal’s seven districts during the 1980s, resulting in a significant transformation in the irrigation sector (Chowdhury et al. 2001). Boro farming is reliant on the usage of tube wells to obtain groundwater. Boro irrigation has a rapid effect on groundwater levels. The principal source of arsenic in the arsenic sulphites minerals formed with the clay in the contained environment is groundwater, which occurs primarily in the shallow zone (20–60 M). Groundwater arsenic has been subjected to geostatistical analysis and predictive models. This research also examines the spatial distributions of groundwater arsenic and compares and contrasts several deterministic and stochastic prediction methods to better understand the nature of arsenic geospatial distributions in aquifers.

2 Study Area

Karimpur Block I and II are situated in the extreme north of Nadia district of the Gangetic delta deposited by the holly River Ganga and its tributaries (Fig. 1). The region of Karimpur II is 224.38 km2. Elevation of Karimpur Block-1 is varied from19–22 m from mean sea level (MSL). Physiographically, the region is covered by alluvial plains, lying to the east of the Hooghly River. The wet season in Karimpur is humid, oppressive, and overcast, while the dry season is mild and largely clear. With an average low of 12.22 °C and a high of 23.89 °C, January is the coldest month of the year. The terrain is largely level within 10 miles (105 feet). Inside 50 miles, there are only minor elevation changes (331 feet). The area is a vast alluvial plain that stretches southwards from near the head of the delta created by the successive rivers through which the Ganges has dispersed itself over time. The entire region is a maze of dormant rivers and streams, but really the Bhagirathi, Jalangi, and Mathabhanga are the only three that have been identified as the :Nadia Rivers: for over a century. According to the 2011 Indian Census, Karimpur- I Block had a total population of 183,556, with 160.895 people living in rural areas and 5,867 in urban areas. According to the 2011 Indian Census, Karimpur-II Block had a total population of 217,136 people, all of whom lived in rural areas. Males made up 111,418 (51%) and females made up 105,648 (49%) of the population.

Fig. 1
figure 1

Location map of Karimpur block, Nadia district (West Bengal, India)

3 Materials and Methods

Village-wise arsenic contamination data were collected from the sub-district laboratory of Karimpur block. Arsenic contamination data was collected from 120 villages in the study area during the period between 2016 and 2019. The village boundary map of the study area was collected from the 2011 Census of India (District Census Handbook, Nadia 2011) and a digital database was generated on GIS platform. The village database was georeferenced to the Universal Transverse Mercator (UTM) coordinate system with World Geodetic System (WGS) 84 datum and North 45 zone. A polygon layer of village boundary was generated from the georeferenced raster data by applying the on-screen digitizing method. The arsenic contamination data for each village was integrated into GIS layer. Groundwater depth data was collected from the Central Groundwater Board (CGWB 2019) booklet. Groundwater depth data was collected from 55 different locations of the Karimpur block in pre-monsoon and post-monsoon season during the period between 2017 and 2019. Filed visits were conducted and the geographic location of each observation pints were recorded through Global Positioning System. Based on the collected geographic locations, a point layer along with attributes were generated into GIS platform. Finally, this information was integrated into the village database through a spatial join tool.

3.1 Areal Interpolation

Areal interpolation technique is used to downscale the data to predict the arsenic-affected villages within the study area (Krivoruchko et al. 2011). It extends kriging theory to data averaged or aggregated over arsenic-affected villages. To measure the average pollution levels for the Karimpur block, a smooth prediction surface for individual points is created from the source polygons, then the prediction surface is aggregated back to the target polygons. The prediction (or standard error) surface is produced for the value of the Gaussian variable at all villages. Predictions and standard errors have been calculated for all villages within and between the affected/non-affected villages, and predictions (along with standard errors) have been reaggregated back to a new set of affected/non-affected villages. The lag size is considered as 2950 m and keep the number of lags at 12. Moreover, most of the empirical covariances fall within the 90% confidence intervals (Anselin 1995). The Root Mean square error (RMSE) is calculated for the cross validation of the predictive model, and the value close to ‘1’ indicates higher accuracy.

3.2 Cluster-Outlier Analysis

Local Moran’s I index (I) statistics is used to identify spatial clusters of features with high or low arsenic affected villages (Anselin, 1995). For this analysis Euclidean distance method is used to calculate the distances between high and low arsenic-contaminated villages. The Moran’s I value is calculated as:

$$I_{i} = \frac{{x_{i} - \overline{X}}}{{S_{i}^{2} }}\sum\limits_{j = 1,j \ne i}^{n} {w_{i,j} } \left( {x_{j} - \overline{X}} \right)$$

where, \({x}_{i}\) is an attribute for feature ‘i’, \(\overline{X}\) is the mean of the corresponding attribute, \(w_{i,j}\) is the spatial weight between feature i and j, and

$$S_{i}^{2} = \frac{{\sum\nolimits_{j = 1,j \ne i}^{n} {w_{i,j} } \left( {x_{j} - \overline{X}} \right)^{2} }}{n - 1}$$

with ‘n’ equating to the total number of features.

For each statistically significant feature, the tool calculates a z-score, a pseudo p value, and a code reflecting the cluster type. The statistical significance of the obtained index values is represented by the z-scores and pseudo p values. In this research, permutations are not used and standard p value is calculated. A positive result for I implies that a village has common characteristics with other villages with equal rates of arsenic contamination; this region is part of a cluster. A negative value for I implies that a village has neighbours with arsenic-affected villages that are different; this aspect is an outlier. The p value for the villages must be low enough in both cases for the cluster or outlier to be statistically significant.

3.3 Identification of Arsenic Affected Village

The Getis-Ord \(G_{i}^{*}\) statistic is calculated by the Hotspot Analysis tool for each village in the Karimpur block. The Getis-Ord \(G_{i}^{*}\) statistic (Getis and Ord 1992), can be calculated as:


\(G_{i}^{*} = \frac{{\sum\nolimits_{j}^{n} {w_{i,j} } x_{j} - \overline{X}\sum\nolimits_{j = 1}^{n} {w_{i,j} } }}{{s\sqrt {\frac{{\left[ {\sum\nolimits_{j = 1}^{n} {w_{i,j}^{2} - \left( {\sum\nolimits_{j = 1}^{n} {w_{i,j} } } \right)^{2} } } \right]}}{n - 1}} }}\)

where, \({x}_{j}\) is the attribute value for contaminated village j, \({w}_{i,j}\) is the spatial weight between feature i, and j, n is equal to the total number of the features and:

$$\overline{X} = \frac{{\sum\nolimits_{j = 1}^{n} {x_{j} } }}{n}$$
$$S = \sqrt {\frac{{\sum\nolimits_{j = 1}^{n} {x_{j}^{2} } }}{n} - \left( {\overline{x}} \right)^{2} }$$

Since the \(G_{i}^{*}\) statistic is a z-score, no additional calculations are needed.

The z-scores and p value that result indicate were villages with high or low arsenic contamination cluster spatially (Scott and Warmerdam 2005). This tool examines every village inside a block of nearby communities. A village with a high level of contamination is intriguing, but it is unlikely to represent a statistically significant hotspot. A village must have a high arsenic pollution and be surrounded by other features with high contaminations to be considered a statistically significant hot spot. The z-scores and p value are statistical significance measures that indicate whether or not the null hypothesis should be rejected village by village. A village with a high z-score and a modest p value implies a spatial clustering of high values. A low p value and a low negative z-score imply a spatial clustering of low values. The more intensive the clustering, the higher (or lower) the z-score. A z-score of zero or less implies that there is no obvious geographical clustering.

3.4 Spatial Mapping of Groundwater

For calculating smooth surfaces of groundwater distribution in block radial basis function (RBF) was used in ArcGIS software (Rusu and Rusu 2006). Completely regularized spline is used to calculate the groundwater depth assigned to the points located in the moving window. For the optimization kernel parameter value 0.004873 is considered that gives the model with the lowest Root Mean Square error.

4 Results and Discussion

The Karimpur block I and II are entirely covered by a thick alluvial formation made up of various grades of sand, silt, and clay. The aquifers are largely made up of various grades of sand (coarse to fine). Gravel, which is the most essential component of aquifers in general, does not play a significant role in this area.It has been discovered that three aquifer systems have been found from the exploratory wells dug by the CGWB and the State departments. The shallow aquifer (Aquifer-I) has a depth range of 5 to 150 m. The next aquifer system (Aquifer-II) is found in the depth range of 150 to 200 m, while the deepest (Aquifer-III) is found in the depth range of 215–295 m (Table 1). Clay barriers of varying thicknesses separate these aquifer types. Karimpur blocks are classified as ‘Semi-critical’ based on groundwater resource calculations and pre- and post-monsoon water level trends. The average Storativity (estimated as 1.55 × 10–3) and Average Fluctuation of Water Level are computed as 0.84 m and 0.77 m in the Karimpur block-I and Block-II, respectively, to estimate the dynamic groundwater resources of semi-confined to confined aquifers in the research area.

Table 1 Groundwater characteristics of Karimpur-I and Karimpur-II block

Figure 2 shows the spatial distribution of arsenic concentration in the Karimpur block. A higher concentration of arsenic is found in the central and small pockets of the north-east corner of the Karimpur block. In most of the arsenic contamination area is occupied between 0.01 and 0.026 mg/l. The medium concentration of arsenic-contaminated area is found in the small patches of east and west of the block. The crosshair’s location has a predicted value of 0.007471325. This suggests that any arsenic-affected villages in that area have a 0.7 percent risk of being obese, according to the model [0.00035234 × Exponential (4328.9)]. The calculated Root-Mean-Square Standardized value is 1.3723.

Fig. 2
figure 2

Areal distribution of arsenic contamination in Karimpur block

The cluster/outlier type (COType) field differentiates between a statistically significant cluster of highly arsenic-contaminated villages (HH), a cluster of low arsenic affected villages (LL), an outlier in which a high arsenic affected villages is surrounded predominantly by low arsenic affected villages (HL), and an outlier in which a low arsenic affected village (LL) is accompanied principally by high arsenic affected villages (LH). The 95 percent confidence level is used to determine statistical significance (Fig. 3). In Karimpur block, two HH clusters (light red colour) are identified in Karimpur -II block and one small HH cluster is found in Karimpur-I block. The LH cluster (blue colour) is identified in the central part of the Karimpur block. A small LH cluster is identified in the south of Karimpur I block.

Fig. 3
figure 3

Cluster-outlier analysis of arsenic contamination

Figure 4 shows the spatial distribution of arsenic-contaminated hotspot and cold spot villages. Most of the hotspot villages are observed in the south and north-east part of Karimpur-II block. In Karimpur-I block, the arsenic-contaminated hotspot villages are found in the east of the study area. Coldspot villages are mainly observed in the south and south-west of the Karimpur-I block.

Fig. 4
figure 4

Hotspot and cold spot analysis of arsenic contamination

Groundwater contamination caused by naturally occurring high quantities of arsenic in deeper layers of groundwater is known as groundwater poisoning. The RMSE value of the RBF model is calculated as 1.0047 with a regression function of − 0.157743650101092 × x + 6.8688612263091. The pre-monsoon depth to water level (DTW) in the shallow aquifer (Aquifer I) in this area ranges between 4.71 and 6.1 m/bgl in the central and south-west parts of the Karimpur I and II block, and between 6.8 and 8.8 m/bgl in the west, north-west, and south-west parts of the Karimpur I and II block (Fig. 5), according to hydrograph station data.Pre-monsoon water table map of Aquifer-I reveals groundwater ‘mounds’ in the extreme north and small pockets of the west and south-west parts of the area covering parts of Karimpur I and Karimpur II blocks with a maximum elevation varying from 4.2 to 5.4 m/bgl(Table 2) and groundwater ‘troughs’ occurring in the central & southern parts covering parts of the Karimpur I and II blocks with maximum depression within 1.6–3.0 m/bgl (Fig. 6). The direction of groundwater flow varies depending on where you are. Long-term trend study showed a downward trend in Karimpur blocks throughout both the pre-monsoon and post-monsoon seasons. In the study area, the pre-monsoon decreasing tendency of water level is 0.9 cm/year, while the post-monsoon declining trend is 2.2 cm/year (CGWB 2019).

Fig. 5
figure 5

Spatial correlation between groundwater depth (in pre-monsoon) and arsenic contamination

Table 2 Summary of areal interpolation technique
Fig. 6
figure 6

Spatial correlation between groundwater depth (in post-monsoon) and arsenic contamination

In the study area, arsenic is the most common pollutant in shallow and, in some cases, intermediate Aquifer Groups; occasional arsenic in shallow aquifers exceeding the acceptable limit (0.01 mg/l) has been identified. The highest concentration of arsenic in ground water was found at Mahisbathan in Karimpur II block, at 1.18 mg/l. The spatial correlation study between arsenic contamination and groundwater shows, the highest arsenic contaminated villages are concentrated in shallow groundwater depth (less than 3.6 m/bgl) region in post-monsoon season. In pre-monsoon season, the maximum concentration of arsenic-contaminated villages is distributed in 6.1 m/bgl groundwater depth area (Table 3). The research area is primarily irrigated by groundwater and somewhat by surface water for intensive agriculture. Paddy and Rabi vegetables are two of the most important crops grown by farmers in the area. For the most part, farmers rely solely on groundwater to cultivate these crops throughout the year (Kumar et al. 2021). Any decrease in tube well yield owing to groundwater depletion will have a negative influence on food grain production. Groundwater exists as geographically widespread unconfined conditions in the upper aquifer system (Aquifer-I and II) within a depth of 150 mbgl in the studied area (with local variation). This aquifer system is rich in natural possibilities, stores freshwater, and serves irrigation, agriculture, and industry (Shrivastava et al. 2014). The groundwater level is slowly falling in most of the irrigated areas, as well as in some urban wells. As a result, competent aquifer management is required to ensure the continued operation of tube wells tapping Aquifer I and Aquifer II.

Table 3 Descriptive statistics of pre-monsoon and post-monsoon characteristics of Karimpur block

The block is in Semi-critical condition, according to the Ground Water Resource Assessment, with a Stage of Ground Water Development (SOD) of 116.54 percent. As a result, irrigation from an unconfined aquifer is not recommended. There is a less chance for growing up of small-scale industries (Das et al. 2021). According to the current situation, an arsenic-free aquifer should be tapped and properly sealed. Before the supply, an arsenic removal plant could be added. Arsenic levels in tube wells must be monitored on a regular basis in the field. In Arsenic-affected areas, artificial recharge is required to dilute Arsenic concentrations in unconfined aquifers for irrigation purposes, as well as to prevent Arsenic contamination in the food chain system.The rainfall recharge during monsoon season by the water table fluctuation method can be estimated with greater accuracy. The trend of the water table during pre- monsoon and post-monsoon intervals can be evaluated with greater accuracy.

5 Conclusion

In the present research work, spatial extent and magnitude of arsenic contamination is illustrated through GIS and geostatistical tool at micro-level. And an association between groundwater depth and arsenic contamination is described. Results of the research showed majority of the people living in arsenic-affected areas had poor socioeconomic status, were illiterate, and worked in agricultural farming or physical labour. The central and small areas of the Karimpur block’s north-east portion have the highest concentration of arsenic. The majority of the hotspot communities can be found in the Karimpur-II block’s south and north-east. In the Karimpur block, the geographical variability of groundwater arsenic distribution is severe at extremely shallow depths, with variability rising at shorter distances in alluvial aquifers. These local-scale patterns in arsenic distributions may be influenced by regional geologic-geomorphic properties of the Karimpur block, according to spatial patterns of arsenic concentrations in shallow wells. The current study focuses solely on the state of arsenic pollution in groundwater in the year in concern. Due to a lack of systematic data at the micro level, the current study is unable to describe in depth the region’s health concerns and other issues. More information on the geographical correlation and variability of arsenic concentrations in alluvial aquifers will be gathered using a detailed geostatistical investigation of arsenic distributions. More thorough statistical investigations should be carried out to identify other continuous and categorical factors that are strongly correlated with arsenic and may be added into geostatistical arsenic prediction models at both the local and regional levels.