Introduction

Groundwater (GW) is vital for life on earth, and its demand has increased tremendously due to increasing of population, agricultural, industrial, and domestic usages (FAO 2011; Wang et al. 2020; Su et al. 2020). Globally, it was reported that about 42%, 36%, and 27% of net GW withdrawal were consumed by agriculture, domestic, and industrial uses, respectively (Taylor et al. 2013; Ren et al. 2021). Owing to the changes projected in global climate and ever-increasing demand for GW, the urgent adoption of quantitative approaches is essential to assess GW availability and its demand (Chen et al. 2015). In India, nearly 90% of rural and 30% of urban population depend on GW in meeting their basic needs (Agarwal and Garg 2016). Further, the over utilization of GW sources is prevalent in the absence of adequate scientific plans and regulatory mechanisms (Rodell et al. 2009). In recent years, many states of India have experienced the rapid decline of GW level due to increased overexploitation (CWC and CGWB 2016). Many researchers conducted the studies on dynamics of GW and its impacts on agriculture in the changing climate scenarios (Song and Choi 2012) and its overall sustainability (Woo 2013). In order to formulate GW management policies, and priorities, the reliable datasets on groundwater potential (GWP) and its productivity are important for policymakers (Das 2017; Etikala et al. 2019). Geological characteristics of the region, it includes type, and porosity of rock formations that primarily determine the GW availability and occurrence (Reddy et al. 1994). The underlying lithological formations control the percolation, and the extent of GW recharge (Shaban et al. 2006). Many authors employed the traditional methods based geological, hydrogeological, and geophysical techniques in delineation of groundwater potential zones (GWPZs) (Chatterji et al. 1978; Rashid et al. 2011; Kumar et al. 2014). The conventional approaches followed in delineation of GWPZs are complex, uneconomical, time-consuming, and have one or other limitations (Jha et al. 2010).

Therefore, precisely identification of GWPZs and device appropriate management strategies by using advance tools, and techniques like remote sensing, Geographic Information System (GIS), and statistical models are important for sustainable development especially in agrarian countries like India. Globally, various techniques such as integration of remote sensing, and resistivity data in GIS (Selvarani et al. 2016), influence factor (Selvam et al. 2015; Magesh et al. 2012), statistical methods (Falah et al. 2017), analytical hierarchy process (Dar et al. 2020; Saranya and Saravanan 2020) and GW modelling (Sashikkumar et al. 2017) were used in delineation of GWPZs. Clark and Fritz (1997) reported the use of isotopes of hydrogen (2H and 3H) and oxygen (18O) in GW studies and assessment of its dynamics at basin level. In India, stable isotopes studies were also conducted in the GW recharge studies (Shivanna et al. 2004; Saha et al. 2013). Lee and Lee (2011) developed decision tree model to study the vulnerability to GW and changes happening in environmental parameters like temperature and rainfall patterns. However, the techniques of satellite remote sensing, and GIS have been adopted widely in identification, and mapping of GWPZs through integration of thematic database on topography, soils, lithology, drainage patterns, and lineaments (Nayak et al. 2017; Mokadem et al. 2018; Şener et al. 2018). In the recent years, probabilistic models were also being used in modelling and mapping of GWPZs through ‘multi-criteria decision analysis’ and modelling of weights-of-evidence (Mousavi et al. 2017; Sahoo et al. 2017). The machine-learning models like decision tree algorithms, fuzzy logic, and numerical modelling were also widely used in modelling of GWP and reported better results than the conventional methods (Barzegar et al. 2018; Golkarian et al. 2018). Rana et al. (2018) and Sharma et al. (2020) performed Hierarchical Cluster Analysis (HCA) to identify the GW pollution zones in Himachal Pradesh, India. Many studies have been conducted using remotely sensed datasets, geophysical surveys, and GIS in mapping of GWPZs (Pradhan 2009; Bera and Bandyopadhyay 2012; Magaia et al. 2018).

The GIS-based advance models like Frequency Ratio (FR) model (Naghibi et al. (2014), and Index of Entropy (IoE) (Al-Abadi and Shahid 2015), were sucessfully demonstrated in modelling, and assessment of GWPZs. By using FR model, the relationship between the independent, and dependent variables was assessed through adopting ‘multi-layer integration of thematic maps’ in GIS (Oh et al. 2011). The FR and logistic regression models provide flexible techniques in identification of GWPZs (Ozdemir 2011). Recently, FR model was tested in observation of the statistical trends in large datasets (Ding et al. 2017; Hong et al. 2017; Siahkamari et al. 2018) including identification of GWPZs (Mousavi et al. 2017; Sahoo et al. 2017). The GW entropy explains the effect of various GW controlling factors in determining its potential. It could be effectively used in computation of index system weights (Jaafari et al. 2014), demarcating the GWPZs, and their yields (Naghibi et al. 2014; Hou et al. 2017; Khoshtinat et al. 2019). Since GW is the main source for irrigation, domestic, and industrial use in semi-arid regions, accurate assessment and identification of GWPZs using advance techniques assume a greater importance. So far, the literature on integration of high-resolution satellite data, GIS, and bivariate statistical models (BSM) is limited in GWPZs mapping especially in hard rock terrains. Therefore, it is imperative to delineate the GWPZs by using advance remote sensing, GIS, and BSM for exploration and increase the water availability and control its scarcity in hard rock terrains of semi-arid regions. Hence, the core objective of the study is to model and delineate the GWPZs by using high resolution Sentinel-2 data and GIS by applying BSM of FR, and IoE models in Sarabanga Watershed (SBW) of Salem district, Tamil Nadu (TN) state, southern India. The comparative, and sensitivity analysis of BSM were also performed to assess the degree of influence of input parameters in precisely identification of GWPZs.

Study area

The SBW lies in southwest (SW) part of Salem district, TN state of southern India, and covers five tehsils, namely Omalur, Mettur, Edappadi, Yercaud and Sankari. It extends between 11° 29′ 27.72″ and 11° 56′ 5.15″ in northern latitudes and 77° 44′ 9.73″ to 78° 13′ 39.2″ in eastern longitudes and covers about 1175.3 sq. km (Fig. 1). Sarabanga river originates on the western slope of Shevaroy hills at 1630 m of above mean sea level (MSL) and flows through Omalur, Tharamangalam, Edappadi, and Thevur tehsils before joining the Cauvery river nearby by Bhavani town. The elevation ranges from 154 to 1641 m above MSL, and it rises from SW to northeast (NE). The SBW receives 800 to 1600 mm mean annual rainfall (MAR) during SW monsoon season, and it progressively increased towards the northern, north-eastern and eastern sections of the SBW with maximum at Yercaud hill region (1594.3 mm) (Arulmozhi and Arulraj 2017). Geologically, hornblende-biotite gneisses, charnockite, and granites represent hard consolidated Archaean crystalline rocks, which are the prominent formations of SBW. These formations are associated with recent alluvial, and colluvial deposits along the river channels, and foothills. However, Quaternary alluvium is the prominent geological formation along the main course of Sarabanga river. The fractured crystalline rocks, and the recent colluvial deposits are the prominent aquifer systems in SBW with weathered thickness zone ranges from 1 to 25 m. GW occurrence mainly confined to weathered mantle in the fractured zones. These rocks are porous, and permeable, which are associated with secondary openings of fractures. The thickness of aquifer systems in SBW vary from 15 to 60m. In the toposequence of SBW, recharge mainly takes place in confined aquifers in the uplands, whereas, recharge-discharge together occur in the unconfined to semi-confined aquifer of midland areas. However, discharge takes place predominantly along the unconfined aquifers of the lowland areas. Major agriculture crops grown are paddy, groundnut, fodder, and sugarcane.

Fig. 1
figure 1

Location map of the study area

Materials and methods

Datasets used

Sentinel-2A multi-spectral instrument (MSI) data (10m) of 3rd March 2018 (tiles of T43PHP, T43PGN and T43PHN) from Copernicus Open Access Hub, and Digital Elevation Model (DEM) (12.5m) from ‘Advanced Land Observing Satellite (ALOS) PALSAR’ were acquired. The lineament data at 1:50,000 scale was derived from ISRO Bhuvan portal. The data on geology, and geomorphology at 1:250,000 scale was derived from the Bhukosh portal of ‘Geological Survey of India (GSI)’ for the watershed area and revised by using high-resolution Sentinel-2A and ALOS DEM datasets. The rainfall data of 30 years (1989 to 2018), acquired from the ‘State Surface and Groundwater Resources Data Centre (SSGRDC)’, Chennai, for 14 rain-gauge stations spread across the watershed. The data on soil parameters at 1:50,000 scale was acquired for the watershed from the ‘Tamil Nadu Agricultural University (TNAU)’. The input parameters used in the spatial modelling of GWPZs are depicted in Table 1.

Table 1 Various input datasets and their characteristics

Georeferenced well inventory data

GIS-based well locations mapping was conducted to establish the relationship among the geological, geomorphological, and  GW characteristics of wells to validate the identified GWPZs (Haghizadeh et al. 2017). Firstly, field survey was conducted to identify, and map the 135 well locations distributed throughout the watershed under different geological formations by using hand-held ‘Global Positioning System (GPS)’ (Fig. 1). Out of 135 well locations, 94 wells (70%) and 41 wells (30%) were arbitrarily grouped as calibration, and validation datasets, respectively. The ‘calibration dataset’ was used in spatial modelling and delineation of GWPZs, whereas the ‘validation dataset’ was used to validate the GWPZs, and accuracy assessment of models adopted. The calibrated dataset of well locations was superimposed on all contributing factor maps in GIS to understand their inter-relationship.

Computation of groundwater conditioning factors

GWPZs mapping can be carried out by investigating the controlling factors on GW storage and occurrence (Tolche 2020). GW occurrence is determined by many parameters like rainfall, elevation, geology, geomorphology, slope, drainage density (Dd), lineament density (Ld), type of soil, land use/land cover (LU/LC), recharge potential, aquifer transmissivity, and anthropogenic activities (Paul et al. 2020). However, in the current investigation, eight major contributing factors, which includes geology, geomorphology, Dd, Ld, slope, soil texture, rainfall, and LU/LC were used for the identification of GWPZs. The thematic raster database for the input factors were developed in ArcGIS at 10m × 10m grid cell by using ‘polygon to raster conversion tool’ for their integrated analysis and mapping of GWPZs by adopting FR and IoE models.

The distinct LU/LC classess  of the SBW was generated through digital interpretation of Sentinel-2A data by using ‘supervised classification algorithm’ in ArcGIS. The watershed boundary, and drainage network were generated through analysis of high-resolution ALOS DEM (12.5m) by using ‘ArcHydro’ tool in ArcGIS to compute ‘Dd’. The slope classes were computed from ALOS DEM by using ‘slope  tool’ in ArcGIS. With the help of derived lineaments, the ‘Ld’ was computed to define its density in the given unit area. The ‘Inverse Distance Weighted (IDW)’ interpolation technique was used to develop the rainfall raster by considering the 30 years rainfall data (1989 to 2018) of 14 rain-gauge stations. The soil texture map was developed in GIS by using the legacy soil data at 1:50,000 scale and the same used in the spatial modelling.

Building of BSMs

FR model was adopted to establish the probabilistic correlation between dependent (groundwater), and independent (conditioning) factors (Oh et al. 2011; Naghibi et al. 2016). The model has robustness in execution and understands the derived results (Khosravi et al. 2018; Khoshtinat et al. 2019). FR model was adopted to determine the effect of individual conditioning factors to identify the GW occurrence. FR model was expressed as follows;

$$ FR=\frac{W/ TW}{CP/ TP} $$
(1)

where W explains the ‘total pixels of wells’ for the given class of input; TW denotes the ‘total pixels of well’, whereas, CP explains the ‘total pixels’ in the given thematic class and TP indicates the ‘total pixels’. To find out the GWP, the FR value of ‘each class in the given parameter’ was taken as the weight of the given class. Finally, the GWPZs were computed as (Jaafari et al. 2014; Naghibi et al. 2014):

$$ GWPZ={\sum}_{i=1}^n{FR}_i $$
(2)

where FRi denotes the ‘frequency ratio of a factor’ and n explains the ‘total number of input factors’. The spatial association between the contributing factors of GW and the occurrence of GWPZs was analysed by using the FR model (Table 3). The FR values were determined by using Eq. (1), where the average of the proportion of the ‘area of wells’ to ‘the entire watershed’ is assumed as 1 (Moghaddam et al. 2015). If the FR is larger than 1, it indicates high correlation, and if it is <1, it shows the low correlation (Oh et al. 2011).

In IoE model, the entropy explains the ‘degree of uncertainty’ in the given random variable (Ihara 1993). The entropy of model shows the ‘degree of variability’ and unreliability of a model (Yufeng and Fengxiang 2009). The following mathematical expressions were used to determine the coefficient of information (Jaafari et al. 2014):

$$ {P}_{ij}= FR=\frac{b}{a} $$
(3)
$$ \left({P}_{ij}\right)=\frac{P_{ij}}{\sum \limits_{j=1}^{sj}{P}_{ij}} $$
(4)
$$ {H}_j=-{\sum}_{i=1}^{Sj}\left({P}_{ij}\right){\log}_2\left({P}_{ij}\right),\kern1em j=1,\dots .,n $$
(5)
$$ {H}_{jmax}={\log}_2\ Sj $$
(6)
$$ {I}_j=\frac{H_{jmax}-{H}_j}{H_{jmax}},\kern1em I=\left(0,1\right),\kern0.75em j=1,\dots .,n $$
(7)
$$ {w}_j={I}_j{P}_{ij} $$
(8)

where (Pij) denotes the ‘probability density’, whereas, Hj and Hjmax explains the ‘entropy values’, Sj indicates the ‘total number of classes’, Ij is the ‘information coefficient’ and wj is the ‘resultant weight’ of the factor. The values of wj range between 0 and 1. The derived GWPZs were assessed by using Eq. 9:

$$ GWPZ={\sum}_{i=1}^n{FR}_i\times {w}_j $$
(9)

where FRi denotes the ‘frequency ratio of a factor’ and wj is the ‘resultant weight’ of the factor as a whole.

Validation and sensitivity analysis of BSMs

In any predictive modelling, assessment of its performance, validation and uncertainty is essential (Dar et al. 2020). Many researchers were used ‘Area Under the Curve-Receiver Operating Characteristic’ (AUC-ROC) to assess the accuracy of delineated GWPZs (Nhu et al. 2020; Naghibi et al. 2016; Rahmati et al. 2015). FR, and IoE models were validated by adopting AUC-ROC to determine the degree of the existence or non-existence of GW (Rahmati et al. 2016). To evaluate the influence of contributing factors, and sensitivity of models, a performed Variable Importance Analysis (VIA) was performed. It represents the statistical significance of each contributing factor with respect to its effect on the generated model (Wei et al. 2015). The regression technique of random forest model was used to measure the variable importance of outputs of FR, and IoE models and explain the importance of each of the input variables. The detailed methodology followed is shown in Fig. 2.

Fig. 2
figure 2

Methodology followed in the study

Results and discussion

The analysis shows that SBW is predominantly underlined by hornblende-biotite gneisses (64.4%), charnockite (23.2%) in northern regions of the watershed and granites (7.6%) in Pudupalayam village as isolated outcrops surrounded by charnockite (Table 2 and Fig. 3a). The geomorphologic features are base to understand the structural features, parent material, and lithological formations in determining the GWP especially in hard rock terrains (Rao et al. 2000). Analysing the nature, and extend of geomorphologic units immensely helped in delineating the GWPZs (Arulbalaji et al. 2019). In the watershed, seven distinct geomorphologic units namely structural hills (0.54%), dissected hills (10.27%), residual hills (1.15), linear ridges (0.7%), pediments (15.8%), pediplain (60.5%), and upper valley (8.8%) were identified (Table 2 and Fig. 3b). The structural hills, dissected hills, and residual hills are developed in north, south, and at places in eastern parts of the watershed, which are normally unfavourable for GW occurrence (Kumar et al. 2020). The ‘moderately weathered pediplain’ with medium to coarse gravel are quite common in the watershed. The pediments are distinguished by moderately steep topography spreading below the hilly regions associated with low permeability and GWP (Rajaveni et al. 2015). The linear ridges associated with barren lands noticed in the eastern regions of Sarabanga river. However, the isolated ‘residual hills’ with steep slopes and rounded in nature were noticed in the northern parts of watershed.

Table 2 FR and IoE model values of groundwater conditioning factors
Fig. 3
figure 3

Contributing parameters in modelling of GWPZs: (a) Geology, (b) Geomorphology, (c) Drainage density (Dd), (d) Lineament density (Ld), (e) Slope, (f) Soil texture, (g) Rainfall, and (h) LU/LC

‘Dd’ explains the proximity of the drainage channels in the given unit area (Horton 1932). It has direct relationship with the permeability (Agarwal et al. 2013; Kanagaraj et al. 2019), slope, geomorphological, and LU/LC conditions of the terrain (Paul et al. 2020); hence, it was considered as an important parameter in identification of GWPZs. GIS-based techniques are being widely applied in delineation of drainage channels and computation of ‘Dd’ in hydrological models (Reddy et al. 2018). The lower ‘Dd’ indicates the porous soil, thick vegetation cover and low relief, whereas, the higher ‘Dd’ implies the opposite scenario (Reddy et al. 2002). The ‘Dd’ of watershed was categorized as very low (<1.5 km/km2), low (1.5 to 2.5 km/km2), moderate (2.5 to 3.5 km/km2), high (3.5 to 4.5 km/km2), and very high (>4.5 km/km2) (Table 2 and Fig. 3c). Lineaments originate through the structural/tectonic process with secondary porosity have significant contribution in GW accumulation (Suganthi et al. 2013) and exploration (Şener et al. 2018). ‘Ld’ in the SBW ranges from 0 to 3.8 km/km2 and further classified into five sub-categories, i.e. 0 to 1 km/km2, 1 to 1.5 km/km2, 1.5 to 2 km/km2, 2 to 2.5 km/km2 and >2.5 km/km2, respectively (Table 2 and Fig. 3d).

As slope significantly influences the surface runoff, it was considered an important criterion in identification of GWP (Patra et al. 2018). As per the slope criteria defined by Singh et al. (2016), nine slope classes, i.e. 0 to 1% (level to nearly level), 1 to 3% (very gently sloping), 3 to 5% (gently sloping), 5 to 10% (moderately sloping), 10 to 15% (moderately steeply sloping), 15 to 25% (steeply sloping), 25 to 33% (very steeply sloping), 33 to 55% (strongly sloping), and above 50% (very strongly sloping), were identified. Very gentle (31.9%) to gentle slopes (25.9%) are the dominant classes in SBW (Table 2 and Fig. 3e). Soil texture determines water infiltration that affects the recharge, occurrence, and circulation of GW (Das 2017). Hence, the soil texture was considered as an important contributing factor in estimating infiltration rates (Lee and Lee 2015) and identification of GWPZs. In SBW, eight distinct soil textural classes, i.e. sandy clay loam, sandy loam, loamy sand, clay loam, sandy clay, clay, silty clay, and loam, were identified (Table 2 and Fig. 3f). Sandy clay loam soils are dominant soil textural classes with an area of 448.4 sq.km (38.1%), whereas, sandy loam and sandy clay soils covered 315.5 (26.8%) and 117.6 (10%) sq.km, respectively.

The amount, intensity, and distribution of rainfall play an important role in GW recharge and determine the GWP depending upon the hydrogeological characteristics of the terrain (Shi et al. 2016). Since, rainfall varies with time and space, it is essential to ascertain the role of rainfall in identification of GWPZs (Patra et al. 2018). The SBW receives 800 to 1600 mm Mean Annual Rainfall (MAR), and the maximum amount of rainfall was received during the SW monsoon season. It was reclassified into six classes, i.e. < 350 mm/year, 350 to 450 mm/year, 450 to 550 mm/year, 550 to 650 mm/year, 650 to 750 mm/year, and > 750 mm/year (Table 2 and Fig. 3g). The LU/LC pattern influences the GWP, and infiltration rate of water through soil (Singha et al. 2019). In the watershed, nine LU/LC classes, namely single crop, double-crop, current fallows, degraded forest, deciduous forest, scrublands, wastelands, settlements, and waterbodies were identified through interpretation of Sentinel-2A satellite data. The single crop is predominantly covered with 410.8 sq.km (35%), while forest land covered approximately 147.9 sq.km (12.6%) of the area (Table 2 and Fig. 3h).

Assessment of GWPZs using FR model

The analysis shows that the classes of LU/LC layer like double crop, single crop, and settlements classes have high FR values of 1.26, 1.2 and 1.8, respectively, which indicates the favourability of these classes for occurrence of GW. In case of Dd, the classes such as 1.5 to 2.5 km/km2 and 3.5 to 4.5 km/km2 have higher FR values of 1.5 and 1.3, respectively, and it indicates their higher influence on GW occurrence. The terrain with slope <1 % shows the highest value of FR (1.8), whereas the terrain with 1 to 3 % slope shows the FR value of 0.9 and 3 to 5% shows 1.3. However, the lowest value of FR of 0.7 was found with slope class of 5 to 10%. In SBW, it was noticed that as slope gradient increases, the FR value decreases. The Ld between 2 and 2.5 km/km2 shows the FR value of 0.8, and it indicates very less probability of GW occurrence, whereas the Ld between 1 and 1.5 km/km2 shows the ratio of 1.4 and it indicates a high probability. In geomorphology layer, the highest FR values were found in the pediplain (1.3); similarly, the next highest values were noticed in the upper valley (1.1), and pediments (0.9), which indicate the relationship of geomorphologic structures and the GW occurrence. As far as geology is concern, the higher FR values are allied with amphibolite, dunite, granite, and fissile hornblende-biotite gneisses with FR equal to 8.9, 2.3, 1.3 and 1.2, respectively; it shows their higher potential for occurrence of GW. The rest of the geological classes with zero FR values denote the low possibility of GW occurrence. The FR values are high for clay loam (2.2), clay (1.3), and loamy sand (1.2) classes and relatively low for the remaining texture classes. The higher FR values indicate the higher infiltration rates of the textural classes. When the infiltration rate accelerates the GW, it contributes the higher recharge conditions. In SBW, the relationship between the rainfall and GWP shows 350 to 450 mm/year (2.0), 450 to 550 mm/year (1.6) and <350 mm (1.3); these positive FR values explains the local topographic conditions. The areas with high rainfall discharges associated with Yercaud hills with steep slopes, and sparse forest cover reduce the likelihood of water infiltration. The final GWPZs were delineated by applying FR model (Eq. 2) with the hypothesis that all the input GW variables have uniform influence on the occurrence of GWP (Fig. 4a). The obtained GWPZs values are in range of 3.91 to 67.29, which were classified based on mean and standard deviation (SD) classification scheme into five classes: very poor (< 13.1), poor (13.05 to 21.64), good (21.64 to 30.23), very good (30.23 to 38.82), and excellent (> 38.82). The very poor to poor classes encompass an area of 15.41 % (181.17 km2); the good and very good classes extend over an area of 75.23 % (884.18 km2), while the area under excellent class occupies only 9.36 % (109.95 km2) (Table 3).

Fig. 4
figure 4

GWPZs derived from a frequency ratio (FR) and b index of entropy (IoE) models

Table 3 Area under different GWPZ’s derived from FR and IoE models

Assessment of GWPZs using IoE model

The results attained on application of IoE model in assessment of GWPZs are shown in Table 3. The computed weights for geomorphology, geology, LU/LC, Dd, slope, and Ld are 0.11, 0.12, 0.13, 0.13, 0.12, and 0.13, respectively. Similarly, for soil texture, the weight is 0.14, and for the rainfall parameter, the weight is 0.12. The final GWPZs map developed by using IoE model was shown in Eq. 9. The obtained GWPZs values are ranges from 3.49 to 27.47. Based on mean and SD, they were classified into five classes, i.e. very poor (< 8.67), poor (8.67 to 12.23), good (12.23 to 15.89), very good (15.89 to 19.35), and excellent (> 19.35) as shown in Fig. 4b. The very poor to poor classes encompass an area of 15.55 % (182.71 km2); the good and very good classes extend over an area of 75.08 % (882.47 km2), while the excellent class occupies 9.37 % (110.12 km2). The very good, and excellent classes are concentrated mainly in SE part of the watershed (Table 3).

Validation of FR and IoE models

The critical stage of any modelling is validation, in the absence of this, models depict less scientific value (Nampak et al. 2014). The AUC-ROC was adopted to validate the input models prediction rate (Pourghasemi et al. 2012; Tehrany et al. 2017). The AUC-ROC values were estimated through comparison of GW depth of wells collected through field surveys with GWPZs derived from FR and IoE models. The quantitative inter-relation lies among the model predictions, and AUC-ROC were grouped into five categories: excellent (0.9–1), very good (0.8–0.9), good (0.7–0.8), average (0.6–0.7), and poor (0.5-0.6). The AUC-ROC values were estimated for FR and IoE models by using GW depth of 41 validation wells, and it shows 0.7313 and 0.7084, respectively (Fig. 5). The accuracy assessment of the models indicates that the adopted two models yield good estimation; however, FR model depicted relatively higher estimation than the IoE model. The study clearly demonstrates that BSM can be effectively applied as a realistic and simple method in modelling and evaluating the GWP. The strengths of BSM are simple in implementation and provide reasonable accuracy in spatial prediction, with the potential to distinguish the input factors or combinations of input factors in the evaluation of GWP (Khoshtinat et al. 2019).

Fig. 5
figure 5

Validation of FR and IoE models using AUC-ROC

Sensitivity analysis

Sensitivity analysis performed through VIA shows that out of the eight input contributing factors, the geology, slope, and rainfall are the three important factors, which significantly contribute in identification of GWPZs through FR model. In other words, the biotite genesis, and dunite-pyroxenite formations with slope less 3% and rainfall ranging from 550 to 750mm influence more in identification of GWPZs (Fig. 6a). In IoE model, drainage density, slope and rainfall found to be three important influencing factors, in identification of GWPZs (Fig. 6b). The low land regions with slope less 3% and rainfall ranging from 550 to 750mm was found to be more contributing in identification of GWPZs. Besides geology, slope, rainfall and Dd, the other contributing factors like geomorphology, and soil texture especially the areas under pediplain associted with sandy loam, and sandy clay have their own contribution in GW recharge, and identification of GWPZs.

Fig. 6
figure 6

Rank of variables by their importance. a frequency ratio (FR) and b index of entropy (IoE) models

Conclusions

In the study, geology, geomorphology, Dd, Ld, slope, soil texture, rainfall, and LU/LC were considered as major contributing factors in identification of GWPZs through GIS and BSM. The GWPZs were determined by adopting FR, and IoE models and established the relationships among the input GW conditioning factors. The study shows that both FR, and IoE models exhibited very good and good performance in modelling, and assessment of GWP, respectively, and explained the most critical classes in each conditioning factor. In comparison with IoE model, the FR model not only has better performance but also has simple procedure in its computation. The validation results confirm that the FR, and IoE models are ideal for assessing GWP, simulating the relationships between the GW occurrence and GW conditioning factors. The results obtained are immensely useful in systematic evaluation, development of GW exploration and environmental strategic plans by the government agencies and policymakers. Sensitivity analysis was carried out through VIA, and it indicates that in both FR, and IoE models, geology, slope, rainfall, and Dd  found to be important influencing factors in identification of GWPZs. The study clearly demonstrates the potential of high-resolution Sentinel-2 data, and robustness of BSM in obtaining the accurate, reliable, and cost-effective results for effective GIS-based spatial modelling of GWPZs in hard rock terrain of semi-arid ecosystems.