Introduction

Groundwater constitutes as the largest freshwater source in arid and semi-arid regions across the globe. The importance of the groundwater is further increased if the underlying aquifers consist of the hard-rock subsurface formations. The groundwater potential in such hard-rock aquifers of semi-arid regions remains restricted to shallow weathered and fractured zones (Machiwal and Jha 2014).

Groundwater has been an important source of water supply in India, accounting 50–80 % of domestic water use and 45–50 % of irrigation (Kumar et al. 2005; Mall et al. 2006). Presently, overexploitation of this vital resource along with other hazardous factors such as severe drought occurrences, high temperatures, relative low rainfall, etc., has caused groundwater lowering in several parts of the country (CGWB 2011). The hard-rock aquifers of Ahar River catchment, Udaipur, India are no exception and suffering from groundwater depletion problems (Machiwal et al. 2012). There are no systematic guidelines for exploring new sites having good groundwater potential for drilling new productive wells in the area. Studies exploring relationship of productive wells with features affecting the groundwater potential are absent, and hence, the sites for groundwater supplies are still selected based on conventional field methods of known water yielding sites. Therefore, there is need to identify suitability of the areas having appropriate potential for future groundwater supplies. The identification of good groundwater potential areas will also reduce huge cost incurred in exploring such sites in the hard-rock areas.

Modern technologies such as remote sensing (RS) and geographical information system (GIS) have been proved to be useful tools for delineating the groundwater potential zones (Chowdhury et al. 2009, 2010; Jha et al. 2010; Singh et al. 2011; Adiat et al. 2012; Lee et al. 2012; Amer et al. 2013; Deepika et al. 2013; Nag and Ghosh 2013). The RS technique offers a powerful and cost-effective tool for obtaining spatio-temporal information of large areas in a short time while the GIS provides an excellent framework for handling large and complex spatial datasets. There are two fundamental approaches regarding GIS-based analysis of the groundwater potential: knowledge-driven and data-driven methods (Bonham-Carter 1994). The knowledge-driven approach is a subjective technique where model functions are controlled by rules developed according to the information and understanding about the model. In the data-driven method, the governing rules of the model are computed through empirical analysis of the data. In the literature, the knowledge-driven or parametric approach has been widely used for RS and GIS-based assessment of the groundwater potential in the hard-rock areas (Krishnamurthy et al. 1996; Lachassagne et al. 2001; Solomon and Quiel 2006; Subba Rao 2006; Madrucci et al. 2008; Machiwal et al. 2011a; Pothiraj and Rajagopalan 2013; Singh et al. 2013; Shekhar and Pandey 2014). However, there are scanty studies where the data-driven or non-parametric approach is employed in the hard-rock areas (Sander 1997; Moore et al. 2002; Ettazarini 2007; Nampak et al. 2014). The data-driven techniques impose relatively less assumptions about the model functions, and therefore, are more flexible than the usual knowledge-driven techniques in groundwater potential zoning. The literature review also reveals that the use of multi-criteria decision making (MCDM) technique is limited in groundwater potential studies to select proper weights for different thematic layers and their features in the hard-rock areas (Machiwal et al. 2011a). Also, multivariate statistical analysis techniques such as principal component analysis (PCA) have sought fewer applications in groundwater potential studies despite of usefulness of the PCA technique in RS and GIS-based groundwater potential assessment (Machiwal et al. 2011a). Therefore, the present study is undertaken for delineating groundwater potential zones in the hard-rock terrain of Ahar River catchment, Rajasthan, India with novelty of integrating for the first time both the knowledge-driven and data-driven approaches by employing RS, GIS and MCDM techniques. The aim of the integrated parametric and non-parametric approaches is to first generating initial estimates of groundwater potential index (GPI) based on knowledge-driven MCDM approach. Later on, the initial GPI values are utilized to compute precise GPI values based on data-driven multiple linear regression (MLR) approach. The major advantage of using MCDM technique is removal of subjectivity/bias from the knowledge-driven approach (Machiwal et al. 2011a) while use of MLR technique helps learning so far unknown mappings (or dependencies) between a model’s input and output from the available data (Mitchell 1977). Results of both the approaches are verified from well yields of 18 sites and by graphical and statistical performance indicators. Moreover, the PCA is employed to determine the major exogenous and endogenous factors affecting the groundwater potential in the delineated potential zones.

Materials and methods

Study area and hydrogeologic settings

The Ahar River catchment (study area) is located in Udaipur district, Rajasthan, India (Fig. 1) between 73°36′51″ and 73°49′46″E, and 24°28′49″ and 24°42′56″N. The study catchment, partly covering Girwa and Badgaon blocks of Udaipur district, encompasses an area of 348 km2. The catchment consists of approximately a continuous girdle of hills and is characterized by sub-tropical and sub-humid to semi-arid climatic conditions. The general topographic slope of the area is from the northwest to the southeast direction. The surface water resources are largely available in the catchment in the form of rivers and lakes. Ahar, being the main river of the area, drains the entire catchment. The Ahar River enters the catchment from the northwest direction and flows toward the southeast up to Udaisagar lake. The other two major rivers of the catchment are Kotra and Amarjok rivers. All three rivers are seasonal rivers and therefore, the perennial flow is lacking in the area. The lakes existing in the study area are Pichhola, Fatehsagar, Udaisagar, Goverdhansagar, Lakhawali, and Roopsagar (Fig. 1). Surface spread area of the three lakes, i.e., Fatehsagar, Pichhola and Udaisagar, is relatively large in size among the six lakes. All the lakes are man-made and their storage volume is mostly filled up by the rainwater harvested from their respective catchments in the form of runoff water. The water level of the lakes fluctuates greatly, and often, the lakes dry up entirely during drought years. The mean annual rainfall is 60.90 cm, about 90 % of which is experienced during the rainy season (June–October).

Fig. 1
figure 1

Location map of the study area along with groundwater monitoring sites

The geology of the hard-rock catchment is represented by Aravalli and post-Aravalli systems consisting of the gneiss, schist, phyllite-schist and the combination of these rock formations. The upper weathered strata of the hard rocks, characterized as aquifer, contain the shallow depths of groundwater mainly under unconfined conditions (Machiwal et al. 2011b). The weathered strata have very little primary porosity, and therefore, groundwater mainly moves through secondary openings created by joints and fractures. Thus, weathered strata and high fracturing density allow large potential for the groundwater occurrences in the hard-rock aquifers. The main groundwater extraction means are hand pumps, dug wells, tube wells and stepwells. Among the existing groundwater-extracting mechanisms in the area, hand pumps account for 29.35 %, dug wells for 68.52 %, tube wells for 1.62 %, and stepwells for 0.51 % (Singh 2002). The well density, i.e., number of groundwater-extracting wells per unit km2 of the land area, is relatively high in the southeast portion of the catchment, while the lower densities of the wells are noticed in the northeast and central parts (Singh 2002). The higher number of production wells with respect to land area and population coincides together and is found in and around industrial area located in the southeast portion of the area.

Database preparation using RS and GIS

This study is based on the integration of following data into geographical information system for preparing spatial database.

  1. (a)

    Geodetic toposheets (45H9, 45H10, 45H11, 45H14, and 45H15) of Survey of India, Dehradun, India at 1:50,000 scale to extract extent and geomorphological features of the catchment. The catchment boundary was delineated in GIS using Universal Transverse Mercator (UTM) projection system with Ellipsoid as Everest India 1956 and Datum as Indian (India Nepal). The GIS analysis, processes and operations were performed using Integrated Land and Water Information System (ILWIS 2001) software (version 3.2).

  2. (b)

    IRS-P6 (Indian Remote Sensing Satellite) image of the sensor Linear Imaging Self-Scanning (LISS)-III (February 8, 2004) at 1:50,000 scale from the Indian Institute of Remote Sensing (IIRS), Dehradun, India. The satellite image was geocoded to generate false color composite (FCC) from the bands 2, 3 and 4 of IRS-P6 image. The generated FCC was utilized to prepare land use/land cover map and to complement geomorphologic map of the area.

  3. (c)

    Soil map of Udaipur district at 1:250,000 scale comes from National Bureau of Soil Survey and Land Use Planning, Regional Research Station, Udaipur (Jain et al. 2005). The map was digitized, prepared and classified into soil classes based on texture of the soil.

  4. (d)

    Digital Elevation Model (DEM) of the catchment was extracted from Shuttle Radar Topographic Mission (SRTM) data (topographic elevations obtained during February 2000) with resolution of 3 arc-second (or 90-m pixel size) downloaded from the US Geological Survey website (USGS 2004). The SRTM DEM dataset was ‘finished’ meaning that ‘sink holes’ had been removed. The DEM was utilized to prepare topographic and slope maps of the catchment.

  5. (e)

    Groundwater level data for 50 sites at monthly time scale for 36-month period (May 2006–July 2009) were recorded by means of TLC (temperature level conductivity) Meter made by Solinst, Canada; data for November 2006, February 2007 and October 2007 were missing. The accuracy of the measured groundwater levels was up to the nearest 1-mm. Location coordinates (latitude and longitude in UTM coordinate system) of the monitoring sites were recorded by means of Trimble-made Global Positioning System; sites are located in Fig. 1.

  6. (f)

    Well-yield and aquifer parameters, i.e., transmissivity and specific yield, were determined by conducting pumping tests at 18 selected sites of the catchment. Transmissivity values were used to prepare thematic raster layer using inverse distance weighted interpolation technique to be further used for delineating groundwater potential zones while specific yield values were used to estimate net recharge using water level fluctuation technique. The well-yield database was used to evaluate the groundwater potential zonation map.

Geomorphologic features of the catchment were digitized from toposheets, and subsequently complemented from IRS-P6 image. The geomorphology, being the science of landforms (Thornbury 1986), has direct/indirect impact on the occurrence and movement of groundwater (Brown 1996). The topographic elevation and slope were calculated separately using SRTM DEM. The GIS-based slope map for the catchment was generated by applying two differential gradient filters \(\left( {\frac{\partial }{\partial x}{\text{ and }}\frac{\partial }{\partial y}} \right)\) on the DEM using ILWIS software.

The drainage density is defined as closeness of spacing of surface water channels in an area (Haan 2002). To prepare drainage density map of the catchment, firstly, drainage lines were simulated and digitized from the SRTM DEM, and were checked from remote sensing imagery and topo sheets of the catchment at 1:50,000 scale. Secondly, sub-watersheds were delineated from raster DEM by applying a standard methodology of ILWIS software. The pixels or cells, having potential of being part of stream network, have a high-flow accumulation value, while the cells near the catchment boundaries or where the overland flow dominates, have relatively low flow accumulation value. Stream branches are controlled and created by the user-specified threshold value of number of grid cells contributing to that stream branch. The downstream edge of each stream branch is located by accomplishing default definition of sub-watershed outlets. Then, the sub-watersheds were delineated by tracing the overland flow direction from each grid cell until either an outlet cell or the edge of the DEM grid extent was encountered. Finally, the drainage density (D d) was computed as expressed by Haan (2002):

$$D_{\text{d}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {{\text{SL}}_{i} } \right)} }}{A}$$
(1)

where, D d is the drainage density (km km−2), SL is the cumulative length (km) of all streams present within ‘A’ area, A is the area (km2), and n is the total number of streams present within the area ‘A’.

The surface waterbodies present in the catchment were digitized from toposheets and were checked and confirmed from RS imagery and DEM.

Furthermore, 3-year (2006–2008) pre-monsoon and post-monsoon groundwater level data of 50 sites were subjected to geostatistical modeling for developing spatial distributed groundwater level maps for the catchment. Firstly, separate experimental variograms for all the seasonal groundwater levels were computed; secondly, theoretical geostatistical models were fitted to the experimental variograms by adjusting the model parameters, i.e., nugget, sill and range; and finally, the best-fit geostatistical model was selected based on the goodness-of-fit criteria. In this study, exponential geostatistical model was adjudged to be the best-fit for developing spatial maps of the pre- and post-monsoon seasons of 3 years and then average of the 3 years was computed for both the seasons. Specific yield and transmissivity of the aquifer systems were determined by analyzing pumping tests’ data. The spatial distribution of both the parameters was determined from point estimates of the 18 sites using inverse distance weighted technique. The groundwater fluctuation method was used in this study to estimate the mean annual net recharge by employing GIS-based raster maps of the pre- and post-monsoon seasons.

Multispectral supervised classification of the IRS-P6 (LISS-III) satellite imagery was performed to prepare land use/land cover (LULC) map of the catchment. The LULC map was developed by following three stages of training, classification, and accuracy assessment (Machiwal et al. 2010).

Assignment and normalization of weights

In this study, international expert hydrogeologists were interviewed to seek their opinions on relative importance of the hydrologic variables influencing the occurrences of the groundwater potential. This approach of asking experts’ opinions is lacking in most of the past studies for selecting appropriate weights for different themes and their features despite the fact that such kind of approach is strongly recommended for RS and GIS-based groundwater prospectus studies (Saaty 1980). The local subject matter specialists were also invited their views about relative importance of one theme over another and of their features. The experts’ weights were provided to different themes and their features on Saaty’s scale ranging from 0 to 9 (Saaty 1980). The maximum weights were assigned to the themes/features of highest groundwater potentiality and the minimum weights to the lowest potential themes/features. Then, the weights were finalized based on experts’ opinions and personal experience/knowledge of the study catchment. The final weights of the themes and their features were normalized by adopting multi-criteria decision making (MCDM) technique, i.e., analytic hierarchy process (AHP) technique proposed by Saaty. The AHP technique normalizes the assigned weights using eigenvector technique, which reduces the bias/subjectivity involved in the assigned weights. The normalized weights were checked for the presence of consistency by computing consistency ratio for different themes and their features. Saaty (1980) suggested that the assigned weights are consistent only when the consistency ratio remains within 10 %, otherwise the weights should be re-considered to remove inconsistency. The computation of the consistency ratio involves three steps: (1) principal eigenvalue (\(\lambda_{ \hbox{max} }\)) was computed by eigenvector technique, (2) consistency index C was calculated as (Saaty 1980):

$$C = \frac{{\lambda_{ \hbox{max} } - n}}{n - 1}$$
(2)

where, n is the number of factors, and (3) consistency ratio was calculated from following expression:

$$C - {\text{ratio}} = \frac{C}{{C_{\text{random}} }}$$
(3)

where, C random is the random consistency index, whose values were obtained from standard table provided in Saaty (1980).

Delineation of groundwater potential zones by knowledge-driven method

The methodology for delineating the groundwater potential zones using remote sensing, GIS and MCDM techniques is illustrated in Fig. 2. The groundwater potential zones were delineated by employing weighted linear combination (WLC) technique suggested by Malczewski (1999). All the 11 raster-based thematic layers were integrated in GIS environment to develop groundwater potential index (GPI) map as shown below.

Fig. 2
figure 2

Flowchart for delineating groundwater potential zones by knowledge-driven method

$${\text{GPI}} = \sum\limits_{t = 1}^{m} {\sum\limits_{f = 1}^{n} {\left( {w_{t} \times w_{f} } \right)} }$$
(4)

where, GPI is the groundwater potential index, w t is the normalized weight of the tth theme, w f is the normalized weight of the fth feature of theme, m is the total number of themes, and n is the total number of features of a theme.

Verification of the delineated groundwater potential zones

The developed groundwater potential zone map was verified from well yields of 18 sites in the catchment. The mean discharge values of the existing wells in separate groundwater potential zones were compared. The further verification of the delineated groundwater potential zone map involves plotting of cumulative frequencies of the wells against their respective well yields for separate potential zones.

Development of groundwater potential index by data-driven technique

In the data-driven approach, the groundwater potential index (GPI) values computed through the knowledge-driven approach were used as the input data after their proper verification. The multiple linear regression (MLR) technique allows logic-based transformation of natural conditions into numerical values due to their ability to provide favorable conditions of the groundwater occurrences. The known GPI values for 47 of the 50 sites shown in Fig. 1 along with numerical values of the quantitative variables at those sites such as pre- and post-monsoon groundwater levels, net recharge, topographic elevation, slope, proximity to surface waterbodies, transmissivity and drainage density were used for the MLR analysis. For rest of the thematic layers, i.e., geomorphology, soil and LULC, information on natural conditions of their features at 47 sites was digitized and transformed into comparative numerical values according to their control on the occurrence of the groundwater resources by the help of the GIS technique. The multiple linear regression (MLR) model, expressing GPI in terms of the 11 thematic layers, is as follows:

$$\begin{gathered} {\text{GPI}}_{\text{MLR}} = \alpha_{ 1} {\text{GM}}\;{ + }\;\alpha_{ 2} {\text{SO}}\;{ + }\;\alpha_{ 3} {\text{TE}}\;{ + }\;\alpha_{ 4} {\text{SL}}\;{ + }\;\alpha_{ 5} {\text{DD}}\;{ + }\;\alpha_{ 6} {\text{SWB}}\;{ + }\;\alpha_{ 7} {\text{GD}}_{\text{pre}} \;{ + }\; \hfill \\ \quad \quad \;\;\alpha_{ 8} {\text{GD}}_{\text{post}} \;{ + }\;\alpha_{ 9} {\text{NR}}\;{ + }\;\alpha_{ 1 0} {\text{TR}}\;{ + }\;\alpha_{ 1 1} {\text{LU}}\;{ + }\;\alpha_{ 1 2} \hfill \\ \end{gathered}$$
(5)

where, GPIMLR is the groundwater potential index based on MLR technique, GM is the geomorphology, SO is the soil, TE is the topographic elevation, SL is the slope, DD is the drainage density, SWB is the proximity to surface waterbodies, GDpre and GDpost is the depth to groundwater levels in pre- and post-monsoon seasons, respectively, NR is the net recharge, TR is the transmissivity, and LU is the land use/land cover. The α 1, α 2, …, α 10, α 11 are regression coefficients or relative weights of respective thematic layers, which are computed from the MLR technique, and α12 is the constant for regression model.

The computed weight coefficients were used to generate raster map of the groundwater potential index for the entire catchment in GIS environment. The GPIMLR map was classified in a manner such that the number of the delineated potential zones and their respective areal extents match with those in the knowledge-driven GPI map. This allows for comparing the GPI maps prepared by two approaches.

The procedure for the development of groundwater potential index map using data-driven approach is illustrated in Fig. 3.

Fig. 3
figure 3

Flowchart for developing groundwater potential index using data-driven approach

Verification of the data-driven approach

The MLR technique works on the least square technique and the coefficients are chosen for the best combination of all the coefficients along with constant in a manner to reduce the square of the errors to a minimum. The value of the multiple correlation coefficient for the best-fit combination of all the coefficients decides whether the coefficients are appropriate or not. In addition, the relative weight coefficients computed by MLR technique for the 11 thematic layers were utilized to back-estimate the groundwater potential index (GPIMLR) for 47 sites whose information was used to develop the GPIMLR. The GPIMLR were verified both by graphical analysis, i.e., by plotting original GPI versus GPIMLR for 47 sites along with 1:1 line, and by statistical analysis, i.e., computing coefficient of determination (R 2). Furthermore, the GPIMLR estimates were calculated for 18 additional sites where well-yield data were available. Both the earlier-mentioned graphical and statistical performance indicators were employed to evaluate and further confirm the results of the GPIMLR.

Identifying major factors controlling groundwater occurrence

It is essential to understand the major exogenous and endogenous factors responsible for the groundwater occurrences in the delineated potential zones for augmenting and managing groundwater resources. The multivariate statistical analysis technique such as principal component analysis (PCA) is very effective in evaluating the major factors influencing the occurrences of the groundwater resources in the hard-rock aquifer systems (Machiwal et al. 2011a). Thus, the PCA technique was applied to values/information of 11 thematic variables (for 47 sites where the groundwater level data and information of all the themes were available) along with their corresponding GPI values to identify the most influencing variables. The PCA technique was separately applied to the standardized datasets of sites falling under individual groundwater potential zones. PCA is a multivariate statistical tool to discriminate which variables (or parameters) are more important in a group of samples and to discriminate groups or families of statistical units that could have, potentially, a relationship (Dillon and Goldstein 1984). PCA was actually performed on correlation matrix of the groundwater levels. Eigenvalues greater than 1 are taken as criterion for extraction of the principal components required to explain the sources of variances in the data.

Results and discussion

Thematic layers of hydrologic/hydrogeologic variables and their features

Geomorphology

On the basis of the physiographic characteristics, the landforms of the study area can be classified into six different units namely: deep buried pediment, inselburg, residual hill, shallow buried pediment, structural hill and waterbody as shown in Fig. 4. A pediment is a gently inclined slope of transportation and/or erosion that truncates rock and connects eroding slopes or scarps to the areas of sediment deposition at lower levels (Oberlander 1989). If the pediments are covered by alluvial or weathered materials, they are termed ‘buried pediments’ which are considered as good to moderate source of groundwater. The deep buried pediment type of geomorphology covering 59.69 km2 is mainly present in the southeast portion of the study area and in a small portion in the northwest and western portions where the river courses exist (Fig. 4). As the pediment is completely buried with some surface weathering materials, it may be favorable for groundwater storage. An isolated hill of massive type abruptly rising above surrounding plains is called inselburg. It forms runoff zones and barriers for groundwater movement. It has less significant recharge potential and prospects than that for deep buried pediment. Inselburgs are the tiny outcrops present at a fewer places which represent only 1.01 km2 area. Residual hills are found in small patches in the study area, encompassing about 6.2 km2 area. Shallow buried pediment covers the highest proportion, i.e., 56 % (187.58 km2) of the catchment. As the study area is surrounded by Aravalli hills, the structural hills having the least chances of the groundwater occurrences are present in 80.16 km2 area on edges of the catchment boundary surrounding the entire catchment. The surface waterbodies with adequate storage of the surface water resources for few months in a year may have opportunities for ample quantities to get recharged through beds of the waterbodies. Thus, the waterbodies extending over 13.67 km2 area are possible sources having significant recharge potential and prospective groundwater resources.

Fig. 4
figure 4

Geomorphology map of the catchment

Soil texture

The thematic layer on soil (Fig. 5) for the catchment reveals six main soil texture classes viz., coarse loamy sand, coarse to fine loam, fine loam, fine loamy rock outcrop, loamy skeletal rock outcrop and skeletal fine loam. It is apparent from Fig. 5 that the majority of the study area i.e., 270 km2 is dominated by fine loam and coarse loamy sands, with other soil types covering rest 22 % of the area. The coarse loamy sand followed by coarse to fine loam and fine loam allows relatively high infiltration and percolation rates due to the presence of large macropores compared to other soil types, which may lead to the good amounts of recharged water. All soil types associated with rock outcrops may have restricted entries of the surface water to subsurface zones resulting in relatively low groundwater prospectus in the catchment.

Fig. 5
figure 5

Soil map of the study area

Topographic elevation and slope

The topographic elevation map classified into five classes from SRTM DEM is shown in Fig. 6a. It is depicted from Fig. 6a that the land elevations are the highest (>600 m MSL) nearby the catchment boundary and the lowest in the southeast portion of the study area where the catchment outlet, i.e., Udaisagar lake, exists. In the major portion (154.37 km2) of the catchment, the land is elevated up to 600–700 m MSL (Table 1). The relatively high land elevations in the northeast portion and closed to boundary induces high runoff, and thus less opportunity for rainfall infiltration. On the other side, relatively low elevations in the southeast portion receive all the runoff quantities generated within the catchment, which have ample scope for recharging and augmenting the groundwater resources.

Fig. 6
figure 6

Maps showing a topographic elevation and b slope in the catchment

Table 1 Assigned and normalized weights of different features of 11 thematic layers used for groundwater potential zoning

The topographic land slope map of the catchment (in %) is classified into six classes (Fig. 6b). The steep slopes (>15 %) prevail in 78.14 km2 area nearby the boundary and around the structural and residual hills of the catchment (Table 1). The steep slopes facilitate generating large runoff amounts, which easily escape from the area due to the high-flow velocity. Hence, there is little scope for the runoff water to get recharged in the land having high topographic slopes. Whereas, the land slope is less than 4 % in nearly flat terrain extended over 164.21 km2 area surrounded by the catchment boundaries (Fig. 6b), which have enhanced possibility for occurrence of the rainfall recharge due to low velocity of the runoff water.

Drainage density

The drainage density map for the study catchment shown in Fig. 7 is classified into three classes, i.e., <0.8, 0.8–1.2 and >1.2 km km−2. The relatively high drainage density in a portion simply means easy and quick escape of the generated runoff water from that area, and hence, lesser chances of infiltration and groundwater recharge or vice versa. Thus, the area characterized by low drainage density values may have favorable conditions for the occurrences of the groundwater potential. The drainage density in the major portion of the catchment (149.66 km2) ranges from 0.8 to 1.2 km km−2 (Table 1).

Fig. 7
figure 7

Drainage density map of the catchment

Proximity to surface waterbodies

The three classes were chosen for classifying the proximity to surface waterbodies as (1) <100 m, (2) 100–500 m, and (3) >500 m by considering suitable buffer distances (Fig. 8). Obviously, the areal extent of the surface waterbodies is small compared to size of the whole of the study catchment. The classes of <100 m and 100–500 m proximities to surface waterbodies have relatively lesser extent of 8 and 13 % only (Table 1).

Fig. 8
figure 8

Proximity to waterbodies in the area

Pre- and post-monsoon groundwater levels

The kriged mean pre-monsoon groundwater level varies from 2 to more than 23 m below ground surface (bgs) as shown in Fig. 9a, whereas the kriged mean post-monsoon groundwater level ranges from 2 to 14 m bgs (Fig. 9b). Fig. 9a reveals that in the major portion of the catchment (243.87 km2 or 70 % area), the mean pre-monsoon groundwater depth varies from 8 to 14 m bgs, while Fig. 9b depicts that the major portion (290.57 km2 or 83 %) in the post-monsoon season experiences the mean post-monsoon groundwater depth ranging from 2 to 8 m bgs. It is seen that the mean groundwater levels remain shallow in the central and western portions of the catchment and relatively deep groundwater levels occur at and nearby the catchment boundaries. Also, the groundwater levels are more variable in the pre-monsoon season (8 classes) compared to that in the post-monsoon season (4 classes).

Fig. 9
figure 9

Spatially distributed kriged mean groundwater depth during a pre-monsoon and b post-monsoon seasons

Net recharge

The mean annual net groundwater recharge computed using water level fluctuation technique varies from less than 5 to more than 40 cm (Fig. 10). The catchment is divided into six classes of recharge zones: (1) <5 cm, (2) 5–10 cm, (3) 10–20 cm, (4) 20–30 cm, (5) 30–40 cm, and (6) >40 cm. It is revealed from Fig. 8 that the net recharge potential is high (>40 cm) in the northern portion of the catchment, while the least significant recharge potential (<5 cm) occurs in the southern and southeast portions of the area. The mean recharge potential is relatively high (10 to more than 40 cm) in 151.76 km2 (44 % area) of the upper catchment in comparison to lower catchment area of 196.58 km2 (56 %) where the recharge potential is less than 10 cm.

Fig. 10
figure 10

Net recharge distribution in the catchment

Transmissivity

The classified spatial distribution map of the transmissivity values computed by analyzing the pumping test data is shown in Fig. 11. It is seen that the transmissivity values ranging from 70 to more than 600 m2 day−1 are classified into 5 classes. Figure 11 depicts that the hard-rock aquifer is highly transmissive in the northern portion of the catchment with relatively high values of the transmissivity, i.e., more than 600 m2 day−1. In the major portion of the catchment (206.33 km2, 59 % area) situated in the southern, southeast and southwest areas, the aquifer transmissivity is relatively low varying between 150 and 300 m2 day−1 (Table 1).

Fig. 11
figure 11

Transmissivity distribution map of the catchment

Land use/land cover

The catchment consists of five types of the land use/land cover (LULC) as shown in Fig. 12. The forest land covering 80 km2 (23 % area) mainly exists nearby the catchment boundary where structural hills are present. The waterbody with 6.15 km2, having high potential for the underneath groundwater occurrences, has the least extent (2 %) in the catchment. The agricultural lands, having moderate potential for the groundwater, are spatially distributed close to the waterbodies, e.g., rivers and lakes in an extent of 68.79 km2 (20 % area). The major portion of the catchment (170.36 km2 or 49 % area) belongs to the rangelands, which are mostly confined between agricultural fields and forests.

Fig. 12
figure 12

Land use/land cover map of the catchment

Normalized weights for themes and their features

The relative weights to 11 thematic layers and their corresponding features were assigned on Saaty’s scale, which were normalized using MCDM (AHP) and eigenvector techniques. The assigned weights for 11 themes are given in Table 2, whereas normalization procedure along with normalized weights for the thematic layers is provided in Table 3. In the similar manner, assigned weights to different features/classes of the 11 thematic layers were normalized using the Saaty’s AHP technique, which are summarized in Table 1. The consistency ratio for the assigned weights to 11 themes and their features were found to be within 10 % suggesting that the weights were consistent and free from subjectivity/bias.

Table 2 Assigned relative weights to different thematic layers on Saaty’s scale
Table 3 Normalization of the assigned weights to 11 thematic layers by eigenvector technique

Knowledge-driven groundwater potential zoning

The groundwater potential zone map of the catchment resulted from the GIS-based WLC technique is presented in Fig. 13a, which reveals three distinct classes or zones representing good, moderate and poor groundwater potential. The good groundwater potential zone encompasses an area of 90.94 km2 (26 %) and mainly exits in the deep buried pediment, agricultural land around the surface waterbodies. It indicates that the area is having sufficient groundwater availability and the hard-rock terrain is suitable for groundwater storage. In the good groundwater potential zone, new productive wells can be successfully drilled without the risk of failure. The poor groundwater potential zone extending over 122.36 km2 (35 %) mainly covers the adjoining area of catchment boundary where structural hills and forests are situated on relatively high topographic elevations and steep slopes. The moderate groundwater potential zone dominantly encompasses 135 km2 (39 %) in the catchment and is confined between the good and poor zones of the groundwater potential. The large proportion of the moderate groundwater potential zone occurs in the northern portion of the area.

Fig. 13
figure 13

a Groundwater potential zones delineated by knowledge-driven technique and b frequency distribution of well yield in good, moderate and poor groundwater potential zones

Evaluating the knowledge-driven groundwater potential zone map

To evaluate the knowledge-driven groundwater potential zone map, well-yield data of 18 wells (location is shown in Fig. 11) were used for graphical and statistical analyses. The significant value of the correlation coefficient (0.65) between the well yield and the groundwater potential index (GPI) for 18 sites was obtained, which indicates the presence of the significant relationship between them and verifies the delineated groundwater potential zones. Furthermore, the mean yields of the wells falling in the good, moderate and poor groundwater potential zones are 1,539, 443 and 434 m3/day, respectively, which further favors the delineated potential zone map.

Moreover, the cumulative frequency of the wells falling under the good zone is higher than that for the wells falling in the moderate and poor zones (Fig. 13b). Similarly, the cumulative frequency of the wells falling in the moderate zones remains higher than that for the wells of poor zone. Thus, the groundwater potential zone map delineated based on knowledge-driven approach is reliable.

Data-driven groundwater potential index

Since the results (GPI values) of the knowledge-driven approach are verified and confirmed, the GPI values were used for developing the groundwater potential index based on multiple linear regression (MLR) technique. The weight coefficients of the 11 thematic variables computed using the multiple linear regression (MLR) technique are listed in Table 4. The resulted GPIMLR model is as expressed below:

Table 4 Regression or weight coefficients for 11 thematic variables computed by MLR technique
$$\begin{gathered} {\text{GPI}}_{\text{MLR}} { = (0} . 3 8 9 5\; \times \;{\text{GM)}}\; - \; ( 0. 1 0 8 0\times {\text{DD)}}\; - \; ( 0. 0 1 7 4\times {\text{GD}}_{\text{pre}} )\; - \; ( 0. 2 1 1 7\times {\text{GD}}_{\text{post}} )\;{ + }\; \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, ( 0. 0 1 0 9\times {\text{NR)}}\;{ + }\; ( 0. 4 3 7 5\times {\text{LU)}}\; - \; ( 0. 0 6 9 5\times {\text{SL)}}\;{ + }\; ( 0. 2 6 4 5\times {\text{SO)}}\,\; - \; \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, ( 0. 0 1 9 7\; \times \;{\text{TE)}}\;{ + }\; ( 0. 0 0 2 6\; \times \;{\text{TR)}}\; - \; ( 0. 0 0 0 1\; \times \;{\text{SWB)}}\;{ + }\; 2 6. 6 3 7\hfill \\ \end{gathered}$$
(6)

where, GPIMLR is the groundwater potential index computed by the MLR technique, and all other abbreviations are as defined in the Eq. (5).

The developed model for GPIMLR may be used to estimate the groundwater potential precisely at any point within the catchment using the data/information at that point of the 11 hydrologic/hydrogeologic variables defined by the model. The classified raster map of the GPIMLR for the entire catchment is shown in Fig. 14a. On comparing Figs. 13a and 14a, it may be inferred that there is large similarity in the good, moderate and poor groundwater potential zones delineated based on both the knowledge- and data-driven approaches.

Fig. 14
figure 14

a Groundwater potential index map generated by data-driven technique and b comparison of the GPI estimates on 1:1 line

Accuracy of the data-driven GPI

The standard error values computed by the MLR technique for the best-fit combination of the regression coefficients are not significantly large (Table 4). The multiple R and R 2 values for the computed weight coefficients are 0.945 and 0.893, respectively, which indicates that the computed regression coefficients are appropriate to adequately estimate the groundwater potential index (GPI). Furthermore, the GPIMLR values of 47 sites computed from data-driven method were compared with the GPI values calculated by knowledge-driven approach by plotting both the GPI values on 1:1 line as shown in Fig. 14b. It is seen that the deviation of the fitted straight line to the observed data is negligible from the 1:1 line, which further verifies that the developed GPIMLR model provides satisfactory estimates of the GPI. Moreover, the developed GPIMLR model was used to estimate the GPI for additional 18 sites where well-yield data are available; location of the wells is shown in Fig. 11. The comparison of the GPI values estimated by both data-driven and knowledge-driven approaches for 18 sites on 1:1 plot revealed that the fitted straight line is very much close to the 1:1 line (Fig. 14b). The multiple R 2 value for the fitted straight line is 0.86, which confirms that the developed MLR model is reliable for estimating the GPI estimates with the reasonable accuracy.

Factors influencing groundwater potential

The results of the PCA technique revealed the presence of 4, 5 and 5 significant principal components (PCs) in the good, moderate and poor groundwater potential zones delineated by the knowledge-driven approach (eigenvector value >1). The factor loadings of the first five PCs are presented in Table 5 for the individual groundwater potential zones. The eleven thematic factors were evaluated following the criterion proposed by Liu et al. (2003) to find out the factors having the major influence on the groundwater potential of the hard-rock aquifer system of the catchment. According to the considered criterion, the PC loadings were classified into three classes of their significance on the groundwater potential: (1) strong significance (PC loadings >0.75), (2) moderate significance (0.75 > PC loadings > 0.5), and (3) weak significance (PC loadings <0.5). Furthermore, it is seen that the first two PCs explain more than 51.90, 41.37 and 62.12 % of the total variation of the system. Hence, the first two PCs explaining the major variation in the groundwater systems of the delineated groundwater potential zones were plotted on unit circle as shown in Fig. 15. It is observed from Fig. 15 that the PC loadings of the GPI are higher on the factor 1 than that on the factor 2 for all the three potential zones. Thus, the GPI is mainly represented by the first factor and other influential factors associated with the GPI should also have moderate to strong PC loadings on the factor 1.

Table 5 Factor loadings of the significant principal components
Fig. 15
figure 15

Principal component loadings of the two major factors affecting groundwater occurrences in a good, b moderate, and c poor potential zones

In the good groundwater potential zone, it is depicted from Fig. 15a that the factor 1 is having strong PC loadings of the GPI (0.87), net recharge (0.81) and transmissivity (0.76), and moderate PC loadings of the proximity to surface waterbodies (−0.62), and land use/land cover (0.53). Thus, it is observed that the endogenous factors, i.e., net recharge and transmissivity, have a strong influence on occurrence of the good groundwater potential in the catchment, while the exogenous factors, i.e., closeness to surface waterbodies and LULC, have moderate influence on the good groundwater potential.

For the zone having moderate groundwater potential, Fig. 15b depicts that the PC loading of the topographic elevation is strong (0.75) on factor 1, while the PC loadings of the transmissivity (0.64), land use/land cover (0.61), geomorphology (0.55), and drainage density (0.54) are moderate on factor 1. Thus, the moderate groundwater potential in the catchment is the most influenced with the exogenous factors only.

In the zone with poor groundwater potential (Fig. 15c), it is apparent that the factor 1 is characterized by the strong PC loadings of the four factors, i.e., post-monsoon groundwater depth (−0.92), proximity to surface waterbodies (−0.88), pre-monsoon groundwater depth (−0.83), and geomorphology (−0.76), and the moderate PC loadings of the drainage density (0.57). Thus, it is clearly reflected that both the exogenous and endogenous factors are responsible for the poor groundwater potential in the area.

Conclusions

In the hard-rock aquifer systems, field investigations for locating new productive well are very costly, time-consuming and laborious. This study suggested integrated use of remote sensing, geographical information system, multi-criteria decision making technique, and multiple linear regression technique to develop groundwater potential index map and delineate groundwater potential zones. The integrated approach is demonstrated with the novel application of both the knowledge-driven and data-driven methods to delineate the groundwater potential zones in the hard-rock terrain of Ahar River catchment, India. The results of the both knowledge- and data-driven approaches are validated from the existing well-yield data, which are found comparable to each other. The results indicated that the good, moderate and poor groundwater potential zones exist in 90.94 km2 (26 %), 135 km2 (39 %) and 122.36 km2 (35 %) area, respectively. The good groundwater potential in the hard-rock terrain is found to be strongly influenced by endogenous factors (net recharge and transmissivity). However, the moderate groundwater potential is mostly affected by exogenous factors (topographic elevation, land use/land cover, geomorphology and drainage density) only and the poor potential of the groundwater is characterized by both the endogenous and exogenous factors.

Moreover, the groundwater potential zone map prepared in this study may be used as the first estimate for locating favorable sites of new productive well in the study area without involving large expenses in searching for the water resources. Thus, the developed groundwater potential zone map is very useful for the water managers and policymakers in planning and formulating appropriate strategies for long-term sustainable use of the groundwater resources in the study area. The proposed methodology of integrating knowledge- and data-driven approaches for precise groundwater potential assessment may successfully be used in other hard-rock regions of the world. The methodology is most suitable for developing countries where intensive hydrogeologic data are often lacking, however, use of MCDM techniques should always be accompanied and the computed groundwater potential index values should be verified from yield of existing wells.