Introduction

Drought is a multifaceted phenomenon that is not as widely recognized as other natural hazards, owing to various influential factors operating at various temporal and spatial scales (Kiem et al. 2016). It is a widespread and highly destructive natural hazard occurring in nearly all geographical regions (Kiafar et al. 2020). The primary cause of drought is often a deficiency in rainfall, though in certain instances, anomalies in variables such as temperature and evapotranspiration can also contribute (Cook et al. 2014; Livneh and Hoerling 2016). Additionally, human activities, such as changes in land use and the exploitation of reservoirs, have the potential to modify hydrological processes and impact the development of drought (Van Loon et al. 2015). The intricate interplay of meteorological anomalies, land surface processes, and human activities plays a crucial role in the initiation and progression of drought (Hao et al. 2018). Droughts manifest in various types, including meteorological, hydrological, groundwater, agricultural, and economic-social categories. In general, drought is rooted in the amount of rainfall. At first, a meteorological drought occurs, and if it continues, other types of droughts, including hydrological and groundwater droughts, will occur, which will have a significant impact on water resources (Livneh and Hoerling 2016; Luo et al. 2017).

The groundwater drought index indicates the critical condition of groundwater during a long-term meteorological drought, which renders groundwater resources unavailable or reduced for human use (Villholth et al. 2013). The study of groundwater drought is important because groundwater is the main source of water supply for domestic use, irrigation, and industry in many countries, especially in arid and semiarid regions. According to Melaku and Wang (2019), groundwater supplies water needs for over 1.5 billion people and almost 40% of irrigation water. However, changes in land use, population growth, and overexploitation of resources lead to groundwater depletion, which can hinder sustainable development (Sheikha-BagemGhaleh et al. 2023). Therefore, there is an increasing need to predict groundwater drought. Forecasting tools for drought are deemed essential for effective risk management, water resource engineering, and strategic planning (Alsumaiei and Alrashidi 2020).

Today, under the influence of various climatic and human factors, groundwater resources are always exposed to critical conditions, which requires further investigation of their possible impacts (Milan et al. 2023). Many studies indicate the undesirable impact of drought on groundwater resources, showing that more study is required in this field. Estimating and predicting the groundwater drought index is one of the first steps in investigating the effects of drought on groundwater resources (Asadzadeh et al. 2016; Karunakalage et al. 2024). In simulating groundwater drought, methods such as physical, mathematical, and data-based approaches are available. Physical and mathematical methods often receive less attention from researchers due to limitations such as high cost, time consumption, requirement of expertise, and more information (Mosavi et al. 2018). Data-based methods or artificial intelligence models, on the other hand, are more popular among researchers due to advantages such as not requiring initial and boundary conditions, high simulation accuracy, and cost efficiency (Seo and Lee 2019; Elbeltagi et al. 2023; Aghelpour and Varshavian 2021; Farzin et al. 2022; Almikaeel et al. 2022).

Machine learning (ML) is a subset of artificial intelligence (AI) focused on identifying patterns and regularities. Predictive models built on ML hold promise as they are more straightforward to develop and demand fewer inputs (Hashemi et al. 2014; Esmaili et al. 2021). In comparison with physical models, ML models offer simpler implementation, and faster training, validation, testing, and evaluation processes (Mosavi et al. 2018; Kiafar et al. 2017). Several researchers have proven the application of these models in hydrology studies (Arya Azar et al. 2023; Kayhomayoon et al. 2023; Jamnani et al. 2024; Milan et al. 2023). Shamshirband et al. (2020) used ML models including support vector regression (SVR), gene expression programming (GEP), and model trees (MT) to predict various drought indicators, and their results showed high capability in predicting drought indicators. Akter et al. (2023) used several ML models to simulate two drought indices, SPI and SPEI. Other researchers (Bidabadi et al. 2024; Malik et al. 2020; Malik et al. 2019; Piri et al. 2023; Tian et al. 2018) have used different ML models to simulate different drought indicators, but very few studies have simulated the groundwater resource index (GRI) index using ML models. In this context, we can refer to the prediction of groundwater drought index using ANFIS and Bayesian networks (Gocić et al. 2015) and groundwater level prediction using wavelet-SVM (Pham et al. 2022). Also, in recent years, some ML models such as the group method of data handling (GMDH) and SVR have been used in a limited way, the results of which have shown their promising performance. Despite the large number of ML models, further investigation in this field is still needed due to their different performance.

One of the models used in this research is the MARS model, which is a regression-based model that works like a step-by-step linear regression model. Its main advantage is improving the understanding of complex relationships between the target and predictor variables (Adnan et al. 2020). This method constructs adaptable regression models to predict the target variable by partitioning the problem space into intervals based on predictor (input) variables and fitting a spline (basis function) within each interval (Zhang and Goh 2016). Additionally, the LS-SVM model is employed in this research. This model is based on the theory of statistical learning, which aims to minimize risk. Unlike SVM, this model uses linear equations instead of quadratic programming problems, yet it has higher computational accuracy compared to classical SVM (Leong et al. 2021).

The accuracy of both the MARS and LS-SVM models has been confirmed in various hydrology and environmental studies (Rezaei-Balf et al. 2017; NajafZadeh et al. 2022a, 2022b; Arya Azar et al. 2021a; Leong et al. 2021; Amiri et al. 2023). As Iran is located in an arid and semiarid region, each region is exposed to different types of drought. In recent years, due to changes in precipitation patterns and the existence of climate change phenomena, which cause increased temperature and decreased rainfall in most regions, there has been a decrease in runoff and river flow, as well as excessive extraction of groundwater sources. There has been significant pressure on the groundwater resources in different regions of Iran, making it necessary to investigate groundwater drought in these areas. Therefore, the aim of this research is to monitor groundwater drought. The study area is located in Ajabshir, northwest of Iran.

The research conducted in the field of groundwater drought indicators has been so far conducted by the indicator itself and using ML models, in which the conditions of the aquifer were not considered in the model. Therefore, in this study, for the first time, the situation of groundwater resources was simulated during modeling. Then, the GRI was predicted using its results as well as the hydrological indicator. Therefore, among the novelties of this research, besides using new ML models, we can mention considering the conditions of the aquifer as well as the use of the hydrological indices to estimate the GRI. In addition to increasing the performance of the GRI prediction, this brings a high level of scenario ability since the groundwater drought index is predicted according to the groundwater level. In this work, first, groundwater resources in the area were simulated using MODFLOW. The amounts of discharge and recharge of groundwater resources were evaluated along with the status of the groundwater level. In the following, the groundwater drought index was calculated using groundwater level data, and then groundwater drought was simulated using rainfall data, groundwater levels, streamflow drought index (SDI), and standardized precipitation index (SPI). LS-SVM and MARS models were used accordingly. Several input scenarios were defined for the models, and the performance of the models was evaluated under these scenarios. Finally, the most influential parameters affecting groundwater drought and the best predictive model were determined.

Materials and methods

Study area

The study area of Ajabshir (45° 43′ E, 36° 46′ N), with an area of 1508 km2 and an altitude of 1385 m above sea level, covers approximately 2.9 percent of the entire watershed of Urmia Lake. Out of the total area, 249 km2 belongs to the plain, while 1259 km2 is composed of elevated regions (Fig. 1). The main river flowing through the study area is Ajabshir Chai, which flows in a south-north direction. A review of long-term precipitation data from the Ajabshir synoptic station reveals that the maximum annual rainfall in this region is 678.4 mm, while the minimum is 173.4 mm. Temperature plays a significant role in the region’s climate, and Ajabshir City, being located in the highlands, enjoys favorable weather conditions. The average minimum and maximum temperatures recorded in Ajabshir between 1984 and 2021 are 6.5 and 19.0 °C, respectively.

Fig. 1
figure 1

Location of the Ajabshir watershed

Despite having a favorable climate situation, this region has always faced challenges in meeting the water needs of the area, which is done usually through dams and groundwater resources. Lake Urmia, located in the neighborhood of this aquifer, is drying up today, which also needs serious attention. Meanwhile, the water resources in the surrounding areas can be used to improve the condition of the lake. This requires preliminary investigation of the water resources of the area, especially the groundwater sources. Therefore, it seems necessary to investigate and simulate the Ajabshir aquifer and investigate its potential under climatic factors such as drought and climate change. Therefore, this region can be a suitable representative for other areas in the implementation of the proposed approach in this study.

Research method

Figure 2 depicts the flowchart of this research, illustrating the prediction of the GRI values using ML models. The initial step involved groundwater simulation, accomplished using the MODFLOW code. In this stage, simulation was conducted for both steady and transient states. Subsequently, the interaction between surface and groundwater was calculated. Following that, meteorological and hydrometric drought indices were computed. Utilizing the flow data representing the interaction between surface and groundwater, as well as precipitation, SPI, and SDI, various input scenarios were developed for GRI prediction. The calculated GRI index served as the output of the models, which were estimated by employing ML models, namely LSSVR and MARS models. For the models, 70% of the available data were utilized for training purposes, while the remaining 30% was used for testing. The results of the models were evaluated using predefined criteria. If the evaluation criteria yielded satisfactory results, the GRI prediction was considered valid. Otherwise, the process was repeated.

Fig. 2
figure 2

A schematic view of the proposed method

Among the characteristics of drought, we can mention severity, continuity, extent, and frequency, all of which are determined by using drought evaluation indices. Given the significance of meteorological and hydrological drought, our research utilizes two hydrological drought indices: SPI and SDI. Additionally, in relation to groundwater drought, it is initially calculated and subsequently predicted using ML. Subsequently, we provide a simulation of the groundwater of the Ajabshir aquifer. Following that, we explain our methodology for calculating the drought indices.

Groundwater simulation

To check the groundwater condition of the Ajabshir aquifer, modeling and forecasting are required. Initially, the aquifer was modeled separately using the MODFLOW code, which was executed in the GMS software environment. This code is a suitable tool for measuring and simulating groundwater. A conceptual model is defined within this code, encompassing the aquifer area, groundwater inflow and outflow, water sources, discharge wells, and observation wells of the aquifer.

The initial step in groundwater simulation involves constructing a conceptual model that integrates fundamental details about the aquifer. This encompasses information such as the aquifer’s geographical extent, observation wells, exploitation wells, surface recharge to the aquifer, rivers, hydraulic conductivity, as well as the topography and bedrock (bottom elevation) of the aquifer. To establish this, the topographic data and the aquifer area were determined utilizing the digital elevation model (DEM) of the region. Bedrock data, another crucial modeling parameter, were derived from reports of pumping tests, well data, and the aquifer’s thickness in various locations. The aquifer thickness ranged from 80 to 250 m, considering a cell size of 300 × 300 m.

Roughly 14% of the monthly precipitation was designated for infiltration and aquifer recharge. Additionally, considering the primary purpose of the wells, the return water from wells was estimated at approximately 65, 70, and 20% for drinking water, industrial, and agricultural wells, respectively (Milan et al. 2023). As illustrated in Fig. 3, a substantial amount of aquifer discharge (around 266 MCM) occurs in exploitation wells. The model underwent simulations from October 2010 for the steady state and from October 2010 to September 2013 for the unsteady state, employing a monthly time step. The majority of exploitation wells are situated in the central section of the aquifer, primarily catering to urban and agricultural needs. Out of the 56 observation wells within the aquifer, 45 were utilized for calibration in both steady and transient states. Under steady-state conditions, where the hydraulic load remains constant over time, the Eq. (1) is employed for simulation

$$ \frac{{\partial^{2} h}}{{\partial x^{2} }} + \frac{{\partial^{2} h}}{{\partial y^{2} }} + \frac{{\partial^{2} h}}{{\partial z^{2} }} = 0 $$
(1)

where h represents the groundwater level. In the unsteady state, Eq. (2) represents the spatial and temporal distribution of the piezometric load in the confined conditions

$$ \frac{\partial }{\partial x}\left( {K_{xx} \frac{\partial h}{{\partial x}}} \right) + \frac{\partial }{\partial y}\left( {K_{yy} \frac{\partial h}{{\partial y}}} \right) + \frac{\partial }{\partial z}\left( {K_{zz} \frac{\partial h}{{\partial z}}} \right) = S_{s} \frac{\partial h}{{\partial t}} $$
(2)

where Kxx, Kyy, and Kzz are hydraulic conductivity in the x, y, and z directions, respectively. Ss is the storage coefficient of the aquifer. In the following, meteorological, hydrological, and groundwater drought indices used in this study are described.

Fig. 3
figure 3

The conceptual model of the aquifer

Groundwater drought index

Proposed by Mendicino et al. (2008) GRI is used to investigate groundwater drought. It is considered one of the most reliable indicators of drought as it expresses the condition of groundwater resources in terms of drought. One of the notable capabilities of this index is its high correlation with average runoff in certain rivers, enabling the prediction of summer droughts. Therefore, the index holds significant importance as an indicator of drought. Equation (3) provides a description of how to calculate GRI

$$\text{GRI}=\frac{{G}_{ij}-{m}_{im}}{\sigma }$$
(3)

where G represents the water level in month i in observation well j. m and σ, respectively, represent the standard deviation of water level data in month i. This index is an indirect criterion of the amount of water table nutrition and an indirect source of groundwater drought, which expresses the decrease in the groundwater level. It always has positive and negative values, which indicate drought and non-drought, respectively. According to the different values obtained from the above relationship, the intensity of drought is different. Table 1 shows the intensity according to the obtained values.

Table 1 Different values of GRI and categories of drought severity

Meteorological drought index

SPI is a variable of the standard probability distribution function, with the cumulative probability value or cumulative probability value of that variable obtained from the Gamma distribution being the same (Deo et al. 2017). SPI for the regions is calculated based on their long-run precipitation records. To calculate this index, the initial step involves fitting an appropriate statistical distribution to the long-term precipitation statistics. Subsequently, the cumulative function of the distribution is transformed into a normal distribution using equal probabilities. This normalization process standardizes the distribution, resulting in an average of zero for each region and period under examination (Zou et al. 2020; Salimi et al. 2021). SPI is calculated using Eq. (4)

$$\text{SPI}=\frac{{P}_{0}+\sum {P}_{-i}-{\mu }_{n}}{{\delta }_{n}}$$
(4)

where n is the number of months for which cumulative precipitation has been considered. P0 and P−i are the normalized precipitation in the current and previous month, respectively, μn is the average cumulative precipitation of n months and δn is the standard deviation of cumulative precipitation of n months. Table 2 shows the categorization of SPI index values. According to the table, if SPI is greater than 1, a downpour condition occurs, and if it is less than − 1, a drought condition occurs.

Table 2 Categorization of SPI (Deo et al. 2017)

Hydrological drought index

As an index used to study the drought of a region, SDI is calculated using river discharge data (Eq. 5) (Nalbantis et al. 2009; Pathak et al. 2016)

$$ \begin{aligned} & SDI_{j,k} = V_{j,k} - V_{k} /S_{k} \\ & V_{j,k} = \mathop \sum \limits_{j = 1}^{3k} Q_{i,j} , \quad i = 1, 2, 3, \ldots \\ & \quad k = 1,2,3,4 \quad j = 1,2, \ldots , 12 \\ \end{aligned} $$
(5)

where i is the hydrological year and j represents the month (1 for October and 12 for September). Q is the volumetric flow rate values of cumulative flow and also the mean and standard deviation parameters of cumulative flow data, respectively (Pathak et al. 2016). Table 3 shows the categorization of drought situations based on the SDI drought index (Nalbantis et al. 2009). According to this table, the downfall condition starts from an SDI value greater than 1 and the drought condition starts from an SDI value less than − 1.

Table 3 Categorization of SDI index

Meteorological and hydrological drought indicators along with precipitation parameters; groundwater discharge and level were used to predict the groundwater drought index. In this line, four input scenarios were compiled with different combinations of the mentioned parameters, which are shown in Table 4. According to this table, the fourth model includes all input parameters, while the first and second models have fewer input parameters for GRI prediction. Each of the developed models was implemented using LSSVR and MARS models, and their results were evaluated accordingly.

Table 4 Input scenarios compiled for GR prediction

Machine learning models

ML models are highly regarded because they do not rely on the specific nature of the data and encompass a variety of methods (Amanabadi et al. 2019). Among the existing methods, two models, MARS and LS-SVM, are particularly interesting as they have demonstrated good results in predicting systems in various fields. The following is a description of the structure and relationships utilized in these methods.

Multivariate adaptive regression spline

Presented by Friedman (1991), MARS is a nonparametric technique designed for flexible modeling of high-dimensional data. It is capable of uncovering nonlinear relationships that may be hidden within a dataset and achieving optimal solutions quickly (Zhang et al. 2016). The method determines the mutual effects among the explanatory variables and allows for distinguishing between effective and non-effective variables based on their influence on the dependent variable. Therefore, the general model of MARS can be expressed as Eq. (6).

$$ f\left( x \right) = \beta_{0} + \mathop \sum \limits_{m = 1}^{M} \beta_{m} h_{m} \left( x \right) $$
(6)

Starting from \({h}_{0}\left(x\right)=1\). The basic functions are iteratively added to the model. For each \({h}_{m}(x)\), there are two choices

$$ \begin{aligned} & \left( {x - t} \right)_{ + } = \left\{ {\begin{array}{*{20}l} {x - t} \hfill & {\quad if\, x > t,} \hfill \\ {0, } \hfill & { \quad otherwise,} \hfill \\ \end{array} } \right. \\ & \left( {t - x} \right)_{ + } = \left\{ {\begin{array}{*{20}l} {t - x} \hfill & {\quad if\, t > x,} \hfill \\ {0, } \hfill & {\quad otherwise,} \hfill \\ \end{array} } \right. \\ \end{aligned} $$
(7)

where X and t are called variable and node, respectively. For each variable, the identified node is different, and the coefficient \({\beta }_{m}\) is variable in each state of \({h}_{m}\left(x-t\right)\) and \({h}_{m}(t-x)\)). The number and location of the nodes are determined through forward–backward steps. In the forward step, a large number of nodes are generated, while in the backward step, nodes that contribute less to the overall fit are omitted. In each step, a linear process is selected from all the derivatives of each basis function that minimizes the model’s defects. Basis functions that have the least impact on the model are eliminated through elimination steps. The optimal model is then chosen based on the lack of fit index, which is evaluated using the reciprocal standardized criterion defined in Eq. (8).

$$ {\text{GCV}}\left( M \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {y_{i} - \hat{y}} \right)^{2} }}{{\left( {1 - \frac{C\left( M \right)}{n}} \right)^{2} }} $$
(8)

where \(\hat{y}\) is the model output, n is the number of observations, M is the number of non-constant terms in the model, and C(M) is the error function and is defined as C(M) = M + cd, where c is cost error factor to optimize the basic function, and d is the effective degree of freedom, that is equal to the number of independent basic functions.

Least square mean support vector regression model

The developed LSSVR is the SVR model which was proposed by Suykens and Vandewalle (1999) in order to improve the prediction accuracy. Compared to SVR, the LSSVR model has the same limitations, yet it has lower calculation complexity and higher accuracy and speed. The set of training data such as \({\{{x}_{k}, {y}_{k}\}}_{K=1}^{N}\) whose input data include \({x}_{k}\in {R}^{N}\) and output data \({y}_{k}\in R\) is defined as Eq. (9) (Xie et al. 2013)

$$ y(x) = W^{T} \varphi (x) + b $$
(9)

where T, b, and W are weights, the bias of the regression function, and output, respectively. \(\varphi \left(x\right)\) is used for a nonlinear mapping of inputs into a high-dimensional feature space. The nonlinear regression equation can be solved by Eq. (10).

$$ \min \,j(w,e) = \frac{1}{2}W^{2} \;W + \frac{1}{2}\gamma \sum\limits_{k = 1}^{N} {e_{k}^{2} } $$
(10)

According to the constraints, we have:

$$ y_{k} = W^{T} \varphi (\chi ) + b + e_{k} {\kern 1pt} ,\quad k = 1,2,...,N $$
(11)

where γ is the adjusting parameter and e shows the error rate. The solution is obtained using the Lagrangian form from the main objective function and Eq. (12):

$$ L(w,\,b,\;e,\;\alpha )j(w,e) - \sum\limits_{i = 1}^{N} {\alpha_{i} } \left\{ {W^{T} \varphi (\chi ) + b + e_{k} - y_{k} } \right\} $$
(12)

where \(\alpha_{i}\) is the Lagrange coefficient. Based on Kahn–Tucker conditions, the LS-SVM model is written as Eq. (13).

$$ y(\chi ) = \sum\limits_{k = 1}^{N} {\alpha_{k} } K(\chi ,\,\chi_{k} ) + b $$
(13)

where \(K\left(x, {x}_{k}\right)\) is called kernel function. In this research, the radial basis kernel function is used (Eq. 14).

$$ K(\chi ,\;\chi_{k} ) = \exp \left( { - \frac{{\left\| {\chi - \chi_{k} } \right\|}}{{\sigma^{2} }}^{2} } \right) $$
(14)

Error evaluation criteria

To assess the effectiveness of the MODFLOW code and ML models across different scenarios, error evaluation metrics were employed. The dataset was randomly split into two sets, with 75% of the data allocated for model training and the remaining portion for model validation (Mostafa et al. 2024). Performance metrics, including root-mean-square error (RMSE), mean absolute percentage error (MAPE), Nash–Sutcliffe efficiency (NSE), coefficient of determination (R2), and mean absolute error (MAE), were computed (Arya Azar et al. 2021b).

$$ {\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{o} - x_{p} } \right)^{2} }}{n}} $$
(15)
$$ {\text{MAPE}} = \frac{1}{n} \mathop \sum \limits_{i = 1}^{n} \left| {\frac{{x_{o} - x_{p} }}{{x_{o} }}} \right| $$
(16)
$$ {\text{NSE}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{o} - x_{p} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{o} - \overline{x}_{o} } \right)^{2} }} $$
(17)
$$ {\text{R}}^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{p} - x_{o} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{o} - \overline{x}_{o} } \right)^{2} }} $$
(18)
$$ {\text{MAE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {x_{p} - x_{o} } \right|}}{n} $$
(19)

where xo is the observed value, xp is the predicted (simulated) value, and n is the number of samples.

Results and discussion

Results of groundwater simulation

Groundwater simulation was conducted and calibrated in both steady and unsteady states, and the outcomes are depicted in Fig. 4. The figure illustrates the highest groundwater levels in the northern regions of the aquifers. Furthermore, the groundwater level exhibits variations ranging from 1353 to 1380 km across the Ajabshir aquifer, indicating substantial fluctuations in groundwater levels in this aquifer. Conversely, the southern areas of the aquifer, situated at lower elevations, exhibit lower groundwater levels.

Fig. 4
figure 4

Groundwater simulation results

Figure 5 displays the calibrated values for both aquifers. The hydraulic conductivity values within the aquifer span from 1 to 15 m/day, with the northern region of the aquifer demonstrating the highest values (approximately 12 to 15 m/day). Additionally, the south and southeast areas of the aquifer show lower hydraulic conductivity values, ranging from 1 to 5 m/day. However, in the deeper central part of the aquifer, the hydraulic conductivity varies between 7 and 10 m/day.

Fig. 5
figure 5

Calibrated values of hydraulic conductivity

The error evaluation criteria values for both aquifers in both steady and unsteady states indicate that the simulations exhibit satisfactory accuracy, as shown in Table 5. The RMSE values for the aquifer in the two states were 0.77 m and 0.86 m, respectively. Moreover, coefficients of determination exceeding 0.98 indicate a highly favorable agreement between the actual and simulated data in both aquifers. In the following, changes in groundwater level were investigated according to the results of simulation. Then, the groundwater level, which represents the state of the aquifer, along with other variables, was used to estimate the groundwater drought index.

Table 5 Error evaluation criteria for numerical simulation of the aquifer

The results of drought indicators

The values of SPI and SDI, calculated using precipitation and runoff data, are presented in Fig. 6. According to the figures, the study area experienced a situation close to normal during the years 2001 to 2005. However, starting from 2005, severe droughts in some months of 2006, 2007, 2010, and 2019 can be observed due to irregular precipitation patterns. Throughout the study period, particularly in the later years, with increasing rainfall, the SPI indicates a return to normal conditions, indicating wetter years. The SDI values also indicate that from 2001 to 2007, the study area experienced wet year conditions. A hydrological drought was observed between 2007 and 2011. Normal to wet hydrological conditions were observed in the later years of the study period (Fig. 7).

Fig. 6
figure 6

The results for SPI and SDI

Fig. 7
figure 7

Groundwater resource index (GRI) during the study area

The performance of the models in predicting groundwater drought indicators

Both MARS and LSSVR models exerted promising results in predicting the GRI. As mentioned earlier, four input scenarios were used for simulation purposes, which included various combinations of precipitation, runoff, meteorological drought index, and hydrological drought index. The results of the error evaluation criteria (Table 6) showed that using LSSVR and considering the third scenario, which included groundwater level, meteorological index, and precipitation index as input values, resulted in RMSE, MAE, NSE, and MAD values equal to 0.60, − 0.03, 0.80, and 0.37, respectively, for the test data. Similarly, using the MARS model, the fourth scenario, which included all input variables, produced RMSE, MAE, NSE, and MAD values of 0.37, − 0.19, 0.83, and 0.30, respectively, for the test data. This indicates that the MARS model performed better than LSSVR. The first scenario in both models exhibited the lowest prediction performance, as it only included groundwater level and SPI variables. Therefore, using this combination alone does not yield proper GRI prediction performance. Similar results were obtained with the second scenario, indicating that having the groundwater level and only one of the meteorological or hydrological drought indicators is insufficient for accurate GRI prediction. Hence, using the third scenario in LSSVR and the fourth scenario in MARS allows for reliable prediction performance of GRI values.

Table 6 The performance of predictive models for the training and test data

The time series results of the GRI values calculated and predicted by the models are shown in Fig. 8. According to the figure, both models provide an acceptable estimate of the trend of the GRI. However, in Fig. 8A, it can be observed that the LSSVR model’s error ranges from − 1.5 to 1, with more noticeable deviations at the beginning and end of the modeling period. The histogram results also indicate that the error follows a normal distribution, with a standard deviation of 0.422. On the other hand, Fig. 8b depicts the estimation with slightly better performance for MARS, where the error in each step falls within a range of 0.5 to − 1.5. Additionally, the figure illustrates that the error of the estimation follows a normal distribution with a standard deviation of 0.37, signifying the model’s proper performance.

Fig. 8
figure 8

Time series of the observed and predicted values using the a LSSVR and b MARS models

Plotting the observed versus predicted GRI data by the models shows low scattering from the regression line (Fig. 9). The coefficient of determination of the LSSVR and MARS models was equal to 0.80 and 0.83, respectively, obtained for the third and fourth scenarios, respectively.

Fig. 9
figure 9

The observed and predicted values of GRI obtained by a MARS and b LSSVR

For further investigation, the performances of the models were compared using Taylor’s diagram (Fig. 10). In this diagram, the vertical axis, arcs inside the quarter circle, and arc of the quarter circle show the standard deviation, RMSD, and correlation coefficient, respectively. According to the diagram, models positioned closer to the observation data demonstrate a better estimate of the drought indicators. In Fig. 10, the correlation coefficient for both models is approximately 0.90, with the MARS model showing a slightly higher value compared to LSSVR. Furthermore, the position of the MARS model is closer to the observational data. The standard deviation values for MARS and LSSVR are 1 and 0.8, respectively, indicating that the MARS model aligns more closely with the standard deviation values of the observed data. Despite this, both models have RMSD values below 0.5, suggesting their appropriate performance. Consequently, the MARS model exhibits better prediction of the GRI. The results of Taylor’s diagram support the findings of the error evaluation criteria and indicate the models’ overall performance.

Fig. 10
figure 10

Taylor’s diagram of the study models

Discussion

Groundwater stands as a vital water supply source in numerous regions globally, particularly in arid and semiarid areas. Challenges such as droughts, climate change, and intensive agricultural activities aimed at ensuring food security have resulted in a notable decline in both the quantity and quality of groundwater. Consequently, the simulation of groundwater and the examination of the aquifer’s status represent crucial initial measures in effectively managing this valuable resource. The results of the groundwater simulation in the study area indicate that despite the presence of suitable surface water resources, groundwater is still at risk of excessive extraction, necessitating further comprehensive studies in this region. The performance of MODFLOW in aquifer simulation was evaluated, and the results demonstrated its suitability for managing surface water and groundwater resources. Therefore, according to the obtained results, it can be stated that the groundwater resources in the area have suffered a significant decrease in water volume, which requires the implementation of appropriate management scenarios. Applicable scenarios can be the use of optimal integrated exploitation of groundwater resources and dams, which are not done currently in an optimal integrated way. Also, the agricultural water transmission and distribution system, which is mainly supplied from groundwater resources, is traditional and somewhat outdated, which causes a lot of water wastage. Therefore, according to the necessity of proper management, the distribution and transmission system can be improved in order to prevent water wastage. Agriculture in the region is one of the largest consumers of groundwater, and most of the irrigation methods in the region are still traditional. Also, in this area, the cultivation pattern has not changed to create a suitable situation for groundwater resources. Therefore, with a reduction of about 5–30% of aquifer harvesting in this sector along with changing the irrigation system and cultivation patterns, one can hope to improve the condition of the aquifer in a relatively short time.

The implementation of ML models for GRI prediction yielded promising results. The integration of SPI and SDI with groundwater levels resulted in accurate GRI predictions. This signifies the potential for predicting groundwater drought using both groundwater levels and drought indices. The results of ML models in GRI and drought indices prediction using ANNs (Banadkooki et al. 2021) and ANFIS (Gocić et al. 2015) also showed that they have a promising performance in GRI estimation. Therefore, employing ML models is highly recommended in groundwater resource management, as they serve as useful tools for predicting various types of droughts. These findings align with the research conducted by Milan et al. (2023).

The results of the models indicated that although their structures differed, their performance was relatively similar. The MARS model possesses a more complex structure compared to LSSVR resulting in higher computational complexity and requiring a greater understanding of modeling principles. Additionally, MARS produces numerous output functions, which may pose challenges in their application, whereas LSSVR lacks such complex nonlinear relationships and has relatively simpler coding. As a result, based on the model results and considering their structures, LSSVR may be preferred over MARS for GRI modeling. The data utilized in this research comprised precipitation and runoff flow rate, but other meteorological information, such as evaporation and temperature, could also be employed to estimate index values. Furthermore, it is advisable to explore alternative ML models in future studies for estimating SPI, SDI, and other drought indicators. Comparing the results of other models to those obtained in this research represents a useful step in selecting an appropriate ML model within this field.

Conclusion

Groundwater is one of the main sources used to meet the water demands of the industry, agriculture, and drinking sectors due to its accessibility and low operational cost, along with its acceptable quality. However, recent droughts have resulted in the excessive exploitation of aquifers. In this research, MODEFLOW was utilized to simulate groundwater and investigate the water resources situation in Ajabshir, located in the northwest of Iran. Subsequently, ML models were employed to predict the groundwater drought index of the aquifer. The input variables included groundwater flow, precipitation, SPI, and SDI, as well as precipitation and discharge flow. The results of the groundwater simulation indicated a relatively stable groundwater condition in the area. Furthermore, the hydrograph of the simulated aquifer revealed an increasing trend in groundwater withdrawal, which may lead to excessive utilization of groundwater resources in the vicinity. Meteorological and hydrological drought indicators also showed the prevalence of severe drought in the years 2006, 2007, 2010, and 2011. However, in the final years of the research period, the situation approached normal conditions and indicated a wet year.

The performance evaluation of the MARS and LSSVR models demonstrated their capability in estimating the GRI, making them applicable for predicting groundwater drought indexes in similar areas. Additionally, the prediction results of the groundwater drought index suggested that the combination of groundwater flow, SPI, and SDI variables can enhance prediction accuracy. Such predictions can greatly assist decision-makers in this field. Considering the significant negative impacts of climate change, particularly in arid and semiarid regions, it is recommended to incorporate the influence of this phenomenon when modeling the GRI. Moreover, various metaheuristic evolutionary algorithms are available that can enhance the performance of base ML algorithms, thus improving the results of individual models. Implementing such an approach can be beneficial for estimating other challenging drought indicators. In addition to the above, the implementation of management scenarios includes the modification of the cultivation pattern, the modification of the agricultural water distribution and transmission system, and the irrigation system along with the reduction of agricultural water withdrawal from the aquifer, are among the solutions that can be implemented in this area and other similar areas neighboring Lake Urmia, which causes improvements in the status of groundwater resources.