Introduction

Drought is a periodic natural hazard that negatively influences water resources (Nemati et al. 2019; Scanlon et al. 2017). This phenomenon is typically divided into different categories in terms of water deficit type, including hydrological that occurs when surface and subsurface water levels are significantly below normal (Barker et al. 2016), agricultural which considers soil moisture deficiency (Yan et al. 2017), socio-economical that accounts for water resources system deficit resulting from other types of drought (Huang et al. 2016), and meteorological droughts which account for precipitation shortage (Hameed et al. 2018). Hydrological drought is generally caused by an uninterrupted meteorological drought (Wu et al. 2017) with various effects on the economy, agriculture, and ecology, including reduced hydraulic power production and irrigation water supply (Mishra and Singh 2010). In other words, various types of droughts can occur at the same time, making it difficult to distinguish between them. As a result, univariate drought indices (rely on single variable) are insufficient for characterizing the complicated effects and conditions of droughts, and composite indices have been suggested to overcome this difficulty (Yang et al. 2018). Therefore, it is necessary to have efficient indices to simultaneously monitor hydrological and meteorological droughts.

In recent years, researchers have tried to introduce new multivariate indices by combining drought indices using various methods to gain more information on the features and situation of this climatic phenomenon (Fahimirad and Shahkarami 2021; Li et al. 2014; Motevali Bashi Naeini et al. 2021; Nazeri Tahroudi et al. 2020; Hosseini et al. 2023).

Numerous current research efforts have monitored and analyzed the types of droughts using interpolated grids or observational data (Henriques and Santos 1999; Santos et al. 2010; Shen and Tabios 1995). Meteorological hydrometric stations can monitor variabilities in precipitation and runoff and accurately estimate precipitation and hydrological fluxes across basins (Fekete et al. 2002). But, there are few or no stations in specific locations worldwide due to the high cost of gauge installation and maintenance, and the existing records may be lacking or related to smaller time frames (Salio et al. 2015; Sun et al. 2018). To overcome the data scarcity in developing regions, alternative sources of global gridded datasets have been developed by different institutions—such as Modern-Era Retrospective Analysis for Research and Applications (Rienecker et al. 2011) and Global Land Data Assimilation System (Rodell et al. 2004)—that benefit the monitoring of water resources in regions with no or poor gauge systems. Each dataset offers certain advantages concerning the provided variables, temporal and spatial resolution, geographic extent, and time period. The gridded datasets were divided into three categories: namely, reanalysis, satellite-based, and gauge-based products (Hosseini-Moghari et al. 2018; Saemian et al. 2022). The reanalysis products assimilate remote sensing and in situ observations as numerical models of the global atmosphere and land surface (Dee et al. 2011; Saha et al. 2010). Estimations provided by these datasets are used for many research and applied cases, such as trend analysis (Balling et al. 2016; Toride et al. 2018), drought monitoring (Ahmadebrahimpour et al. 2019; Golian et al. 2019; Liu et al. 2020), flood modeling (Dis et al. 2016; Nhi et al. 2018; Yuan et al. 2019), and stream-flow simulation (Try et al. 2020; Yuan et al. 2017).

Droughts can be predicted using various models, for example, data-driven models (Dikshit and Pradhan 2021; Park et al. 2017; Zhang et al. 2019; Nejatian et al. 2023), autoregressive integrated moving average (Mishra and Desai 2006, 2005), Markov Chain (Alam et al. 2014; Rahmat et al. 2017), and hydrological (e.g., (Trambauer et al. 2013; Xing et al. 2020). However, as is the case with the rest of the statistical and linear models, complex drought conditions cannot be predicted by single data-driven models. Multi-model combination/fusion-based method, as a new approach, improves the credibility of data-driven models and resolves their shortcomings. Drought can be assessed through fusion more accurately than models based on single data sources. This approach was proposed by Dasarathy (1997), and it allows for gaining greater insight than single data sources. This method has been employed for streamflow forecasting (Modaresi et al. 2018), river-level forecasting (See and Abrahart 2001), and flood frequency analysis (ensemble model) (Shu and Burn 2004). This approach can predict drought by merging single forecasts. It has been suggested that combining remotely sensed datasets provides better forecasts than in-situ observations ( Feng et al. 2019; Fooladi et al. 2021; Jiao et al. 2019; Park et al. 2017). A review of the previous research showed that various single artificial intelligence (SAI) models have been employed for drought modeling (Jalalkamali et al. 2015; Naderi et al. 2022). Meanwhile, fusion-based methods have outperformed single-modal methods in drought modeling.

Previous studies mostly focused on the evaluation and validation of various precipitation products worldwide (Rahmati Ziveh et al. 2022; Raziei and Sotoudeh 2017), whereas runoff products were validated in very limited regions (Qi et al. 2020). Meanwhile, few studies have directly investigated the application of fusion-based methods in fusing multiple datasets to improve runoff and precipitation trend assessment compared to in-situ measurements (Fooladi et al. 2023) and to promote accuracy, reliability, and stability of SAIs (Alizadeh and Nikoo 2018).

Therefore, the overall aim of this study is to fill this gap by assessing and inter‐comparing various precipitation and runoff gridded products through gauge observation and fusion-based methodologies to model meteorological and hydrological drought in 24 large river basins in Iran. Different gridded precipitation and runoff products were employed to estimate Non-Parametric Standardized Precipitation Index (NSPI) and Non-Parametric Standardized Runoff Index (NSRI) as meteorological and hydrological drought indices, respectively, against local observations. The new composite drought index namely the Fusion-based Hydro-Meteorological Drought Index (FHMDI) was developed which is composed of the mentioned indices. In this case, four SAI models—including Support Vector Regression (SVR), Ensemble, Gaussian Process Regression (GPR), and Feedforward Neural Networks (FNN)—were used to estimate FHMDI in different time scales. Considering the different accuracy of SAI models, four fusion-based methods—namely, Wavelet-Based (WB), Weighted Majority Voting (WMV), Extended Kalman Filter (EKF), and Entropy Weight (EW)—were applied to improve the results of SAIs. Overall, the research objectives are to:

  1. (i)

    Assessing multiple remotely sensed monthly runoff and precipitation datasets individually, as well as employing a fusion-based approach including WB, EKF, WMV, and EW to evaluate multi-source runoff and precipitation remote sensing datasets, in order to enhance estimation accuracy

  2. (ii)

    Calculate the FHMDI compound drought index in simultaneous hydrological and meteorological drought estimations at different time scales using various gridded products and compare the results with ground-based FHMDI

  3. (iii)

    Compare four SAI models including Ensemble, GPR, FNN, and SVR in estimating FHMDI with remote sensing datasets under six scenarios (model input selection) and employing a fusion-based approach including WB, EKF, WMV, and EW to improve the accuracy combined drought estimation

The rest of the paper is organized as follows: “Material and methods” briefly describes the study area datasets and the methodology used for the study, “Results” discusses the results, “Discussion” provides discussion, and “Conclusion” presents conclusions.

Material and methods

Study area

Iran is located in the southwest of Asia with an area of about 1.7 million km2 (Fig. 1). About 60% of the country is covered by two mountain ranges: the Zagros range extends southward from the northwest to the shores of the Persian Gulf, while the Alborz chain extends from the northwest to the northeast along the southern edge of the Caspian Sea. The elevation of Iran ranges from less than − 297 m at the Caspian sea to 5597 m in the Damavand peak of Alborz Mountain chain. Precipitation values in Iran are also very diverse so that the average annual precipitation during the study varied from 52.6 mm at Zabol station to 1694.7 mm at Anzali station. Also, mean annual runoff is on average among the basins about 39 mm, but is much lower, 0–25 mm, in 18 of the 30 basins (Moshir Panahi et al. 2020). Iran has six main water basins and 30 river basins. Table 1 shows the specification of the basins and their hydrometric and synoptic stations (Fig. 1).

Fig. 1
figure 1

Location of river basins and their synoptic and hydrometric stations across Iran

Table 1 General information on main river basins, their area, and selected synoptic and hydrometric stations

Datasets

This study used the following datasets:

  • -Observational gauge data

The validation of the data was done using monthly precipitation and runoff datasets sourced from the Iran Meteorological Organization and Iran’s Ministry of Energy from 24 selected synoptic and hydrometric stations (for each river basin). The research data were obtained over the 1987–2019 period, which was determined based on the availability of consistent data records with suitable quality and range (Table 1). Observational data were not available for six political basins (red areas in Fig. 1, including Aras, Attak, West-border, Hamun Hirmand, Hamun Mashkel, and Khaf river basins). Therefore, they were excluded from the research.

  • -Remotely sensed datasets

Five different datasets on precipitation and runoff from 1987 to 2019 were used here, including:

  • 1) GLDAS-2.1

GLDAS was produced with the cooperation of several institutions. This dataset accurately estimates the weather data by mixing ground-based and satellite measurements of precipitation, humidity, temperature, radiation, wind speed, etc. Moreover, Mosaic, Noah, CLM (Community Land Model), and VIC (Variable Infiltration Capacity) models were used to simulate land-surface states and fluxes (GES DISC Dataset: GLDAS Catchment Land Surface Model L4 monthly 1.0 × 1.0 degree V2.1 (GLDAS_CLSM10_M 2.1).

  • 2) ERA5

ERA5 is the 5th generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis and has greatly surpassed ECMWF Reanalysis Interim (ERA-Interim) in terms of horizontal resolution, spatiotemporal resolution, design simulation, numerical modeling, observational absorption, output frequency, and better model-level display of details (https://cds.climate.copernicus.eu/#!/search?text=ERA5).

  • 3) G-RUN ENSEMBLE (GRUN)

GRUN is a runoff reconstruction based on observation. It employs machine learning algorithms to estimate global runoff (Ghiggi et al. 2019). One of its shortcomings is that it is based on a single atmospheric forcing dataset (GSWP3; (Kim 2017)). Later on, Global RUNoff ENSEMBLE (G-RUN ENSEMBLE), as a new publicly-available global runoff reconstruction, was produced with up to 525 ensemble members through 21 atmospheric forcing datasets (https://figshare.com/articles/dataset/GRUN_Global_Runoff_Reconstruction/9228176).

  • 4) TerraClimate (TERRA)

TerraClimate is a high-spatial-resolution dataset (1/24°, ~ 4 km) of monthly climate and climatic water balance for global terrestrial surfaces. It uses climatically aided interpolation to combine the high-spatial-resolution climatological normal (using the WorldClim dataset) with other coarser resolution and time-varying (i.e., monthly) datasets. This is to generate a monthly dataset of wind speed, precipitation, vapor pressure, solar radiation, and max/min temperature. TerraClimate also creates monthly surface water balance datasets using a water balance model incorporating precipitation, reference evapotranspiration, temperature, and interpolated plant-extractable soil water capacity (https://climate.northwestknowledge.net/TERRACLIMATE).

  • 5) MERRA2

The Modern-Era Retrospective Analysis for Research and Applications, version 2, is a global atmospheric reanalysis generated by the NASA Global Modeling and Assimilation Office. It provides a regularly-gridded (0.5-degree latitude by 5.8-degree longitude) and homogenous record spanning the satellite observing era from 1980 to the present. Compared to MERRA, MERRA2 allows for assimilating modern hyperspectral radiance and microwave observations, along with GPS-Radio Occultation and NASA ozone datasets. It is accessible at https://disc.gsfc.nasa.gov/datasets?project=MERRA-2.

The overall specifications of these products are listed in Table 2. All gridded datasets were re-sampled to 0.5° × 0.5° through the nearest-neighbor interpolation method to match the gridded data from ground observations.

Table 2 Remotely sensed precipitation and runoff products (P, precipitation; R, runoff)

Methodology

The following is an overview of the research methodology (Fig. 2).

Fig. 2
figure 2

The research flowchart

As shown in Fig. 2, the research data were sourced from the ERA5, GLDAS, TERRA, MERRA2, and GRUN products. The precipitation and runoff data of mentioned products were validated against observational gauge data. Moreover, fusion-based precipitation and runoff data were assessed to enhance the accuracy of runoff and precipitation evaluations at all river basins. Single and composite indices were calculated using observational data at different time scales for each river basin. For better drought modeling, the AGNES clustering technique as a classification algorithm was employed to cluster ground river basins based on FHMDI into different groups with similar features. Then, for each cluster, four SAI models, namely, GPR, FNN, SVR, and Ensemble were developed under six scenarios to estimate and predict drought using a remotely sensed dataset, and their error indices were determined. In the following, four fusion methods including WB, WMV, EKF, and EW are used for fusing estimations from the four SAI estimator models.

Calculation of runoff based on remotely sensed products

The sum of overland flow, interflow, and groundwater equals runoff in a grid (q grid):

$${q}_{{\text{grid}}}={q}_{{\text{sur}}}+{q}_{g+i}$$
(1)

where \({q}_{{\text{sur}}}\) is the overland flow (m3/s) and \({q}_{g+{\text{inter}}}\) is the sum of interflow and groundwater (m3/s). The monthly runoff of various products at gauge sites is the sum of the upstream monthly runoff:

$${q}_{{\text{gauge}}}=\sum {q}_{{\text{grid}}}$$
(2)

where \({q}_{{\text{gauge}}}\) is the calculated monthly runoff at gauge sites. This is a popular routing method (Crooks et al. 2014; Wang et al. 2016) as it is nonparametric (Eq. 2) with no parameter uncertainty.

Three evaluation criteria were used here, namely, modified Kling–Gupta efficiency (KGE), normalized root mean square error (NRMSE), and mean bias error (MBE) because they are commonly utilized in uncertainty evaluations as follows (Knoben et al. 2019):

$${\text{KGE}} = 1-\sqrt[2]{{\left(r-1\right)}^{2}+{\left(\beta -1\right)}^{2}+(\gamma -1)^2}$$
(3)
$${\text{NRMSE}}=100\times \frac{\sqrt{\frac{{\sum }_{i=1}^{N}{({C}_{pi}-{C}_{ei})}^{2}}{N}}}{{C}_{e{\text{max}}}-{C}_{e{\text{min}}}}$$
(4)
$${\text{MBE}}=\frac{{\sum }_{1}^{N}({C}_{pi}-{C}_{ei})}{N}$$
(5)

where N is the sample size,\({C}_{e}\) is the observed values, \({C}_{p}\) is the calculated value, r is Pearson’s linear correlation (measuring the temporal dynamics), \(\gamma\) is the variability ratio (measuring the relative dispersion between simulations and observations), and \(\beta\) is the bias ratio (measuring the total volume) (a value greater than 1 represents an overestimation of the simulations while a value less than 1 represents an underestimation).

Calculation of NSPI, NSRI, and FHMDI indices using ground-based and remotely sensed products

In this research, three drought indices, NSPI, NSRI, and FHMDI, were used for drought analysis. The Standardized Precipitation Index (SPI) is a popular index for assessing meteorological drought. It owes its popularity to its standardized nature, simplicity, and flexibility (Hayes et al. 1999; Sahoo et al. 2015). The Standardized Runoff Index (SRI) was introduced to investigate hydrological drought through the same method used by Mckee et al. (1993), to define SPI, which is the unit standard normal deviation associated with the percentile of hydrologic runoff accumulated over a timescale (Shukla and Wood 2008). Deriving SPI and SRI necessitates fitting a suitable parametric probability distribution function to precipitation data, which may not always be the best distribution function (Angelidis et al. 2012; Guttman 1999). This issue can be resolved using a nonparametric framework to derive NSPI and NSRI and better describe drought.

Regarding the necessity of nonparametric methods in deriving SPIs and SRIs, different statistical tests were used to determine the best probability distribution function for precipitation and runoff data in each ground station. The nonparametric method was employed here considering that different parametric probability distribution functions were fitted to precipitation and runoff data (Alizadeh and Nikoo (2018); Fooladi et al. 2021). Therefore, the probabilities of these data were calculated through the empirical Gringorten plotting position (Farahmand and AghaKouchak 2015):

$$p\left({x}_{i}\right)=\frac{i-(44/100)}{n+12}$$
(6)

where \(p({x}_{i})\) is the empirical probability, i is the rank of non-zero precipitation data, and n represents the sample size. Here, the Standardized Index can be obtained using Eq. 7:

$$\text{SI}=\varphi^\wedge(-1)\left(p\right).$$
(7)

where φ is the standard normal distribution function and p is computed empirical probability.

Then, the FHMDI is developed using NSPI, NSRI, and WB fusion-based method s(that will explain at “Fusion-based models”) to monitor compound hydrological and meteorological drought for different timescales (6, 9, and 12 months). It should be noted that this new index was calculated on the basis of reanalysis data.

AGglomerative NESting (AGNES) clustering algorithm

Drought can be analyzed and estimated more accurately by using AGNES to classify ground-based observation data into different groups based on similar features. FHMDI (calculated based on observational data) was considered as a clustering criterion to classify the data into various groups based on their feature resemblance. Agglomerative clustering, as a hierarchical clustering algorithm, is an unsupervised machine learning technique dividing the population into clusters with data points being more similar in the same cluster and vice versa (Li et al. 2022).

Single artificial intelligence (SAI) models

Four SAI models, namely, FNN, GPR, Ensemble, and SVR, were employed as estimator models for drought modeling using remotely sensed precipitation and runoff data under six scenarios (as model input) through FHMDI based on river basin’s observational data (as output model) that will explain them separately at “Scenarios of simulation.” According to AGNES results, the corresponding data for each river basin in different clusters were used to develop SAI models. These SAI models were trained (calibration) and verified (validation) using 80% and 20% of the dataset, respectively. The data were split through shuffled sampling. In addition, different percentage ratios—such as 70:30, 75:25, 80:20, and 85:15—were tested, and the model performance was analyzed in the training and validation stages. Finally, the best output (estimated FHMDI) with the least estimation error was determined for each SAI model based on different statistical error indices. The method for drought estimation and prediction through SAI models is presented in Fig. 3. To enhance accuracy, reliability, and stability of SAIs, different fusion-based models are used to combine them which are presented in the following.

Fig. 3
figure 3

The component of proposed modeling in this study

Fusion-based models

Data fusion means aggregation and combination of multi-source data (the individual outputs of different estimation models) to obtain more accurate and reliable solutions as opposed to using single-source data (Dasarathy 1997). Here, four different fusing methods, namely, WB, WMV, EKF, and EW, were used to estimate 6, 9, and 12-month FHMDI using remotely sensed data. Accordingly, the best outputs (FHMDI) of the four SAI estimator models (Ensemble, GPR, SVR, and FNN) were fused. A summary of the four methods is given as follows.

  • a) Wavelet-Based (WB)

Wavelet transform is an important mathematical linear transformation that is used in various fields of science (Xu et al. 2004). The main contribution of this method is to resolve the shortcomings of the Fourier transform. Wavelet transform has a high resolution in both the time domain and the frequency domain. This transformation not only determines the frequency number in the signal but also determines when those frequencies occur in the signal. The wavelet transform enables this by functioning at different scales. In wavelet transform, the signal is first considered with a large scale/window, and its large features are analyzed. In the next step, the signal is examined with small windows, and the small features of the signal are obtained. Suppose there are sensors in a multi-sensor system for observing unknown quantities. Then, the output of the sensor j is:

$${Y}_{j}\left(t\right)=Y\left(t\right)+{n}_{j}\left(t\right)$$
(8)

where \({n}_{j }(t)\) represents the white noise added to the original signal Y(t) in the output \({Y}_{j }(t)\). The variance \({n}_{j }(t)\) is defined as σ_j2 = En_i2 (t)⌋, and E[x] is the mathematical expectation of X. If the observations are independent and unbiased, the measure can be estimated using the following LMS estimator:

$$\widehat{Y}=\sum\nolimits_{j=1}^{N}{W}_{j}{Y}_{j}$$
(9)

where \({W}_{j}\) is the weight applied to \({Y}_{j}\) and \(\sum_{j=1}^{N}{W}_{j}=1\). The estimation variance is also defined as Eq. 10:

$${\sigma }^{2}=\sum\nolimits_{j=1}^{N}{W}_{j}^{2}{\sigma }_{j}^{2}$$
(10)

It represents the jamming noise variance of the \({j}_{{\text{th}}}\) observation sequence. The observation sequence is referred to as the data sequence obtained from the \({j}_{{\text{th}}}\) sensor. Hence,\({\sigma }_{j}^{2}\) is simply called the variance of the \({j}^{{\text{th}}}\) observation or sensor sequence.

If the weight of all observation sequences is the same, that is,\({W}_{j}=1/N\). For all j and \(\widehat{{\text{Y}}}\) are estimated from Eq. 9, then, the arithmetic mean of observations will be N. The variance of this estimate is calculated by Eq. 11:

$${\sigma }_{{\text{avg}}}^{2}= \frac{1}{{N}^{2}}\sum\nolimits_{j=1}^{N}{\sigma }_{j}^{2}$$
(11)

Although the arithmetic mean is widely used for estimating measurements from multiple independent observations, the estimates are not optimal regarding the least mean squared error. Therefore, to minimize the polynomial in Eq. 10, the optimal weights should be equal to 1 (\(\sum_{j=1}^{N}{W}_{j}=1\)) which can be calculated through Eq. 12:

$${W}_{j}=\frac{1}{{\sigma }_{j}^{2}\sum_{j=1}^{N}\frac{1}{{\sigma }_{j}^{2}}}$$
(12)

Therefore, the minimum variance estimated from Y will be calculated as Eq. 13:

$${\sigma }_{{\text{min}}}^{2}=\frac{1}{\sum_{j=1}^{N}\frac{1}{{\sigma }_{j}^{2}}}.$$
(13)

In this case, the estimator is consistent and unbiased. It can also be proven that \({\sigma }_{{\text{min}}}^{2}\) is not only smaller than the variance of any observed sequence but also smaller than Eq. 11. If it is possible to obtain \({\sigma }_{{\text{min}}}^{2}\)=\(\frac{1}{\sum_{j=1}^{N}\frac{1}{{\sigma }_{j}^{2}}}\) in advance, relations 9 and 12 can be used to obtain the optimal estimation of the measurement in terms of the minimum mean squared error.

  • b) Weighted Majority Voting (WMV)

Majority voting is a simple and highly effective linear hybrid group learning method. For example, in real-world research on a particular issue, in case some experts are more competent than others, weighting the decisions of those qualified experts may improve the overall performance more so than mere multi-voting (Ekbal and Saha 2012, 2011). The same problem exists in combining results when using SAI algorithms because finding the best weight is important. If the weight coefficients are not chosen correctly, it will yield poor results. Now, to combine the dataset, the weight combination is used according to Eq. 14, which is derived from the single-layer perceptron neural network (de Almeida et al. 2020).

$${W}_{O}={\left({H}^{T}.H+\lambda .I\right)}^{-1}.{H}^{T}.T$$
(14)

where H is the input data, T is the target vector, λ is a small fixed value, and I is the singular matrix, which is also known as the penalty because it penalizes large weights in the optimization process. Now, having the optimal weights, by multiplying it with the data, a combined and optimal target vector can be obtained through Eq. 15.

$$T=H*WO$$
(15)
  • c) Extended Kalman Filter (EKF)

The EKF algorithm is used to merge the input data (taken from different datasets). This algorithm is applied in two phases (Kaczmarek et al. 2022). Suppose all system information is available up to sample k − 1. Then, based on the system’s mathematical model and the available data up to time k − 1, an initial estimate of the state variables in sample k is calculated. We call this step of the Kalman filter the prediction process. Now, naturally, a new measurement is made by the sensors at time k, that is, \({y}_{k}\) becomes available. In the second step of the Kalman filter, which is called the update phase, the prediction made in the first step is improved using the newly available data in order to obtain the final estimated states in the K sample. The prediction phase is performed through the following two equations:

$${\widehat{x}}_{k|k-1}=f({\widehat{x}}_{k-1},{u}_{k-1})$$
(16)
$${P}_{k|k-1}={(F}_{k-1}{P}_{k-1}{F}_{k-1}^{T})+{Q}_{k-1}$$
(17)

In these relations,\({\widehat{x}}_{k-1}\) is the vector of estimated states in the sample \(k-1\); \({u}_{k-1}\) is the control vector in sample \(k-1\); \({Q}_{k-1}\) is the noise matrix in sample \(k-1\); \({P}_{k-1}\) is the covariance matrix in the sample \(k-1\); \({\widehat{x}}_{k|k-1}\) is the prediction of the state variable in sample k based on the information from the system up to the sample \(k-1\); \({P}_{k|k-1}\) is the prediction of the covariance matrix in sample k based on information from the system up to the sample \(k-1\); finally, \({F}_{k}\) and \({F}_{k-1 }^{T}\) are the linearized dynamics obtained through Taylor linearization and the T mark on it indicates that it is a transpose.

$${F}_{k}={\text{frac}} \;{\partial }_{f}{\partial }_{x}{|}_{xk-1},$$
(18)

The update phase of the EKF is performed as follows:

$${{K}_{k}={P}_{k|k-1}H}^{T}{({H}_{k}{P}_{k|k-1}{H}^{T}+{R}_{k})}^{-1},$$
(19)
$${\widehat{x}}_{k}={\widehat{x}}_{k|k-1}+{K}_{k}\left[{z}_{k}-h\left({\widehat{x}}_{k|k-1}\right)\right],$$
(20)
$${P}_{k}=\left(1-{K}_{k}{H}_{k}\right){P}_{k|k-1},$$
(21)

Here, \({H}_{k}\) is the measured linearized dynamics, which is used in the update phase. This matrix is also calculated as follows.

$${H}_{k}=\mathrm{frac }{\partial }_{h}{\partial }_{x}{|}_{{\widehat{x}}_{k|k-1}}$$
(22)

In the formula, \({z}_{k}\) is the output measured by the sensors. Therefore, in the first step, the Kalman interest \({(K}_{k})\) is calculated and then the final estimate of the state variables \({(\widehat{x}}_{k})\) and the covariance matrix \({(P}_{k})\) in the sample k are calculated. Phrase \(\left[{z}_{k}-h\left({\widehat{x}}_{k|k-1}\right)\right]\) also indicates the difference between the actual output measured by the sensors and the output obtained in the prediction phase.

  • d) Entropy Weight (EW)

EW is a linear method that helps to weigh indicators (Zhu et al. 2018). In this method, the indicator-derived matrix is first standardized and the entropy of every indicator is calculated:

$${H}_{j }=\frac{- \sum_{i=1}^{m}{f}_{i,j}*{\text{ln}}({f}_{i,j})}{{\text{Ln}} (m)}$$
(23)

In which

$${f}_{ij}=\frac{{\text{abs}}({r}_{i,j})}{\sum_{i=1}^{m}{\text{abs}}({r}_{i,j})}$$
(24)

where \({f}_{ij}\) is the value of the index matrix, j is the variable, m is the year, and i is the month. The indicator weight is calculated using Eqs. 25 and 26:

$${D}_{j}=1-{H}_{j}$$
(25)
$${w}_{j}=\frac{{D}_{j}}{\sum_{j=1}^{n}{D}_{j}}$$
(26)

where \({w}_{j}\) is the assigned weight of each variable, which should be equal to 1 (\(\sum_{j=1}^{n}{w}_{j}=1)\); \({D}_{j}\) represents a measure of the entropy for the jth variable.

Scenarios of simulation

To make input model, six scenarios are considered as follows:

  • S1 (MI-8 V): Model input including four datasets of precipitation (TEERA, MEERA2, GLDAS, and ERA5) and four datasets of runoff (TEERA, MEERA2, GLDAS, and ERA5)

  • S2 (MI-2 V-F1): Two variables as representative of precipitation and runoff with datasets of each variable combined by WB fusion method

  • S3 (MI-2 V-F2): Two variables as representative of precipitation and runoff with datasets of each variable combined by WMV fusion method

  • S4 (MI-2 V-F3): Two variables as representative of precipitation and runoff with datasets of each variable combined by EKF fusion method

  • S5 (MI-2 V-F4): Two variables as representative of precipitation and runoff with datasets of each variable combined by EW fusion method

  • S6 (MI-2 V): Two variables as representative of precipitation and runoff selected based on the results of single- and multi-product models versus local observations)

Results

Temporal and spatial assessment of monthly runoff and precipitation gridded products at the river basins

First, the upstream basin of the selected hydrometric station was determined at each river basin. Then, upstream runoff for each product was measured against observational monthly point-based data from the selected stations. The result of the evaluation criteria for runoff at the monthly scale can be observed for all river basins in Fig. 4. As it is clear, all of the products showed that there is a higher correlation coefficient (KGE between 0 and 1) in winter (Jan-Feb-Mar) (40%) and spring (Apr-May-Jun) (30%) than in other seasons (Fig. 4a). GRUN and ERA5 products had the highest correlation (50%, 42%, 40%, and 30% above 0.5, respectively) than other products. It should be noted that all negative KGE values are related to high bias ratio values. The NRMSE was the lowest for GRUN year-round (with an average value of 33%) (Fig. 4b). GRUN had the highest accuracy (lower than 1) in the Mand, Gavkhuni, Gharesu Gorgan, Karun, Karkheh, Haraz-Ghareso, and Talesh-Mordab Anzali river basins, with the average NRMSE being the lowest in Karkheh (0.7) and the highest in Mand (0.99) river basins. All of the products had the lowest error in Karun and Gavkhuni river basins and the highest error in the Central Desert (even GRUN product). According to MBE values, GLDAS overestimated runoff in all of the seasons (80% mean values), whereas GRUN, ERA5, and TERRA underestimated runoff compared to observational values (with average values of 52%, 55%, and 62%, respectively) (Fig. 4c) among all river basins. All products mostly overestimated runoff during four seasons in the Siahkooh, Central Desert, Ghareghoom, Namke Lake, Tasht Bakhtegan, and Haraz-Sefidrood river basins, but not in Helle, Hamun Jazmurian, Karun, and Gavkhuni river basins.

Fig. 4
figure 4

The results of evaluation criteria for runoff datasets with observed runoff in different months and river basins (ER: ERA5; GR: GRUN; GL: GLDAS; TE: TERRA) (1987–2019)

Figure S1 (Supplementary Material) shows the result of the evaluation criteria for precipitation at the monthly scale for all river basins. As can be observed, all of the products showed a higher correlation coefficient (KGE between 0 and 1) in winter (Jan-Feb-Mar) (80%) and spring (Apr-May-Jun) (82%) than in other seasons (Figure S1-a). All products had a KGE value higher than 0.5 in 70% of the cases. NRMSE was the lowest and highest for MERRA2 and ERA5 products during four seasons (with an average value of 60% and 30%, respectively) (Figure S1-b). MERRA2 had the highest accuracy (lower than 1) in Haraz-Ghareso, Haraz-Sefidrood, and Gharesu-Gorgan (with an average value of 65%). According to MBE values, GLDAS over- and under-estimates precipitation in all of the seasons and river basins (50% mean values), while MERRA2, ERA5, and TERRA under-estimate precipitation compared to observational values (with average values of 72%, 77%, and 64%, respectively) (Figure S1-c).

Fusion-based assessment of runoff and precipitation gridded products at the river basins

Considering the various climate zones across Iran, the precipitation and runoff datasets do not have the same accuracy for different river basins, which makes it difficult to choose the best product for a specific area. The fusion-based approaches can be used to overcome this issue by increasing the accuracy of remotely sensed datasets. In this section, the combined remotely sensed products were investigated using four fusion-based methods. The Taylor diagrams in Fig. 5 demonstrate the data accuracy in each dataset and model compared against local observations at some river basins as an example. As can be seen, WMV had the closest performance to local observation in the Talesh river basin (Fig. 5a). WMV and WB methods had a good performance for precipitation and runoff, respectively, in Haraz-Ghareso. WMV showed the highest accuracy (~ 60%) compared to single-source precipitation and runoff products. In addition, WMV (50%) and EKF (20%) outperformed WB and EW methods in merging data in all river basins. Figure S2 provides information on some statistical metrics, including KGE, NRMSE, and MBE obtained from each precipitation and runoff product and four fusion-based models. It can be seen that all products and fusion-based models yielded different outputs compared to local observations. For instance, EKF fusion-based model outperformed other methods (with the lowest NRMSE 0.7 and highest KGE 0.83) for both precipitation and runoff in Helle, while MERRA2 and GRUN products had a good performance for precipitation and runoff, respectively, in the Lut Desert river basin. This difference may be due to the poor performance of products, which were not improved after fusion.

Fig. 5
figure 5

Taylor diagrams of precipitation data on remotely sensed datasets and two fusion-based models compared to local observations in the 1987–2019 period (left: precipitation, right: runoff)

Fusion-based Hydro-Meteorological Drought Index (FHMDI)

Individual indices including hydrological and meteorological drought indices were computed for the selected stations and different time scales (e.g., 6, 9, and 12 months). NSRI and NSPI appear to be good solutions for the problem of probability distribution fitting, because a fixed distribution may not always be the best choice. According to Table 3, the best probability distribution functions were used for precipitation and runoff data of each ground station through the Kolmogorov–Smirnov test. Results showed no relevance between the datasets and the probability distribution functions. Hence, NSPI and NSRI were used as nonparametric meteorological and hydrological drought indices in the river basins of Iran. Then, a composite drought index, called FHMDI, was calculated to monitor compound hydrological and meteorological drought using WB method.

Table 3 Top probability distribution functions for precipitation and runoff data of selected synoptic and hydrometric station

Compound drought modeling based on SAI models

Here, 24 river basins were classified into different clusters aiming to attain better composite drought prediction and estimation results. The AGNES model provided four clusters based on their feature similarity (FHMDI) (Table 4). Figure 6 shows the value of FHMDI12 in all clusters (for all river basins) as an example. FHMDI12 variation in monthly time series from 1987 to 2019 in all river basins shows that the highest drought severity (less than − 2) and extent (100% of region or all pixel) occurred in the 1999–2000 period (24 months) in all clusters across Iran.

Table 4 Results of AGNES algorithm for river basin classification
Fig. 6
figure 6

The change in the monthly FHMDI 12 in all clusters (for each river basin) in monthly time series from 1987 to 2019

Table 5 demonstrates the statistical error metrics for SAI models in FHMDI predictions at a 12-month timescale for different clusters in the calibration and validation stages, as an example. It can be seen that the error index differences at the validation stage are negligible for the four models in clusters 1, 3, and 4, suggesting that they are suitably trained and verified. Although none of the models is entirely superior to the others, GPR slightly outperformed other models under scenario 1 (MI-8 V) with the highest KGE (0.94) and lowest NRMSE (0.06). In cluster 2, the ensemble model with the average values of KGE = 0.95 and NRMSE = 0.07 had the lowest estimation error under scenario 1 (MI-8 V). Generally, drought modeling was satisfactory under scenario 1 in all clusters, while it was better in clusters 1 and 2 than in clusters 3 and 4.

Table 5 The evaluation criteria for SAI models in estimating 12-month FHMDI for calibration and validation stages in all clusters under different scenarios

Based on the results, scenario 2 (MI-2 V and WB method) also had a good performance (after scenario 1) in clusters 1 and 2 (KGE = 0.87 and NRMSE = 0.09), while there was no other satisfactory scenario in clusters 3 and 4. In other words, other scenarios performed poorly in clusters 3 and 4. Moreover, the results of the four SAI models at different time scales showed that the model performance improved by increasing the time scale (Table S1 and Table S2; respectively, 6- and 9-month timescales). The scatter plots of the estimated and observed 12-month FHMDI under scenario 1 are presented in Fig. 7. As can be seen, Ensemble and GPR models had a better correlation than other models in all clusters.

Fig. 7
figure 7

The performance of SAI models in estimating 12-month FHMDI in different clusters (cluster 1: South Baluchestan, cluster 2: Mehran-Kal, cluster 3: Lake Namak, and cluster 4: Haraz-Ghareso river basins) for all data (1987–2019) under scenario 1

Compound drought modeling based on fusion-based methods

Four fusion-based methods including WB, EKF, WMV, and EW were used for improving the SAI results under scenario 1 (since drought modeling had the best performance under this scenario). Table 6 presents the results of fusion-based methods in predicting FHMDI at the 12-month time scale for the calibration and validation stages in four clusters. WMV had the lowest error for all clusters. WMV also had the best results in cluster 3 with NRMSE of 0.02 and KGE of 0.99. Based on the NRMSE indicator, WMV enhanced the precision of the predicted FHMDI by 48%, 60%, 56%, and 45% compared to the best SAI model for clusters 1, 2, 3, and 4, respectively. Tables S5 and S6 show that fusion methods performed poorly in the short-term (6) compared to the long-term (12) timescale.

Table 6 The evaluation criteria for four fusion models in estimating 12-month FHMDI for calibration and validation stages under scenario 1

Figure 8 demonstrates the predicated FHMDI at a 12-month time scale using WMV for four ground stations in four clusters. As can be seen, WMV with \({R}^{2}=0.99\) had the best performance in cluster 3. Figure 9 shows that FHMDI estimations in the Hamun river basin (as an example) performed better in the 12-month (\({R}^{2}=0.9783\)) than in the 6-month (\({R}^{2}=0.8033\)) timescale.

Fig. 8
figure 8

Predicted 12-month FHMDI using remotely sensed datasets based on WMV in four river basins from different clusters for all data (1987–2019)

Fig. 9
figure 9

Predicted 6, 9, and 12-month FHMDI using remotely sensed datasets based on WMV in Karun river basin from different clusters for all data (1987–2019)

Discussion

Precise prediction of precipitation and runoff is a significant challenge in the study of drought characterization and monitoring (Nassaj et al. 2022). Additionally, drought modeling is essential for mitigating its impacts, informing the public about its consequences, and planning water resources (Docheshmeh Gorgij et al. 2022).

Several studies have evaluated the performance of different gridded precipitation datasets over Iran, unlike runoff. Among these studies, one conducted by Saemian et al. (2021) demonstrated the high accuracy of MERRA2 over Iran, which aligns with the findings of this research. Similar conclusions have been reported by other studies regarding the accuracy of MERRA2 in various regions worldwide such as Chen et al. (2019), Kuswanto and Naufal (2019), Le et al. (2020), Odon et al. (2019), and Ullah et al. (2021).

Our findings regarding the combination of multiple precipitation datasets align with the results reported by Beck et al. (2017), Xu et al. (2020), and Fooladi et al. (2023). These studies also demonstrated that combining multiple precipitation datasets resulted in higher accuracy compared to relying on a single dataset alone. It is important to note that there is currently no existing study specifically focused on the combination of runoff datasets for Iran or any other region. Therefore, our research on the assessment of runoff, particularly in Iran, represents a novel contribution to the field.

The performance of FHMDI at all river basins was also measured against numerous drought events in Iran during 1998–2001(Morid et al. 2006; Hosseini et al. 2023) which confirmed their efficiency (composite drought index) in drought monitoring (as can be observed at the next section).

Previous studies such as Mishra and Desai (2006), Morid et al. (2007), Bacanli et al. (2009), and Marj and Meijerink (2011) have provided evidence supporting the effectiveness of machine learning models in accurately forecasting drought. These studies highlight the potential of machine learning models to capture the complex relationships between meteorological and hydrological variables and drought occurrence. Among the various models used in research studies, the Gaussian Process Regression (GPR) model has demonstrated high performance in several research studies (Sihag et al. 2017; Mishra and Kushwaha 2019; Shabani et al. 2020; Ghasemi et al. 2021) similar to our results.

Our findings regarding the effectiveness of fusion-based models in accurately meteorological forecasting drought at different time scales are consistent with the results reported by Alizadeh and Nikoo (2018) and Fooladi et al. (2021). In other words, the fusion-based model application improved the accuracy of forecasting compared to all SAI models alone.

Conclusion

This study investigated four fusion-based methods for improving the accuracy of precipitation and runoff evaluation with low uncertainty and estimation of FHMDI using different remotely sensed precipitation and runoff products. In addition, the study compared the performance of single and composite products and single and composite models under different scenarios based on the four proposed fusion-based models using remotely sensed data versus ground-based FHMDI estimations at different timescales. The results indicated that while single products showed a good performance in some river basins, some fusion-based models improved the accuracy of dataset products compared to local measurements. However, the fusion-based assessment showed that fusing datasets did not perform satisfactorily in some river basins due to the high uncertainty of some single products. SAI models showed acceptable performance in composite drought prediction, especially at the 12-month timescale in clusters 1 and 2 under scenario 1. Fusion-based models significantly improved the accuracy of FHMDI estimation and research results compared to those of SAI models based on local measurement, especially in cluster 3. The WMV fusion-based method, with the lowest error, had a good performance in comparison with other products and other SAI and fusion-based models. In addition, the estimation of FHMDI 12 had higher precision than FHMDI 6 as shown in other research. Overall, the WMV model was the best fusion framework with good performance in predicting FHMDI using remotely sensed datasets compared to ground observations, so it can be adopted as a reliable and accurate method for composite drought modeling.