1 Introduction

Deficiency of water is larges problems in worldwide expensive pressures to ecologies, an especially in various areas with different climatic types (Mokhtar et al. 2021; Yaseen et al. 2021; Kushwaha et al. 2022a). Dry is a spatiotemporal occurrence that develops when rainfall levels fall beyond normal for a prolonged period of time, presenting an ecology risks (Ihinegbu and Ogunwumi 2021; Kushwaha et al. 2022c). It crucially affects the different sectors related ti ecological components as a water resources availability, vegetation, agricultural, industrial production, wildlife, river basin systems, and socio-economic structures (Ahmed et al. 2019; Abdo 2020; Hateffard et al. 2021). However, global literatures on drought assessment indicates that extreme drought episodes are predicted to an increase problems about the climate change (Deo et al. 2018; Malik et al. 2021). Therefore, the process of predicting and modelling characteristics of drought, including duration, intensity, and frequency, which is a vital and urgent mechanism in the context of adequate water resources management to agricultural, industrial and service development.

Since the last five decades, current area are always facing the biggest problem of metrological and agriculture drought in the semi-arid region. Overall climate change factors impact on the entire global climate, and main important factors affected on the increasing temperature and decreasing agriculture crop (World Bank 2003). During the year 2013, severe drought is occurred due to low rainfall measured during monsoon season in Maharashtra, India. It was worst in the last 40 years. Almost 60% of agriculture depend on the rain water, and no any other source of water for crops during a drought situation in India. Indeed, "Indians know that Monsoon is India's real finance minister," the ecological activist Sunita Narian has been claimed. India faces its worst-ever water crisis in a drought period (Udmale et al. 2014). NITI Aayog, the leading policy investigation centre, says that demand for water in the country is projected to double its supply by 2030. Millions of farmers are now struggling to live with drought and crop failure in different parts of the country. Drought has been declared to be affected by more than 80% of districts in the Karnataka state and 70% in Maharashtra (Udmale et al. 2014; Sreekesh et al. 2019).

Researchers and scientist have developed and available many mathematical indicators in worldwide, which is measuring and an evaluating drought incidents, and crop moisture index (CMI) (Palmer 1968); Palmer Drought Severity Index (PDSI) (Palmer 1965); and Standardized Precipitation Index (SPI) (McKee et al. 1995; Shukla and Wood 2008). SPI is one of the most reliable indicators are used to globally assessment of meteorological drought, since its suitability and capabilities to assess on multiple temporal scales (1-, 3-, 6-, 9-, and 12-month timescales) with relevant constructive interpretations (Dutta et al. 2013; Bonaccorso et al. 2015; Ionita et al. 2016; Komasi et al. 2018). However, SPI is a probabilistic index that used rainfall variability as a crucial indicator influencing drought severity and duration, ranges between -3 (severe drought) to + 3 (wet conditions) (Dutta et al. 2013; Barker et al. 2016; Saada and Abu-Romman 2017; Ali et al. 2019b).

SPI is a non-linear method based on the drought prediction by conventional statistical methods with great uncertainty. Recently, the implementation of machine learning (ML) algorithms was used to SPI estimation and based on the support vector regression (SVR) (Belayneh et al. 2014; Borji et al. 2016), an extreme learning machine (ELM) (Park et al. 2016; Ali et al. 2019a; Liu et al. 2021), linear genetic programming (LGP) (Mehr et al. 2014), adaptive regression spline (MARS) (Deo et al. 2017), an extremely randomized tree (ERT) (Rhee and Im 2017), adaptive neuro-fuzzy inference system (ANFIS) (Nguyen et al. 2015; Gocić et al. 2015), artificial neural network (ANN) (Deo and Şahin 2015; Banadkooki et al. 2021), M5 Tree (M5T) (Yaseen et al. 2021; Elbeltagi et al. 2022b), and random forest (RF) (Masroor et al. 2021; Elbeltagi et al. 2022b) have presented huge potentials with promising results for an estimating and modeling drought compared with conventional methods.

Super vector machine (SVM) is a novel machine-learning algorithm has been reported as a reliable method dealing with complex sorting issues (Suess et al. 2015; Borji et al. 2016; Kushwaha et al. 2021). Moreover, the implementation of SVM has essential features such as advanced validation, geometric explanation, and precise statistical tracking with a low number of a training data sets (Zhang and Wu 2012; Achirul Nanda et al. 2018). In the SVM models, the kernel functions are presented a worthy tool because it aids in determining the dataset to a more an excellent proportional environment to acquire a reliable an explanation at classification method (Achirul Nanda et al. 2018; Sihag et al. 201.8). prediction of drought is must for the understanding of the intensity of drought in future, which plans best for controlling and decreasing less impact by drought condition and climate changes in future. In this context, developed some models for forecasting of drought in the semi-arid region (Sakaa et al. 2022). These machine learning models can be also good predicated the information by using the correct information of input variables (Buttafuoco et al. 2015). In India every year, any part of country is suffering drought but no one given the future predication of drought and intensity, this is a so long research gap in the drought research and also forecasting models (Pande et al. 2022). Machine learning models can be given development of various models for forecasting of drought in the semi- arid region. Various models present but researchers are not applied for the development of drought forecasting in the India study area (Pande et al. 2022). Some of researchers is not believed on machine learning models but in future such kind of study can be very helpful for quick development of machine learning model for forecasting of values (Tan et al. 2019). Now in this adopted this model have been given more accurate results depend on the 30 year SPI data (Rhee and Im 2017). Compared with previous studies and current study directly shown a very big research gap related to forecasting of drought and machine learning models in the semi-arid condition. This current research results are a cover an entire research gap filled in the view of drought forecasting and development of ML model to prediction of metrological and agriculture drought. Despite the wide variety of drought prediction algorithms, the generalization of a model suitable for all global and regional climatic characteristics is still a necessary procedure. Importantly, studies related to testing the potential of Kernal functions in modeling and predicting drought are still limited. Novelty of this research, we have developed the forecast model for semi-arid condition based on the 30-year rainfall data and processing entire datasets and development model in the python programming. We have found the best models for forecasting of drought particularly in the dry land area. Such type of research works first time introduced and suggested to policy makers, future plans and governments. However, projected results can be used to preparation of drought policy and agriculture purposes in the semi-arid region. This is main novelty and objective has been achieved for sustainable development planning and management. The present study focuses to improve the performance of SVM algorithm using sequential minimal optimization (SMO) under various kernel functions i.e., SMO-SVM poly kernel, SMO-SVM Normalized poly kernel, SMO-SVM PUK (Pearson Universal Kernel) and SMO-SVM RBF (radial basis function) kernel for prediction multiscale droughts at the upper Godavari River basin in central India, affected by several droughts.

2 Material and Methods

2.1 Study area

The upper Godavari River basin is located in the central India of Maharashtra state. It is located between 19°00′00′′ and 20°30′00′′ N latitudes and 73°20′00′′ and 75°40′00′′ E longitudes (Fig. 1). This basin area is in the semi-arid zone with basaltic rock. The average rainfall is 700-850 mm with minimum and maximum temperature observed is 25° to 40 °C. The agricultural land is most dependent on rainwater and groundwater. The weather stations information is presented in Table 1. It comes from Trimbakeshwar, which is located at a height of 1067 m in Maharashtra's of Nashik district. It mostly passes through to the eight states of Maharashtra, Telangana, Andhra Pradesh, Karnataka, Madhya Pradesh, Chhattisgarh, Odisha, and Puducherry as it flows into the Bay of Bengal. The Upper Godavari river basin, which is wholly inside the state of Maharashtra, is also a part of the study region. The basin is usually dry in other seasons because the monsoon season is when it rains the most. The basin of water resources is primarily used to agriculture purposes. In this study area has been found basaltic rock overall basin. This river basin area is most important for agriculture crops such as onion, sugarcane, vegetables, wheat and horticulture production, however most of the crops are cash crops in surrounding basin area.

Fig. 1
figure 1

Location map of the study stations

Table 1 Details description of the study stations

2.2 Methodology

The present study was evaluated the best subset regression model and sensitivity analysis, which is selected best combinations of variables as an input to build of machine learning models. Four machine-learning algorithms were considered to develop in MATLAB software varying in various kernel functions: poly kernel, Normalized poly kernel, PUK and RBF. The adopted methodology is presented in Fig. 2.

Fig. 2
figure 2

Flowchart of drought estimation methodology in the study area

2.2.1 Support Vector Machine Algorithm

Support vector machine (SVM) is a classification and regression supervised learning technique. SVM is determined the extrication hyperplane that separations vector space in vectors sub-set assign to a collection of p-dimensional vectors space; every divided sub-set (data set) is allocated to one class. The SVM algorithm was introduced and created by Vladimir Vapnik and Alexey Chervonenkis in 1963 (Cortes and Vapnik 1995). SVM has been proposed through kernel technique on largest boundary hyperplanes to create non-linear classifiers (Boser et al. 1992; Cortes and Vapnik 1995). Linear SVP is given in Eq. (1)

$${x}_{1},{y}_{1}\quad\quad{x}_{n},{y}_{n1}$$
(1)

where \({y}_{1},{y}_{1}\) are also 1 and -1, for every showing the type, which the point \({x}_{1},{x}_{1}\) be appropriate. Every \({x}_{1},{x}_{1}\) is a \(\rho \rho\)-dimensional actual vector. We wish to discover the highest hyperplane that splits the class of points \({x}_{1},{x}_{1}\) for where \({y}_{1},{y}_{1}=1\) from the class of sites for which \({y}_{1}=-1\), which is specified so that the distance between the hyperplane and the nearest point \({x}_{1},{x}_{1}\) by either category is greatest.

Any hyperplane is written as the set of points xx filling with Eq. (2)

$${w}^{T}x-b=0$$
(2)

where is the normal vector to the support vectors (not necessarily normalized), this is similar to Hesse normal form, only it isn't always a unit vector.

During the training of support-vector machines, the quadratic programming (QP) problem can be solved using the sequential minimum optimization (SMO) approach (SVM). Support vector machines (SVMs) are frequently trained using SMO, which is implemented by the well-liked LIBSVM tool. The SMO algorithm was announced in 1998, and the SVM community was quite happy about it because earlier methods for SVM training were far more difficult and expensive third-party QP solvers were necessary. The aforementioned optimization problem is addressed by the iterative SMO algorithm. SMO breaks down this problem into a number of its tiny sub-problems, and then analytically resolves each one. Due of the linear equality requirement involving them, the lowest problem involves two Lagrange multipliers.

2.2.2 Poly kernel

The kernel functions are used to integrate the original pure (linear/non-linear) dataset into a higher dimensional space is required to convert it to a linear dataset.

A polynomial kernel is given in below

$$K\left(x, y\right)={\left({x}^{T} y\right)+c)}^{d}$$
(3)

wherever x and y are input vector space, i.e. feature vectors derived using training or testing samples, c ≥ 0 can factor interchange off an effect of greatest-order against lower-order define in the polynomial. When c = 0, the kernel is called as homogeneous (Shashua 2009).

An additional general poly kernel xTyxTy by a user-particular scalar factor

$$K\left(x,y\right)=(\varphi \left(x\right),(y))$$
(4)

An example of the nature φ can be observed. If we set Let d = 2, we become the quadratic kernel's different case. After reforming and multinomial theorem (double-the outermost use is the binomial theorem).

$$\begin{aligned}K\left(x,y\right)&=({\sum }_{i=1}^{n}{x}_{i},{y}_{i}+{C)}^{2}={\sum }_{i=1}^{n}{(x}_{i}^{2}) {(y}_{i}^{2})\\&\quad+ {\sum }_{i=2}^{n}{\sum }_{j=2}^{i-1}(\sqrt{2}\left({x}_{i},{x}_{j}\right)\sqrt{2 ({y}_{i},{y}_{j}}\\&\quad+{\sum }_{i=1}^{n}\left(\sqrt{2{cx}_{i}}\right)\sqrt{2{cy}_{i})}+{c}^{2}\end{aligned}$$
(5)

From this, it monitors that the feature is provided by:

$$\begin{aligned}\varphi (x)&=({x}_{n}^{2},\dots \dots \dots , {x}_{1}^{2}\sqrt{2 {x}_{n}}{x}_{n-1}, \sqrt{2 {x}_{n}}{x}_{1},\sqrt{2 {x}_{n-1}}{x}_{n-2}\dots \sqrt{{2x}_{n-1}{x}_{1}}, \\&=\sqrt{{x}_{n-1}}{x}_{1},\dots \dots \sqrt{{2x}_{n}{x}_{n-1}},\sqrt{{2x}_{n}{x}_{1}},\sqrt{{2Cx}_{1}X,})\end{aligned}$$
(6)

2.2.3 Normalize Poly Kernel

Normalize a poly kernel by dividing every element by the sum over all kernel elements, resulting in a normalized kernel with a sum including all elements equal to unity. This assures that the average pixel in the changed image is the same brightness as the original data.

Normalize poly kernel is expressed as

$$\mathrm{K}(\mathrm{m},\mathrm{n})=\frac{k(m,n)}{\sqrt{k\left(m,m\right),k(n,m)}}$$
(7)

2.2.4 PUK (Pearson Universal Kernel)

The Pearson VII function for the fitted graph (Sihag et al. 2018) is given by;

$$\mathrm{f}\left(\mathrm{x}\right)=\frac{H}{\{1+((2(x-{x}_{0}]\sqrt{\frac{1}{2\omega -1}/{\delta }^{2}]}}\omega$$
(8)

where H is peak height at its center, x0x0, and x is the single variable. The half-width and investigation parameters of the peak are controlled by the parameters x and ω. A function, on the other hand, is considered legitimate if the kernel matrix relating to it is homogeneous and positive semi-fixed. Üstün et al. (2006) rewrote the preceding equation into a function of two vectors to show that the PUFK does indeed satisfy these constraints (Zhang and Ge 2013; Sihag et al. 2018).

$$k\left({x}_{i},{y}_{j}\right)=\frac{1}{[1+((2\sqrt{{({x}_{i},{y}_{j})}^{2}\sqrt{{2}^{{(^1/_\omega-1^)/_\delta)}^{2\omega }}]}}}$$
(9)

where \({x}_{i},{x}_{i}\) and \({x}_{j}{x}_{j}\) are both vector arguments. The peak off-set define \({x}_{0}{x}_{0}\) in Eq. (8) Without losing generality, the maximum height H is eliminated and simply replaced by 1. For every random pair of numbers, the Pearson VII function kernel will result in a matrix with ones on the diagonal as well as all values range of 0 and 1 (\({x}_{i},{x}_{i}\) and \({x}_{j}{x}_{j}\)). It is a strong equivalent or even more mapping power than conventional kernel functions, resulting in SVM performance that is equal or better than standard kernel functions.

2.2.5 RBF (Radial Basis Function) Kernel

The kernel function is used in several machine learning algorithms. The concepts underpinning kernel functions are quite complex, thus understanding them is essential. Various kernel functions are regularly utilized. The radial basis function (RBF) kernel is most prevalent (at least for this study). The RBF is expressed as an equation:

$$k\left(x,{x}^{{\prime}}\right)=\mathrm{exp}(-\frac{\Vert x-{x}^{{\prime}}\Vert }{2{\sigma }^{2}})$$
(10)

Here, K view points for RBF kernel. The x and x’ are vectors where x’ is a “point of reference” vector and the || term is Euclidean distance. The RBF kernel function returns a number that is always between 0.0 and 1.0. RBF gives 1.0 if x and x' are equal. The lesser the value of RBF becomes as x and x' become more apart, approaching, but not quite reaching, 0. As a result, the RBF kernel is a similarity measure.

The Partial Auto Correlation Function (PACF), which also regresses the values of the time series at all shorter lags, provides the partial correlation between a stationary time series and its own lagged values, which can be used to choose the best input variables for the developed machine learning models. The autocorrelation function, in contrast, does not take into account extra lags. The examination of data to determine the amount of lag in an autoregressive (AR) model depends on this function. This function was initially used to find the required lags by graphing the partial auto correlative functions in the Box-Jenkins technique for modeling time series.

2.3 Performance Indicators

The actual data of SPI and modeled values were compared through 30 year datasets. To evaluate the accuracy of developed hybrid support vector machine models, the following statistical indicators have been used, i.e., Root Mean Square Error (RMSE), Coefficient of Determination (R2), and Mean Absolute Error (MAE) (Elbeltagi et al. 2022a, c; Kushwaha et al. 2022b). This indicators are well-defined as:

  1. 1.

    Root Mean square error (RMSE)

    $$\mathrm{RMSE }=\sqrt{\frac{1}{\mathrm{N}}{\sum }_{i=1}^{N}{{(SPI}_{A}^{i}-{SPI}_{P}^{i})}^{2}}$$
    (11)
  2. 2.

    Coefficient of determination (R.2)

    $${\mathrm{R}}^{2}= {\left[\frac{{\sum }_{i=1}^{N}{(SPI}_{A}^{i}-{\overline{SPI} }_{A}){(SPI}_{P}^{i}-{\overline{SPI} }_{P})}{\sqrt{{\sum }_{i=1}^{N}{{(SPI}_{A}^{i}-{\overline{SPI} }_{A})}^{2}}\sqrt{{\sum }_{i=1}^{N}{{(SPI}_{P}^{i}-{\overline{SPI} }_{P})}^{2}}}\right]}^{2}$$
    (12)
  3. 3.

    Mean absolute error (MAE)

    $$\mathrm{MAE}=\frac{1}{\mathrm{N}}{\sum }_{i=1}^{N}{|SPI}_{P}^{i}-{SPI}_{A}^{i}|$$
    (13)

In which, \({SPI}_{A}^{i}\) is an observed or actual value, \({SPI}_{P}^{i}\) is simulated or forecasted value,\(\overline{{SPI }_{A}}\) and \(\overline{{SPI }_{P}}\) are the mean values of observedand forecasted samples, and N is the total number of data points.

3 Results and Discussion

3.1 Best Subset Regression and Sensitivity Analysis

3.1.1 Input Selection Using Best Subset Model for the SPI 3, 6, and 12 Months of Selected Stations

The finest input grouping was certain by six statistics analysis norms (MSE, determination coefficients (R2), Adjusted R2, Mallows' Cp, Akaike's AIC and Amemiya's PC) at two stations and outputs are shown in Tables 2(A–C) and 3(A–C) for SPI-3, SPI-6, and SPI-12 respectively at Paithan and Silload stations. From Table 2(A), it can be seen that at Paithan station, the bolded blue of input combination is found as a better combination input variables, it is given best results of arithmetical norms like MSE, determination coefficients (R2), Adjusted R2, Mallows' Cp, Akaike's AIC and Amemiya's PC as 0.517, 0.726, 0.720, 2.180, -151.13, -130.29, 0826, respectively of among all input combinations for prediction of SPI-3 model. For SPI 6, the lowest values of MSE (0.436) and lowest values of Amemiya's PC of 0.172 among all input combinations. This input combination was achieved a high equal value for R2 (0.831) and a high Adj- R2 of 0.834. Furthermore, the lowest values of Mallows' Cp (-0.837) and lowest values of Amemiya's PC of 0.065 among all input combinations in SPI 12.

Table 2 The best subset regression analysis for determining the best input combinations at the Paithan station
Table 3 The best subset regression analysis for determining the best input combinations at the Silload station

Similarly, at the Silload station, the bolded blue input grouping having the lowest value of Mallows' Cp (0.363) and lowest value of Amemiya's PC (0.280) among all selected input combinations as the best input combination for SPI-3 and it has a high equal value for R2 (0.725) and a high Adj- R2 of 0.729. For SPI-6, the lowest values of Mallows' Cp (2.66) and lowest values of Amemiya's PC of 0.179 among all input combinations. This input combination was achieved a high equal value for R2 (0.824) and a high Adj- R2 of 0.828. Furthermore, the lowest values of Mallows' Cp (-2.02) and lowest values of Amemiya's PC of 0.073 among all input combinations for SPI-12. We have used 30-years data sets are divided into two sections i.e. 75% training and 25% testing dataset, which are measured for validation of the ML models.

3.1.2 Sensitivity Analysis

In this analysis of input combinations are obtained using regression investigation for classify the more important input factors used to forecast of multiscale SPI by machine learning models. The acquired result from regression proces at the Paithan station can be seen in Table 4(A–C) and Fig. 3a–c. The outputs of accomplished regression process based on whole input variables has been showed of SPI (t-1), SPI (t-4), SPI (t-5), SPI ((t-6), and SPI (t-9) have the highest absolute standard coefficients (β = 0.892, -0.103, 0.134, -0.615 and 0.101) recognized as more important input variables for simulation of SPI-3. In SPI 6 simulation, the conclusions of done regression analysis based on input variables have been proved that SPI (t-1), SPI (t-2), SPI (t-6) and SPI (t-7) by having highest absolute standard coefficients (β = 0.993, -0.071, -0.140 and 0.128) were observed as more significant input parameters, respectively. In addition, our results stated that SPI (t-1), SPI (t-3) and SPI (t-6) by having the highest absolute standard coefficients (β = 1.000, 0.086 and -0.157) were identified as the more significant input variables, which is used to estimating of SPI-12.

Table 4 The regression analysis for identifying the most effective parameters at the Paithan station
Fig. 3
figure 3

The standardized coefficients of input variable for sensitivity analysis Paithan station for a SPI-3, b SPI-6, and c SPI-12

The Table 5(A–C) and Figs. 4a–c listed the outcomes of regression analysis to identify the most effective input parameters in predictive data-driven models at the Silload station. The results of performed regression analysis on all input parameters proved that SPI (t-1), SPI (t-3), SPI (t-4), and SPI (t-10) having the highest absolute standard coefficients (β = 0.946, -0.269, 0.105, and 0.087) were found as the most significant initial parameters, respectively for simulation of SPI-3. As well, for SPI 6 simulation, an observed of performed regression process on whole input variables have been proved of SPI (t-1), SPI (t-2), SPI (t-3), SPI (t-5), and SPI (t-9) by having highest absolute standard coefficients (β = 1.002, -0.144, 0.082, -0.107, and 0.086) are found as most powerful input factors. Similarly, the result showed that SPI (t-1) and SPI (t-8) by having the highest absolute standard coefficients (β = 1.012 and -0.092) were found as the important input factors, which is computed of SPI-12.

Table 5 The regression analysis for identifying the most effective parameters at the Silload station
Fig. 4
figure 4

The standardized coefficients of input variable for sensitivity analysis at Silload station for a SPI-3, b SPI-6, and c SPI-12

3.2 Evaluation Machine Learning Models

The multiscale SPI at both various stations were computed by using applied four ML models. The performances of used to models were assessed with associated using different performance metrics (i.e., MAE. RMSE, RAE, RRSE and r). The model with a high r and lowest value of MAE, RMSE, RAE, and RRSE with close to 0 is considered the larger accurateness for assessment of SPI. The common trend MAE. The models performance indicators R, RMSE, RAE, RRSE are obtainable in Tables 4 and 5(A–C). The algorithm SMO-SVM with two kernels i.e. SMO-SVM poly kernel and SMO-SVM PUK kernel has possible to assess the SPI with better accuracy as associated with another selected kernels.

3.2.1 Estimation of Multiscale SPI at Paithan Station

The algorithm was used to SMO-SVM with four kernel functions, i.e. Poly kernel, normalized poly kernel, PUK kernel and RBF kernel, was calculated using statistics indicators (MAE, RMSE, RAE, RRSE, and r) at Paithan station and obtainable in Table 6(A–C). It is clear from table of SMO-SVM poly kernel model performed better during the training and testing period in predicting SPI-3. The results of r, MAE, RMSE, RAE (%) and RRSE (%) for model SMO-SVM poly kernel were obtained as 0.773 and 0.905, 0.699 and 0.329, 0.979 and 0.493, 75.38 and 40.52, 70.52 and 42.88, respectively during training and testing stages for SPI-3. In case of SPI-6, the punk kernel improves the performance of SMO-SVM as it gave the best performance criteria values, MAE = 0.387, RSME = 0.555, RAE = 47.17%, RRSE = 40.10%, r = 0.920 during the testing period. Similarly, for SPI-12, the SMO-SVM algorithm with Poly kernel was superior over the other selected kernel function and gave the values of r, MAE, RMSE, RAE (%) and RRSE (%) as 0.961 and 0.984, 0.254 and 0.187, 0.543 and 0.361, 26.22 and 18.77, 28.70 and 20.01, respectively during training and testing phases. Therefore, it revealed that the SMO-SVM algorithm performed better with poly kernel for SPI-3 and SPI-12, while PUK kernel improved the performance of the SMO-SVM model during the prediction of SPI-6.

Table 6 MAE, RMSE, RAE, RRSE and r for Machine learning algorithms-based models during the training and testing span at Paithan station

The temporal variation between estimated and observed multiscale SPI and their scatter plots are presented in Fig. 5a–c. In scatter graphs, the regression line delivered the highest coefficient of determination (R2) as 0.819 for the SMO-SVM poly kernel (SPI-3), 0.846 for the SMO-SVM PUK kernel (SPI-6), and 0.968 for the SMO-SVM poly kernel (SPI-12), respectively [Fig. 5a–c]. The regression line (RL) was above the best fit (1:1) for all the applied models, which means these models overestimated the multiscale SPI values concerning the observed SPI values. The SMO-SVM poly kernel and the SMO-SVM PUK kernel models provided that RL is good fitted line and presented a superior performance.

Fig. 5
figure 5

Line plot (left) and scatter plot (right) of observed vs estimated SPI values by the SMO-SVM poly kernel, SMO-SVM Normalized poly kernel, SMO-SVM PUK kernel and SMO-SVM RBF kernel during testing at Paithan station for a SPI-3, b SPI-6, and c SPI-12

In addition to the above, the relative analysis of models was done with help of Taylor diagram. According to Fig. 6a–c, the SMO-SVM RBF kernel model was located furthest. The SMO-SVM poly kernel model for SPI-3 & SPI-12 and SMO-SVM PUK kernel model for SPI-6 were located nearest the observed point depending on the standard deviation, correlation, and RMSE. This showed SMO-SVM RBF kernel as the worst model and SMO-SVM poly kernel and SMO-SVM PUK kernel as the best model among the selected models for multiscale SPI estimation.

Fig. 6
figure 6

Line plot (left) and scatter plot (right) of observed vs estimated SPI values by the SMO-SVM poly kernel, SMO-SVM Normalized poly kernel, SMO-SVM PUK kernel and SMO-SVM RBF kernel during testing at Silload station for a SPI-3, b SPI-6, and c SPI-12

3.2.2 Estimation of Multiscale SPI at Silload Station

The results obtained for multiscale SPI estimation at Silload station is summarized in Table 7(A–C). It revealed that the SMO-SVM poly kernel model was superior over the other models with r = 0.858, MAE = 0.421, RMSE = 0.644, RAE = 45.17% and RRSE = 51.60% during the testing period in the estimation of SPI-3. Similarly, Table 7(B) summarized that the best performance of SMO-SVM poly kernel model was for projecting the SPI-6 at Silload station with the best performance criteria values, MAE = 0.359, RSME = 0.486, RAE = 45.11%, RRSE = 39.71%, r = 0.917 during the testing period. As noted from Table 7(B), the SMO-SVM performance was observed to be the best with PUK kernel functions for SPI-12 and having the best performance indicator values (MAE = 0.191, RSME = 0.338, RAE = 16.64%, RRSE = 18.71%, r = 0.987) during the testing span for projection of drought condition.

Table 7 MAE, RMSE, RAE, RRSE and R2 for Machine learning algorithms-based models during the training and testing span at Silload station

The temporal disparity between the projected and calculated values for multiscale SPI by the SMO-SVM with four kernel functions i.e. Poly kernel, normalized poly kernel, PUK kernel and RBF kernel models during the testing span at Silload station are presented in Figs. 7a–c. As seen from these Figures, the line of regression of SMO-SVM poly kernel was quite near the best-fit line and the highest coefficient of determination (R2) as 0.736 and 0.841 for SPI-3 and SPI-6 respectively. The SMO-SVM PUK kernel showed the regression line close to the best-fit line for SPI-12 and gave the value of R2 of 0.975.

Fig. 7
figure 7

Taylor diagrams of SMO-SVM poly kernel, SMO-SVM Normalized poly kernel, SMO-SVM PUK kernel and SMO-SVM RBF kernel during testing span at Paithan station for a SPI-3, b SPI-6, and c SPI-12

Accordingly, the spatial design of projected and calculated values of multi-scalar SPI for the SMO-SVM with four kernel functions i.e. Poly kernel, normalized poly kernel, PUK kernel and RBF kernel models were measured by utilizing the Taylor diagram (TD) as a polar graph for accomplishing a graphical judgment of model performance based on standard deviation, correlation coefficient, and RMSE. Figures 8a–c shows the TD of applied models for the testing span. Consequently, Fig. 8a–b shows that the SMO-SVM with the chosen poly kernel can be utilized for SPI projection at 3 and 6-month time spans and Fig. 8c displays that the SMO-SVM with PUK kernel for the 12-months.

Fig. 8
figure 8

Taylor diagrams of SMO-SVM poly kernel, SMO-SVM Normalized poly kernel, SMO-SVM PUK kernel and SMO-SVM RBF kernel during testing span at Silload station for a SPI-3, b SPI-6, and c SPI-12

4 Discussion

The performance of the SMO-SVM algorithm and four various kernel functions was assessed of both stations, (Paithasn and Silload) for the multiscale prediction of SPI (i.e. 3, 6 and 12 month). The results have been obtained highlighted the suitability of machine learning algorithms for the prediction of multiscale SPI, especially. Figures 5, 6 and 7 represented the temporal variation between predicted and calculated SPI values and their scatter plots for SPI-3, SPI-6, and SPI-12 months. In scatter graphs, the regression line has given the high value of the coefficient of determination (R2) regarding the SMO-SVM with poly kernel and PUK kernel at both stations for SPI-3, SPI-6, and SPI-12 months. As seen from the Figures, the meta-heuristics algorithms performed better in a longer time scale. The Taylor Diagrams (Figs. 6, 7 and 8) showed a more comparable depiction of model performance in predicting monthly SPI values. The SMO-SVM RBF kernel model was located furthest, and the SMO-SVM poly kernel model was located nearest to the observed point based on the standard deviation, correlation, and RMSE for SPI-3, 12 at Paithan and SPI-3, 6 at Silload stations. This showed that the SMO-SVM poly kernel has higher accuracy in predicting monthly SPI than other kernel-based predictions.

The findings from this study are also associated with the recent work shown in a different location in the world (Deo et al. 2017; Rahmati et al. 2020; Malik et al. 2021; Elbeltagi et al. 2022b). Deo et al. (2017) applied machine learning models for predicting SPI values for eastern Australia. The result showed that M5Tree models were the best model and this model is a lower value of RSME and MAE and a high R2 value. A similar attempt was made by Rahmati et al. (2020) and compared the machine learning algorithms random forests (RF), multivariate adaptive regression splines (MARS), flexible discriminant analysis (FDA), boosted regression trees (BRT), and support vector machines (SVM) for spatial modelling of agricultural droughts in the south-east region of Australia. They found RF models yielded the highest accuracy than the other models. Malik et al. (2021) predicted SPI using Multi-layers Perceptron Neural Network (MLPNN), Co-Active Neuro-Fuzzy Inference System (CANFIS), and Multiple Linear Regression (MLR), in the Garhwal, Uttarakhand, India. They found that CANFIS model has higher accuracy in the prediction of SPI-3, 6 and 12 in Dehradun; SPI-1, 3, 6, 9 and 12 months in Chamoli and Tehri Garhwal; SPI-1, 3, 6, and 9 months in Haridwar and Pauri Garhwal; SPI-1, 3, and 6 months Rudraprayag and SPI-3 in Uttarkashi station, respectively. Özger et al. (2020) modelled the self-calibrated Palmer Drought Severity Index (SC-PDSI) values through the application of machine learning techniques (i.e., ANFIS, SVM, and M5 models) and their hybrid with empirical mode decomposition (i.e., EMD-ANFIS, EMD-SVM, EMD-M5) and wavelet decomposition (i.e., WD-ANFIS, WD-SVM, WD-M5). They reported that WD hybrid models such as WD-ANFIS, WD-SVM, WD-M5 have better performance than the EMD hybrid models. Aiyelokun et al. (2020) assessed the presentation of Support Vector Machine (SVM) with various kernel classes with ANN depending on the various neural network structures used for the forecasting of multiscale SPI values at Minna, North Central Nigeria. The SVM with Radial Basis kernel outperformed other SVM types with R2 (0.64) and ANN with two hidden layers gave the best result. The present study also unravelled that of all the kernel types considered for the SMO-SVM algorithm, the algorithm with poly kernel and PUK kernel has high accuracy than the Normalized and RBF kernels in the prediction of monthly SPI at both the stations. The study further revealed that the performance of Poly kernel and PUK kernel based-SMO-SVM were close to each other in the prediction of long time droughts (i.e., SPI-6 and SPI-12). Low precipitation, significant oscillations in average rainfall, and climate change, particularly as a result of regional and global warming, all have an impact on the occurrence of drought. Therefore, it is imperative to create drought management policies and successfully carry them out with the help and support of public and private entities. With the use of three months' worth of SPI data, the findings of the current study can be used to understand the water availability for the entire year in advance, which can then be used to develop an effective climate-smart agriculture strategy with ramifications for the world economy.

5 Conclusion

Assessment of extreme conditions in the semi-arid region plays an important role in water resource management. One of the most expensive risks to environments worldwide is drought, particularly in areas with complex climate patterns. Drought is a spatial–temporal phenomenon that happens when precipitation levels fall below average for a given time period and become dangerously low. The study was evaluated the improved support vector machine based on sequential minimal optimization (SVM-SMO) under various kernel functions (i.e., SMO-SVM poly kernel, SMO-SVM Normalized poly kernel, SMO-SVM PUK kernel and SMO-SVM RBF kernel) in predicting the SPI-3,6, and 12 months at the upper Godavari river basin, India. Data with extra variability have been selected and used for training and testing model performance. Ten input combinations were used and the best combinations were selected using the method of subset regression and sensitivity analysis. The performance was assessed using statistical criteria and visual graphics. The results revealed that at Paithan station, the SVM-SMO poly kernel model performed well in capturing SPI-3 (RSME = 0.493, and r = 0.905) and SPI-12 (RSME = 0.361, and r = 0.984), while predicting SPI-6, the SMO-SVM at PUK kernel performed superior (RSME = 0.555, and r = 0.920). Similarly, at the Silload station, the SVM-SMO with poly kernel function showed the best result for SPI-3 (r = 0.858) and SPI-6 (r = 0.917) and SVM-SMO algorithm with PUK kernel precisely predicted SPI-12 with a lower value of RMSE (0.338) and high value of correlation coefficient (0.987). This showed the efficacy of applied algorithms in predicting the SPI drought index. Based on the result obtained, it can be concluded that the SVM-SMO algorithm with poly and PUK kernels showed competitive results in the prediction of SPI- 6 and SPI-12. While predicting SPI-3, the SVM-SMO algorithm with a poly kernel was on the top during the testing period at both stations. It has been observed that the performance of the applied algorithms with various kernel functions was increased when predicting the long-term drought conditions. Finally, the poly and PUK kernels function significantly enhanced the performance of SMO-SVM algorithms by improving the correlation coefficient and decreasing the MAE and RMSE values in the study stations. This study provides simple information and will help policymakers or resource managers in the preparation of droughts mitigation strategies. However, it becomes more challenging due to uncertainties, heterogeneity, linked datasets, methodologies, scenarios, models with a lot of algorithmic parameters, search space, and optimization process. Thus, future study should concentrate on reducing these uncertainties and enhancing optimization performance. Additionally, because just two stations were chosen for this investigation, it is possible to evaluate the applicability of the suggested SMO-SVM model at many sites with variable agro-climatic conditions under various sceneros in order to reach a more generalized conclusion.