1 Introduction

Energy shortage is a real challenge for different countries. The development and future of each country rely on the control of energy. For fossil fuels, when the consumption increases, the energy reserves decreases and the environmental pollution increases (Zhang et al. 2020). Today, renewable and clean energies such as wind and solar energies are excellent alternatives to fossil fuels for energy production (Samadianfard et al. 2020). The renewable energy has different advantages. It decreases some types of air pollution. Also, renewable energies create economic development and jobs in manufacturing (Liu et al. 2020).

A unique method of desalination using solar energy has been proposed, and its potential effectiveness has been assessed through a proven model of humidification–dehumidification (Abedi et al. 2023a). It has been demonstrated that if a turbine is used to generate electricity for the desalination system, the plant can supply freshwater to approximately 800 homes. Numerous machine learning regression approaches were utilized to build a surrogate model based on data from the dehumidifier component (Abedi et al. 2023b).

Wind energy, a principle renewable energy, is cost-effective and creates jobs. Also, wind energy is used for developing industries and economies. Wind energy can be used for producing power generation without environmental pollution (Kumar 2020). Wind power generation is an important technology for producing power and developing different countries’ industries and economies (Xu et al. 2021). Wind power generation is used to converts wind energy to electric energy. Wind power generation can enhance supplies and facilitate the reclamation of degraded land. To better manage wind power generation, the research works have focused on predicting wind power and wind speed. The W.S. prediction is complex because of the chaotic fluctuations of W.S. There are different methods for predicting W.S., such as physical models, soft computing models (SCMs), and spatial correlation models (Liu et al. 2012). The geographic and geomorphic data are required to establish physical models and different data from the different measurement stations to establish the spatial correlation models. Modelers may encounter challenges in accessing various types of data, including climate, geographic, and geomorphic data. To address this issue, some researchers have turned to using soft computing models (SCMs) to predict various variables. SCMs offer several advantages, such as their high accuracy, ability to handle complex systems, flexibility in coupling with different models and algorithms, and ease of implementation (Ehteram et al. 2021). Notably, researchers have also explored the potential of using SCMs for wind speed prediction. For example, one study demonstrated the effectiveness of combining a hybrid MLP model with self-organizing feature maps to enhance the model’s accuracy (Gnana Sheela and Deepa 2013). Research has shown that a coupled model utilizing the multi-layer perceptron (MLP) technique outperforms the standalone MLP model for wind speed (W.S.) prediction. Similarly, the use of coupled PSO with the support vector machine (SVM) produced greater accuracy than the standalone SVM model for predicting W.S (Kong et al. 2015). Additionally, an artificial neural network (ANN) has demonstrated the ability to accurately predict W.S. with a mean absolute percentage error of 6.48% using altitude, solar radiation air pressure, and air temperature (Ramasamy et al. 2015).

In Kumar and Malik (2016), researchers explored the effectiveness of both the MLP and Generalized Regression Neural Network (GRNN) for predicting wind speed (W.S.). The findings showed that the GRNN outperformed the MLP. Meanwhile, Zhang et al. (2016) conducted a study to evaluate the potential of Gaussian Process Regression (GPR) for W.S. prediction. The results demonstrated that the GPR is more accurate than ANN and SVM techniques.

In Ahmed et al. (2016), the researchers combined the adaptive neuro-fuzzy interface system (ANFIS) with the krill optimization algorithm to predict wind speed (W.S.). They utilized the krill algorithm to optimize the parameters of the ANFIS model. They demonstrated that the combined ANFIS-krill approach enhanced the accuracy of the standalone ANFIS model. In another study, Liu et al. (2018) applied a Convolutional Long Short-Term Memory (CLSTM) to predict W.S. They showed that the CLSTM had better performance compared to the Convolutional neural network.

Researchers in Yu et al. (2018) examined the capacity of a hybrid SCM for predicting wind speed (W.S.). The method involved decomposing the original wind speed history using wavelet transform and using a Recurrent Neural Network (RNN) to extract the more profound features of the data, which were then fed into the SVM model. The results showed that the hybrid model had significant accuracy in W.S. prediction. Similarly, Samadianfard et al. (2020) applied MLP with genetic and whale optimization algorithms to boost the performance of predicting W.S. The optimized MLP models were tested across different climatic regions of Iran, where it was observed that the MLP-Whale Optimization Algorithm (WOA) hybrid model outperformed the standalone MLP model.

Research in Navas et al. (2020) focused on comparing the predictive accuracy of different models including MLP, Radial Basis Function Neural Network (RBFNN), and Categorical Regression for predicting wind speed (W.S.). The study revealed that the MLP had a better accuracy than other models. On the other hand, Sun et al. (2020) looked for the performance of a coupled Multi-Kernel Least Square SVM (MLKSSVM) and Gravitational Search Algorithm (GSA) for W.S. prediction. The GSA was utilized for MLKSSVM’s parameters optimization. The results demonstrated that the optimized MLKSSVM model increased the accuracy of W.S prediction.

While various SVMs have been demonstrated to have a high capacity for predicting wind speed (W.S.), challenges still exist. One major challenge is that the SVMs’ structure contains parameters that need to be accurately identified to ensure model accuracy. To address this issue, robust training algorithms are necessary to obtain precise parameter values. Another issue is that most previous studies have focused on comparing the models’ performance without exploring how different models could be integrated to achieve improved accuracy. Finally, it is crucial for the ideal model to predict the target variable within a short computational time.

This study aims to tackle the previously mentioned issues through various efforts, including:

  1. (1)

    the use of a water strider (WSA) optimization algorithm to train Multiple Linear Regression (MLR) models capable of predicting daily weather station (W.S.) values in five different locations throughout Malaysia. In their introduction of the WSA algorithm, Kaveh and Dadras (2020) explained that it was inspired by the behavior of water striders, a type of insect known for its remarkable ability to walk on the surface of water. According to their findings, this algorithm is highly accurate in solving complex problems and also possesses an ideal balance between exploration and exploitation while rapidly converging toward optimal solutions. Due to these benefits, the current study employs the WSA.

  2. (2)

    If modelers have access to a variety of climate data, predicting W.S. can be relatively straightforward. However, in some cases, particularly in developing countries, data on climate patterns may be limited to just time series data on W.S. In this scenario, modelers must rely on lagged wind speed data to predict wind speed at the current time. This presents a unique challenge as the goal is to establish a highly effective model for predicting wind speed using only limited input data. To tackle this issue, the present study employs an MLP-WSA algorithm that utilizes lagged W.S. data to accurately predict daily wind speed values.

  3. (3)

    A new hybrid model has been introduced and referred to as the Gamma Test, which serves as a novel approach to select the optimal input combination when utilizing lagged W.S. data. The WSA algorithm is combined with the Gamma test to find the most proper set of inputs for the MLP model, thereby improving the overall accuracy of predictions.

  4. (4)

    To comprehensively evaluate the predictive capabilities of the MLP-WSA used, it is compared with several other variants of MLP such as MLP-Sine Cosine Algorithm (MLP-SCA), MLP-Salp Swarm Algorithm (MLP-SSA), MLP-PSO, and traditional MLP. An integrated multi-model approach is employed to leverage the strengths of each individual model and enhance the accuracy of predictions.

  5. (5)

    To increase the efficiency of the MLP variants developed in this study, an approach is adopted to identify and remove redundant weights that do not significantly impact predictions. This helps to reduce computation time and improve overall model performance. A fuzzy reasoning concept is applied to successfully identify and eliminate the unnecessary weights from the MLP models.

This study proposes four key innovations:

  1. (1)

    Developing a new hybrid MLP and MLP-WSA model for predicting daily wind speed (W.S.).

  2. (2)

    To select the optimal input variables, a novel hybrid Gamma test was created.

  3. (3)

    Presenting a comprehensive multi-model approach for enhancing the accuracy of predictions by integrating various models.

  4. (4)

    Using the fuzzy reasoning concept to reduce the computational time of the Multi-Layer Perceptron (MLP) models.

Section 2 discusses the material and methods. In Sect. 3, a case study is presented along with its relevant details. Section 4, we present the results of this study. Finally, in Sect. 5, we draw a conclusion based on the findings.

2 Materials and methods

2.1 Multilayer perceptron (MLP)

MLP is a significant type of ANN. The basic unit of computation in MLP are neurons, which connect to the next layer via weight connections (Muslim et al. 2020b). Incoming data are received by the first layer and processed using activation functions in hidden layers, and finally, the final layer produces the overall result, according to the equation outlined below

$${\text{Out}}_{k}= {f}_{\text{out}}\left[\sum_{j=1}^{N}{w}_{kj} \times {f}_{h}\left[\sum_{i=1}^{n}{w}_{ji} {\text{in}}_{i }+ {B}_{j}\right]+{B}_{k}\right],$$
(1)

where i: index of inputs, j: index of hidden nodes, k index of outputs, \({w}_{ji}\): the weight connection linked the input to the hidden layer, \({B}_{k}\): the bias of the output layer, \({B}_{j}\): the bias of the hidden layer, \({\text{in}}_{i}\): the inputs, N: hidden layer’s nodes, and n: number of inputs. \(f_h\): activation function of the hidden layer, and \({f}_{\text{out}}\): activation function of final layer. Given the success of sigmoid function (SIG) in prior research studies (Banadkooki et al. 2020; Ehteram et al. 2020; Najah Ahmed et al. 2019), it was chosen as the activation

$$ f\left( {{\text{SIG}}} \right) = \frac{1}{{1 + {\text{e}}^{ - {\text{SIG}}} }}. $$
(2)

The process of training the MLP models in this study involves transforming the received signals (SIG) into the activation function through backpropagation. Initially, weights and biases are randomly assigned, and the SIG is then fed into the first layer to generate output values. Subsequently, the error function is calculated to assess the difference between actual and predicted values. In the backward pass, updates are made to weights and biases to decrease the error function. While the backpropagation algorithm is effective for this process, it may converge too slowly or become stuck in local optima. Therefore, optimization algorithms are used to improve the performance of the MLP, as shown in Fig. 1.

Fig. 1
figure 1

The structure of the MLP model

The hyperparameters of the MLP are as follows:

Batch size is 16.

Hidden layers’ number is 1.

Hidden layer’s nodes are 32.

Activation function in hidden layers is Sigmoid.

Activation function in output layer is Linear.

Optimizer is Stochastic Gradient Decent (SGD).

Loss function is Mean Squared Error (MSE).

2.2 Water strider optimization algorithm (WSOA)

As one of the kinds of insects, the water striders live on water surface top. Water striders claim ownership of specific areas known as territories, which they protect to ensure access to their food and potential mates (Kaveh and Dadras 2020). The social communication of the water striders is performed through provided ripples. The water striders can produce ripples with different amplitudes. The generated ripples are used for different aims, such as sex discrimination and prey locating. The female W.S.s are eager to find the food, while the males are eager to create the mating (Kaveh and Dadras 2020). When the females receive the signals from the male WSs, the response of female W.S.s will be based on the attraction and repulsive signals. Males skate in the females’ areas, because the females are eager to find the best location for finding food. In the first stage, the following identifies the initial location of W.S.:

$$ X_i^o = {\text{UB}} + {\text{rand}}\left( {{\text{UB}} - {\text{LB}}} \right), $$
(3)

where \(X_i^o\): the initial location of W.S.s, UB: the upper bound, and LB: the lower bound. In the next stage, the territories are created by the W.S.s. In this level, the objective function is computed for W.S. Then, the W.S.s are sorted based on the obtained values for their objective function. The W.S.s are divided into \(\frac{\text{number of WSs in each group}}{\text{number of territories}}\) groups. In the next level, the mating behavior is modeled. If p refers to probability of positive feedback of females to males for mating, (1 − p) indicates probability of females ignoring the males for mating. If the females are not eager to the mating, the females get males away. The W.S.s update their location after mating as follows:

$$ \left[ \begin{array}{l} X_i^{t + 1} = X_i^t + R.{\text{rand}} \leftarrow \left( {{\text{if}}} \right)\left( {{\text{mating}}} \right)\left( {{\text{happens}}} \right) \hfill \\ X_i^{t + 1} = X_i^t + R.\left( {1 + {\text{rand}}} \right) \hfill \\ \end{array} \right], $$
(4)

where \(X_i^{t + 1}\): the new location of W.S.s, \(R\): radius ripple wave, and rand: random values between 0 and 1

$$ R = X_F^{t - 1} - X_i^{t - 1} , $$
(5)

where \(X_F^{t - 1}\): the female W.S., and location \(X_i^{t - 1}\): the male W.S. location. When the W.S. update its position, the objective function is calculated for the W.S. new location. If the new location has not better objective function than the previous location, the W.S. moves to the best location for finding food as follows:

$$ X_i^{t + 1} = X_i^t + 2.{\text{rand}}\left( {X_{{\text{best}}} - X_i^t } \right), $$
(6)

where \(X_{{\text{best}}}\): the best location for the W.S. If the W.S. in the new location has not better objective function than the W.S. in the previous location, the W.S. will die and a larva is generated. The location of Larva is as follows:

$$ X_i^{t + 1} = {\text{LB}}_j^t + 2{\text{rand}}\left( {{\text{UB}}_j^t - {\text{LB}}_j^t } \right), $$
(7)

where \({\text{LB}}_j^t\): lower values of W.S.’s position inside jth territory and \({\text{UB}}_j^t\): upper values of the \({\text{UB}}_j^t\). Figure 2 shows the WSA flowchart.

Fig. 2
figure 2

The WSA flowchart for optimization problem

2.3 Salp swarm algorithm (SSA)

This algorithm is utilized for various tasks such as feature selection (Tubishat et al. 2021), engineering optimization problems (Salgotra et al. 2021), training SVM (Li et al. 2020), and training ANFIS (Mohamadi et al. 2020). Group life is observed for the salps. In each group, there are a leader and followers. A leader guides follower. Each leader updates its location as follows:

$$ {\text{sa}}_j^1 = \left[ \begin{gathered} {\text{food}}_j + \sigma_1 \left( {\left( {{\text{up}}_j - {\text{lo}}_j } \right)\sigma_2 + {\text{lo}}_j } \right) \leftarrow \sigma_3 \ge 0 \hfill \\ {\text{food}}_j - \sigma_1 \left( {\left( {{\text{up}}_j - {\text{lo}}_j } \right)\sigma_2 + {\text{lo}}_j } \right) \leftarrow \sigma_3 < 0 \hfill \\ \end{gathered} \right], $$
(8)

where \({\text{sa}}_j^1\): the location of leader, \(\sigma_1\), \(\sigma_2\), and \(\sigma_3\): random numbers, \({\text{food}}_j\): the location of food source, \({\text{up}}_j\): the upper bound, and \({\text{lo}}_j\): the lower bound. A balance is provided between the exploration and exploitation as follows:

$$ \sigma_1 = 2{\text{e}}^{ - \left( \frac{4l}{L} \right)^2 } , $$
(9)

where l: number of iterations and L: maximum number of iterations. The follower in each iteration changes its location as follows:

$$ {\text{sa}}_j^i = \frac{1}{2}\left( {{\text{sa}}_j^i + {\text{sa}}_j^{i - 1} } \right), $$
(10)

where \({\text{sa}}_j^i\):the location of each follower in jth dimension. Figure 3 shows the SSA flowchart for optimization problems.

Fig. 3
figure 3

The SSA flowchart for optimization problem

2.4 Sine cosine algorithm (SCA)

SCA algorithm was inspired by the sine and cosine functions utilized for different optimization problems such as biomedical signal reconstruction (Daoui et al. 2021), engineering applications (Dhiman 2021), optimal multi-robot path planning (Paikray et al. 2021), feature selection (Neggaz et al. 2020), and image segmentation (Ewees et al. 2020). First, random solutions are created. Then, the final position of solutions is found based on the current location of solutions and destination point

$$ {\text{so}}_i^{t + 1} = \left[ \begin{gathered} {\text{so}}_i^t + r_1 \times \sin \left( {r_2 } \right) \times \left| {r_3 {\text{de}}_i^t - {\text{so}}_i^t } \right|,r_4 \le 0.50 \hfill \\ {\text{so}}_i^t + r_1 \times \cos \left( {r_2 } \right) \times \left| {r_3 {\text{de}}_i^t - {\text{so}}_i^t } \right|,r_4 \le 0.50 \hfill \\ \end{gathered} \right], $$
(11)

where \({\text{so}}_i^{t + 1}\): the new location of ith solution at iteration t + 1, \({\text{de}}_i^t\): the location of destination of point, r1, r2, r3, and r4: random number. The r1 parameter is responsible for transitioning from exploration to exploitation

$$ r_1 = 2 - 2 \times \left( \frac{t}{T} \right), $$
(12)

where t: current iteration and T: total iterations. Figure 4 shows the SCA flowchart.

Fig. 4
figure 4

The SCA flowchart for optimization problem

2.5 Particle swarm optimization (PSO)

The PSO operates on the principle of sharing information among particles, which makes it a straightforward approach with numerous benefits, such as easy implementation, computational efficiency, and simplicity of concept. Due to its effectiveness, PSO has been utilized in various problem-solving contexts, including but not limited to environmental economic dispatch (Xin-gang et al. 2020), ANN training (Darwish et al. 2020), sports image detection (Lei et al. 2021), and particle filter noise reduction (Chen et al. 2020). Initially, we defined the starting position of particles and random parameters of PSO. The objective function is then computed for each particle, followed by the updating of the location and velocity of particles in accordance with the equations provided below

$$ {\text{po}}_{ij}^{t + 1} = {\text{po}}_{ij}^t + v_{ij}^{t + 1} $$
(13)
$$ {\text{ve}}_{ij}^{t + 1} = {\text{wve}}_{ij}^t + \theta_1 r_1 \left( {{\text{po}}_{ij}^{p\left( t \right)} - p_{ij}^t } \right) + \theta_2 r_2 \left( {{\text{po}}_{ij}^{p\left( t \right)} - p_{ij}^t } \right). $$
(14)

2.6 Inclusive multiple model

The hybrid models of the current study are considered as competitive models. The previous research works utilized different models for predicting W.S. and determined the worst and best model. If the modelers generate synergy among multiple different models, the final outputs will be based on different models’ advantages. Also, the modelers can ensure to extract all of the required information for predicting W.S. based on contributing all models. In this study, first, W.S. is predicted based on different hybrid and Standalone MLP. Then, each MLP model’s outputs as the lower order modeling results are used as the input to an ANN as inclusive multiple model (IMM). The application of IMM model for predicting groundwater level and CO2 emission was successful for previous studies (Shabani et al. 2021; Khatibi et al. 2017). Figure 5a shows IMM structure.

Fig. 5
figure 5

a The structure of the IMM, b Fuzzy reasoning conception, c The location of case study

2.7 Fuzzy reasoning

The weak weights in the standalone and hybrid MLP structure are considered the redundant weights. To identify these weights, three rules are used. The rules are observed in Fig. 5b. In the starting simulation process, the values of weights are small. Thus, the learning cycle rule is proposed to avoid removing these weights. If this rule is satisfied, the second rule is RMSE. If the RMSE does not decrease, the weights are considered redundant weights. The third rule, which involves the weight rules, is employed, because the first and second rules are less effective in dealing with complex data and high levels of noise. As a result, increasing the number of weak weights can lead to redundancy, which necessitates their removal. For the first, second, and third rules, a monotonically increasing, decreasing, and decreasing sigmoid functions are used, respectively to serve as membership functions. Minimum values for each of these membership functions are selected, and this minimum value is multiplied by the weight being removed.

3 Case study

In this study, five stations, namely, Alor Setar (AS), Bayan Lepas (B.L.), Cameron Highlands (C.H.), Ipoh (I.P.), and Kota Bharu (K.B.), were chosen for predicting WS. Figure 5c shows the location of stations. The Peninsular Malaysia has a typical tropical climate whereby it is warm and humid throughout the year with relatively lower wind speed in its upper part (Hwang et al. 2019). The five meteorological stations located at the upper part of Peninsular Malaysia were chosen as the sites of interest for this investigation. The selected stations were Cameron Highlands (CH) (4° 28′ N, 101° 22′ E), Alor Setar (AS) (6° 12′ N, 100° 24′ E), Kota Bharu (KB) (6° 10′ N, 102° 18′ E), Bayan Lepas (BL) (5° 18′ N, 100° 16′ E), and Ipoh (IP) (4° 34′ N, 101° 06′ E). These stations represent the wind speed condition in the low land of the upper part of Peninsular Malaysia, since they are located nearby or inside the airports of the respective areas except for Cameron highlands station. Figure 6 shows the W.S. time series. The AS station has a tropical monsoon climate based on the Koppen climate. The average low and high temperatures of the AS are 32 °C and 23 °C, respectively. The climate of B.L. is tropical, and the average temperature of the B.L. is 26 °C. The annual rainfall of B.L. is 2552 mm. A tropical rainforest climate is observed in the C.H. The mean annual temperature of the C.H. station is 18 °C. A tropical rainforest climate is observed for the I.P. The average temperature of the I.P. is 28 °C. The wettest and driest months of the I.P. are October and January. The tropical monsoon climate is observed in the K.B. station. The station experiences heavier rainfall from August through January.

Fig. 6
figure 6

The WS time series for a Alor Seta, b Bayan Lepas, c Cameron Highlands, d Ipoh and e Kota Bharu

3.1 Input sensitivity with Gamma test

As observed in Table 1, 211-1 input combinations can be combined for predicting W.S. based on the lagged input values. Thus, it is necessary to choose the best input combination based on the lagged W.S.s. The Gamma test is one of the powerful preprocessing methods for choosing the best input combination. They utilized Gamma test in different domains such as predicting evaporation (Allawi et al. 2019), predicting groundwater level (Sharafati et al. 2020), estimating evapotranspiration (El-Shafie et al. 2013), estimating solar radiation (Jumin et al. 2021), and predicting streamflow (AlDahoul et al. 2023). In the Gamma test, the relationship between the inputs and outputs is as follows:

$$ y = f\left( x \right) + r, $$
(15)

where x: input, y: output, f(x): smooth function, and r: the error term. The \(\Gamma\) in the Gamma test describes the variance of observations. The Gamma test acts based on the ith input’s closet neighbor (N[ik], 1 ≤ k ≤ p, p: the maximum number of neighbors). To compute the \(\Gamma\), the value of \(\xi_M \left( k \right)\) should be computed by

$$ \xi_M \left( k \right) = \frac{1}{M}\sum_{i = 1}^M {\left| {x_{N\left[ {i,k} \right]} - X_i } \right|}^2 , $$
(16)

where M: number of observations. In the next level, the value of the

$$ \gamma_M \left( k \right) = \frac{1}{M}\sum_{i = 1}^M {\left| {y_{N\left[ {i,k} \right]} - y_i } \right|} , $$
(17)

where \(y_{N\left[ {i,k} \right]}\): the output value corresponding to the kth neighborhood of xi. Finally, the \(\Gamma\) is calculated as follows:

$$ \gamma = A\xi + \Gamma . $$
(18)
Table 1 The input and output data to the MLP models

Another index in the Gamma test is Vratio

$$ V_{{\text{ratio}}} = \frac{\Gamma }{\sigma \left( y \right)}, $$
(19)

where \(\sigma \left( y \right)\):the output variance. The lowest values of the Vratio and \(\Gamma\) show the best input combination. However, it is difficult to compute \(\Gamma\) for 211-1 input combinations. To satisfy the process of selection of the best input combination, the WSA is coupled with the Gamma test. First, the name of input variables is inserted as the initial population of WSA. Then, the random combinations of the inputs are generated based on the initial population of WSA. In fact, each WSA shows a random combination of inputs. Then, the \(\Gamma\) is computed for each member as the objective function. The operators of the WSA based on Sect. 2.1 are used to update the value of agents. The optimization process is continued until the \(\Gamma\) is converged to the least value.

3.2 Hybrid MLP and optimization algorithms

In this research, the optimization algorithms are used to set the MLP parameters as follows:

  1. 1.

    First, the data are split into 30% for testing and 70% for training levels. This fraction is utilized, because it makes the least value of RMSE error. In this study, RMSE error is considered as the objective function. The data were collected from January 2000 to September 2009.

  2. 2.

    The initial values of weights and biases are initialized.

  3. 3.

    The MLP runs for the training data.

  4. 4.

    If the stop criterion is satisfied, the MLP is used for the testing stage; otherwise, it is hybridized with the optimization algorithm.

  5. 5.

    The initial population of algorithms is initialized based on the random value of weights and biases.

  6. 6.

    The RMSE for each agent of optimization algorithms is calculated as the objective function.

  7. 7.

    The operators of the algorithms are utilized to change the values of weights and biases.

  8. 8.

    Move to step 3, after checking the convergence criterion and found it met; otherwise, move to step 6.

In this work, the indexes used to evaluate the models are as follows:

1. Nash Sutcliffe efficiency (Yafouz et al. 2021)

$$ {\text{NSE}} = 1 - \frac{{\sum_{i = 1}^N {\left( {{\text{WS}}_{{\text{ob}}} - {\text{WS}}_{{\text{es}}} } \right)} }}{{\sum_{i = 1}^N {\left( {{\text{WS}}_{{\text{ob}}} - {\text{W}}\vec{{\text{S}}}} \right)} }}. $$
(20)

2. Root-mean-square error (RMSE) (Osman et al. 2021)

$$ {\text{RMSE}} = \sqrt {{\frac{1}{n}\sum_{n = 1}^N {\left( {{\text{WS}}_{{\text{ob}}} - {\text{WS}}_{{\text{es}}} } \right)^2 } }} . $$
(21)

3. Mean absolute error (MAE) (Abba et al. 2020)

$$ {\text{MAE}} = \frac{1}{N}\sum_{i = 1}^N {\left| {{\text{WS}}_{{\text{es}}} - {\text{WS}}_{{\text{ob}}} } \right|} . $$
(22)

4. Scatter index (S.I.) (Muslim et al. 2020a)

$$ {\text{SI}} = \frac{{{\text{RMSE}}}}{{{\text{W}}{\overline{\text{S}}}_{{\rm ob}} }} $$
(23)

(SI < 0.10: excellent performance, 0.10 < SI < 0.20: good performance, 0.20 < SI < 0.30: fair performance, SI > 0.30: poor performance).

5. Uncertainty with 95% confidence level (U95) (Jumin et al. 2020)

$$ U_{95} = \sqrt {{\left( {{\text{SD}}^2 + {\text{RMSE}}^2 } \right)}} , $$
(24)

where SD: the standard deviation of the difference, WSes: estimate values, WSob: observed values, \({\text{W}}{\overline{\text{S}}}_{{\text{ob}}}\): average observed values, and N: number of samples. The highest values of NSE are ideal, and the lowest values of MAE, RMSE, and U95 are ideal.

4 Results and discussion

4.1 Optimization algorithms’ parameters

Obtaining precise values of random parameters is critical for achieving optimal performance. This requires computing the variance of the objective function in relation to variations in the random parameters. An analysis was conducted on the variance of the objective function across different domains of random parameters in the AS station, and the results are presented in Table 2. Based on these results, it was concluded that the best population size for the WSA is 40, and as a result, best population size of RMSE became the lowest. Furthermore, the ideal value for the maximum number of iterations in WSA was found to be 200, producing the least RMSE value. Similarly, for the SSA, SCA, and PSO algorithms, the best population size was found to be 40, 40, and 60, respectively. The optimal random parameters for other stations were also established, as shown in Table 3.

Table 2 Determining random parameters in the AS station
Table 3 Determining random parameters in the different stations

4.2 The best input tuning for predictive models

According to Table 4, the first to third-best input combinations for each station are shown. From WS (t − 1), … to WS (t − 6)) combination was set to be the optimal input combination for predicting water level at the AS and CH stations. Similarly, from WS (t − 1), … to WS (t − 5)) combination was found to be the best input combination for the BL, LB, and IP stations.

Table 4 Selection of the best input combination based on improved Gamma test

The coupling of the Gamma test with an optimization algorithm provides a convenient way to automatically determine the best input combination for predicting various target variables, without the need for manual computation of different input combinations. Therefore, this hybridized Gamma test proves to be a highly effective tool for selecting optimal inputs in models.

4.3 Accuracy comparison for various models

As shown in Fig. 7a–d, when examining the testing results of the models in the AS station, the MLP-WSA model outperformed the MLP-SSA, MLP-SCA, MLP-PSO, and MLP models in terms of accuracy. The U95 of the MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models were 17%, 19%, 20%, 22%, and 24%, respectively. The comparison between the accuracy of the models and the IMM model demonstrated that the IMM improved the accuracy and decreased the RMSE by 1.5%, 3.2%, 5.9%, 8.03%, and 23.7%, compared to MLP-WSA, MLP-SCA, MLP-SSA, MLP-PSO, and MLP, respectively. Both the IMM and MLP-WSA models showed the highest NSE values, and the MLP model obtained the highest value of the U95.

Fig. 7
figure 7

Comparison between various models in terms of RMSE, MAE, NSE, and U95% in five stations

Figure 7a–d presents the outcomes of the evaluation stage at the B.L. site, revealing that the MLP-WSA model achieved an RMSE of 2.78 (m/s), whereas the MLP-SSA, MLP-SCA, MLP-PSO, and MLP models yielded RMSE values of 3.88 m/s, 4.12 m/s, 4.78 m/s, and 4.98 m/s, respectively. Furthermore, it was determined that the MLP-WSA model exhibited superior performance compared to other models. Moreover, the implementation of the IMM model demonstrated that it could enhance the accuracy of all models by incorporating information from each model. The NSE value of the IMM model was found to be 0.92, whereas the NSE values for the MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models were 0.90, 0.86, 0.82, 0.80, and 0.78, respectively.

Figure 7a–d reveals the outcomes of the models during the testing phase at the C.H. station. Notably, the IMM model outperformed the other models with a substantially reduced MAE of 6%, 22%, 25%, 39%, and 41%, respectively, compared to the MLP-WA, MLP-SCA, MLP-SSA, MLP-PSO, and MLP models. Furthermore, the IMM model achieved the highest NSE value, while the MLP model attained the lowest NSE. Conversely, the MLP model recorded the lowest U95, indicating the lowest accuracy compared to the other models.

Figure 7a–d presents the accuracy during the testing phase at the I.P. station. It is evident that the IMM model had the lowest RMSE of 1.22 m/s, which indicates its high accuracy compared to other models. On the other hand, the MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models had relatively higher RMSE values of 1.45 m/s, 1.76 m/s, 1.89 m/s, 2.23 m/s, and 2.35 m/s, respectively. The IMM and MLP-WSA models demonstrated superior performance with the highest NSE and lowest U95 values recorded, respectively.

Figure 7a–d presents the performance of the models during the testing phase at the K.B. station. It was observed that the RMSE of the IMM model was significantly lower than the other models. The IMM model achieved a 17%, 22%, 44%, 54%, and 55% reduction in RMSE compared to the MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models, respectively. Additionally, the NSE value of the MLP-WSA model was higher (0.90) compared to the MLP-SSA, MLP-SCA, MLP-PSO, and MLP models, which had NSE values of 0.88, 0.86, 0.85, and 0.82, respectively. Therefore, the hybrid MLP models showed superior performance than the standalone MLP models.

Figure 8 illustrates the scatterplots for the testing stages at the A.S., B.L., C.H., I.P., and K.B. stations, respectively. In Fig. 8a, the IMM model demonstrated the best performance with a testing R2 value of 0.9891, while the MLP-WSA model achieved superior accuracy among the other hybrid and standalone MLP models, with an R2 value of 0.9816. In Fig. 8b, the IMM and MLP models exhibited the best and worst accuracy with testing R2 values of 0.989 and 0.9451, respectively. In Fig. 8c, the IMM and MLP-WSA models show the highest testing R2 values of 0.9894 and 0.9860, respectively, while MLP-SSA, MLP-SCA, MLP-PSO, and MLP models recorded lower R2 values. Figure 8d depicts the testing R2 values of the IMM model (0.9923). In Fig. 8e, the IMM and MLP-WSA models also exhibited the best testing R2.

Fig. 8
figure 8figure 8figure 8figure 8figure 8

Scatterplots for different models in a AS station, b BL station, c CH station, d IP station, e KB station

Figure 9 displays the Scatter Index (S.I.) values of the models under evaluation. The S.I. value for testing levels was determined to be 0.09, 0.11, 0.17, 0.21, 0.24, and 0.25 for the IMM, MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models, respectively, in the AS station. Based on these findings, the IMM, MLP-WSA, and MLP-SSA models performed well, achieving excellent, good, and good accuracy ratings, respectively. The MLP-SCA, MLP-PSO, and MLP models achieved fair accuracy, indicating room for improvement. In the B.L. station, the accuracy ratings for the IMM, MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models were excellent, good, good, fair, fair, and fair, respectively. Furthermore, in the C.H., I.P., and K.B. stations, the IMM model’s performance was deemed excellent, while the MLP models achieved fair accuracy.

Fig. 9
figure 9

The SI values for different models in the different stations

Figure 10 presents the heat maps depicting the relative error of various models. The variation of relative error for the IMM model in all stations ranged from 0 to 5%, whereas the range of relative errors in MLP model was between 20 to 25%. The findings revealed that the relative error of MLP-WSA ranged from 0 to 10, 0 to 10, 5 to 10, 0 to 10, and 0 to 10 for the AS, B.L., CH, I.P., and K.B. stations, respectively.

Fig. 10
figure 10

The relative error for the different models

The CPU time was calculated for various models. The results indicate that for the AS station, the CPU time for the IMM model was 230 s and 260 s without and with fuzzy reasoning, respectively. Similarly, for the B.L. station, the MLP-WSA model had a CPU time of 250 s and 282 s without and with fuzzy reasoning, respectively. However, the results across stations showed that by utilizing the fuzzy reasoning, the CPU time became lower than time calculated with fuzzy reasoning.

4.4 Concluding discussion

By analyzing the results, it is worthy notable that the WSA optimization algorithm helped enhance the accuracy of MLP and resulted in the MLP-WSA outperforming other MLP models, such as MLP-SSA, MLP-SCA, MLP-PSO, and MLP. Hence, the developed MLP model can be regarded as an effective means of predicting various climate and hydrological variables. Additionally, the IMM model has demonstrated its ability to enhance the accuracy of MLP models by aggregating data from multiple MLP models.

The study’s outcomes support the findings of earlier research studies. Liu et al. (2013) found that utilizing optimization algorithms like particle swarm optimization and genetic algorithms could enhance the MLP model’s precision for forecasting W.S. Liu et al. (2015) combined fast ensemble decomposition and optimization algorithms with the MLP to predict W.S, resulting in hybrid MLP models outperforming standalone MLP models. Moreover, Samadianfard et al. (2020) merged optimization algorithms with the MLP model, indicating that the whale optimization algorithm boosted the MLP models' accuracy by utilizing advanced operators.

Future research could investigate the use of the WSA in combination with other soft computing models such as the radial basis function neural network and SVM models to forecast W.S. Furthermore, further research can examine the impact of uncertainty on the models’ outputs caused by the uncertainty of model parameters and inputs. While the MLP-WSA showed superior performance in this study, future research can utilize multi-criteria decision-making methods to determine the most appropriate model based on different analyses.

Future studies could explore the possibility of defining multiple objective functions for tuning the MLP parameters, which would allow the identification of the best input combination and MLP parameters simultaneously. This approach does not require additional preprocessing methods like the Gamma test to identify the optimal inputs. To achieve this, two objective functions could be defined. The first objective function would focus on identifying the optimal MLP parameters, while the second objective function would concentrate on finding the best input combination. Therefore, it would be necessary to modify the WSA into a multi-objective optimization algorithm capable of solving such problems.

In situations where climate data are unavailable, alternative input combinations like latitude, longitude, and the number of available data points can still provide useful insights into predicting W.S. This approach can be particularly useful for scenarios where the availability of data is limited. Although fuzzy reasoning can reduce computational time, it is worth noting that optimization algorithms can also be effective at reducing the computational time required. These algorithms may be especially beneficial when they converge more quickly.

5 Conclusion

Wind energy aims to mitigate the environmental pollution resulted from the consumption of fossil fuels. Accurately predicting wind speed is essential in managing energy and generating power. This work utilized an optimization algorithm called the WSA to train an MLP in five stations in Malaysia. Additionally, the outputs of several MLP models, including the MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP, were applied to the IMM. To find the best input combination, a Gamma test was conducted. The results showed that the MLP-WSA outperformed other MLP models, with the lowest RMSE of 3.95 m/s. In terms of accuracy, the IMM model had the highest NSE in the B.L. station, whereas the MLP model had the lowest NSE in the same station. During testing, the MAE of the IMM model was recorded at 2.55 m/s, which was significantly lower compared to the MAE of the MLP-WSA, MLP-SSA, MLP-SCA, MLP-PSO, and MLP models, which were 2.55, 2.98, 3.44, 3.98, and 4.12 m/s, respectively. Similarly, the MAE of the IMM model was found to be significantly lower than others in the C.H. station. Specifically, it was 6%, 22%, 25%, 39%, and 41% lower compared to the MAE of the MLP-WA, MLP-SCA, MLP-SSA, MLP-PSO, and MLP models, respectively. The IMM and MLP-WSA also performed better in terms of NSE in the I.P. station, and its accuracy was superior to that of other models in the B.K. station. Furthermore, it was observed that incorporating fuzzy reasoning in the modeling process reduced the CPU time required for analysis. Overall, combining whale search algorithm with the MLP model and using hybrid Gamma test based on fuzzy reasoning provided the most accurate prediction of wind speed.

This paper focused on MLP particularly and combined it with optimization algorithms. This limitation of targeting only MLP can be addressed in future. This work opens a door to explore other neural network architectures, such as CNNs, LSTMs, and transformers to be trained with the various optimization algorithms to improve the prediction performance.

List of acronyms

Abbreviation

Definition

SGD

Stochastic gradient decent

MSE

Mean squared error

W.S

Wind speed

MLP

Multilayer perceptron

WSA

Water strider algorithm

MLP-SCA

MLP-sine cosine

MLP-SSA

MLP-salp swarm

MLP-PSO

MLP-particle swarm optimization

IMM

Inclusive multiple model

ANN

Artificial neural network

SVM

Support vector machine

SCMs

Soft computing models

GPR

Gaussian Process Regression

GRNN

Generalized Regression Neural Network

ANFIS

Adaptive neuro-fuzzy interface system

CLSTMN

Convolutional Long Short-Term Memory Network

RNN

Recurrent Neural Network

RBFNN

Radial Basis Function Neural Network

WOA

Whale Optimization Algorithm

MLKSSVM

Multi-Kernel Least Square Support Vector Machine

GSA

Gravitational Search Algorithm

A.S

Alor Setar

B.L

Bayan Lepas

C.H

Cameron Highlands

I.P

Ipoh

K.B

Kota Bharu

RMSE

Root-mean-square error

S.I

Scatter Index

NSE

Nash–Sutcliffe efficiency

U95

Uncertainty with 95% confidence level

MAE

Mean absolute error