1 Introduction

In recent years, with the significant development of huge energy demand and dwindling supplies for renewable energy resources, wind energy has proliferated and gained a great deal of attention as one of the most environmentally and economically sustainable green energy resources [1]. However, due to wind speed’s natural stochastic characteristic, designing an accurate wind energy model in electrical power systems can be considered problematic. Moreover, the wind speed’s inconsistency can significantly impact the safety and stability of the micro-grid scheduling and wind turbines control that will further affect the load demand and balance of supply for the wind farm and energy quality [2]. Thus, in energy conversion and management, the optimal and accurate wind speed prediction models’ design can bring a stable bias for the generation and transmission of wind energy and diminish the operating costs of the power system.

Over the last few decades, various forecasting techniques have been developed to predict the wind speed time series. Typically, such methods can be categorised within three subgroups, including physical strategies, statistical methods, and artificial intelligence algorithms [3]. Physical strategies are the explicit approaches that use meteorological information such as density, temperature, roughness, and atmospheric pressure obstacles [4]. A common technique for numeric weather prediction (NWP) uses mathematical models based on the physical data to forecast wind speed. Nonetheless, this numerical method is not adequate for practical usage as it is not a straightforward process to collect such physical data, particularly for short-term wind speed forecasting.

The statistical methods are the second group used by researchers to forecast wind speed time series. The modeling of different natural phenomena were studied using several data analysis techniques, such as statistical and mathematical modeling containing time series analysis, regression modeling, optimization and numerical analysis [5,6,7,8,9]. For wind speed forecasting, the most well-known statistical methods are auto-regressive models (AR), auto-regressive moving average models (ARMA), and auto-regressive integrated moving average models (ARIMA). Lydia et al. [10] adopted the linear and nonlinear AR models to forecast wind speed from 10-min up to 1-h for a wind energy center in India. Their developed method uses the Gauss–Newton algorithm for parameter tuning of the ARs. They also measured the accuracy of their proposed model using three performance metrics. In another work, Ailliot et al. [11] proposed novel techniques called non-homogeneous Markov-Switching auto-regressive (MS-AR) models to measure wind speed forecasting for an island in France. Different weather types have been analyzed by their method. Torres et al. [12] used the ARMA to forecast the hourly average wind speed from 1-h up to 10-h time horizons ahead. The data for this work have been gathered from a period of 9 years of five locations with different topographic characteristics in Navarre (Spain). They have shown that the ARMA models have a better forecasting performance than the persistence model. Yunus et al. [13] developed an ARIMA model that can cost-effectively capture the probability distribution and time correlation for wind speed data. Their work’s simulation results show that their technique outperforms most of the persistence models to forecast short-term time horizons. As stated in [14], due to the inappropriate pre-assumed linear form, many statistical methods can not cope well against nonlinear wind speed characteristics.

With the rapid growth of feature selection, and machine learning approaches [15,16,17,18,19,20,21,22], numerous artificial intelligence (AI) strategies have been applied for several real-world problems [23,24,25,26,27,28,29,30] and have successfully been designed to address the non-stationary and randomization nature of wind speed time series. Generally, the existing AI-based wind speed forecasting methods can be classified into two categories, including traditional machine learning algorithms and deep learning methods [31]. Support vector machine (SVM) algorithm is one of the prominent categories of the traditional machine learning algorithms which has a strong generalization potential [32,33,34]. In recent work, Kong et al. [35] optimized the parameters of a specific type of SVM algorithms called reduced support vector machine (RSVM) using particle swarm optimization (PSO) algorithm for wind speed prediction. In another work, Yu et al. [36] integrated an SVM algorithm with recurrent neural network methods to forecast wind speed with success.

Artificial neural network (ANN) algorithms including backpropagation (BP), Elman neural network (ENN) [37], extreme learning machine (ELM) [38, 39], and radial basis function (RBF) [40] are the most commonly used traditional machine learning algorithms in many areas including the forecasting of wind speed time series. Cadenas and Rivera [41] utilized several BP models to forecast the short-term wind speed of Oaxaca city in Mexico. They showed the structure used for BP has acceptable accuracy for the energy supplier in Oaxaca. In [42], Guo et al. presented a hybrid algorithm based on the BP algorithm and seasonal exponential adjustment (SEA), in which the proposed algorithm was utilized to forecast the daily wind speed 1 year ahead for an area in China from 2001 to 2006. For ENNs, Wang et al. [43] proposed a novel algorithm optimizing these neural networks’ weights and thresholds using a multi-objective whale optimization algorithm for wind speed forecasting. In another work, a multi-objective satin bower-bird optimizer algorithm was employed by [44] to optimize and enhance the forecasting performance of the ENNs based on two real wind farms of China. Salcedo et al. [45] developed a combined wind speed forecasting model using coral reefs optimization algorithm based on a feature selection problem for training ELM [46, 47] networks. In [48], RBF neural networks were trained and optimized by a two-step novel mechanism, including the K-means clustering algorithm and non-dominated sorting genetic algorithm-II (NSGA-II) to maximize the coverage probability of the constructed prediction intervals for a wind speed dataset.

The deep neural network algorithms have gained substantial attention as another successful artificial intelligence category [49,50,51]. Chen et al. [52] developed a novel nonlinear-hybrid ensemble of deep LSTM models for forecasting wind speed time series. Their hybrid method has been evaluated through two case studies of data from a Chinese wind farm. Liu et al. in [53] proposed a new model for the wind speed multi-step forecasting by deep LSTM networks combined by empirical wavelet transform and ELM [54, 55] algorithms. In another work, Pei et al. [56] proposed a hybrid algorithm including new cell update LSTM combined with empirical wavelet transform for wind speed forecasting simulated on four different datasets. Besides, Khodayar et al. [57] presented a rough deep learning architecture combined by stacked denoising autoencoder (SDAE) and stacked autoencoder (SAE) to forecast wind speed for ultra-short-term and short-term horizons. Several of these studies and other applications of deep learning have shown that deep learning approaches have more accurate performance than traditional machine learning methods [58, 59]. In general, deep learning has demonstrated tremendous promise as an advanced and efficient machine learning paradigm for the wind speed forecasting field.

In the research work presented in [60], the authors introduced a hybrid method called VMD-DE-ESN, combining variational mode decomposition [61], differential evolution, and echo state for wind speed forecasting. This proposed algorithm showed efficient performance on four stations collected from a wind farm in northwestern Spain. In [62], a new deep learning approach of the gated recurrent unit has been effectively designed to be coupled with the wavelet soft threshold denoising to predict the wind speed series. By adjusting the GRU parameters using a cross-validated grid-search strategy, this deep learning-based hybrid model achieved high adaptability through several case studies. In [63], the authors presented a novel model of day-to-day wind speed forecasting focused on deep CNNs by exploiting the Taguchi’s orthogonal array. The experimental findings signify that the proposed efficient design-based CNN outperforms other existing benchmark models.

Among the deep learning approaches [64, 65], LSTM neural network generally has effective and strong performance due to its outstanding ability to cope with long-term time series problems [66, 67]. As a groundbreaking derivative of RNNs (recurrent neural networks), LSTMs can profoundly learn the temporal and long-term dependencies from time-series data and effectively solve the gradient problem compared to traditional RNNs [68]. Thus, these excellent characteristics of LSTM motivated us to consider the deep learning strategy in this work based on LSTM neural network.

Nonetheless, the empirical knowledge for selecting the values of hyperparameters in the LSTM neural network is unknown, and these hyperparameters will affect the forecasting potential of LSTM. Therefore, we introduce a novel deep neuroevolution method based on an enhanced version of grasshopper optimisation algorithm (GOA) to optimize these hyperparameters to increase the wind speed forecasting functionality. GOA is a recent promising optimization algorithm inspired by the swarming behavior of grasshoppers. This algorithm has already been utilized to plenty of stochastic and continuous optimization problems, proving its dominance over the most common meta-heuristics such as differential evolution [69], whale optimizer [70], particle swarm optimization [71], and genetic algorithm [72]. Saxena et al. [73] has introduced an improved version of GOA based on ten forms of chaotic maps in which the performance of these variants is successfully examined on several unimodal and multimodal benchmark functions. In work by Xu et al. [74], two techniques, namely orthogonal learning and chaotic exploitation, are implemented in the traditional GOA to explore a much more reliable trade-off between both the phases of exploration and exploitation. The analytical findings demonstrate that the modified version can alleviate the shortcomings of GOA and provide more high-quality solutions. An annealing behaved GOA with boosted exploratory and exploitative patterns was proposed by Yu et al. [75] for solving global optimization. For a comprehensive review on GOA, please refer to work presented by Abualigah and Diabat [76]. As mentioned in the previous works discussed above, there are some weaknesses in the standard model of GOA. It can easily collapse into the local optimum and demonstrate a slow convergence rate when faced with several challenging problems.

To further improve the performance quality of GOA, we add two powerful evolutionary operators into the basic GOA for the first time. These operators are based on chaos theory [77] and levy-flight technique, aiming to enhance meta-heuristic evolutionary algorithms’ performance for optimization problems. We name this improved version of basic GOA as enhanced GOA (EGOA).

As discussed by Jalali et al. [26], it is of great importance how to select the appropriate hyperparameters for DNN algorithms since their performance depends on the values of such hyperparameters. Due to the decentralized and relational feature representations, deep neural networks can learn nonlinear structures that are deeper and more dynamic than traditional machine learning models such as BP, ENN, ELM, and RBF neural networks, and SVM [78] algorithms. On the other hand, deep LSTM as a prominent deep neural network was successfully deployed to solving different time series real-world problems [53, 79]. The architecture of LSTM neural networks was mostly designed manually, which is a cost-effective and time-consuming procedure [80]. Nonetheless, in the field of wind speed forecasting, there remain little works to utilize the optimal design architecture for LSTM algorithms. In most of the studies that utilized deep learning technologies for wind speed forecasting, the authors designed the architecture of the utilized deep learning manually, which is a time-consuming procedure [81, 82]. Therefore, this paper aims to predict the wind speed with the highest accuracy using a novel optimization algorithm that automatically and efficiently designs the LSTM architecture.

In summary, the principal contributors of this paper are as follows:

  1. 1.

    We introduce an LSTM-based deep neuroevolution time series forecasting algorithm for exploring the implicit knowledge from wind speed time series. Moreover, the mutual information (MI) algorithm is implemented to determine the procedure of input variable selection. The obtained features by MI aid in selecting the most fitting size of the LSTM input window.

  2. 2.

    While the references such as [67, 83, 84] selects the deep LSTM hyperparameters by the trial-and-error procedure, which is a time-consuming procedure, to efficiently optimize the hyperparameters of the deep LSTM neural network in each layer, an efficient enhanced version of GOA evolutionary algorithm is conducted which we name it as EGOA. This modification enhances the GOA performance centered on chaotic theory and levy-flight strategies to obtain a faster convergence speed and make a more efficient balance between exploitation and exploration phases in the search space.

  3. 3.

    To the best of our knowledge, this work is the first study to utilize an enhanced version of the GOA evolutionary algorithm to optimize the hyperparameters of LSTM neural networks for wind speed forecasting.

  4. 4.

    Our proposed deep hybrid optimization algorithm shows an excellent forecasting performance compared to seven competitive classical and state-of-the-art methodologies for wind speed forecasting.

Two prediction intervals successfully show the proposed model’s supremacy: utmost short-term wind speed forecasting for 30-min ahead and short-term wind speed forecasting for 1-h ahead. The datasets used for our experiments are collected from two wind sites near Las Vegas and Denver in the USA. We compare our novel algorithm with several standard and hybrid state-of-the-art time series forecasting algorithms including back propagation (BP) [41], convolutional neural network (CNN) [85], long short-term memory (LSTM) [80], Xgboost [86], empirical mode decomposition and genetic algorithm-BP neural network (EMD-GABP) [87], differential evolution-LSTM (DE-LSTM) [88] and ensemble empirical mode decomposition–GA–particle swarm Optimization Wavelet Neural Network (EGP-WNN) [89] algorithms. The experimental results show that the proposed model is significantly superior to other compared standard models.

The remainder of this study is arranged as follows: Sect. 2 presents the related basic formulation for the proposed method. The experimental procedures for two US collected datasets of two different time-step horizons and discussions of the obtained experimental results are given in Sect. 3. Eventually, all major findings and future works are summarized and presented in Sect. 4.

2 Proposed method

This section describes how to develop our enhanced GOA evolutionary algorithm to optimize the structure of the LSTM neural networks by providing details.

2.1 Structure of basic GOA

Saremi et al. [72] recently proposed the swarm-based GOA based on imitating the behavior of grasshopper groups in the environment to realize optimal or sub-optimal solutions to the complex multimodal or composite hybrid problems. After initialization, the updating rule follows three laws: social interaction, gravity force, and wind advection. The current position of ith agent is referred to \(X_{i}\) and described by

$$\begin{aligned} X_{i}=S_{i}+G_{i}+A_{i} , \end{aligned}$$
(1)

where \(S_i\) denotes the variable for social interaction, \(G_i\) represents the gravity force and \(A_i\) denotes to the wind advection. Social interaction is the most influential component, based on its impact on the motion patterns, which can be determined as follows:

$$\begin{aligned} S_{i}= & {} \sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^{N} s\left( d_{i j}\right) \widehat{d_{i j}} \end{aligned}$$
(2)
$$\begin{aligned} d_{i j}= & {} \left| x_{j}-x_{i}\right| \end{aligned}$$
(3)
$$\begin{aligned} \widehat{d_{i j}}= & {} \left( x_{j}-x_{i}\right) / d_{i j} \end{aligned}$$
(4)
$$\begin{aligned} s(r)= & {} f e^{-r / l}-e^{-r,} \end{aligned}$$
(5)

where \(d_{ij}\) represents the distance between the agent i to the jth agent, and \(\widehat{d_{i j}}\) denotes to a unit vector between ith and jth agent. The function s determines the social forces, which can be evolved based on the f and l parameters. The distance between agents should be allocated between the [1,4] interval. The gravity force of an agent can be expressed as follows:

$$\begin{aligned} {G_{i}=-g \widehat{e}_{g}}, \end{aligned}$$
(6)

where g is the constant of gravity and \(\widehat{e}_{g}\) is the vector of unity towards the center of the earth. Grasshopper wind advection can be computed as following:

$$\begin{aligned} A_{i} =u \widehat{e_{w}}, \end{aligned}$$
(7)

where u denotes to a constant drift and \(\widehat{e_{w}}\) is a vector of unity in wind direction. Finally, Eq. (1) can then be generalized as follows:

$$\begin{aligned} X_{i} =\sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^{N} s\left( \left| x_{j}-x_{i}\right| \right) \frac{x_{j}-x_{i}}{d_{i j}}-g \widehat{e}_{g}+u \widehat{e}_{w} \end{aligned}$$
(8)

where the number of agents are denoted by N. The considerable influence of gravity force on the grasshopper is too slow and weak to be simply ignored and implicitly assumes that the direction of wind (A component) is always in the best solution \(\widehat{T_{d}}\). The logical model between the agents is also demonstrated in Fig. 1.

Fig. 1
figure 1

Primitive patterns between the agents in an update of GOA

After all, the mathematical formula is developed as follows:

$$\begin{aligned} X_{i}^{d}=c\left( \sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^{N} c \frac{u b_{d}-l b_{d}}{2} s\left( \left| x_{j}^{d}-x_{i}^{d}\right| \right) \frac{x_{j}-x_{i}}{d_{i j}}\right) +\widehat{T_{d}}, \end{aligned}$$
(9)

where \(u b_{d}\) is the dth dimension of upper boundary, \(lb_{d}\) is the dth dimension of lower boundary, \(\widehat{T_{d}}\) is the dth dimension value in the best solution so far obtained, and the parameter c is continuously updated to minimize exploration phase and help increasing exploitation phase according to the number of iterations through the following equation:

$$ c = c_{{\max }} - l\frac{{c_{{\max }} - c_{{\min }} }}{L}, $$
(10)

where the maximum value is represented by \(c_{\rm max}\), the minimum value is denoted by \(c_{\mathrm{min}}\), l corresponds to the current iteration, and L denotes the maximum iteration number.

2.2 Chaotic-population initialization

Boosting the balance of the swarm-based methods such as GOA is an essential part of the optimization process. For example, advanced variants of several other evolutionary and swarm intelligence methods such as boosted moth-flame optimizer (LGCMFO) [90], chaotic, random spare ant colony optimization [91], biogeography-based whale optimizer [92], double adaptive moth-flame optimizer [93], orthogonal learning grey wolf optimizer [94], Gaussian bare-bones fruit fly optimizer [95] have found their applications in both basic and advanced versions in many areas based on stabilizing the balance of the exploration and exploitation of the core processes. In this regard, the quality of the initial population can significantly impact the convergence speed, and solution accuracy with evolutionary algorithms that continuously desire optimization via population iteration [96, 97]. The basic GOA typically initializes the population randomly, making it hard to guarantee population diversity, leading to weak search results and performance. Therefore, it is essential to enhance the diversity of the initial population. Generally speaking, chaos is a pseudo-random movement formed by a stochastic deterministic mechanism that is initially sensitive to a value and then generates many pseudo-random patterns [98, 99]. It has the attributes of non-linearity, randomness, and consistency. These characteristics can easily eliminate the algorithm from the local optimal solution when solving function optimization problems to preserve population diversity and increase the global search efficiency [100, 101]. Among various chaotic maps having different function optimization abilities, the tent map has shown its greater performance than the other maps [102]. Therefore, we used tent map agent population initialization, which can be formulated as

$$\begin{aligned} x_{i + 1} = \left\{ {\begin{array}{*{20}l} {2 \times x_{i} ,} &{} {0 \le x_{i} \le 1/2;} \\ {2 \times \left( {1 - x_{i} } \right) ,} &{} {1/2 \le x_{i} \le 1.} \\ \end{array} } \right. \end{aligned}$$
(11)

Assume D represents the search space dimension and N denotes to the population size, the tent map sequence \(x_{ij} \left( {i = 1,2, \ldots ,N;\quad j = 1,2, \ldots ,D} \right) \) is generated by Eq. (11). Based on Eq. (12), the initialized population \(P_{0} = \left\{ {X_{ij} } \right\} \) is mapped into the search space as follows:

$$\begin{aligned} X_{ij} = x_{ij} \times \left( {X_{\mathrm{max} j} - X_{\mathrm{min} j} } \right) + X_{\mathrm{min} j}, \end{aligned}$$
(12)

where the maximum and minimum of the jth dimension are represented by \(X_{\mathrm{max} j}\) and \(X_{\mathrm{min} j}\), respectively.

2.3 Levy flight

Levy-flight (LF) was initially proposed in 1937 by Paul Levy, a French mathematician. In terms of levy statistics, many artificial and natural phenomena have been defined [103]. The LF is a well-respected subclass of non-Gaussian stochastic walks to distribute their step-length values concerning a stable Levy distribution. The levy distribution is accomplished as follows:

$$\begin{aligned} \mathrm{Levy }(\beta ) \sim u=t^{-1-\beta }, \end{aligned}$$
(13)

where \(\beta \) provides a significant levy index for stability adjustment. The levy random number is determined using the given equation:

$$\begin{aligned} \mathrm{Levy }(\beta ) \sim \frac{\varphi \times \mu }{|v|^{1 / \beta }}, \end{aligned}$$
(14)

where \(\mu \) and v represent the standard normal distributions, \(\Gamma \) denotes to a standard Gamma function, the value of \(\beta \) parameter is equal to 1.5, and \(\varphi \) is computed as follows:

$$\begin{aligned} {\varphi =\left[ \frac{\Gamma (1+\beta ) \times \sin (\pi \times \beta / 2)}{\Gamma \left( \left( \frac{1+\beta }{2}\right) \times \beta \times 2^{\frac{\beta -1}{2}}\right) }\right] ^{1 / \beta }}. \end{aligned}$$
(15)

For achieving a potential trade-off between the capability of evolutionary algorithms to exploration and exploitation, LF approach is employed to update the position of each agent, which is calculated as follows:

$$\begin{aligned} X_{i}^{\mathrm{levy }}=X_{i}+r \oplus {\text {levy}}(\beta ), \end{aligned}$$
(16)

where \(X_{i}^{\mathrm{levy }}\) represents the new position of the ith agent \(X_i\), r denotes to a random vector in [0,1] interval, and \(\oplus \) is the dot product (entry-wise multiplications).

2.4 Enhanced GOA

This section outlines the proposed enhanced GOA (EGOA) in detail. In EGOA, first, we adopt the chaos theory to boost the quality of the initial population position as described in detail in Sect. 2.2. Then we utilize the Levy flight strategy into the GOA to address the original GOA’s drawback to make a more appropriate balance between exploration and exploitation phases. Section 2.3 defined the fundamental principles of the levy flight strategies in detail. As it is well-known regarding evolutionary algorithms, search agents’ diversity is crucially important since diversity enables the population to search functionality towards the global optimum. The levy flight component was utilized in GOA to improve GOA population diversity. To this end, once the position of ith search agent \(X_i\) is updated, the levy flight component is incorporated to deploy a new candidate solution. The modified mathematical equation for the enhanced GOA is defined as follows:

$$\begin{aligned} \mathrm {X}_{i}^{\mathrm{levy }}= \, & {} \mathrm {X}_{i}^{*}+{\text {rand}}(d) \oplus {\text {levy}}(\beta ) \end{aligned}$$
(17)
$$\begin{aligned} \mathrm {X}_{i}^{t+1}= & {} \left\{ \begin{array}{ll}{\mathrm {X}_{i}^{\mathrm{levy }}} &{} {\text { fitness }\left( \mathrm {X}_{i}^{\mathrm{levy }}\right) >\text { fitness }\left( \mathrm {X}_{i}^{*}\right) } \\ {\mathrm {X}_{i}^{*}} &{} {\text { otherwise }},\end{array}\right. \end{aligned}$$
(18)

where \({X}_{i}^{*}\) represents the current agent position after the new update, and rand(d) is a random d-dimensional vector into the interval of [0,1]. Since levy flight is a randomized procedure where the jump’s size typically follows the levy probability distribution function, the new candidate solution obtained via the levy flight algorithm has a significant chance of jumping from the local optimum and achieving a superior solution. Search agents with more excellent fitness are preserved in the population to guarantee the reliability of the population. Therefore, the levy flight mechanism can cause competitive agents to move faster towards the global optimum. As a result, since incorporating the chaotic theory and levy flight strategies help to enhance the capabilities of GOA, we name this novel proposed method as enhanced GOA (EGOA).

2.5 LSTM

LSTM neural network is a deep learning algorithm with time-varying inputs and targets. It also has an excellent performance in time-series data processing thanks to its outstanding ability to solve long-term dependency problems. The cornerstone of the LSTM neural network is the memory cell, which can preserve the temporal state. The input gate can add or remove the information to the cell state with the memory cell, forget gate, and the output gate. Figure 2 describes a sample unit of a LSTM network. The key stages of this neural network are explained as follows in three stages:

  1. 1.

    The input gate monitors the input activation when the input gate is activated, and the new input information is received to the memory cell.

  2. 2.

    The forget gate forgets the unimportant contents. Thus, the past cell status is forgotten in the pipeline when the forget gate is enabled.

  3. 3.

    The output gate regulates the output activation. Thus, the current cell output is propagated to the final state when the output gate is enabled.

The three gates are sigmoid units that adjust each item in the interval of [0, 1]. The standard sigmoid logistics function is specified as follows:

$$\begin{aligned} \sigma (x)=\frac{1}{1+e^{-x}}. \end{aligned}$$
(19)

The ith entry gate regulates the input information that passes into the memory cell, resulting in the following:

$$\begin{aligned} i_{t}=\sigma \left( w_{x i} x_{t}+w_{h i} h_{t-1}+b_{i}\right) . \end{aligned}$$
(20)

Forget gate \(f_t\) regulates forgetting cell information, in which

$$\begin{aligned} f_{t}= \sigma \left( w_{x f} x_{t}+w_{h f} h+b_{f}\right) . \end{aligned}$$
(21)

Output gate \(o_t\) regulates the output information that flows from the cell, deriving from the following equation:

$$\begin{aligned} o_{t}=\sigma \left( w_{x 0} x_{t}+w_{h o} h_{t-1}+b_{o}\right) . \end{aligned}$$
(22)

For the time t, a tanh function quantifies the input characteristics by inputting \(x_t\) and the previous hidden state \(h_{t-1}\) as follows:

$$\begin{aligned} g_{t}=\tanh \left( w_{x c} x_{t}+w_{h c} h_{t-1}+b_{c}\right) . \end{aligned}$$
(23)

Here, the memory cell is updated through regulated input features and the partial forgetting of previous memory cell, which provides

$$\begin{aligned} {c_{t}=f_{t} * c_{t-1}+i_{t} * g_{t}}. \end{aligned}$$
(24)

The hidden output status \(h_t\) is eventually determined by the output gate \(o_t\) and the memory \(c_t\), where

$$\begin{aligned} {h_{t}=o_{t} * \tanh \left( c_{t}\right) }. \end{aligned}$$
(25)

Therefore, the LSTM output \(y_t\) is determined as follows:

$$\begin{aligned} {y_{t}=\sigma \left( w_{h y} h_{t}+b_{y}\right) }. \end{aligned}$$
(26)

In Eqs. (20)–(26), the \(w_{xi}\), \(w_{xf}\), \(w_{xo}\), and \(w_{xc}\) are the proper input weights. \(w_{hi}\), \(w_{hf}\), \(w_{ho}\), and \(w_{hc}\) matrices represent the recurrent weight matrices, and \(w_{hy}\) denotes to the matrix of hidden output weight. The corresponding bias vectors are represented by \(b_i\), \(b_f\), \(b_o\), \(b_c\), and \(b_y\).

Fig. 2
figure 2

Structure of the deep LSTM neural network block

2.6 Proposed EGOA-LSTM Method

This section presents the proposed wind speed forecasting method called EGOA-LSTM. This method aims to utilize the improved GOA algorithm to optimize the LSTM neural network’s hyperparameters, leading to improving the wind speed forecasting model’s accuracy. Before applying EGOA, two issues should be considered, including representation of solutions and calculation of fitness function. It should be noted that four different hyperparameters, including batch size, learning rate, maximum epoch, and neural units, are considered in the proposed method to be optimized by the EGOA algorithm. Therefore, each solution in EGOA can be represented as a vector with four dimensions, each of which corresponds to one of the four hyperparameters. Learning rate is a hyperparameter with continuous values, which EGOA can obtain its optimal value. In contrast, batch size, maximum epoch, and neural units are other hyperparameters with discrete values. As EGOA explores solution space in continuous mode, we need to convert these hyperparameters’ optimal values to their corresponding discrete values. To this end, each real value can be converted to an integer value using the following equation:

$$\begin{aligned} y_{ij} = \left\lfloor b_j \times \frac{x_{ij} - lb}{ub - lb}+0.5\right\rfloor , j=1, \ldots , n, \end{aligned}$$
(27)

where \(b_j\) is the total number of the item of type j, \(x_{ij}\) is the real number corresponds to the jth dimension of the solution \(X_i\), \(y_{ij}\) is the converted integer value, lb and ub are respectively the lower and upper bounds of the search space.

In the proposed EGOA-LSTM method, first of all, the initial population with n solutions is randomly initialized using Eq. (8). Each solution is denoted by a four-dimensional vector \(X_{ij}, i=1,\ldots ,n\) and \(j=1,\ldots ,4\) where each dimension j corresponds to one of the four LSTM hyperparameters. After the initialization of the initial population, new solutions can be obtained by repeatedly updating the solutions’ current positions using Eq. (9). Moreover, the levy flight strategy is applied to the updated positions to balance exploration and exploitation using Eqs. (17) and (18). The procedure repeats until the termination criterion is reached, and then the best-obtained solution is considered as the final result. This obtained solution can be used as the optimal values of the LSTM hyperparameters. To evaluate the usefulness of each solution, we need to define a fitness function. To this end, the input time series data is divided into two sets, including training and test. The training set is used to optimize the LSTM hyperparameters using EGOA, while the test set is used to evaluate the final obtained wind speed forecasting model’s performance. Suppose that \(\vec {y}\) is a vector to denote the historical wind speed time series data for M time steps expressed as follows:

$$\begin{aligned} \vec {y} =(y_{(0)}, y_{(1)}, \ldots y_{(M-1)}), \end{aligned}$$
(28)

where \(y_{(t)}\) denotes the actual wind speed value for the time step t. The purpose of the proposed wind speed forecasting model is to predict the wind speed values of the next N time steps using LSTM neural network which these predicted values can be represented as follows:

$$\begin{aligned} \vec {\hat{y}} = \left(\hat{y}_{(M)}, \hat{y}_{(M+1)}, \ldots , \hat{y}_{(M+N-1)}\right), \end{aligned}$$
(29)

where \(\vec {y}_{(t)}\) denotes the predicted wind speed value for the time step t. It should be noted that each solution in EGOA is used to configure an LSTM model based on the hyperparameters’ obtained values. Therefore, the configured LSTM model’s performance on forecasting wind speed values can be considered as the fitness function. To this end, the input vectors of the LSTM model are represented using Eq. (28) based on the training data. The LSTM model is then utilized to predict the wind speed values of the next N time steps, which are represented using Eq. (29). To calculate the fitness value of each solution in EGOA, the mean square error can be used as follows:

$$\begin{aligned} MSE = \frac{1}{n}\sum _{i=1}^{n} (y_i - \hat{y}_i)^2, \end{aligned}$$
(30)

where \(y_i\) is the actual wind speed value and \(\hat{y}_i\) is the predicted wind speed value obtained by the LSTM neural network. Obviously, a solution with a lower MSE value has a higher fitness value and vice versa. Therefore, the proposed method aims to obtain a solution with the lowest MSE value (i.e., highest fitness value) containing the optimal values of LSTM hyperparameters. This leads to obtaining an LSTM model with maximum performance forecasting wind speed values in the test set. After determining the optimal values of LSTM hyperparameters using EGOA, the configured LSTM model is used to predict wind speed values in the test set. Algorithm 1 represents the overall steps of the proposed EGOA-LSTM method. In Fig. 3, the deep proposed model’s whole procedure is illustrated. Also, the flowchart of the proposed wind speed forecasting model is depicted in Fig. 4.

Fig. 3
figure 3

The schema of deep EGOA-LSTM model for wind speed forecasting

figure a
Fig. 4
figure 4

Flowchart of the proposed EGOA-LSTM model for wind speed forecasting

3 Experimental results

Fig. 5
figure 5

Location of wind speed site for Las Vegas case study

Fig. 6
figure 6

Location of wind speed site for Denver case study

3.1 Data

In contrast to several studies such as [53, 56, 82] which used a small amount of wind speed data (usually less than one-thousand samples) for showing the efficiency of their proposed deep learning algorithms, the 30-min interval between consecutive historical samples of two wind stations in the US for the whole year 2012 has been used in this study. Western Wind Dataset [104] created by the National Renewable Energy Laboratory (NREL) and 3TIER, the wind speed time series estimated for two wind sites in Las Vegas and Denver are used to examine the efficiency of the proposed EGOA-LSTM algorithm. The location of these two wind sites are shown in Figs. 5 and 6. In total, there are 17520 wind speed values measured in intervals of 30 min for each of two wind stations. Thus, sufficient data are available to train and test our proposed deep learning method. Similar to [80], 70% of each dataset is considered for training sets while 10% is used for validation sets and the rest is dedicated to testing sets. At the beginning of the experiments, raw datasets were pre-processed into the interval of [0,1] using Eq. (31) to improve the forecasting efficiency. The final goal is to predict the next forecasting horizons for the next 30 min (one-step) and 1-h (two-step) ahead.

$$\begin{aligned} z = \frac{z - z_\mathrm{min}}{z_\mathrm{max} - z_\mathrm{min}}. \end{aligned}$$
(31)

3.2 Evaluation metrics

Four loss functions are employed to assess the prediction performance of the proposed model as the criterion related to the wind speed values including root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and R squared (\(R^2\)). The lower the loss function value, the higher the model accuracy for wind speed forecasting. The formulas of the performance evaluation metrics are as follows:

$$\begin{aligned} \mathrm{RMSE}= & {} \sqrt{\left( \frac{1}{n}\right) \sum _{i=1}^{n}(y_{i}' - y_{i})^{2}} \end{aligned}$$
(32)
$$\begin{aligned} \mathrm{MAE}= \, & {} \left( \frac{1}{n}\right) \sum _{i=1}^{n}| y_{i}' - y_{i}|\end{aligned}$$
(33)
$$\begin{aligned} \mathrm{MAPE}= \, & {} \left( \frac{1}{n}\right) \sum _{i=1}^{n} \left| \frac{y_{i}' - y_{i}}{y_{i}}\right| \end{aligned}$$
(34)
$$\begin{aligned} R^2= \, & {} 1 - \frac{\sum _{i=1}^{n}(y_{i}' - y_{i})^2}{\sum _{i=1}^{n} (y_{i}' - \widetilde{y_{i}}) ^2 }, \end{aligned}$$
(35)

where \(y_i'\) represents the predicted wind speed value of corresponding \(y_i\) and n indicates the number of data points in the test set.

3.3 Input feature selection

Input feature selection [105,106,107] is a fundamental, and yet crucial consideration in determining the optimal structure of data-driven models. In the literature, several studies such as [108] have operated auto-correlation function (ACF) to achieve the cross-correlation of wind speed time series at various time instances. As ACF can only calculate linear dependency of variables with themselves, and the wind speed information is highly nonlinear in nature, mutual information (MI) is an effective strategy to estimate the data’s nonlinear and linear correlations. Assume X and Y are considered as two random variables. The entropy of X represented by H(X) is a metric of its uncertainty, and the joint entropy of X and Y are donated by H(XY). The conditional entropy calculated by \(H(Y|X) = H(X,Y) - H(X)\) indicates the uncertainty of Y due to the observation of the variable X. The MI is a nonlinear equation between two random variables to calculate the amount of information acquired about a variable if the other variable is observed. MI is determined by \(I(X,Y) = H(Y) - H(Y|X)\) which reduces the uncertainty of variable Y due to the observation of variable X, and vice versa.

Suppose v(t) as the value of wind speed at time t, the MI between \(v(t - l + 1)\) and \(v(t + 1)\) is calculated considering l as the time lag. Following the selection of the most relevant inputs for our deep EGOA-LSTM algorithm, the wind speed values equivalent to time-lags with MI more than \(x= 0.4\) are considered for input sets to highlight the correlation in two wind datasets for 30-min and 1-h ahead forecasting horizons. In Fig. 7, MI for the lag \(l=1\) to \(l=200\) of the Las Vegas dataset for 30-min ahead interval is illustrated. As it is indicated from this figure, the correlation among the wind speed observations is increased by the time-lag. As a result, time-lags from \(l=1\) to \(l=29\) are incorporated. Assume the current time is t and we are going to predict the wind speed values for a future time horizon. Then, our input set is a 29+28 = 57 dimensional vector \(v(t - 28), \Delta v(t - 27), v(t - 26), \ldots , v(t)\) with the sequential difference \(\Delta v(t) = v(t) - v(t - 1)\) of the wind speed data.

Fig. 7
figure 7

Mutual information of various time-lags for Las Vegas dataset

3.4 Parameter settings

In this section, we describe the default configurations for performing our proposed EGOA-LSTM algorithm. Regarding the initialized parameters for EGOA, we set the number of population = 30, the maximum number of iterations = 20, and the number of runs = 20 for each dataset. Two main parameters for GOA are \(C_\mathrm{max}\) and \(C_\mathrm{min}\), which their values are set to 1 and 0.00004, respectively. These values are selected based on the recommended literature [72]. There are four key hyperparameters for training the deep LSTM neural network, including maximum epoch (\(M_\mathrm{e}\)), neural units in the hidden layer (\(N_\mathrm{u}\)), batch size (\(B_\mathrm{s}\)), and learning rate (\(L_\mathrm{r}\)), which are fed to EGOA. The range of these hyperparameters is shown in Table 1. Previous works [109,110,111] have used a more limited range of hyperparameters. However, in this study, we have chosen a wider range of hyperparameters to train LSTM. Moreover, the number of layers that have been used for designing the LSTM architecture is denoted to three. To further assess our proposed approach’s predictive ability, the proposed deep neuroevolution model is compared with the recently proposed deep learning models. The single and hybrid algorithms presented herein are used as compared models to highlight the efficiency of the EGOA-LSTM. These models are backpropagation (BP) [41], convolutional neural network (CNN) [85], long short-term memory (LSTM) [80], Xgboost [86], empirical mode decomposition and genetic algorithm-BP neural network (EMD-GABP) [87], differential evolution–LSTM (DE-LSTM) [88] and ensemble empirical mode decomposition–GA–particle swarm Optimization Wavelet Neural Network (EGP-WNN) [89] algorithms. The configuration for these compared single and hybrid approaches are based on their recommended literature. The proposed EGOA-LSTM model is implemented in the Python programming language [112] version 3.7, TensorFlow 1.15, CUDA 10.1, cuDNN 8.0.5 and executed on an NVIDIA GTX 1080 Ti GPU, RAM of 32 GB, and Intel Core i7 machine with 3.7 GHz 12 cores CPU.

Table 1 The hyperparameters of the deep LSTM network during the evolution

3.5 Analysis of the results and discussion

In this section, we report the results of experiments for two case studies with two forecasting horizons. We then discuss these results in detail.

3.5.1 Las Vegas case study

In this case study, the wind speed data recorded for every 30 min was utilized as the dataset. We consider this wind speed dataset for forecasting of utmost short-term 30 min (one-step ahead) ahead and short-term 1 h (two-step ahead) ahead.

Tables 2 and 3 report the forecasting performance of the different prediction algorithms for the 30-min ahead and 1-h ahead wind speed data, respectively. Moreover, Fig. 8 demonstrates the actual and predicted values of different forecasting algorithms for the next 30-min ahead of the Las Vegas dataset. The blue and red colors seen in these figures represent the actual and predicted data values of the algorithms used in this paper, respectively. The convergence curve for two different horizons of the Las Vegas dataset is also demonstrated in Fig. 9. Also, the violin plots of four hyperparameters involved in optimizing LSTMs using our novel deep neuroevolution method are illustrated in Figs. 10 and 11.

From Table 2 and Fig. 8, it is noteworthy that the proposed EGOA-LSTM carries out better than the compared forecasting techniques with the minimum value of RMSE as 0.033647, MAE as 0.019135, MAPE as 24.42821 and the maximum value of \(R^2\) as 0.956096 in terms of next 30-min wind speed prediction. On the other hand, the best algorithm among compared predictive models is EGP-WNN with RMSE as 0.037143, MAE as 0.025895, MAPE as 51.28683 and \(R^2\) as 0.946497 whereas Xgboost is the worst one with RMSE as 0.158511, MAE as 0.147968, MAPE as 418.574722 and \(R^2\) as 0.467719. It appears from Fig. 8 that the EGOA-LSTM demonstrates better curve fitting of the actual wind speed time series compared to other forecasting models.

Table 3 shows that the EGOA-LSTM achieves better performance than the compared forecasting algorithms in terms of next 1-h ahead wind speed forecasting, including the minimum value of RMSE as 0.064619, MAE as 0.038741, MAPE as 66.02486, and the maximum value of \(R^2\) as 0.838482. Among the compared models, EGP-WNN is the leading algorithm with minimum values in terms of RMSE, MAE, MAPE, and maximum value for \(R^2\). The convergence profile of the proposed EGOA-LSTM algorithm for the Las Vegas dataset using two different forecasting horizons is shown in Fig. 9. As we can see in this figure, the prediction error for 1-h ahead of forecasting is much higher than 30-min ahead of forecasting. Moreover, our proposed method converges properly to the end of the maximum iteration number for both forecasting horizons.

The violin plots using four different hyperparameters evolved into EGOA-LSTM algorithm for 30-min and 1-h ahead wind speed forecasting are illustrated in Figs. 10 and 11. In an overview of these two figures, we reveal that the EGOA-LSTM assigns values to deep LSTM hyperparameters that do not have high computational volumes (usually less than the maximum value of the interval). For instance, by looking into the batch size values for both 30-min and 1-h intervals of the Las Vegas dataset, we understand that most of the assigned values are around and less than the median (the line shown in the figure). Such an interpretation applies to the other three hyperparameters and indicates the high capability of the proposed evolutionary search algorithm in initializing hyperparameters of the LSTM neural network.

To evaluate the proposed algorithm’s performance statistically, we demonstrate the boxplots of RMSE rates in Figs. 12 and 13 for the proposed EGOA-LSTM versus the other benchmarks in tackling Las Vegas dataset for two forecasting horizons. As seen from these two figures, in two forecasting horizons, the dominance of the proposed deep EGOA-LSTM is evident.

Table 2 Error estimated results of the predictions of the 30-min ahead wind speed time series for Las Vegas dataset. The bold values represent the best performance evaluation metric
Table 3 Error estimated results of the predictions of the 1-h ahead wind speed time series for Las Vegas dataset. The bold values represent the best performance evaluation metric
Fig. 8
figure 8

The wind speed forecasting results of 30-min ahead obtained by different algorithms on Las Vegas case study

Fig. 9
figure 9

The convergence profile of EGOA-LSTM algorithm for two forecasting horizons of Las Vegas case study

Fig. 10
figure 10

Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Las Vegas case study of the 30-min ahead forecasting

Fig. 11
figure 11

Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Las Vegas case study of the 1-h ahead forecasting

Fig. 12
figure 12

RMSE box plots of all models for Las Vegas 30-min ahead forecasting

Fig. 13
figure 13

RMSE box plots of all models for Las Vegas 1-h ahead forecasting

3.5.2 Denver case study

This section investigates the forecasting of the next 30-min and 1-h ahead for wind speed time series of Denver collected dataset. Tables 4 and 5 display the performance results of forecasting compared algorithms. The visualization for different forecasting algorithms based on the test set’s actual and predicted points is shown in Fig. 14.

Table 4 demonstrates that the proposed EGOA-LSTM model dominates other compared forecasting methods with the minimum value of RMSE as 0.042213, MAE as 0.028105, MAPE as 40.930122 and maximum value of \(R^2\) as 0.916746 for utmost short-term 30-min wind speed forecasting. Among compared prediction algorithms, the best forecasting performance is denoted to EGP-WNN with RMSE as 0.045463, MAE as 0.030756, MAPE as 43.49489, and \(R^2\) as 0.896977. From Fig. 14, we notice that our novel deep neuroevolution method’s actual and predicted points are met properly. We also observe such dominance of our proposed method in Table 5 for 1-h ahead wind speed forecasting with the maximum value of \(R^2\) as 0.746495, minimum values of RMSE as 0.071538, MAE as 0.049332 and MAPE as 73.45831 while the performance of the best predictive algorithm among compared models indices to EGP-WNN with RMSE as 0.073322, MAE as 0.052651, MAPE as 85.09159 and \(R^2\) as 0.74649. As it can be seen in Fig. 14, the wind speed predicted by the EGOA-LSTM model demonstrates more similarities with the actual data points and conducts fewer errors in the Denver case study.

Figure 15 shows the convergence curve for the EGOA-LSTM algorithm using 30-min and 1-h ahead horizons for the Denver case study. Like the Las Vegas case study, EGOA-LSTM is easily converged to the maximum iteration number (20), and it generates fewer error values for 30-min ahead prediction compared with 1-h ahead horizon. Besides, four utilized LSTM hyperparameters involved in optimization procedures with EGOA obtain low computational volumes of hyperparameters, as shown in Figs. 16 and 17. For example, the proposed algorithm for the initializing of learning rate hyperparameter in both cases mostly chooses values that are closer to the beginning of the interval or the median, indicating that the algorithm is effective in initializing the LSTM hyperparameters. Moreover, the boxplots of two forecasting horizons of the Denver case study are illustrated in Figs. 18 and 19. We notice from these two figures that the novel EGOA-LSTM performs better than all single and hybrid benchmarks. Finally, we present the best architectures obtained by the proposed algorithm for both databases in the 1-step (30-min) and 2-step (1-h) ahead time periods in Table 6. As an example, we can see that for the prediction of the next 30-min ahead of the Denver case study, the algorithm selects the maximum epoch = 50, the number of units = 23, batch size = 50, and learning rate = 0.0001, which results in the RMSE equal to 0.033562.

Table 4 Error estimated results of the predictions of the 30-min ahead wind speed time series for Denver dataset. The bold values represent the best performance evaluation metric
Table 5 Error estimated results of the predictions of the 1-h ahead wind speed time series for Denver dataset. The bold values represent the best performance evaluation metric
Fig. 14
figure 14

The wind speed forecasting results of 30-min ahead obtained by different algorithms on Denver case study

Fig. 15
figure 15

The convergence profile of EGOA-LSTM algorithm for two forecasting horizons of Denver case study

Fig. 16
figure 16

Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Denver study of the 30-min ahead forecasting

Fig. 17
figure 17

Violin diagram of the values obtained from the four hyperparameters used in the EGOA-LSTM algorithm for the Denver case study of the 1-h ahead forecasting

Fig. 18
figure 18

RMSE box plots of all models for Denver 30-min ahead forecasting

Fig. 19
figure 19

RMSE box plots of all models for Denver 1-h ahead forecasting

Table 6 The best architectures obtained by EGOA-LSTM based on RMSE error metric

We focus on discussing and comparing the proposed EGOA-LSTM model with other conventional forecasting algorithms in a nutshell. We can notice from the experimental findings for utmost short-term wind speed forecasting and short-term wind speed forecasting for both case studies that EGOA-LSTM performs superior in error prediction indicators (RMSE, MAE, MAPE and \(R^2\)) among comparative forecasting algorithms, including CNN, LSTM, BP, Xgboost, EMD-GABP, DE-LSTM, and EGP-WNN. Moreover, the proposed EGOA-LSTM model meets most of the actual and predicted points of both case studies.

From both convergence curves for two case studies, we can easily understand that short-term wind speed forecasting is more costly and challenging than utmost short-term wind speed forecasting when the prediction time horizon was lengthened from 30 min to 1 h with wind speed time-horizon rising. Besides, to show the effectiveness of the proposed EGOA-LSTM from the statistical point, we evaluated it using the boxplots for different horizons of two datasets. The results indicate that the novel EGOA-LSTM performs superior compared to other benchmarks used in the experiments.

Based on the evaluation error results, EGP-WNN itself shows a robust performance among compared benchmarks. Our proposed deep neuroevolution, which optimizes the four key hyperparameters of LSTM networks, improves the generalization robustness and competency of single LSTMs. On the other hand, the results of optimized hyperparameters visualized in violin plots for two forecasting horizons of two datasets show that EGOA-LSTM has not chosen the complex and heavy values during optimization of LSTMs. This behavior shows the low cost-efficiency of the EGOA-LSTM algorithm. According to the discussions in this section, we conclude that the EGOA-LSTM algorithm proposed in this study is efficient and promising, which can be considered as a reliable alternative strategy for wind speed time series forecasting.

4 Conclusions and future directions

Wind speed forecasting is an essential problem in the conversion, consumption, and wind energy operation, which has received much attention in recent years. This paper presented a novel deep neuroevolution approach for wind speed forecasting, using the optimization of deep learning time series LSTM algorithm based on an enhanced version of GOA (EGOA) involving chaotic theory levy-flight operators. Involving these two powerful evolutionary operators into the original GOA makes adjusting and balancing the primary GOA exploration and exploitation phases. In this study, evolved LSTM neural networks were introduced to EGOA to optimize the hyperparameters of LSTMs to learn and predict the data of wind speed time series. To confirm the feasibility of the proposed EGOA-LSTM, two data-collection case studies from two wind stations near Las Vegas and Denver in the USA were introduced to forecast the utmost short-term wind speeds including 30-min short-term wind speed and 1 h ahead. We used the mutual information as the feature selection strategy to determine our proposed deep learning model’s optimal inputs. Compared to other prominent forecasting methods such as LSTM, CNN, BP, Xgboost, EMD-GABP, DE-LSTM, and EGP-WNN, our novel EGOA-LSTM algorithm obtained the best prediction performance with the minimum values of RMSE, MAE, and MAPE and the maximum value of \(R^2\). Furthermore, the analysis of the evolved hyperparameters’ impact on the forecasting performance of the LSTMs presented that the hyperparameters of LSTMs optimized by the EGOA obtained a low computational cost. The proposed EGOA-LSTM algorithm achieved adequate wind speed forecasting performance based on the nonlinear-learning features of LSTMs and EGOA.

In this paper, we analyzed the univariate time series prediction for wind speed forecasting. For future works, the scholars can research for multivariate time series prediction of further complicated wind speed prediction based on more advanced deep neuroevolution models using more interdependent attributes such as power system statuses and weather conditions. Moreover, an attempt can be made to develop more optimal deep learning algorithms to promote green energy resources forecasting. A further valuable orientation might be to expand the datasets’ size, which would allow the training process more robust against over-fitting. An analysis of the wind speed results for the forecasts of the next few hours and multi-day would be undertaken as another future work. We may further use the proposed deep neuroevolution strategy proposed in this work to obtain probabilistic forecasts to quantify the corresponding uncertainties in the wind speed datasets. Also, the new GOA-based model can be applied to areas such as neural network-based robotic systems [113].