1 Introduction

As one kind of the most promising green energy, wind power is rapidly growing around the world [1]. In the past 3 years, it has been doubled for the accumulative installed capacity of wind energy generation. It is estimated that in 2020, approximately 12% of total world electricity demands will be supplied from wind power [2]. With the large scale penetration of wind energy in the power grids, a number of challenges have been posed because of the intermittent and stochastic natures of wind speed fluctuations [3, 4]. These uncertainties of wind speed fluctuations can put the system reliability and power quality at risk [5]. In order to deal with the problem and improve the utilization efficiency of wind energy, accurate wind speed prediction is indispensable [6].

In general, there are two main categories for wind speed prediction in the time scales: short-term prediction and long-term prediction [7]. The time scales of short-term prediction are minutes, hours and days, and the time scales of long-term prediction are months and years. It is very important to improve the short-term prediction precision for guaranteeing the safety of power grid and reduce the cost of wind power generation [8]. It is beneficial to improve the long-term prediction precision for planning the windmills [9]. In this study, we mainly focus on the prediction issue of short-term wind speed.

Extensive efforts have been devoted for enhancing the prediction ability of short-term wind speed in recent years. According to existing literatures, there are two main categories for the short-term wind speed forecasting [10,11,12]. The first category can be referred to conventional statistical models, such as autoregressive moving average model (ARMA), Kalman filter, stochastic model, Markov chain, and so on [13,14,15,16,17]. These approaches try to use the historical wind speed data to construct prediction models, and the statistical regularities of wind speed fluctuations are described. They have been successfully used to predict the short-term wind speed. For example, Lalarukh and Yasmin used the ARMA model to predict the short-term wind speed in Quetta, Pakistan [13]. Louka et al. [14] used the Kalman filter technique to predict the short-term wind speed. Bivona et al. [16] presented a stochastic model to forecast the short-term wind speed. Shamshad et al. [17] presented the Markov chain models to forecast the short-term wind speed. All empirical results indicated that these conventional statistical models are suitable for forecasting the short-term wind speed. However, the prediction performance of these models will be worse if the nonlinear features of wind speed fluctuations are obvious. In other words, these statistical models may be insufficient to capture the hidden nonlinear features in wind speed.

To overcome the drawbacks of the conventional statistical models, artificial intelligence (AI) techniques with powerful nonlinear self-learning capacities, including artificial neural networks (ANNs), support vector regressions (SVRs), fuzzy logic methods, etc., have become increasingly popular for wind speed forecasting [18,19,20,21,22,23]. For example, Cadenas and Rivera [18] used the ANN to forecast the wind speed in the region of La Venta, Oaxaca, Mexico, and the empirical results indicated that the ANN could increase the prediction ability of short-term wind speed. Li and Shi [19] applied three ANNs to the hourly wind speed forecasting. In Ref. [20], Flores et al. used a back-propagation neural network (BPNN) to forecast the wind speed, and the conclusion indicated that the BPNN could increase the accuracy of wind speed forecasting. Zhou et al. [21] applied the SVR with fine tuning parameters to forecast the wind speed, and the conclusion indicated that the proposed model outperformed the persistence model for predicting the short-term wind speed. Hu et al. [22] presented a control algorithm based on \(\upnu \)-support vector regression and augmented Lagrange multiplier method for wind speed forecasting, and the results indicated that the proposed control algorithm could increase the forecasting effectiveness. Kavousi-Fard et al. [23] proposed a fuzzy-based prediction interval for wind power prediction. More introductions to AI methods can be found in [24,25,26,27,28,29,30,31].

Even though AI methods (e.g., ANN, SVR, and fuzzy logic) provide a great deal of promise, they also suffer from a number of shortcomings such as the time wasting, slow convergence, local minima, and the risk of model over-fitting [32, 33]. In order to overcome these drawbacks, intelligent optimization algorithms, mainly including genetic algorithm (GA) and particle swarm optimization (PSO), have been successfully used to optimize the model parameters of AI and enhance the prediction performance of them [34,35,36]. For instance, Hu et al. proposed a short-term traffic flow hybrid forecasting method based on PSO and SVR. Experimental results showed that the hybrid method could get accurate forecasting results than individual models [34]. Cao and Parry [35] examined the relative effectiveness of hybrid model based on ANN and GA in forecasting future earnings per share. Gu et al. [36] proposed a housing price forecasting based on GA and SVR.

As a climate-driven renewable resource, the seasonal variations and trend variations of wind speed fluctuations are two most commonly encountered phenomena. Generally speaking, two kinds of variations in wind speed are mutual penetration, and the seasonal information of wind speed is often neglected in the most existing researches. The phenomena will cause large deviation in wind speed prediction. According to Zhang and Qi [37], seasonal information extraction (SIE) can extremely reduce the prediction error in many seasonal time series. The SIE can decompose the seasonal time series into seasonal and trend components, and can help extract the seasonal information and make forecasting more efficient. Therefore, In order to increase the accuracy of wind speed forecasting, this study uses the SIE technique to extract the seasonal information from wind speed.

On the other hand, the multi-patterns of wind speed fluctuations are the other commonly encountered phenomena because of wind being a climate-driven green energy which is influenced by many meteorological parameters. Thus, accurate wind speed forecasting is a difficult task if these forecasting models are directly constructed by the original wind speed. In order to improve the predictive accuracy, it is necessary to consider and analyze the multi-patterns characteristics of wind speed fluctuations. Thus, a multi-patterns decomposition technique is indispensable to construct a suitable wind speed prediction model [38]. As a relatively novel multi-patterns decomposition technique, wavelet decomposition algorithm (WDA) can decompose a complicated multi-patterns signal into an approximate part associated with low frequency and a detailed part associated with high frequency, which can show the local and global dynamic properties of a signal at specific timescales [39, 40]. It has been widely applied in the prediction issues [41]. Thus, this study tends to construct a suitable forecasting model for wind speed using WDA.

Based on the above consideration, this study proposes a new method named SIE–WDA–GA–SVR for forecasting the short-term wind speed, which applies the SIE and WDA into hybrid model that integrates the GA into the SVR. First, the proposed approach uses SIE to decompose the original wind speed into seasonal and trend components, and the seasonal indices are calculated by SIE. Second, the proposed approach uses WDA to decompose the trend component into both the approximate and the detailed scales. Third, the proposed approach uses GA–SVR to forecast the approximated and detailed scales, respectively. Then, the prediction values of the trend component can be obtained by integrating the prediction values of the approximated scale into the prediction values of the detailed scale. By integrating the seasonal indices into the prediction values of trend component, we can obtain the final forecasting results of the original wind speed. Moreover, the partial autocorrelation function (PACF) is used to determine the number of input dimension for the SVR, and the genetic algorithm (GA) is used to select the parameters of the SVR. Four real wind speed datasets are used as test samples to verify the proposed approach. Experimental results indicate that the proposed SIE–WDA–GA–SVR model outperforms other benchmark models in four statistical error measures, and can improve the prediction ability of the short-term wind speed.

In addition, the main contribution of this paper can be summarized as follows:

  1. (1)

    Considering the seasonal variation and multi-patterns characteristics of wind speed fluctuations, a comprehensive signal preprocessing technique is proposed to extract the useful information of wind speed fluctuations.

  2. (2)

    Instead of determining the input and output relationship of wind speed series by experience way, this study employs partial autocorrelation function (PACF) to find the lag length of wind speed series and determine the number of input dimension for the SVR.

  3. (3)

    A novel approach named SIE–WDA–GA–SVR is proposed to predict the short-term wind speed. The proposed strategy can provide higher prediction accuracy compared with the traditional methods.

The rest of this paper is organized as follows. The formulation process of the novel SIE–WDA–GA–SVR model can be described in Sect. 2. Section 3 presents the different error criteria and numerical results obtained from four real datasets. Finally, the conclusions and future researches are summarized in Sect. 4.

2 Proposed Approach

This section presents a novel approach named SIE–WDA–GA–SVR for wind speed forecasting, which feeds SIE and WDA into hybrid model that combines GA and SVR. The proposed approach is briefly described as follows, and the flowchart is shown in Fig. 1.

  1. Step 1:

    The original wind speed are decomposed into seasonal and trend components, and calculate the seasonal indices by SIE.

  2. Step 2:

    The trend component can be decomposed into both approximate part associated with low frequency and detailed part associated with high frequency by WDA.

  3. Step 3:

    By employ the PACF, we can find the input and output relation of the wind speed series, and determine the number of the input dimension of the SVR model for both approximate part and detailed part.

  4. Step 4:

    Train the SVR model based on the optimal parameters obtained from GA.

  5. Step 5:

    Forecast both low frequency and high frequency using the constructed SVR models.

  6. Step 6:

    By the sum of the prediction values of both approximate part and detailed part, we can get the forecasting values of the trend component.

  7. Step 7:

    By aggregating the prediction values of seasonal component to the prediction values of trend component, we can obtain the final forecasting result of the wind speed.

Fig. 1
figure 1

SIE–WDA–ESVR flowchart

2.1 Seasonal Information Extraction (SIE)

As a climate-driven renewable resource, the seasonal variations and trend variations of wind speed are two most commonly encountered phenomena. The SIE technique can decompose the original wind speed datasets into seasonal and trend components, and calculate the seasonal indices. Generally, addition and multiplication operations are used to generate composite SIE models with seasonal and trend components. According to Zhang and Qi [37], the multiplicative decomposition of the SIE technique is widely used to extract the seasonal information in real datasets. Therefore, the multiplicative composite model is adopted to extract the seasonal information of original wind speed in this study. The concrete process of the algorithm can be described as follows [42]:

Assuming that \(T=m \times l\), and m and l denote the number of cycles and the number of data items in each cycle, respectively. Let \(x_t \) denote the wind speed at time \(t\,(t=1, 2, \ldots , T)\), \(S_{\cdot j} \) and \(Tr_{ij} \) represent the seasonal and trend components, respectively. Then, \(x_{ij}\) denotes the j-th datum of the i-th cycle \((i=1,2,\ldots ,m\, ; j=1,2,\ldots ,l)\), and

$$\begin{aligned} x_{ij} =Tr_{ij} \times S_{\cdot j} , \end{aligned}$$
(1)

Then, the seasonal index \(S_{\cdot j} \) can be obtained by

$$\begin{aligned} S_{\cdot j} =x_{ij} /Tr_{ij} . \end{aligned}$$
(2)

Because the trend component \(Tr_{ij} \) is unknown, it is need to be approximate by the average of \(x_{ij} \) in each cycle.

In fact, the average of the \(\hbox {i-}th\) cycle can be derived as follows:

$$\begin{aligned} {\bar{x}}_{i} =(x_{i1} +x_{i2} +\cdots x_{il} )/l \quad (i=1,2,\ldots ,m) . \end{aligned}$$
(3)

If \(S_{ij} \) denotes the normalization data for items \(x_{ij} \), then

$$\begin{aligned} S_{ij} =\frac{x_{ij} }{\bar{x}_{i} }\quad (i=1,2,\ldots , m;j=1,2,\ldots ,l). \end{aligned}$$
(4)

Then, \(S_{\cdot j} \) can be defined as follows:

$$\begin{aligned} S_{\cdot j} =\frac{S_{1j} +S_{2j} +\cdots S_{mj} }{m}\quad (j=1,2,\ldots ,l). \end{aligned}$$
(5)

This definition of \(S_{\cdot j} \) conforms to the normalization process and is demonstrated as follows:

$$\begin{aligned} \sum _{j=1}^l {S_{\cdot j} } =\frac{1}{m}\sum _{i=1}^m {\sum _{j=1}^l {S_{ij} } } =\frac{1}{m}\sum _{i=1}^m {\left( \sum _{j=1}^l {x_{ij} / {\bar{x}}_i } \right) } =\frac{1}{m}\sum _{i=1}^m {l=l} . \end{aligned}$$
(6)

Then, the trend component can be obtained as follows:

$$\begin{aligned} Tr_{ij} =\frac{x_{ij} }{S_{\cdot j} }\quad (i=1,2,\ldots ,m;j=1,2,\ldots ,l). \end{aligned}$$
(7)

Considering the cycle influence of wind speed data, in this paper, \(l=24\) is as a cycle and \(m=[T/l]\).

2.2 Wavelet Decomposition Algorithm (WDA)

As a relatively novel signal processing technique, wavelet decomposition algorithm (WDA) can decompose a complex signal into both the low and high frequencies. The low and high frequencies which are associated with the approximate part and detailed part, respectively, can show the local and global dynamic properties of the signal at specific timescales [39, 40]. So the WDA has been widely used for signal decomposition and complex data processing [43,44,45,46,47,48]. In this section, the WDA is used for wind speed decomposition, and the brief introduction of the WDA is described as follows.

In general, the WDA can be classed into two categories: continuous wavelet decomposition (CWD) and discrete wavelet decomposition (DWD). Let \(\psi (t)\) denote a mother wavelet function, the definition of the CWD can be described as follows [49]:

$$\begin{aligned} CWT_x^\psi (b,a)=\phi _x^\psi (b,a)=\frac{1}{\sqrt{\left| a \right| }}\int {x(t)\cdot \psi ^{{\bullet }}}\left( \frac{t-b}{a}\right) dt \end{aligned}$$
(8)

where \(\psi ^{{\bullet }}(t)\) denotes the complex conjugate of \(\psi (t)\), and a and b denote the scale and translational parameters, respectively. When \(a=1/{2^{s}}\) and \(b=k/{2^{s}}\), a discrete version of Eq. (8) can be described as follows:

$$\begin{aligned} DWT_x^\psi (k,s)=\phi _x^\psi \left( \frac{k}{2^{s}},\frac{1}{2^{s}}\right) =\int _\infty ^\infty {x(t)\cdot \psi ^{{\bullet }}} \left( \frac{t-k/{2^{s}}}{1/{2^{s}}}\right) dt, \end{aligned}$$
(9)

where s and k meet the following constrained form

$$\begin{aligned} \left\{ {{\begin{array}{lll} {T=2^{s}+k} \\ {0\le s\le \log _2 T,\quad s\in Z} \\ {k\in Z} \\ \end{array} }} \right. \end{aligned}$$

In general, the mother wavelet function needs to be set at the beginning of WDA. Due to the high computing efficiency of the daubechies wavelet filters of order 3 (db3), it is widely applied in data analysis issues. Considering the advantages of db3, this study selects it as a mother wavelet function to generate a set of wavelet basis functions \(\{\psi (t)\}_{s,k}\) by the scale and translational transformations of the mother wavelet. In general, the number of these basis functions depends on the length of the signal. In this study, \(\log _2 T\) wavelet basis functions can be used to decompose the complex wind speed signal into both the low and high frequencies which can show the local and global dynamic properties of the wind speed at specific timescales. The low frequency is associated with the approximate part and can reveal the trend of wind speed, and the high frequency is associated with the detailed part and tends to be related to exogenous variables effect. For more detail information about WDA, please refer to [43,44,45,46,47,48,49].

2.3 Support Vector Regression (SVR)

As a novel machine learning technique, SVR has been superior in minimizing the expected error of a learning machine and reducing the problem of over-fitting [50]. This algorithm has been widely applied in prediction issues [21, 22, 24, 34, 36]. The notion of an SVR model can be briefly described as follows.

Given a data set \(\{X_i ,y_i \}_{i=1}^{sn} \), where \(X_i \in R^{n}\) is the input vector, \(y_i \in R\) is the actual output value, and sn is the sample number. The basic idea of the SVR is to map the input vector space into a higher dimensional feature space via a nonlinearly mapping \(\varphi (X_i )\), and find a linear function of the higher dimensional feature space to show the nonlinear relationship between input data and output data. The linear function \(f(X_i )\) named SVR function can be described as

$$\begin{aligned} f(X_i )=\omega ^{T}\varphi (X_i )+b, \end{aligned}$$
(10)

where \(\omega \) and b are the coefficients.

The values of coefficients \(\omega \) and b can be estimated by minimizing the following penalty function \(\hbox {R}(\hbox {C}, \varepsilon )\)

$$\begin{aligned} \hbox {R}(\hbox {C},\varepsilon )=\frac{1}{2}\left\| \omega \right\| ^{2}+C\cdot \frac{1}{sn}\sum _{i=1}^{sn} {\left| {y_i -f(X_i )} \right| _\varepsilon }, \end{aligned}$$
(11)

where \(\hbox {C}\) is the penalty parameter, \(\varepsilon \) is the non-sensitivity coefficient and denotes the radius of the tube located around the regression function \(f(\hbox {X}_i )\), and

$$\begin{aligned} \left| {y_i -f(X_i )} \right| _\varepsilon =\left\{ \begin{array}{ll} 0,&{}\quad \left| {y_i -f(X_i )} \right| \le \varepsilon \\ \left| {y_i -f(X_i )} \right| -\varepsilon ,&{}\quad \text {otherwise} \\ \end{array} \right. \end{aligned}$$

By introducing two slack variables \(\xi _i\) and \(\xi _i^*\), the infeasible constraints of the optimization problem Eq. (11) can be transformed into the following constrained form

$$\begin{aligned}&\min (\omega ,b,\xi ,\xi ^{*})=\frac{1}{2}\left\| \omega \right\| ^{2}+C\sum _{i=1}^{sn} {(\xi _i +\xi _i^*)}\nonumber \\&\text {subject to }\left\{ \begin{array}{ll} y_i -\omega ^{T}\varphi (X_i )-b&{}\quad \le \varepsilon +\xi _i^*\\ -\,y_i +\omega ^{T}\varphi (X_i )+b&{}\quad \le \varepsilon +\xi _i \\ \xi _i ,\xi _i^*&{}\quad \ge 0 \end{array} \right. \end{aligned}$$
(12)

Let \(\alpha _i\) and \(\alpha _i^*\) denote the Lagrange multipliers. By using the Lagrange equation, the maximal dual function can be described as

$$\begin{aligned}&\max (\alpha _i ,\alpha _i^*)=\sum _{i=1}^{sn} {y_i (\alpha _i -\alpha _i^*)-} \varepsilon \sum _{i=1}^{sn} {(\alpha _i +\alpha _i^*)} -\frac{1}{2}\sum _{i,j=1}^{sn} {(\alpha _i -\alpha _i^*)(\alpha _j -\alpha _j^*)k(X_i ,X_j )}\nonumber \\&\quad \hbox { subject to }\sum _{i=1}^{sn} {(\alpha _i -\alpha _i^*)} =0 \hbox { and } \alpha _i ,\alpha _i^*\in [0,C]. \end{aligned}$$
(13)

By exploiting the optimality constraints, the SVR function can be obtained as follows,

$$\begin{aligned} f(x_j )=\sum _{i=1}^{sn} {(\alpha _i -\alpha _i^*)k(X_i ,X_j )+b}, \end{aligned}$$
(14)

where \(k(X_i ,X_j )\) is the kernel function. In general, the Gaussian radial basis function (RBF) is the most frequently adopted for SVR modeling. In this study, the RBF function is also chosen as the kernel function of SVR model, which is defined as follows:

$$\begin{aligned} k(X_i ,X_j )=\exp \left( -\left\| {X_j -X_i } \right\| ^{2}/2\sigma ^{2}\right) . \end{aligned}$$
(15)

where \(\sigma \) denotes the width of the RBF. Thus, in the modeling process of SVR, three parameters need to be chosen including the penalty parameter C, the width of the \(\hbox {RBF}\,\sigma \), and the non-sensitivity coefficient \(\varepsilon \). In this study, the GA is used to determine the appropriate parameter values of the SVR model.

2.4 The Forecasting Model of GA–SVR

In the modeling process of SVR, the choice of three parameters will influence the performance of the model including the penalty parameter C, the width of the \(\hbox {RBF}\,\sigma \), and the non-sensitivity coefficient \(\varepsilon \). It is a challenge task to select the appropriate parameters of the SVR model [21, 34]. Genetic algorithm (GA) based on the theories of natural selection mechanisms and Darwin’s main principle, has been successfully applied in optimization problems [35, 36]. In this study, GA is used to optimize three parameters of the SVR and the partial autocorrelation function (PACF) is used to determine the number of the input dimension for the SVR. The structure of the proposed GA–SVR model for wind speed prediction is shown in Fig. 2 and the operational process of the model can be described as follows.

Fig. 2
figure 2

The structure of the proposed GA–SVR model

  1. Step 1:

    Divide the wind speed data into the training samples and test samples. The training samples are used to train the forecasting model, and the test samples are used to evaluate the performance and effectiveness of the forecasting model.

  2. Step 2:

    Determine the number of the input dimension for the prediction model. Generally, the number of the input dimension can influence the performance of AI techniques directly. As a special kind of AI techniques, the forecasting precision of SVR is also affected by the number of the input dimension. It is a challenging task to select the appropriate number of the input dimension for the SVR model. The partial autocorrelation function (PACF) is used to determine the number of the input dimension for the SVR model.

  3. Step 3:

    Randomly generate an initial population of the chromosomes. In the modeling process of the GA–SVR model, three parameters need to be chosen including the penalty parameter C, the width of the \(\hbox {RBF}\,\sigma \), and the non-sensitivity coefficient \(\varepsilon \). In this study, the size of the initial population is set to 20 chromosomes and each chromosome is consisted of three segments corresponding three parameters of the SVR model.

  4. Step 4:

    Calculate the fitness function. The fitness function is used to evaluate the optimal chromosome. In this study, the fitness function is defined as \(2\big /{\sum _{\mathrm{i=1}}^{ts} {(y_i -\hat{{y}}_i )^{2}} }\), where ts denotes the number of training samples, \(y_{\mathrm{i}} \) and \(\hat{{y}}_{\mathrm{i}} \) represent the actual value and validation value at time i, respectively.

  5. Step 5:

    Generate a new population of the chromosomes by selection operation, crossover operation and mutation operation. In this study, the roulette wheel is used for selecting the excellent chromosomes to reproduce. The probability of crossover operation and the rate of mutation operation are set to 0.8 and 0.01, respectively.

  6. Step 6:

    If stopping criteria have not been met, return to Step 4.

2.5 Partial Autocorrelation Function (PACF)

In time series analysis issues, the correlation between a variable series and its different lags can be measured by partial autocorrelation function (PACF) method. Inspired by it, this study also uses the PACF method to determine the number of the input dimension for SVR model instead of the experience way. The brief introduction of the PACF is described as follows [51, 52].

If \(x_t (\hbox {t}=1,2,\ldots ,\hbox {T})\) is the wind speed at time t and \(\gamma _k\) denotes the covariance at lag k, then we can get the estimation value \(\hat{{\gamma }}_k \) of \(\gamma _k \) as follows:

$$\begin{aligned} \hat{{\gamma }}_k =\frac{1}{T}\sum _{t=1}^{T-k} {(x_t -\bar{{x}})} (x_{t+k} -\bar{{x}}),\quad \hbox {k}=0,1,\ldots ,L \end{aligned}$$
(16)

where \(\bar{{x}}\) is the average value of time series \(x_t\), T is the data size, and L is the maximum lag. The choice of L depends on the length of the data. In general, \(L=T/4\).

If \(\rho _k\) denotes the autocorrelation function (ACF) at lag k, then we can get the estimation value \(\hat{{\rho }}_k\) of \(\rho _k\) as follows:

$$\begin{aligned} \hat{{\rho }}_k =\frac{\hat{{\gamma }}_k }{\hat{{\gamma }}_0 } \end{aligned}$$
(17)

If \(\beta _{k,k} \) denotes the PACF at lag k, then the estimation value \(\hat{{\beta }}_{k,k} \) of the \(\beta _{k,k}\) can be derived as follows:

$$\begin{aligned}&\hat{\beta }_{1,1} =\hat{{\rho }}_1 \nonumber \\&\hat{\beta }_{k+1,h} =\hat{{\beta }}_{k,h} -\hat{{\beta }}_{k+1,k+1} \ldots \hat{{\beta }}_{k+1,k-h+1}\quad (\hbox {h}=1,2,\ldots ,\hbox {k}) \nonumber \\&\hat{\beta }_{k+1,k+1} =\frac{\hat{{\rho }}_{k+1} -\sum _{h=1}^k {\hat{\rho }_{k+1-h} \hat{{\beta }}_{k,h}}}{1-\sum _{h=1}^k {\hat{\rho }_h \hat{{\beta }}_{k,h} } } \end{aligned}$$
(18)

where \(k=1,2,\ldots ,L\).

To assess the significance of autocorrelation between lags, the confidence intervals have been widely adopted. In this study, the 95% confidence interval is employed to determine the optimal lags of wind speed for all models. The definition can be described as follows:

$$\begin{aligned} r^{+}_{0.95}= & {} +\,\frac{2}{\sqrt{T}} \nonumber \\ r^{-}_{0.95}= & {} -\,\frac{2}{\sqrt{T}} \end{aligned}$$
(19)

where T is the data size, \(r^{+}_{0.95} \) and \(r^{-}_{0.95}\) denote the upper and lower critical values, respectively. If \(\hat{{\beta }}_{k,k} \in (r^{-}_{0.95} , r^{+}_{0.95} )\), then \(x_{t-k} \) is one of input variable. Otherwise, it is not.

3 Experimental Design and Comparison Results

3.1 Evaluation Criteria

In order to evaluate the performance of all involved prediction models, four error measures are adopted including the mean absolute error (MAE), root mean-square error (RMSE), mean absolute percentage error (MAPE) and standard deviation (SD). These error measures are defined as follows:

$$\begin{aligned} MAE= & {} \frac{1}{fs}\sum _{i=1}^{fs} {\left| {e_i } \right| } , \end{aligned}$$
(20)
$$\begin{aligned} RMSE= & {} \sqrt{\frac{1}{fs}\sum _{i=1}^{fs} {e_i ^{2}} },\end{aligned}$$
(21)
$$\begin{aligned} MAPE= & {} \frac{1}{fs}\sum _{i=1}^{fs} {\left| {\frac{e_i }{y_i }} \right| } , \end{aligned}$$
(22)
$$\begin{aligned} SD= & {} \sqrt{\frac{1}{fs}\sum _{i=1}^{fs} {(e_t -\mu )^{2}} } \end{aligned}$$
(23)

where \(e_i =y_i -\hat{{y}}_i \), \(\mu =\frac{1}{fs}\sum _{i=1}^{fs} {e_t } \), and fs denotes the number of forecasting samples. \(y_i \) and \(\hat{{y}}_i \) represent the actual value and forecasting value of wind speed at time i, respectively.

3.2 Datasets

The mean hourly wind speed datasets of wind farm in the province of Gansu, China, are collected to evaluate the proposed model. In order to further verify the generalization ability of the proposed model, four wind speed Cases in May 2010, August 2010, October 2010, and January 2011 are randomly selected as the four seasons in a year. Each dataset has 744 wind speed records. In the modeling process, the top 80% of each dataset (about 600 wind speed records) is called as the training dataset which is used to train the proposed model, and the remaining 20% of each dataset (about 144 wind speed records) is called as validation dataset which is used to evaluate the performance and effectiveness of the proposed model. Figure 3 shows four real mean hourly wind speed datasets. Table 1 shows the statistical measures results of four Cases.

Fig. 3
figure 3

Four wind speed Cases: a Spring, b Summer, c Fall, and d Winter

Table 1 Statistical measures of four cases
Fig. 4
figure 4

SIE results in Spring and Summer

Fig. 5
figure 5

SIE results in Fall and Winter

3.3 SIE of Wind Speed Datasets

As a climate-driven renewable resource, the seasonal variations and trend variations of wind speed are two most commonly encountered phenomena. In this study, The SIE is used to decompose the seasonal time series into seasonal and trend components, and extracts the seasonal information of wind speed fluctuations. Figures 4 and 5 show the SIE process of four wind speed datasets. From Figs. 4 and 5, we can see that the seasonal and trend variations of each Case can be obtained by SIE.

3.4 WDA of Trend Components

Due to the intrinsic complexity and multi-patterns of wind speed fluctuations, this study adopts the WDA to decompose the complex trend components of original wind speed signals into both the low and high frequencies. The low and high frequencies which are associated with the approximate part and detailed part, respectively, show the local and global dynamic properties of the wind speed at specific timescales. Figures 6 and 7 show the decomposition process of four Cases by WDA. From Figs. 6 and 7, we observe that the complex trend component of each wind speed signal has been decomposed into both low frequency and high frequency, which are used to establish the corresponding SVR model.

Fig. 6
figure 6

WDA results in Spring and Summer

Fig. 7
figure 7

WDA results in Fall and Winter

Fig. 8
figure 8

Plots of PACF against the lag length in four seasons

3.5 The Modeling Process of GA–SVR

3.5.1 Input Structure Determination

The number of the input dimension for the SVR has an important influence for forecasting performance. If we ignore the relationship between the wind speed series, this will lead to a bad prediction performance and a slow convergence speed. In order to enhance the prediction ability of the SVR model, we adopt the PACF method to extract the relation of wind speed series, and determine the number of the input dimension for the SVR model. The plots of PACF against the lag length in four Cases can be shown in Fig. 8. The number of the input dimension of each SVR is obtained by the plots of PACF against the lag length, and they can be shown in Table 2. The sample pairs of data can be determined according to the prediction horizon and the number of the input dimension for the model. In this study, we mainly focus on the one-step ahead wind speed prediction, and the prediction horizon is set as one. As an example, the sample pairs of the low frequency in Spring can be shown in Fig. 9. As is shown in Fig. 9, 738 sample pairs can be obtained for the low frequency in Spring. The sample pairs for other wind speed data can be also got in a similar way.

Table 2 The number of the input dimension for the SVR model by the PACF
Fig. 9
figure 9

The sample pairs of the low frequency in Spring

3.5.2 Model Parameters Determination

Three parameters of the SVR models need to be chosen including the penalty parameter C, the width of the \(\hbox {RBF}\,\sigma \), and the non-sensitivity coefficient \(\varepsilon \). In this study, the GA is used to determine the appropriate parameter values of the SVR model. In the modeling process, the sample pairs of data first are determined according to the prediction horizon and the number of the input dimension for model. Then, these sample pairs are randomly partitioned into training set (80%) and validation set (20%). The training set is used to train the GA–SVR, and the validation set is used to test the prediction performance of this model. Note that the training set and validation set are randomly re-partitioned during each simulation, and they are different from each other in multiple simulations. Each simulation can establish a prediction model and obtain a set of model parameters. In this study, the mean of these parameters for the repeated 30 times simulations is used to establish the appropriate model for obtaining the stable result of wind speed prediction. Table 3 shows the final parameters of GA–SVR models for each wind speed series.

Table 3 The final parameters of GA–SVR models for each wind speed series
Fig. 10
figure 10

The final forecasting results of the proposed SIE–WDA–ESVR model

3.6 Final Prediction Results of Wind Speed

According to the corresponding SVR model by building in this previous section, both low frequency and high frequency can be predicted. Then, the prediction values of trend component can be obtained by sum the prediction values of both low frequency and high frequency. By aggregating the prediction values of seasonal component to the prediction values of trend component, we can obtain the final forecasting results of the wind speed. The final forecasting results of the proposed SIE–WDA–GA–SVR model can be shown in Fig. 10. From Fig. 10, we can see that the prediction values of the proposed approach can approximately describe the characteristics of four wind speed datasets.

3.7 Model Comparisons

In wind speed forecasting issues, the back-propagation neural network (BPNN) which is a benchmark predictor, is often selected as a reference to assess the other forecasting methods. Generally, a novel model is first compared with the BPNN to assess the forecasting ability of it. In the modeling process of the BPNN model or the BPNN part of the hybrid model, the hidden nodes number is determined by Kolmogorov theorem, and the logsig and purelin functions are selected as the activation functions of both hidden layer and output layer, respectively. The learning velocity is set as the default values 0.01.

Table 4 The comparisons results of these different models

In this study, four models of BPNN, SVR, SIE–BP (hybrid SIE and BPNN) and SIE–SVR are selected as the benchmarks to evaluate the forecasting ability of the proposed SIE–WDA–GA–SVR model. The number of the inputs dimension for all models is determined by PACF. The comparisons results of these models are shown in Table 4. From Table 4, we can clearly observe that four statistical errors of the proposed model are the minimum compared with other benchmark models. These results can be explained below. First, the proposed model has the better prediction performance compared with the BPNN and SVR. This result indicates that the proposed approach can fully capture the seasonal information and different patterns of wind speed fluctuations. In addition, the proposed model can considers all linear and non-linear structures of wind speed fluctuations, and has the higher prediction precision compared SIE–BP and SIE–SVR. Thus, it is concluded that the SIE–WDA–GA–SVR model can enhance the forecasting ability of wind speed and is an effective approach.

4 Conclusions

To guarantee the security of wind energy utilization and lower the cost of wind power generation, it is very essential to enhance the prediction ability of wind speed fluctuations. In consideration of the intrinsic complexity and multi-patterns features of the wind speed, a novel SIE–WDA–GA–SVR model is proposed for forecasting the short-term wind speed, which feeds SIE and WDA into hybrid model that combines GA and SVR. The performance of the SIE–WDA–GA–SVR model is comprehensively evaluated using four real prediction cases of wind speed, and compared with a number of benchmark algorithms and baselines. Experimental results indicate that the proposed approach outperforms other benchmark models in four statistical error measures, and is effective to improve the forecasting accuracy of wind speed.