Introduction

In recent years, with the rapid development of national economy and society, the demand for electric energy is gradually expanding, and the traditional fossil energy has caused serious pollution, such as global warming, heat pollution, air pollution, and ecological damage. Countries all over the world have reached a consensus on green development and put forward green development plans, hoping to accelerate the transformation of energy structure and establish a cleaner and more flexible clean energy system. China, the USA, Japan, and other countries have put forward the goal of carbon neutrality in the middle of this century. The transition to a low carbon society is dependent on renewable energy-based electrification (Nazir et al. 2020a). Wind energy is an important substitute for traditional fossil energy (Nazir et al. 2020b). According to the latest data, in 2019, the global installed 176 GW of new renewable energy (just under 2018 new 179 GW), accounting for 72% of the world’s new generating capacity, renewable energy share in the global electricity generation rose to 34.7% from 33.3% at the end of 2018 (International Renewable Energy Agency’s 2020).

Wind energy is inexhaustible, with low development cost and clean (Chaurasiya et al. 2019). Wind power in China has been given full attention by the government and developed rapidly. According to the latest data of the national energy administration, in the first three quarters of 2020, the new installed capacity of wind power in China will be 13.92 million kilowatts, including 12.34 million kilowatts on land and 1.58 million kilowatts on the sea. By the end of the third quarter of 2020, the total installed capacity of wind power in China will be 223 million kilowatts, including 216 million kilowatts of onshore wind power and 7.5 million kilowatts of offshore wind power. At present, China has become the largest wind power market in the world (National Energy Administration 2020).

With the rapid development of the wind power industry, the height of wind turbines has gradually increased. Although the height limitation of mast technology has been solved, its installation cost and maintenance have become expensive and cumbersome. Thus, SODAR and LIDAR technologies are used to replace wind turbines in complex regions to measure wind speed, which improves the credibility of wind speed data and reduces power generation costs. At the same time, the nonlinear relationship between wind power and wind speed determines the accuracy of wind speed modeling and has a direct impact on wind power generation (Prem Kumar Chaurasiya et al. 2017; Chaurasiya et al. 2018a). However, wind speed has high random fluctuation and intermittency, which leads to high random fluctuation of wind power. Large-scale wind power grid connection will bring challenges to power system safety and power quality. And accurate wind speed prediction can help control wind power output, power plant wind power safety scheduling, and other issues; therefore, wind speed prediction has very important theoretical significance and practical value.

Currently, wind speed prediction methods can be generally divided into physical model method (Higashiyama et al. 2018; Harty et al. 2019), statistical model method (Erdem and Shi 2011; Nourani Esfetang and Kazemzadeh 2018; Chaurasiya et al. 2018b), and artificial intelligence (Sharifian et al. 2018; Chitsaz et al. 2015). The physical model method makes wind speed prediction by calculating the terrain, air pressure, climate, and other information around the wind farm, and the data obtained is generally used as input for other statistical models. The statistical model method uses the historical wind speed data to establish the statistical model and obtain the wind speed time series; artificial intelligence method uses ANN, SVM, and other methods to predict wind speed according to the characteristics of historical wind speed data. Nowadays, BP (Zhao et al. 2014; Wang et al. 2019; Li et al. 2019a; Glowacz and Glowacz 2018), LSTM (Gu et al. 2020; Shahid et al. 2020), RBF (Tang et al. 2020; Cao et al. 2019; Zhang et al. 2015), Elman (Yu et al. 2017; Li et al. 2019b), and so on are widely used and have excellent effect. Zendehboudi et al. (2018) used SVM as the basic model. Wang et al. (2016) used BP neural network as the basic model. Wu et al. (2015) used RBF neural network as the basic model for wind speed prediction. The prediction results were generally better than statistical model method (Yuan et al. 2015; Zafirakis et al. 2019). To improve the algorithm of single neural network is a widely used method to improve the accuracy of prediction (Zhang et al. 2019a; Tian et al 2019). Therefore, in this paper, IPSO is used to improve the LSTM neural network prediction model. All kinds of wind speed prediction methods have their own advantages and disadvantages. Compared with the single prediction model, the combined forecasting model based on artificial intelligence algorithm will be more used in the research of wind speed prediction. Peng et al. (2020) proposes a negative correlation learning-based regularized extreme learning machine ensemble model (NCL-RELM) integrated with optimal variational mode decomposition (OVMD) and sample entropy (SampEn) for multi-step ahead wind speed forecasting. Zhang et al. (2020a) combined artificial intelligence methods with statistical knowledge and proposed an optimized radial basis function model and interval prediction model of the Fourier distribution of wind speed based on FCBF. Zhang and Pan (2020) proposed an RBF-ARMA model based on wind speed characteristics by combining statistical model method and artificial intelligence method. An autoregressive integral moving average model based on repeated wavelet transform is used to improve the accuracy of ARIMA model for short-term wind speed prediction (Bri-Mathias Hodge et al. 2011). A combined forecasting model is established, which is composed of three main modeling steps: quadratic decomposition, integration method, and error correction (Liu et al. 2018a). A new wind speed multi-step prediction framework based on WPD, CEEMDAN, and ANN is proposed (Liu et al. 2018b). It is proved that the performance of the combined model is better than that of the corresponding neural network model.

Wind speed is random and non-stationary. From the perspective of wind speed characteristics, the commonly used signal processing methods include principal component analysis (Skittides and Früh 2014), wavelet transform (Dang et al. 2013; Zhang et al. 2020b), EMD (Wang et al. 2016; Dragomiretskiy and Zosso 2014), and its improved EEMD, WPD, and VMD (Zhang et al. 2019b). In the process of obtaining the decomposed components, VMD determines the frequency center and bandwidth of each component by iteratively searching for the optimal solution of the variational model, so as to realize the frequency domain segmentation of the signal and the effective separation of each component adaptively. Because VMD overcomes the endpoint effect and modal aliasing of EMD and has good decomposition effect, it has been applied to wind speed sequence decomposition. Zhang et al. (2020c) used VMD to decompose the original wind speed and compared with EMD method to verify that VMD has a better decomposition effect. In view of the inconvenient choice of mode number K in VMD, Chen (2020) selects the appropriate decomposition times K by judging the value of sample entropy.  From the perspective of energy, Chen (2020) proposed an energy difference component selection method to calculate the number of comparative mode decomposition. Combined with the singular value decomposition (SVD) difference spectrum processing error, a hybrid HMD data processing method is constructed. This method improves the quality of decomposition and overcomes the defect of random mode number selection. Experiments show that HMD method is better than VMD method.

Therefore, based on the improved VMD method and IPSO-LSTM neural network, a new wind speed prediction scheme is designed. The specific process is as follows:

Firstly, data preprocessing is used to extract data features. The SVD differential spectrum is used to denoise, MIV and PCA are used for multi factor analysis, and the energy difference mode number selection method is used to optimize the VMD mode number, and VMD decomposition is carried out. After the decomposition, WT is used to decompose the modal components again.

Then the decomposed time series are classified, and the nonlinear time series are trained and predicted by LSTM neural network, and the linear time series are trained and predicted by ARMA model. In this paper, fitting interpolation is used to predict the noise sequence.

Finally, each predicted value is reconstructed to obtain the final prediction result.

The rest of this paper is arranged as follows: Section 2 introduces the improved VMD method used in this paper; Section 3 introduces IPSO-LSTM model and ARMA model; Section 4 introduces the forecast flow of the wind speed forecast scheme described in this paper; Section 5 uses this prediction scheme to predict the real wind farm data, compares the results of China’s Changma wind farm and Spain’s Sotavento wind farm, and makes comparisons with other schemes; Section 6 draws the conclusions.

The improved VMD method

Singular value difference spectrum

Let the time series V(t) = [v(1), ⋯, v(t)] with noise constitute Hankel matrix of m × n order, and make the matrix as square as possible, so as to achieve better noise reduction effect.

$$ H=\left(\begin{array}{cc}\begin{array}{cc}v(1)& v(2)\\ {}v(2)& v(3)\end{array}& \begin{array}{cc}\cdots & v(n)\\ {}\cdots & v\left(n+1\right)\end{array}\\ {}\begin{array}{cc}\vdots & \vdots \\ {}v(m)& v\left(m+1\right)\end{array}& \begin{array}{cc}\ddots & \vdots \\ {}\cdots & x(t)\end{array}\end{array}\right) $$
(1)

By singular value decomposition of V, we can get the following result:

$$ H={U}_{m\times n}{\Sigma}_{n\times n}{V}_{n\times n}^T $$
(2)

where Um × n, Vn × n is an orthogonal matrix, Σ is a nonnegative diagonal matrix, and the non-zero value on the diagonal of Σ is the singular value of H.

There are k singular values, which are sorted as follows

$$ \sigma =\left({\sigma}_1,\cdots, {\sigma}_k\right) $$
(3)

where σi is the i-th singular value after sorting (i is from 1 to k).

The singular value mutation of noisy sequences can be well expressed by using differential spectrum:

$$ {d}_i={\sigma}_i-{\sigma}_{i-1},i=1,2,\cdots, r-1 $$
(4)

The order P of reconstruction can be determined by the point with the largest singular value mutation, and the denoising sequence and noise sequence can be obtained finally.

$$ V(t)={x}_1(t)+{r}_1(t) $$
(5)

where x1(t) is the denoising sequence and r1(t) is the noise sequence.

Variational mode decomposition

The VMD method was proposed by Dragomiretskiy and others. It was originally used to analyze the early failure of rolling bearings. Because this method overcomes a series of shortcomings of EMD mode aliasing and endpoint effect and has good decomposition effect and good noise robustness, it is introduced into wind speed prediction to decompose wind speed time series. According to the center frequency and bandwidth of each decomposition component, the frequency domain decomposition of the signal and the effective separation of each component are realized (Glowacz 2018; Glowacz 2019).

The VMD first constructs the variational problem. Let the original time series be decomposed into k components to ensure that the decomposed sequence is a modal component with a finite bandwidth and a central frequency. At the same time, the sum of estimated bandwidth of each mode is the minimum. The constraint condition is that the sum of all modes is equal to the original signal:

$$ \left\{\begin{array}{c}\underset{\left\{{u}_k\right\}\left\{{w}_k\right\}}{\mathit{\min}}\left\{{\sum}_k{\left\Vert {\partial}_t\left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-j{\omega}_kt}\right\Vert}_2^2\right\}\\ {}s.t.{\sum}_k{u}_k(t)=x(t)\end{array}\right. $$
(6)

where K is the number of modes to be decomposed, {uk} {wk} is the k − th mode component and center frequency after decomposition, t is the pair function φk(t) find the partial derivative of t, δ(t) is Dirichlet function,∗is convolution, and x(t) is the original input signal.

Then, by introducing the Lagrange multiplication operator λ, the constrained variational problem is transformed into an unconstrained variational problem.

Then, Eq. (6) is solved, and the Lagrange multiplication operator λis introduced to transform the constrained variational problem into a non-constrained variational problem. The expression of augmented Lagrange is:

$$ L\left(\left\{{u}_k\right\},\left\{{\omega}_k\right\},\lambda (t)\right)=\alpha {\sum}_k{\left\Vert {\partial}_t\left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-j{w}_kt}\right\Vert}_2^2+{\left\Vert x(t)-{\sum}_k{u}_k(t)\right\Vert}_2^2+\left[\lambda (t),x(t)-{\sum}_k{u}_k(t)\right] $$
(7)

In the formula, α is the quadratic penalty factor to reduce the interference of Gaussian noise. Alternate wind direction multiplier iterative algorithm combined with Parseval/Plancherel and Fourier equidistance transform is used to optimize to obtain each modal component and center frequency and to search the saddle points of the augmented Lagrange function.

The expressions of {uk}, {ωk}, λ(t) after alternating optimization iteration are shown in Formula (8), (9), and (10):

$$ {\hat{u}}_k^{n+1}\left(\omega \right)=\frac{\hat{x}\left(\omega \right)-{\sum}_{i<k}{\hat{u}}_i^n\left(\omega \right)+\frac{\hat{\lambda}\left(\omega \right)}{2}}{1+2\alpha {\left(\omega -{\omega}_k\right)}^2} $$
(8)
$$ {\omega}_k^{n+1}=\frac{\int_0^{\infty}\omega {\left|{\hat{u}}_k^{n+1}\left(\omega \right)\right|}^2 d\omega}{\int_0^{\infty }{\left|{\hat{u}}_k^{n+1}\left(\omega \right)\right|}^2 d\omega}\kern0.5em $$
(9)
$$ {\hat{\lambda}}^{n+1}\left(\omega \right)={\hat{\lambda}}^n\left(\omega \right)+\gamma \left(\hat{f}\left(\omega \right)-{\sum}_k{\hat{u}}_k^{n+1}\left(\omega \right)\right)\kern0.5em $$
(10)

If the convergence satisfies the condition (11), the iteration is finished:

$$ {\sum}_k{\left\Vert {\hat{u}}_k^{n+1}-{\hat{u}}_k^n\right\Vert}_2^2/{\left\Vert {\hat{u}}_k^n\right\Vert}_2^2<\varepsilon $$
(11)

where ε is the set precision.

Modal number selection method

In order to make up for the deficiency in subjective selection of influencing parameters, Bing (2020) proposed a modal number selection strategy to overcome the defects caused by random selection of modal numbers, and strict experiments proved that the use of HMD was superior to VMD method.

From the optimal theoretical result of VMD decomposition, the sum of the energies of each component is equal to the original signal. When k value is too large, the generation of the fictitious component will cause the energy sum of the components to be too high. Based on this principle, according to the calculation (Formula (12)) of wind speed signal energy and energy difference, when the value of θk, k − 1is small, the signal will be underdecomposed. With the increase of the value of θk, k − 1, VMD is overdecomposed obviously. Therefore, with the increase of parameters k, there will be decomposition phenomenon, and the corresponding valueθk, k − 1 will increase significantly. In this case, it can be regarded k − 1 as the optimal mode number of VMD decomposition.

$$ \left\{\begin{array}{l}{E}_k^l=\sqrt{\frac{\sum_{i=1}^n{x_l}^2(i)}{n}},l=1,2,\cdots k\\ {}{E}_k={\sum}_{l=1}^k{E}_k^l\kern7.5em \\ {}{\theta}_{k,k-1}=\frac{\left|{E}_k-{E}_{k-1}\right|}{E_{k-1}}\kern5.25em \end{array}\right. $$
(12)

where Ekis the energy sum of all components when the mode number is k, Ek − 1is the sum of component energies after the last decomposition, θk, k − 1represents the energy difference, xl(i) is the l − th component sequence under the current mode number, and n is the number of sampling points.

IPSO-LSTM model and ARMA model

Improved particle swarm optimization

The LSTM model is based on the deep neural network, which is very complex to set and optimize the hyperparameters. In this paper, the particle swarm optimization (PSO) algorithm is used to optimize the super parameters in the model. The PSO algorithm has the advantages of simple principle, small consumption of computing resources, good convergence, and high computational efficiency.

The speed and position update formula of PSO algorithm is as follows:

$$ \left\{\begin{array}{c}{V}_{id}^{t+1}=\omega {V}_{id}^t+{c}_1{r}_1\left({P}_{id}-{X}_{id}^t\right)+{c}_2{r}_2\left({Q}_{id}-{X}_{id}^t\right)\\ {}{X}_{id}^{t+1}={V}_{id}^{t+1}+{X}_{id}^t\kern14.5em \end{array}\right. $$
(13)

where ω is the inertia weight, c1, c2 is the learning factor, r1, r2 is an independent random number distributed between [0,1], \( {V}_{id}^t \) represents the velocity component of the ith particle in d dimension in the t iteration, \( {X}_{id}^t \) represents the position component of the ith particle in d dimension in the t iteration, Pid represents the individual optimal value of the ith particle in ddimension in the t iteration, and Qid represents the population global optimal value of the ith particle in d dimension in the t iteration.

But PSO algorithm has the phenomenon of local optimum, premature convergence, or stagnation, so it needs to be improved. This paper proposes that the learning factors change linearly with evolution, such as Formula (14). In this way, the particles can move in the whole search space in the early stage of optimization and move around the optimal solution in the late stage of optimization, so as to improve the convergence rate of the optimal solution. The inertia weight decreases nonlinearly with evolution, as shown in Formula (15). In this way, the search speed is gradually reduced, and the algorithm convergence is convenient.

$$ \left\{\begin{array}{c}{c}_1={a}_1+{a}_2l/{L}_{max}\\ {}{c}_2={a}_3-{a}_4l/{L}_{max}\end{array}\right. $$
(14)
$$ \omega (l)={\omega}_{max}-\left({\omega}_{max}-{\omega}_{min}\right)\ast \mathit{\arcsin}\frac{l}{L_{max}}\ast \frac{2}{\pi } $$
(15)

where l is the number of iterations, Lmax is the maximum number of iterations allowed, and a1, a2, a3, a4 are constants.

Long- and short-term memory neural network

The RNN has memory, parameter sharing, and Turing completeness, which has certain advantages in learning the nonlinear characteristics of sequences. With the development of deep learning technology, the concept of time series has also been applied in the structural design of RNN. Therefore, RNN has good time series analysis capability. As an improved RNN, LSTM inherits RNN’s ability to analyze time series data while enhancing long-term memory. LSTM neural network model has been applied in the field of wind speed prediction because it can keep memory for a long time effectively.

All RNNs have a chain structure of repetitive neural network modules, which is a simple structure in standard RNN. But there are three gates in the memory unit of LSTM neural network, which are input gate, forget gate, and output gate. The calculation process of LSTM update unit is as follows:

Input gate

The information passed first is input at t, and then the value of input layer of memory unit is updated to it:

$$ {i}_t={\alpha}_i\left({R}_i\cdotp \left[{h}_{t-1},{x}_t\right]+{b}_i\right) $$
(16)

The candidate value of hidden layer state is updated as \( {\overset{\sim }{C}}_t: \)

$$ {\overset{\sim }{C}}_t=\mathit{\tanh}\left({R}_C\cdotp \left[{h}_{t-1},{x}_t\right]+{b}_C\right) $$
(17)

Forgetting gate

The value of forgetting layer in memory unit is ft:

$$ {f}_t={\alpha}_f\left({R}_f\cdotp \left[{h}_{t-1},{x}_t\right]++{b}_f\right) $$
(18)

At this point, the status update value of the hidden layer is Ct:

$$ {C}_t={f}_t\ast {C}_{t-1}+{i}_t\ast {\overset{\sim }{C}}_t $$
(19)

Output gate

The value of the output layer and the output value of the final memory unit are updated to ot, ht:

$$ {o}_t={\alpha}_o\left({R}_o\cdotp \left[{h}_{t-1},{x}_t\right]+{b}_o\right) $$
(20)
$$ {h}_t={o}_t\ast \mathit{\tanh}\left({C}_t\right) $$
(21)

where xt is the input vector, ht is the output vector, α represents the gate activation function,∗ is the element multiplication between two vectors, R represents the corresponding weight matrix, and b is the relevant partial vector.

The cell update process diagram is shown in Fig. 1.

Fig. 1
figure 1

Cell update process of LSTM

Process of IPSO optimizing LSTM

  1. Step 1:

    Initialize LSTM neural network. The input and output data are selected and normalized to create LSTM neural network. The number of neurons in each layer, learning rate, and training iteration times are determined.

  2. Step 2:

    Initialization of particle swarm. The particle swarm size is determined, and the particle velocity v0, position x0, inertia weight ω, learning factors c1, c2, maximum iteration number Lmax, minimum inertia weight ωmin, maximum inertia weight ωmax, minimum velocity vmin, and maximum velocity vmax are initialized.

  3. Step 3:

    Construct LSTM prediction model and determine the optimization range of parameters. The number of neurons in each layer, learning rate, and training iteration times are taken as parameter variables to determine the optimization range of parameters. The original data is divided into training samples and test samples.

  4. Step 4:

    Define the fitness value fi of LSTM model as:

$$ {f}_i=\frac{1}{2}\left(\frac{1}{P}{\sum}_{p=1}^p\frac{y_p-{\hat{y}}_p}{y_p}+\frac{1}{Q}{\sum}_{p=1}^p\frac{y_q-{\hat{y}}_q}{y_q}\right) $$
(22)

where P, Q are the number of training samples and test samples, \( {y}_p,{\hat{y}}_p \) are the true value of the training sample and the test paper, and \( {y}_q,{\hat{y}}_q \) are the true value and predicted value of the validation sample.

  1. Step 5:

    Update and calculate the velocity and position of each particle, generate the next generation of particle swarm, and calculate the individual and global extremum of the next generation particle swarm. The fitness value of particles is calculated. If the current fitness value of particles is less than the individual extreme value of the previous generation, the individual extreme value and individual optimal position are updated; if the minimum fitness value of all particles is less than the current global optimal value, the global optimal value and global optimal position are updated.

  2. Step 6:

    Judge whether the termination conditions are met. If the training reaches the maximum number of times or the training error meets the accuracy requirements, the iteration is terminated, and the particle position corresponding to the final global optimal solution is output. Otherwise, go to step 5.

  3. Step 7:

    Using the optimized parameters of LSTM neural network model, a new LSTM neural network model is created. Input the training set and calculate the final wind speed prediction value.

The flow chart of IPSO-LSTM is shown in Fig. 2:

Fig. 2
figure 2

IPSO-LSTM flow chart

ARMA model

ARMA model is a basic model for time series analysis.

Let the time series {Yt} is a linear difference equation,

$$ {Y}_t-{\phi}_1{Y}_{t-1}-{\phi}_2{Y}_{t-2}-\cdots -{\phi}_p{Y}_{t-p}={e}_t-{\theta}_1{e}_{t-1}-{\theta}_2{e}_{t-2}-\cdots {\theta}_q{e}_{t-q} $$
(23)

{et} is white noise, ϕ1ϕp,θ1θq is regression parameter, and p and q are autoregressive coefficient and moving average order. If ϕ1ϕp,θ1θq is not all 0, then the sequence {Yt} is called autoregressive moving average mixing process, which is called ARMA (p, q).

Combination forecasting scheme

In this paper, 600 data from China’s Changma wind farm and Spain’s Sotavento wind farm were used, 500 data of which were intercepted as training sets, and the last 100 data were used as test sets. The specific steps are as follows:

  1. Step 1.

    The SVD differential spectrum is used to decompose the original wind speed data for the first time, and the original wind speed data is changed into two parts: noise reduction sequence x1(t) and noise sequence r1(t).

  2. Step 2.

    MIV screening and PCA were used for multivariate analysis. Use the energy difference modal number optimization method to calculate the energy difference between different modal numbers and select k value.

  3. Step 3.

    The SVD noise reduction sequence is decomposed into k components, namely, IMF1, ⋯, IMFk. At the same time, the noise r2(t) is proposed.

$$ {r}_2(t)={x}_1(t)- IMF1-\cdots - IMF k $$
(24)
  1. Step 4.

    Wavelet decomposition is used to decompose each component again to get the denoising sequence x3(t) and noise sequence r3(t).

  2. Step 5.

    Analysis of denoising sequence x3(t). Using IPSO-LSTM neural network to train nonlinear sequence and ARMA model to train linear series, the prediction result of denoising sequence is y1, ⋯, yk.

  3. Step 6.

    The noise sequence is fitted with probability distribution. The results show that the data obey normal distribution. The result of noise analysis yr can be made up of random numbers from the normal distribution of error sequence.

  4. Step 7.

    Reconstruct the predicted value to get the final prediction result.

$$ y={y}_1+{y}_2+\cdots +{y}_k+{y}_r $$
(25)

The flow chart is shown in Fig. 3:

Fig. 3
figure 3

Combination forecasting scheme

Experiment

Data analysis

In this paper, wind speed data of Changma wind farm in China and Sotavento wind farm in Spain were used for analysis. The data of Sotavento wind farm include wind speed, wind direction, air temperature, air pressure, specific volume, specific humidity, and surface roughness; the environmental factors of China Changma wind farm include wind direction, air temperature, motor speed, and pitch angle. The hub height of Sotavento wind farm and Changma wind farm is 45m and 65m, respectively. Six hundred data were selected for training with a sampling interval of 10min. The first 500 data were taken as training samples, and the last 100 data were taken as test sets.

The original wind speed diagram is shown in Fig. 4. Figure 4a shows the wind speed data of the Changma wind farm in China, and Fig. 4b shows the wind speed data of the Sotavento wind farm in Spain. It can be seen from the figure that the wind speed of Changma Wind Farm in China has a strong random volatility and no obvious periodic characteristics within the range of 4m/s, 16m/s, while the wind speed of Sotavento in Spain has a strong random volatility and no obvious periodic characteristics within the range of 4m/s, 17m/s.

Fig. 4
figure 4

(ab) Wind speed data of different wind farms. (a) Wind speed of Changma Wind Farm in China. (b) Wind speed of Sotavento wind Farm in Spain

Figure 5 shows the wind rose for two different wind farms. The wind rose diagram shows the wind direction and the wind frequency in this direction, which is convenient for readers to understand the wind data. In this wind data, it can be seen from Fig. 5a that the main wind directions of Sotavento wind farm are west wind and south wind. It can be seen from Fig. 5b that west wind is the main wind direction in Changma wind farm.

Fig. 5
figure 5

(ab) Wind rose chart of different wind farms. (a) Wind rose chart of Sotavento wind Farm in Spain. (b) Wind rose chart of Changma wind farm in China

Data processing

Firstly, SVD difference spectrum is used for the first error decomposition to extract the periodic features in the original wind speed data and obtain the denoising sequence and noise term.

Then use the formula to calculate the energy of the wind speed sequence after preliminary to remove noise and the results are shown in Fig. 6. It can be seen from Fig. 6a that the energy at k = 8 is higher than that at k = 7, and the change range is obvious. Therefore, the mode number of the wind speed sequence VMD of Changma wind farm is 7. It can be seen from Fig. 6b that the energy at k = 6 increases significantly compared with that at k = 5. Therefore, the mode number of VMD of the wind speed series of Sotavento wind farm in Spain is selected as 5.

Fig. 6
figure 6

(ab) Sum of wind speed energy decomposed by VMD of different wind farms. (a) Sum of wind speed energy decomposed by VMD in Changma wind farm. (b) Sum of wind speed energy decomposed by VMD in Sotavento wind farm

It can be seen from Table 1 that when the modal number changes from 5 to 6 in Sotavento wind farm, the energy of the denoising sequence after decomposition changes greatly; that is, the energy difference becomes larger. Therefore, the mode number of VMD is 5. Similarly, in Changma wind farm, when the mode number changes from 7 to 8, the energy difference becomes larger. Therefore, the mode number of VMD is 7.

Table 1 Energy difference between different modes (×10−4)

In order to improve the prediction accuracy, we use MIV method to deal with multiple factors. Firstly, the results of MIV among multiple factors are shown in Table 2 and Table 3. We filter out the environmental factors that have little influence on the wind speed prediction results of Sotavento and Changma wind farms, that is, temperature, specific volume, and surface roughness of Sotavento wind farm and pitch angle of Changma wind farm.

Table 2 MIV value of Sotavento wind farm
Table 3 MIV value of Changma wind farm

Then, the other factors are analyzed by principal component analysis, and all the principal components whose cumulative variance contribution rate reaches 85% are selected as the fixed inputs of the prediction model. The results are shown in Table 4. Both wind farms choose the first three principal components to participate in the prediction.

Table 4 Results of PCA

After determining the mode number of VMD decomposition, the wind speed series after noise removal is directly decomposed into trend sequence part and fluctuation sequence part, as shown in Fig. 7. Figure 7a shows the VMD decomposition data of Changma wind farm. Figure 7b shows the decomposition data of VMD wind speed of Sotavento wind farm.

Fig. 7
figure 7

(ab) VMD decomposition results of different wind farms. (a) VMD decomposition data of Changma wind farm. (b) VMD decomposition data of Sotavento wind farm

Wavelet decomposition is used to decompose the noise of the decomposed IMFs for the last time. The final wind speed sequence which can be used for training is obtained.

Wind Speed prediction

A unit of LSTM neural network contains three gates. This structure can effectively alleviate the gradient disappearance problem and is more suitable for solving nonlinear time-varying problems. Therefore, IPSO-LSTM is used to predict the trend part of the wind speed sequence. The ARMA model is used to train the linear sequence.

Figure 8 shows the variation of fitness values when using IPSO and PSO to optimize the parameters of LSTM neural network using two different wind farm data. It can be seen from Fig. 8a that IPSO converges faster. When using the same parameters, it can be seen from Fig. 8b that PSO has fallen into obvious local convergence. Therefore, the improvement of PSO is necessary and successful.

Fig. 8
figure 8

Fitness curves of different wind farms. (a) Fitness curve of Sotavento Wind Farm. (b) Fitness curve of Changma Wind Farm

The wind speed prediction results of Changma wind farm are shown in Fig. 9a, and the wind speed prediction results of Sotavento wind farm are shown in Fig. 9b:

Fig. 9
figure 9

(a–b) Wind speed prediction results of different wind farms. (a) Prediction results of Changma Wind Farm. (b) Sotavento wind Farm wind speed prediction results

It can be seen from the figure that the IPSO-LSTM-ARMA-E combination prediction is closer to the real value. IPSO-LSTM can predict the trend part of wind speed series better than LSTM. LSTM-ARMA model can predict the peak and trough of wind speed fluctuation more accurately than LSTM neural network. The prediction results of HMD-IPSO-LSTM-ARMA-E model are better than HMD-IPSO-LSTM-E model.

In order to better judge the influence of various variables on the final prediction results and verify the conclusion obtained from direct observation in Fig. 9, it is necessary to conduct accurate error analysis on the prediction results.

In this paper, MAPE, MAE, and MSE are selected to evaluate the model (Hu et al. 2020; Zhang et al. 2021). The calculation formula is Formula (26)

$$ \left\{\begin{array}{c} MAPE=\frac{1}{m}{\sum}_{i=1}^n\left|\frac{x_i-{\hat{x}}_i}{x_i}\right|\\ {} MAE=\frac{1}{m}{\sum}_{i=1}^n\mid {x}_i-{\hat{x}}_i\mid \\ {} MSE=\frac{1}{m}{\sum}_{i=1}^n{\left|{x}_i-{\hat{x}}_i\right|}^2\end{array}\right. $$
(26)

where xi \( {\hat{x}}_i \) is the actual value and predicted value of wind speed at time t and the sample size is m.

The error analysis is shown in Table 5.

Table 5 Prediction error analysis table

It can be seen from the error analysis table that the conclusion obtained from direct observation of Fig. 9 is that HMD-IPSO-LSTM-ARMA-E combination prediction is more accurate and can effectively reduce the error. It can be seen from the table that MAE, MSE, and RMSE are reduced in varying degrees after adding noise prediction. Although the prediction results of HMD-IPSO-LSTM-E model and HMD-IPSO-LSTM model generally have large errors, the noise prediction still effectively reduces the final prediction error, which fully proves the necessity of noise prediction.

To sum up, the prediction results of HMD-IPSO-LSTM-ARMA-E scheme proposed in this paper are closer to the actual values. LSTM-ARMA was used to analyze the differences of different types of time series. IPSO-LSTM optimizes the parameters of LSTM model. Adding error analysis in the reconstruction of prediction results also reduces the final prediction error. The scheme is feasible.

We take data from Sotavento wind farms in Spain. The first 450 data were taken as training samples, and the last 150 data were taken as test sets. LSTM, RBF, Elman, and BP neural networks are used to predict the data, and the prediction results are as follows (Fig. 10, Table 6):

Fig. 10
figure 10

Comparison of the results of different prediction methods

Table 6 Prediction error analysis table

It can be seen from the error analysis table that LSTM neural network is more suitable for this system. The MAE of LSTM neural network model is 22.5%, 23.5%, and 33.6% less than that of RBF, Elman, and BP neural network model, respectively. The MSE of LSTM neural network model is 38.9%, 41%, and 60.2% less than that of RBF, Elman, and BP neural network model, respectively. The RMSE of LSTM neural network model is 21.9%, 23.4%, and 37.0% less than that of RBF, Elman, and BP neural network model, respectively. Thus, in this experiment, the advantage of LSTM neural network in learning nonlinear sequence has been fully reflected.

Wind speed prediction based on artificial intelligence

When forecasting wind speed, it needs a lot of historical and weather information. Artificial intelligence is the best tool to deal with a lot of data. Due to the lack of space-time data density of the existing data, the effect of traditional prediction methods is poor. Artificial intelligence has the ability to infer incomplete and uncertain information and can achieve high-precision wind speed prediction. In addition, artificial intelligence can also summarize the knowledge and experience of experts, improve the average prediction level, and use the abstract prediction knowledge that cannot be used in statistical and numerical models to comprehensively consider multiple influencing factors, so as to improve the prediction level and operation efficiency of the model.

In recent years, in the research field of wind speed prediction at home and abroad, the application of artificial intelligence has increased significantly, and it shows a trend from traditional machine learning to deep learning. Wind speed is fluctuant and random, which makes wind speed and wind power series highly nonlinear. Compared with traditional machine learning methods, deep learning has obvious advantages in massive data processing and nonlinear space-time prediction. It has great application potential in various technical links of wind speed simulation and prediction and provides technical support for faster, more efficient, and more accurate wind speed prediction.

Conclusion and future work

In order to reduce carbon emissions and solve environmental problems such as air pollution and ecological damage, countries around the world are developing renewable energy technologies to reduce their dependence on fossil energy. With the energy revolution facing many problems, wind power has become an important source of renewable energy, and wind forecasting technology is developing rapidly. New optimization methods and combination schemes are constantly produced. High-precision wind speed prediction method provides a scientific basis for realizing smart power grid and improving economic benefits of wind farms.

In order to achieve high-precision wind power prediction, this paper proposes an HMD-IPSO-LSTM-ARMA-E prediction scheme based on the improved VMD method and IPSO-LSTM neural network, which is verified by the wind speed data of Changma wind farm in China and Sotavento wind farm in Spain. The main experimental results are as follows:

  1. (1)

    The HMD method is used to decompose the non-stationary wind speed time series, and the relatively stationary components and residual terms are obtained, which effectively solves the problems of the insignificant periodic characteristics of the wind speed time series and the difficulty in choosing the VMD mode number.

  2. (2)

    Considering that as an improved RNN, LSTM neural network not only has the ability of analyzing time series data but also has the ability of long-term memory, using IPSO to optimize the parameters of LSTM neural network which can improve the prediction accuracy of neural network.

  3. (3)

    The ARMA model can take into account the dependence of time series and the disturbance caused by random fluctuation in the prediction and has a high accuracy in the short-term trend prediction. In this paper, LSTM neural network and ARMA model are specifically used to train and forecast the decomposed nonlinear wind speed sequence and linear wind speed sequence, respectively, and the prediction effect is significantly improved.

The results of the improved VMD method, the optimization of LSTM neural network, the difference analysis of different sequences, and the multiple noise decomposition are satisfactory. The combined model highlights the advantages of each algorithm, and the wind speed prediction system has more perfect wind speed processing and smaller prediction error.

In the follow-up prediction work, consider improving the quality of model input data, using NWP data with higher resolution and combining with geographic information system to correct wind speed, fully consider the impact of geographical conditions on wind speed, improve accuracy, further study wind speed series prediction based on other deep learning algorithms, try to use more optimization algorithms to improve the performance of the model, consider that the combination model combined with physical model, dynamic model, and fluid mechanics model is constructed to improve the prediction accuracy and efficiency.