Keywords

1 Introduction

Global forecast systems are based on the numerical integration of differential equation systems, which can describe atmospheric processes on account of meteorological observations. Statistical methods, which apply historical time-series to predict several hours ahead, are excellent at forecasting idiosyncrasies in local weather [4]. Model output statistics (MOS) post-process first outputs of complex Numerical Weather Prediction (NWP) models, able to forecast large-scale weather patterns, using regression equations to reduce systematic NWP errors and clarify surface weather details [7]. Adaptive methods can improve conventional statistical corrections, eliminating also random forecast errors of NWP models, induced due to uncertain initial conditions and data computational limitations. However their applications are limited to some cases of a limited number of input data for specific model resolutions [8]. Artificial Neural Networks (ANN) can approximate any continuous nonlinear function that offers an effective alternative to more traditional regression techniques. Polynomial Neural Networks (PNN) can adapt some mathematical principles of the Partial Differential Equation (PDE) substitution to decompose and solve the general linear PDE, able to describe properly the local weather dynamics [10]. Extended PNNs, trained for local actual weather conditions to model relevant data fluctuant relations in few last days, can process numerical model outcomes of the same data types (replacing the unknown data) to refine target forecast series for specific local situation features.

$$ Y = a_{0} + \sum\limits_{i = 1}^{n} {a_{i} x_{i} } + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {a_{ij} x_{i} x_{j} + } } \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {\sum\limits_{k = 1}^{n} {a_{ijk} x_{i} x_{j} x_{k} + \cdots } } } $$
(1)

nnumber of input variables x (x 1 , x 2 ,…, x n )

a (a 1 , a 2 ,…, a m ),…vectors of parameters

Differential polynomial neural network (D-PNN) is a new neural network type, which adopts some principles of the GMDH (Group Method of Data Handling) theory, created by a Ukrainian scientist Aleksey Ivakhnenko in 1968. The GMDH decomposes the complexity of the Kolmogorov-Gabor general polynomial (1), which can describe connections between input and output system variables, into a lot of simpler relationships each described by low order polynomials (2) for every pair of input variables. The GMDH network polynomials can approximate any stationary random sequence of observations and can be computed by either adaptive methods or system of Gaussian normal equations. A typical PNN maps a vector input x to a scalar output Y, which is an estimate of the searched true function [6]. The PNN is a flexible architecture, whose structure is developed through learning. The number of layers of the PNN is not fixed in advance but becomes dynamically meaning that this self-organizing network grows over the trained period.

$$ y = a_{0} + \, a_{1} x_{i} + \, a_{2} x_{j} + \, a_{3} x_{i} x_{j} + \, a_{4} x_{i}^{2} + a_{5} x_{j}^{2} $$
(2)

D-PNN defines and solves the general PDE to model an unknown searched function, producing sum series of fractional substitution derivative terms in all the PNN layer nodes. The D-PNN extends the basic PNN structure functionality to decompose the general PDE analogous to the GMDH does the general connection polynomial (1). In contrast with the ANN functionality, each neuron (i.e. substitution PDE term) can be directly included in the total network output, calculated as the sum of selected (active) neuron values [10]. The D-PNN application merits become evident in the modeling of uncertain dynamic systems (including weather variables), which PDE can preferably describe and which are too complex requiring mass of input data to be solved by standard regression or soft-computing methods [9].

2 Short-Term Wind Speed Forecasting

One of the main problems of the wind speed forecasting are wind frequent continuous fluctuations. The NWP models smoothing and averaging the orographic and landscape characteristics lead to the weak representation of local effects on the airflow, emphasized by possible errors in the initial and lateral boundary conditions [6]. Only rough 24–48 h prognoses are usually provided by a meso-scale meteorological model resulting from a NWP global system, which does not take into account local obstacles and terrain asperity that can influence the wind speed (power) in a great measure. The wind power is primarily a result of the current wind-speed, much less affected by other conditions, e.g. the unstable wind direction, speed change, dusts or turbine operating temperature. Wind flow prediction inaccuracies become 3-times larger when the wind-speed is converted into the power through the characteristic curve of a wind turbine. The potential benefits of forecasting the wind energy production are obviously useful in the power control and load scheduling. Wind or power forecast models usually apply lagged values of the average data measurements together with some relevant meteorological variables. Forecasting methods can be generally divided into two main groups based on:

  • physics of the atmosphere is applied in NWP models

  • statistical consideration takes into account historical time-series only

Statistical methods can apply ANNs, wavelet or classical time-series analysis [5] combined with neuro-fuzzy or ARIMA models. The spatial correlation takes relationships of the wind speed at different sites into account and time-series of the predicted point and its neighboring sites are employed in prediction. Forecasting models developed for one location usually do not match the other site due to variety of reasons like change in terrain, different wind speed patterns and atmospheric factors. The physical method has advantages in long-term prediction, while the statistical method does well in short-term hourly forecasts. The iterative training can provide slightly different model results therefore the final solution is usually the average of several runs. Most of the existing methodologies show some drawbacks such as over-fitting or dependence on the particular local conditions. Hybrid methods combine different model unique features to overcome the single negative performance and try to improve the final forecast. These methods need as a rule a pre-processing of input data to show a better performance. The spatial averaging and temporal interpolation can derive specific prediction intervals to improve wind speed local forecasts. On-line forecast systems can monitor actual wind speed NWP errors and calculate several hour ahead intra-day corrections [2]. Most models combine time-series observations with NWP outputs, to relate measured data with the forecasts [3]. Adaptive post-processing methods require as a rule to reduce substantially the number of input variables, necessary to model detailed weather data relations [8]. They may equal or exceed the MOS, starting from the given NWP that does not include local characteristics. The proposed revision methods process final NWP model outcomes, which were already corrected primarily for systematic forecast errors by several MOS and secondary data analysis models. The D-PNN forms daily correction models that can adopt corresponding NWP output data to recalculate the target 24-hour wind speed forecast series in consideration of the trained actual weather frame and local specifics.

3 The General PDE Decomposition and Substitution

The key idea of the D-PNN is to define and substitute for the general linear PDE, which exact form is not known in advance and which can describe any data relations, with sum series of selected relative polynomial derivative terms (3). The searched function u, which is possible to calculate as the sum of its derivative terms (+ bias) (3), may be expressed in the form of sum series (5), consisting of convergent series arisen from the competent partial derivatives (3) in the case of 2 input variables.

$$ a + bu + \sum\limits_{i = 1}^{n} {c_{i} \frac{\partial u}{{\partial x_{i} }}} + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {d_{ij} \frac{{\partial^{2} u}}{{\partial x_{i} \partial x_{j} }}} } + \cdots = 0\begin{array}{*{20}c} {} & {} & {} \\ \end{array} u = \sum\limits_{k = 1}^{\infty } {u_{k} } $$
(3)

u(x 1 , x 2 ,, …, x n )unknown function of n-input variables

a, B(b 1 , b 2 ,,…, b n ), C(c 11 , c 12,, …)polynomial parameters

Substitution PDE terms (3) are built form the GMDH polynomials (2) according to the adapted Similarity Dimensional Analysis (SDA) that applies various formal adaptations of a PDE or data units to form dimensionless characteristic groups of variables. It replaces original mathematical operators and symbols of a PDE by the ratio of corresponding variables [1].

$$ y_{i} = \frac{{\left( {a_{0} + \sum\limits_{i = 1}^{n} {a_{i} x_{i} } + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {a_{ij} x_{i} x_{j} } } + \ldots } \right)^{m/n} }}{{b_{0} + b_{1} x_{1} + \ldots }} = \frac{{\partial^{m} f(x_{1} , \ldots ,x_{n} )}}{{\partial x_{1} \partial x_{2} \ldots \partial x_{m} }} $$
(4)
n :

combination degree of a complete polynomial of n-variables

m :

combination degree of polynomial denominator

f :

polynomial substitution for the unknown function u

The complete combination polynomials (2) of the numerators substitute for the partial unknown functions u k of derivative term sum series u (5), while the reduced polynomials of denominators represent the alterative derivative parts (5) that result from the competent derivatives.

$$ \left( {\sum {\frac{{\partial u_{k} }}{{\partial x_{1} }},} \sum {\frac{{\partial u_{k} }}{{\partial x_{2} }},\sum {\frac{{\partial^{2} u_{k} }}{{\partial x_{1}^{2} }},} \sum {\frac{{\partial^{2} u_{k} }}{{\partial x_{1} \partial x_{2} }},} } \sum {\frac{{\partial^{2} u_{k} }}{{\partial x_{2}^{2} }}} } \right) $$
(5)

The root function of the numerator (4), which takes the complete polynomials into competent combination degree, may or may not be applied. The general PDE (3) can be converted into an ordinary differential equation with only time derivatives (7) that describe 1-variable time-series. It is solved analogously using the same PDE multi-variable substitution terms (4) with time-series. Most models take a combined form of the ordinary-time and PDE solutions [10].

$$ a + bs + \sum\limits_{i = 1}^{m} {c_{i} \frac{{ds(x_{i} )}}{dt}} + \sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{m} {d_{ij} \frac{{ds(x_{i} ,x_{j} )}}{dt}} } + \ldots + \sum\limits_{i = 1}^{m} {cc_{i} \frac{{d^{2} s(x_{i} )}}{{dt^{2} }}} + \ldots = 0 $$
(6)
s( x ) :

function of independent time-series observations x (x 1 , x 2 ,…, x m )

Blocks (extended nodes) of the D-PNN (Fig. 1.) form substitution sum DE terms (neurons in this context) with the same input variables, one for each fractional polynomial derivative combination (4). Each block contains a single output polynomial (2), without derivative part. Neurons do not affect the block output but can be directly included in the total network output sum calculation of a DE solution. Each block has 1 and neuron 2 vectors of adjustable parameters a, and a, b respectively.

Fig. 1.
figure 1

D-PNN blocks forms simple (/) and composite (CT) substitution PDE terms (neurons)

$$ F\left( {x_{1} ,x_{2} ,u,\frac{\partial u}{{\partial x_{1} }},\frac{\partial u}{{\partial x_{2} }},\frac{{\partial^{2} u}}{{\partial x_{1}^{2} }},\frac{{\partial^{2} u}}{{\partial x_{1} \partial x_{2} }},\frac{{\partial^{2} u}}{{\partial x_{2}^{2} }}} \right) = 0 $$
(7)

where F(x 1 , x 2 , u, p, q, r, s, t) is a function of 8 variables

While using 2 input variables the 2nd order partial DE may be expressed in the form (7), which involves derivative terms formed in respect of all the GMDH polynomial (2) variables. Each D-PNN block forms 5 corresponding simple derivative neurons in respect of single x 1 , x 2 (7) squared \( x_{1}^{2} \), \( x_{2}^{2} \) (8) and combination x 1 x 2 (9) derivative variables, which combination sum can directly solve and substitute for the 2nd order partial DE (6), most often used to model physical or natural system non-linearities.

$$ y_{1} = \frac{{\partial f(x_{1} ,x_{2} )}}{{\partial x_{1} }} = w_{1} \frac{{\left( {a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1} x_{2} + a_{4} x_{1}^{2} + a_{5} x_{2}^{2} } \right)^{1/2} }}{{b_{0} + b_{1} x_{1} }} $$
(8)
$$ y_{4} = \frac{{\partial^{2} f(x_{1} ,x_{2} )}}{{\partial x_{2}^{2} }} = w_{4} \frac{{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1} x_{2} + a_{4} x_{1}^{2} + a_{5} x_{2}^{2} }}{{b_{0} + b_{1} x_{2} + b_{2} x_{2}^{2} }} $$
(9)
$$ y_{5} = \frac{{\partial^{2} f(x_{1} ,x_{2} )}}{{\partial x_{1} \partial x_{2} }} = w_{5} \frac{{a_{0} + a_{1} x_{1} + a_{2} x_{2} + a_{3} x_{1} x_{2} + a_{4} x_{1}^{2} + a_{5} x_{2}^{2} }}{{b_{0} + b_{1} x_{11} + b_{2} x_{12} + b_{3} x_{11} x_{12} }} $$
(10)

The Root Mean Squared Error (RMSE) is calculated for the polynomial parameter optimization and neuron combination selection (11).

$$ RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{M} {\left( {Y_{i}^{d} - Y_{i} } \right)}^{2} }}{M}} \to \hbox{min} $$
(11)

4 Backward Differential Polynomial Neural Network

Multi-layer networks form composite functions (Fig. 2.). Composite Terms (CT), which substitute for the derivatives with respect to variables of previous layers, are calculated according to the composite function (12) partial derivation rules (13).

Fig. 2.
figure 2

N-variable D-PNN selects from 2-variable combination blocks in each hidden layer

$$ F\left( {x_{1} , \, x_{2} , \ldots , \, x_{n} } \right) = f\left( {z_{1} ,zz_{2} , \ldots , \, z_{m} } \right) = f(\phi_{1} (X),\phi_{2} (X), \ldots ,\phi_{m} (X)) $$
(12)
$$ \frac{\partial F}{{\partial x_{k} }} = \sum\limits_{i = 1}^{m} {\frac{{\partial f(z_{1} ,z_{2} , \ldots ,z_{m} )}}{{\partial z_{i} }} \cdot \frac{{\partial \phi_{i} (X)}}{{\partial x_{k} }}} \quad k = 1, \ldots ,n $$
(13)

Each D-PNN block can form 5 simple neurons (8)–(10) and the blocks of the 2nd and next hidden layers produce additional CTs using composite substitution derivatives with respect to the output and input variables of the back connected previous layers blocks, e.g. 3rd layer blocks can form linear CTs in respect of 2nd (14) and 1st layer (15). The number of block neurons, i.e. CTs that include composite function derivatives, doubles each previous back-connected layer. Thus the probability activations P A of CTs, formed with respect to the previous layers block derivative input variables, must halve together with the increasing number of hidden layers they backward comprise in the network tree-like structure (Fig. 2) [10].

$$ y_{2} = \frac{{\partial f(x_{21} ,x_{22} )}}{{\partial x_{11} }} = w_{2} \frac{{(a_{0} + a_{1} x_{21} + a_{2} x_{22} + a_{3} x_{21} x_{22} + a_{4} x_{21}^{2} + a_{5} x_{22}^{2} )^{1/2} }}{{x_{22} }} \cdot \frac{{x_{21} }}{{b_{0} + b_{1} x_{11} }} $$
(14)
$$ y_{3} = \frac{{\partial f(x_{21} ,x_{22} )}}{{\partial x_{1} }} = w_{3} \frac{{(a_{0} + a_{1} x_{21} + a_{2} x_{22} + a_{3} x_{21} x_{22} + a_{4} x_{21}^{2} + a_{5} x_{22}^{2} )^{1/2} }}{{x_{22} }} \cdot \frac{{(x_{21} )^{1/2} }}{{x_{12} }} \cdot \frac{{x_{11} }}{{b_{0} + b_{1} x_{1} }} $$
(15)

The square and combination derivative terms are formed analogously. The D-PNN with an increased number of input variables must select from possible block nodes in each hidden layer (analogously to the GMDH combinatorial explosion) as the number of input combination couples grows exponentially in each next hidden layer. The D-PNN need not define all the possible PDE terms (using a deep multi-layer structure), e.g. 4–5 hidden layers can form an optimal model for 10-20 input variables (Fig. 3.).

Fig. 3.
figure 3

3-simultaneous separate processes of the structure and parameters optimization

$$ Y = \frac{{\sum\limits_{i = 1}^{k} {y_{i} } }}{k}\quad k = actual\; \, number\;of\; \, active\; \, neurons $$
(16)

Only some of all the potential combination PDE terms (neurons) may be included in a PDE solution, in despite of they have the adjustable term weight (w i ). A specific neurons combination, which forms a PDE solution, is not able to accept a disturbing effect of the rest of the neurons (possible to form other solutions) in the parameter optimization. The D-PNN total output Y is the arithmetic mean of active neurons output values so as to prevent their changeable number (in a combination) from influencing the total network output value (16) [10].

The 2 simultaneous random gradually finishing processes of the optimal block 2-inputs and neurons combination selection are the principal initial phase of the D-PNN structure formations and PDE composition, performed simultaneously along with the continual polynomial parameters adjustment using the Gradient Steepest Descent (GSD) method (Fig. 4). The binary Particle Swarm Optimization (PSO), able to solve large combinatorial problems, may perform the optimal neurons selection. The Simulated Annealing (SA), which not only accepts changes that decrease the objective function (in a minimization problem) but also some changes that increase it, can improve the block inputs reconnection process.

Fig. 4.
figure 4

29.1.2014, Helena - RMSE: NOAA = 6.79, D-PNN = 4.31

5 Wind Speed NWP Local Revisions

The National Oceanic and Atmospheric Administration (NOAA) provides the free National Weather Service (NWS), which includes among others tabular 4-day forecasts of the hourly average temperature, relative humidity and wind speed at a selected locality [11]. Daily wind speed correction models processed the last 3-input time-series (lags) of the above 3 local forecasted quantities, i.e. 9 input variables in total, in Helena, Montana [11] (Fig. 4, 5 and 6). The D-PNN was trained with corresponding hourly observation series (of the same data types) for the period of 2 to 6 days previously (48-144 data samples). The training data can be downloaded from NOAA complete free historical archives [12], which the Weather Underground (WU) shares [13], or current daily tabular NOAA observations [14] at many land-based stations.

Fig. 5.
figure 5

30.1.2014, Helena - RMSE: NOAA = 3.89, D-PNN = 2.59

Fig. 6.
figure 6

31.1.2014, Helena - RMSE: NOAA = 5.90, D-PNN = 3.77

Settled weather periods of several days allow to train the D-PNN with historical observations, which actual fluctuant relevant data relations do not change essentially in the forecasted days (Fig. 6). It is infeasible to revise the NOAA wind speed NWP in sporadic days of an overnight break change in the weather however intervals of more or less settled stable conditions tend to prevail (Fig. 7). The NOAA North American Meso-scale (NAM) forecast model may be also accurate enough to allow any successful revisions using the proposed procedure.

Fig. 7.
figure 7

The week wind speed average prediction RMSE: NOAA = 4.65, D-PNN = 2.99, Helena

Figure 7 shows daily wind speed prediction RMSEs of the original NOAA and D-PNN minimal error correction models in a week period; the x-axis represents the ideal (real) approximation.

6 Conclusions

Meteorological conditions mostly do not change fundamentally within short time periods, which allow to form models that represent the forecasted day actual data relations. An appropriate NWP data analysis can detect overnight weather changes, followed by days, which out of date models show big prediction errors, to reject the failed revisions. Proposed method results are naturally bound to the accuracies of input NWP model outcomes, which are not completely valid and that enter correction models. The presented D-PNN output wind speed predictions were compared with the observations at the complete 24-hour forecasted interval, which are clearly not known in real-time, to select the best PDE correction solutions. It is necessary to estimate the D-PNN optimal daily training errors or test its models with the previous day several hours past forecasts and corresponding last observations to eliminate the weather dynamics and the NWP outputs inaccuracy in real predictions. The approximate training or testing parameters will any debase the presented minimal prediction errors however some improvements in the NWP are feasible according to the experiments.