Keywords

1 Introduction

Electricity is of vital importance to every region as an essential energy resource in people’s daily life. Especially in the underdeveloped areas where electricity is deficient, thus makes it more precious. Short-term electric load forecasting (STLF) is an fundamental tool for transmission dispatching, unit commitment and other public utilities [1]. During the past decades, various models based on mathematics or artificial intelligence [27] have been introduced to electric load forecasting. All these methods improve the forecasting performance to a certain degree. However, the individual methods are too simple to comprehensively deal with the forecasting problems. As Moghram and Rahman [8] stated, all the individual methods failed to yield desirable forecasting performance. The forecasting errors will have a tremendous impact on the economic benefits. In the other words, small increasement of forecasting accuracy may save millions of dollors cost which are huge to the electricity market. In order to enhance the forecasting performance, emphases have laid on the hybrid models or combined models. By integrating multiple methods, the forecasting method can process different data with different characteristics and overcome flaws existing in the individual methods. Amina et al. [9] proposed a novel fuzzy wavelet neural network model of Greek Island of Crete. By using the subtractive clustering optimized with the Expectation-Maximization algorithm, the results indicated that it provided significantly better forecasts. Xiao et al. [10] introduced a hybrid forecasting model combined with BP and GRNN. Ahead of forecasting, data preprocessing was used to improve the forecasting accuracy. A hybrid method with empirical mode decomposition (EMD), extended Kalman fitter, extreme learning machine (ELM) and particle swarm optimization (PSO) was utilized by Liu et al. [11] to forecast short-term load. [1214] proposed the examinations of the combined methods. All these hybrid and combined methods enhance the forecasting performance to a large extend. That makes them the preferred methods when considering the forecasting accuracy.

In this paper, a combined model based on generalized regression neural network (GRNN), Elman, least squares support vector machine (LSSVM) and simulated annealing (SA) is introduced. GRNN is a kind of radial basis function (RBF) network which has a strong nonlinear mapping capacity and high degree of fault tolerance. Its network structure is simple and the calculation results can achieve global convergence. Xia et al. [15] applied GRNN for short-term load forecasting and virtual instrument design. Chelgani et al. [16] used GRNN to predict microwave irradiation pretreatment and peroxyacetic acid desulfurzation of coal. The results turned that GRNN was good at predicting. Li et al. [17] employed a hybrid model based on GRNN which was optimized by fruit fly optimization algorithm. Elman is a dynamic neural network which has a short-term memory. That makes it well accommodate to the time-variant characteristic. Under the influence of temperature, a wavelet Elman neural network for short-term load prediction was proposed by Kelo et al. [18]. Song [19] introduced the Elman networks on the weight convergence. Li et al. [20] utilized Chaotifying linear Elman networks. LSSVM is one of the machine learning methods which has strong nonlinear data processing ability. LSSVM is able to reduce the complexity of the calculation and improve the speed of solving. Shayeghi et al. [21] proposed a hybrid model to forecast day-ahead electricity price. Chaotic gravitational search algorithm was developed to find the optimal parameters of LSSVM. Zhang et al. [22] utilized an unbiased LSSVM with polynomial kernel. Xie et al. [23] applied clustering-LSSVM to forecast electricity price. SA is a stochastic optimization algorithm which based on Monte-Carlo iterative solution strategy. It can effectively avoid falling into local minimum and finally it approaches the global optimal. In this paper, SA not only optimizes the parameters of GRNN, but also determines the weight coefficients of the three individual methods. Hong [24] introduced seasonal SVR which is optimized by simulated annealing algorithm to forecast traffic flow. Yuan et al. [25] proposed the Clound theory-based simulated annealing algorithm and application. Pai et al. [26] utilized support vector machine optimized by simulated annealing algorithm in electricity load forecasting. Firstly, three individual forecasting methods are applied to forecast respectively. Then, SA is employed to determine the weight coefficients of each individual method. At last, multiplying all the three forecasting results with the corresponding weights, the final result of the combined model can be attained.

The rest of this paper is organized as follows. Section 2 introduces the theory of GRNN, Elman, LSSVM and SA which combined the proposed model. The implementation process of the proposed combined model is described in Sect. 3. In Sect. 4, a simulation of electric load forecasting of Australia electric market is shown. The contrasted results demonstrate the preponderance of the proposed method. Finally, Sect. 5 concludes the paper.

2 Methodologies

All the individual methods are introduced in this section, including generalized regression neural network, elman, least squares support vector machine and simulated annealing algorithm.

2.1 Elman Neural Network

The Elman neural network(ElmanNN) first proposed by Elman in 1990 [27], which structure includes the input layer, a particular context nodes input layer, the hidden layer(middle layer) and the output layer, is a feed-forward network with local feedback. The linear or nonlinear function is applied for the transfer function of Elman. The connection of each layer is similar to the feed-forward network and the context layer that can be seen as a one-step delay operator is used to record before moment output values of hidden layer. The framework of ElmanNN is shown in the Fig. 1 and its state space is expressed as:

Fig. 1.
figure 1

The framework of a feedback ElmanNN with three layer structure.

$$ y\left( k \right) = g\left( {\omega^{3} x\left( k \right)} \right) $$
(1)
$$ x\left( k \right) = f\left( {\omega^{\prime}x_{c} \left( k \right) + \omega^{2} \left( {u\left( {k - 1} \right)} \right)} \right) $$
(2)
$$ x_{c} \left( k \right) = x\left( {k - 1} \right) $$
(3)

In the Fig. 1, \( y \) is m-demension output vector; \( x \) is n-demension hidden layer unit vector; \( u \) express r-demension input vector; \( x_{c} \) is n-demension feedback state vector; \( \omega^{3} \) is the weight from hidden layer to output layer; \( \omega^{2} \) is the weight from input layer to hidden layer; \( \omega^{1} \) is the weight from context layer to hidden layer; \( b_{1} \) and \( b_{2} \) are the threshold value of hidden layer and output layer, respectively; \( g\left( * \right) \) and \( f\left( * \right) \) are the transfer function of the output neurons and of hidden neurons, respectively.

2.2 General Regression Neural Network

The general regression neural network (GRNN) proposed by Donald F. Specht in 1991 [28] is a variation of radial basis neural networks which is designed for function approximation and regression. A GRNN consisted of four layers: the input layer, the pattern layer, the summation layer, and the output layer is a novel effective feed-forward neural network model with the standardization of the dot product weight function. Radial Basis Function (RBF) is used as transformation from input layer to hidden layer, while, a special linear transformation is used from hidden layer to output layer. In addition, the network can avoid the impact on the prediction result caused by human subjective assumptions as much as possible, because of that in GRNN there is only one artificially adjust parameters named smooth factor and network learning are all depended on the data sample. Figure 2 shows the structure of GRNN.

Fig. 2.
figure 2

The structure of the general regression neural network

The first layer is input layer which number of neurons is equal with the number of the input parameters. The second layer is RBF hidden layer which number of neurons is the training sample number. The Gaussian function is normally used to be its transfer function, which contain smooth factor. The smaller the smooth factor, the stronger the function approximation ability. The third layer was simple linear output layer.

2.3 Least Squares Support Vector Machine

Least squares support vector machine (LSSVM) which is developed based on the statistic theory is a new kind of machine learning techniques. Suyken et al. [29] proposed LSSVM by changing the constraint condition and risk function of SVM. The quadratic programming problem can be directly converted into linear equations. That means LSSVM has the ability to accelerate the solving speed and reduce the use of computing resources which can improve the convergence speed.

For a given sample \( \left\{ {x_{i} ,y_{i} } \right\} \), where \( x_{i} \in R^{n} \) is the output vector, \( i = 1,2, \cdots ,l \), \( y_{i} \in R \) is the output. By using the structural risk minimization principle, the optimization problem is expressed as the following constrained optimization problem:

$$ \hbox{min} :J\left( {\omega ,{\text{e}}} \right) = \frac{1}{2}\left\| \omega \right\|^{2} + \frac{1}{2}C\sum\limits_{i = 1}^{N} {e_{i}^{2} } $$
(4)
$$ s.t.:y_{i} = \omega^{T} \varphi \left( {x_{i} } \right) + b + e_{i} \,i = 1,2, \cdots ,N $$
(5)

Where \( e_{i} \) denotes as the errors, \( C \) is the penalty coefficient which is used to control the degree of punishment beyond the errors of the sample.

The optimization problem can be solved by introducing the Lagrange function:

$$ L\left( {\omega ,b,e,\alpha } \right) = J(\omega ,e) - \sum\limits_{i = 1}^{N} {\alpha_{i} [\omega^{T} \phi (x) + b + e_{i} - y_{i} ]} $$
(6)

Where \( \alpha = (\alpha_{1} ,\alpha_{2} , \cdots ,\alpha_{N} )^{T} \) is the Lagrangian multiplier.

After solving the linear equation, the final expression is as follows:

$$ f\left( x \right) = \sum\limits_{i = 1}^{N} {\alpha_{i} } k\left( {x_{i} ,x_{j} } \right) + b $$
(7)

\( k\left( {x_{i} ,x_{j} } \right) \) is the inner product kernel function. In this paper, the radial basis function (RBF) kernel function is chose because it is the most effective one to deal with the nonlinear regression problems.

2.4 Simulated Annealing Algorithm

Simulated annealing (SA) is a kind of widely used stochastic optimization algorithm which is based on Monte Carlo iterative solution strategy. By simulating the annealing process of solid material in physics, problems similar to the NP complex can be solved.

Starting from the random feasible solution, for a given initial control parameters, SA keeps carrying out the iterative process of “generate data processing—judgment—accept/abandonment”. That is to say when the control parameter of temperature \( T \) stays the same, the relatively optimal solution of the combinatorial optimization can be obtained by repeating the Metropolis algorithm. And then reduce the control value of \( T \), repeat the Metropolis algorithm under different \( T \), when \( T \) approaches zero, the final overall optimal solution of the combinatorial optimization problem can be acquired. Four steps of SA are as follows:

  1. 1.

    Initialization: choose the initial temperature and the length of Markov chain which stands for the number of the iteration of Metropolis algorithm.

  2. 2.

    Generate new state: reduce the control temperature according to its attenuation function. Every time the temperature reduces, there exists a random disturbance which generate a new state.

  3. 3.

    Generate new solution: determine whether to accept the new generated state as the new solution according to the accept function.

  4. 4.

    Obtain the optimal solution: the optimal solution will be procured until meeting the stop criterion.

3 The Proposed Method

This paper proposed a new combined predict model based on several methods of power load forecasting.

The details of the combined model are shown as follows:

Step 1. Import the training data. Then build the input set and the output set of the training set and the input set and the output set of the testing set.

Step 2. Build Elman model according to the input set and the output set. Then set the scope of the number of the intermediate layer nodes and to find the optimum value by circulation. Finally, use the training set to train ElmanNN model and get the forecast result set \( \hat{Y}_{1} \).

Step 3. Build the GRNN model according to the input set and the output set. Then choose the smooth factor parameter by using the simulated annealing algorithm. When the simulated annealing algorithm was used, set the parameter of the simulated annealing algorithm. Finally, use the training set to train GRNN model and get the forecast result set \( \hat{Y}_{2} \).

Step 4. Build LSSVM model according to the input set and the output set. Then use the training set to train GRNN model and get the forecast result set \( \hat{Y}_{3} \).

Step 5. Build combined forecast model. Get the linear weighted combination of the above three models.

$$ \hat{Y} = \alpha_{1} \hat{Y}_{1} + \alpha_{2} \hat{Y}_{2} + \alpha_{3} \hat{Y}_{3} $$
(8)

Where, \( \alpha_{1} \), \( \alpha_{2} \) and \( \alpha_{3} \) are weighted coefficients.

Then, use the simulated annealing algorithm to optimize the weighted coefficients. When the simulated annealing algorithm was used, set the parameter of the simulated annealing algorithm.

Step 6. Testing the trained combined model. Apply the combined model got by step 5 to forecasting.

Step 7. Output the predicted results and calculate the accuracy.

4 Simulation

The main purpose of this section was put forward the simulation of the proposed combined model.

4.1 The Performance Indexes

In this simulation, three generally adopted error indexes, the mean square predict error (MSE), the mean absolute error (MAE) and the mean absolute percent error (MAPE), was used given as follows:

$$ MSE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {x_{i} - \hat{x}_{i} } \right)^{2} } $$
(9)
$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {x_{i} - \hat{x}_{i} } \right|} $$
(10)
$$ MAPE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{x_{i} - \hat{x}_{i} }}{{x_{i} }}} \right|} \times 100\% $$
(11)

Where, \( x_{i} \) is the real data of samples; \( \hat{x}_{i} \) is the predictive values of samples, and \( n \) is the number of samples.

4.2 The Process of Simulation

Original data used in this paper come from Queensland(QLD) state of Australia’s electricity market, which gathered once every half hour from 0:30 on October 16, 2014 to 0:00 on October 1, 2015, a total of 16800 data items. 16464 data items gathered from 0:30 on October 16, 2014 to 0:00 on September 24, 2015 as the training set, and the remaining 336 data as the test set. The training data of the model was divided into seven groups, and every group were established data combined model. Combined Model is that using the data (48 items) form the previous week Monday to forecast the following Monday’s data (48 items), Tuesday forecast Tuesday and soon.

Firstly, Seven ElmanNN models consisting of an inputting layer, an outputting layer, a hidden layer and a recurrent layer was adopted to forecast the whole week electrical load data. Set the number of input layer nodes and output layer nodes of each ElmanNN model were all 48. To achieve better prediction results, the node number of the hidden layer was set from 10 to 20. In this simulation, depending on the MAPE value, the number of hidden layer nodes with the minimum value of MAPE were selected. The optimal value of intermediate layer nodes of each ElmanNN model are shown in the Table 1. We can easily observe that the optimal value of middle layer nodes were 15 for the Sunday and Monday model, while that were 16 for Saturday and Tuesday model. For Wednesday, Thursday and Friday model that were 11, 17 and 12, respectively.

Table 1. The optimal value of intermediate layer nodes of each ElmanNN model.

Secondly, seven GRNN models consisting of an inputting layer, a pattern layer, a summation layer and an outputting layer was adopted to forecast the whole week electrical load data. Set the input layer nodes was 48 and the number of output layer nodes was 48. The simulated annealing algorithm with the default parameter and 30 annealing chain length was used to find the optimal smoothing parameter.

Thirdly, seven LSSVM models consisting of 48 inputs, 48 outputs and RBF kernel function was adopted to forecast the whole week electrical load data.

Finally, the above three models were combined by linear weighted combination. Combination forecasting model that obtain the combination forecast model in the form of the appropriate weighted average can improve the accuracy of the prediction and reliability. The most concern of combined forecast model is how to calculate the weighted average coefficient to make the combination forecast model more effectively and to improve the prediction precision. Therefore, the simulated annealing algorithm was adopted to optimize the weights. First of all, the initial value of three combination weighted parameters were set to 0.33, 0.33 and 0.33. Then the simulated annealing algorithm with the default parameter and 30 annealing chain length was adopted to optimize the combination weights. The optimized combination weights by the simulated annealing algorithm are given in Table 2, in which \( \alpha_{1} \), \( \alpha_{2} \) and \( \alpha_{3} \) are the combination weights of ElmanNN model, GRNN model and LSSVM model, respectively. It can be clearly seen that on Saturday and Sunday the second parameter \( \alpha_{2} \) are 0.9956 and 0.9958, respectively, that point out that the results of those two days’ data set largely depend on the GRNN model. While, the weights of Friday and Monday shows that the LSSVM model with the weights \( \alpha_{3} \) of 0.9956 and 0.9858 is extremely important for the combined model, and, the weights \( \alpha_{1} \) of Thursday shows that the ElmanNN model is a very significantly role. In addition, for the Tuesday and Wednesday, the combined model is affected by the joint ElmanNN model and GRNN model.

Table 2. The optimized combination weights by the simulated annealing algorithm

4.3 The Results and Analysis of Simulation

Figure 3 shows the actual values and forecasting values in the whole week of the four methods (Elman, GRNN, LSSVM and the combined model). As seen from Fig. 3, the curve of the combined model is more consistent with the actual data. Although at time of 12:00 to 18:00, the forecasting errors are bigger, it can still reveal that the combined model is better than the other three individual models.

Fig. 3.
figure 3

Final predicted values for each day by the four methods.

Table 3 presents the three indicators of the four forecasting methods in the form of numbers. It can be seen clearly that the average values of the combined model of the whole week are the lowest. The value of MAPE of the combined model is as low as 1.72 %. MAE, MSE and MAPE of the combined model has the lowest value of all the four methods. That means the proposed combined model has the best forecasting performance. When comparing with the other three individual methods, GRNN has the largest forecasting values on Monday and Thursday, LSSVM has the largest values on Tuesday. While, Elman owns the largest forecasting data on Wednesday and Friday. The results display visually that different forecasting methods have different forecasting values which are sometimes meet the demands or sometimes unsatisfactory. Table 3 proved the high forecasting performance of the proposed combined model in another visual angle.

Table 3. Three statistics measures of the four forecasting methods.

5 Conclusions

Interfered by various of factors, time series data has plenty of complex characteristics. Considering that an individual method cannot deal with all kinds of data, a novel combined model for STLF is presented in this paper. The proposed model combined with generalized regression neural network (GRNN), elman and least squares support vector machine (LSSVM). Optimizing by simulated annealing (SA), each individual method is assigned a weight coefficient. Multiplying all the three forecasting results by corresponding weights coefficients, the final forecasting results can be attained. In order to verify the performance of the combined model, electric load data from Queensland of Australia is utilized. The result shows that the average MAPE of the combined model is 1.72 % which is lower than the existing hybrid model named MFES proposed by Zhao et al. [30]. It had reduced MAPE of MFES by 29.04 %. The results of comparison demonstrate the excellent performance of the combined model. The reasons why the combined model has higher forecasting accuracy are listed below. Firstly, the proposed model combines two kinds of artificial neural networks which have strong forecasting performance. Between them, GRNN is optimized by SA, thus makes it more accurate. Secondly, instead of traditional average allocation method, SA is employed to determine weight coefficients of each individual model. Taking advantage of each model, the higher forecasting accuracy can be obtained. In a nutshell, the combined model outperforms other individual models. With higher accuracy, it is a promising tool in the future.