Rational and self-adaptive evolutionary extreme learning machine for electricity price forecast

Xiao, Chixin; Dong, Zhaoyang; Xu, Yan; Meng, Ke; Zhou, Xun; Zhang, Xin

doi:10.1007/s12293-016-0195-0

Rational and self-adaptive evolutionary extreme learning machine for electricity price forecast

Regular Research Paper
Published: 15 June 2016

Volume 8, pages 223–233, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Memetic Computing Aims and scope Submit manuscript

Rational and self-adaptive evolutionary extreme learning machine for electricity price forecast

Download PDF

Chixin Xiao ORCID: orcid.org/0000-0002-3911-7113^1,2,4,
Zhaoyang Dong³,
Yan Xu³,
Ke Meng³,
Xun Zhou⁴ &
…
Xin Zhang³

629 Accesses
27 Citations
2 Altmetric
1 Mention
Explore all metrics

Abstract

Electricity price forecast is of great importance to electricity market participants. Moreover, various prediction approaches based on extreme learning machine (ELM) have been identified as effective on normal decision space. Especially, evolutionary extreme learning machine (E-ELM) may obtain better solution quality. However, in high dimensional space, E-ELM is time-consuming because it is difficult to converge into optimal region when just relied on stochastic searching approaches. In addition, due to the complex functional relationship is often complicated, the objective function of E-ELM seems hard to be mined directly for obtaining useful mathematical information to guide the optimum exploring. This paper proposes a new differential evolution (DE)-like algorithm to enhance E-ELM for more accurate and reliable prediction of electricity price. The approximation model for producing DE-like trail vector is the key mechanism, which may use simpler mathematical mapping to replace the original yet complicated functional relationship within a small region. Thus, the evolutionary procedure frequently guided by rational searching directions may make the E-ELM more robust and faster than supported only by those stochastic methods. Several benchmarks are applied to test the performances of the proposed algorithm and the experimental results have shown that the new method can improve the performance of E-ELM.

Rational and Self-adaptive Evolutionary Extreme Learning Machine for Electricity Price Forecast

Pinball-Huber boosted extreme learning machine regression: a multiobjective approach to accurate power load forecasting

Article Open access 03 July 2024

Photovoltaic Power Forecasting by Evolutionary Algorithm-Based Improved Extreme Learning Machine

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In electricity industries, accurate price prediction [1] is very important for market participants to decrease risks along with their decision making. Historically, great research efforts have been devoted to developing accurate and reliable methods for electricity price prediction [2–5]. The state-of-the-art techniques include time-series methods such as Autoregressive Integrated Moving Average (ARIMA) [6, 7] and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) [8], and machine learning methods [9]. Among these techniques, machine learning models have shared the largest research attention mainly because of their strong nonlinear modeling capacity [10, 11]. In respect of utilizing machine learning approaches for electricity price prediction as well as other applications in power system area, three main categories of the learning algorithms related to this paper are listed as follows,

The first is the conventional machine learning tools, for instance, artificial neural networks (ANNs) [2, 3], support vector machine (SVM) [10], etc. The best merit of such kind of approaches is that they can extract the nonlinear relationships out of the input and output dataset. They therefore have been developed and adopted widely in power engineering domain during past decades. However, the approaches fallen in this category are mainly to use some dull gradient-descent information to guide the training for their forecasting models, and they are often deemed as lacking efficient strategies to escape premature convergence or many local minima.
The other is a novel approach, ELM [12–15], which is proposed in recent years. ELM and its variants have one common distinguished capacity, fast training speed. At the beginning, they initialize hidden nodes randomly and then obtain the output weights via Moore-Penrose pseudo inverse [12], i.e. the output weights vector is treated as one solution of a group of linear-like equations. For many ordinary regression or classification problems being in low dimensionality, this kind of method is obviously enough to obtain better results than those traditional ones in both the training speed and training accuracy.
Using evolutionary algorithms, E-ELM aims to evolve out an optimum result based on a population, usually in a constant size, consisting of ELM candidates for training. Significantly, ELM training is mainly depended on an unique step to find out the matrix solution instead of training step by step, say, the output weight vector is obtained on the basis of hidden layer output matrix and parameters assigned randomly. This is also the key factor why ELM is usually superior to those conventional approaches on training speed while still keeping good learning performance. However, calculating solution based on partly random assignments may lead to the final output unstable, sometimes with a better training effect, sometimes with an even worse one, e.g., as for high dimensional problems as in the works [1]. Besides that, ELM is sometimes difficult to find out a satisfied regression or classification results by once calculation, even though it can calculate output weights fast [16]. Overall, the difference of the solution quality provides the developing space for E-ELM.

Although the E-ELM can promote the final learning quality, it increases the time complexity as well. There are mainly due to the reasons in two aspects: First, until the final optimum to be found, the evolutionary algorithm has to calculate the ELM population rather than a single ELM individual iteratively by maximum generations. In addition, evolutionary algorithm itself also increases its time cost as the dimensionality of decision space expanding. According to [17], one random matrix of hidden weights is corresponding to a vector of output weights, besides that, better hidden weights and output weights can lead to a lower root mean square error, that is, a better regression or classification result for ELM. The hidden weights can be combined into a single row solution for the objective function of ELM, the optimum can be obtained after evaluating the individual solution and comparing with other solutions among whole population. For example, let the data be 100 dimensions and the number of the hidden layers is 10, the dimensionality of the solution individual will then reach to 1000. Usually, the dimensionality of the power market data or the data for electricity load forecasting is over 100. Self-adaptive evolutionary extreme learning machine (SaE-ELM) [18] is a representative method of E-ELM, which can obtain output weights for a single hidden layer feed-forward network (SLFNs) with some promising features. However, in respect of training high dimensional data, SaE-ELM is also time-consuming in evolutionary iterations.

In order to promote the speed of E-ELM and robustness in optimum exploring in high dimensional space, this research is motivated to consider whether some rational analysis could be used to guide the optimum searching, especially in such environment as in high dimensional decision space, with a complicated objective function. As mentioned above, in conventional neural network, the gradient information provides some rapid exploring guides though often leading to local optima. So the gradient information perhaps can be properly used in E-ELM to provide some rational directions and then to accelerate the whole optimization procedure. Unfortunately, the complicated objective function of E-ELM is too difficult to mine the gradient information directly between decision variable vector and its fitness function. Moreover, the basic framework of ELM seems also to have pushed the gradient ideas out of date. Even so, the rest of this paper proposes a new approximation model, which not only can provide an approximate mapping dynamically to replace the old functional relationship of E-ELM within a comparative small region, but is compatible with differential evolution (DE) frame. Therefore, based on the new model, a hybrid DE-like algorithm is developed to ensure the new E-ELM not only can enter global optimal region faster than pure stochastic searching, but also can obtain high qualitative solutions, more reliably than those dull gradient methods. Overall, faster convergence and better quality of solution of E-ELM are two mandatory objectives should be considered in this paper.

The rest of paper is organized as follows. Section 2 outlines some relative backgrounds. The new approximation model is shown in Sect. 3. In Sect. 4, a new evolutionary algorithm for E-ELM learning high-dimensional data is proposed. The experimental results and discussions are placed in Sect. 5. Finally, a conclusion and future works are provided in Sect. 6.

2 Brief reviews

This section gives brief reviews of aboriginal extreme learning machine (ELM) as well as some necessary methods used in the rest of this paper for completeness.

2.1 Extreme learning machine (ELM)

ELM has been shown many good performances on the generalized single-hidden layer feed-forward networks (SLFNs) since it was proposed. It has some differences from the traditional neural networks on the hidden layer, e.g., random generation of its hidden nodes is the main feature of ELM. The basic working mechanism of ELM is briefly generalized as follows,

Given N training samples $\left\{ {\left( {x_i ,t_i } \right) } \right\} _{i=1}^N $ which can be also described in matrix style $\left\{ {\left( {P,T_{tar} } \right) } \right\} $, where P is a $D\times N$ real matrix of input data and $T_{tar} $ represents $N\times 1$ target vector. H is a $L\times D$ real matrix consisting of the hidden layer parameters generated randomly. $\beta $ is a $L\times 1$ real vector of output weights. Their mathematical relationship can be expressed as Eq. (1)

$$\begin{aligned} f\left( {H \cdot P+Bias} \right) ^{\mathrm{T}} \cdot \beta =T_{tar} \end{aligned}$$

(1)

where Bias is a $L\times N$ real matrix and function $f\left( \cdot \right) $ is a kind of activation functions [11], for instance, a log-sigmoid function,

$$\begin{aligned} \sigma \left( t \right) =\frac{1}{1+e^{-c \cdot t}} \end{aligned}$$

(2)

where c is a slop parameter.

Usually, Eq. (1) can be presented in brief as Eq. (3)

$$\begin{aligned} \hbox {H} \cdot \beta =T_{tar} \end{aligned}$$

(3)

where $\hbox {H}=f\left( {H \cdot P+Bias} \right) ^{\mathrm{T}}$ is a $N\times L$ matrix. ELM uses Moore-Penrose pseudoinverse $\hat{\mathrm{H}}^{\dagger }$ and target vector $T_{tar} $ to obtain a least-square solution of such linear system as Eq. (3). That is, a least-square solution of output weight vector $\beta $ can be analytically determined as Eq. (4)

$$\begin{aligned} \hat{\beta }=\hat{\mathrm{H}}^{\dagger } \cdot T_{tar} \end{aligned}$$

(4)

where

$$\begin{aligned} \hat{\mathrm{H}}^{\dagger }=\left\{ {{\begin{array}{l@{\quad }l} \hbox {H}^{\mathrm{T}}\cdot \left( {\frac{I}{C}+\hbox {H}\cdot \hbox {H}^{\mathrm{T}}} \right) ^{-1},&{} N<L \\ \left( {\frac{I}{C}+\hbox {H}^{\mathrm{T}}\cdot \hbox {H}} \right) ^{-1}\cdot \hbox {H}^{\mathrm{T}},&{} L<N \\ \end{array} }} \right. , \end{aligned}$$

(5)

and c is a trade-off constant, which can be referred to [12–15] for more details.

Instead of following traditional gradient descend approach, ELM minimizes training accuracy or the cost function Eq. (6) via the result gotten by the Eq. (4).

$$\begin{aligned} RSME=\sqrt{mse\left( {\hbox {H} \cdot \hat{\beta } -T_{tar} } \right) } \end{aligned}$$

(6)

where $mse\left( \cdot \right) $ is the function to measure network performance as the mean of absolute errors.

2.2 Basic differential evolution framework

Differential evolution (DE) [19] has been applied to many practical engineering problems since it was proposed in 1995. Furthermore, the variants of DEs with enhanced search performance have been introduced in [21–23]. Especially for multimodal optimization, researchers tend to combine DE with other evolutionary methods, sometimes called hybrid DEs, so as to promote whole performance of the algorithm.

In classical DE framework, the remarkably simple trial vector generation scheme is a main character distinguished from other EAs. It processes a scaled difference of vectors originating from a fixed-size population of decision vectors. Usually, such three evolutionary operators as mutation, crossover and selection are included respectively. During the g-th generation and in the basic DE mutation, a trial vector $u_{i,g} $ is produced by a crossover operation between old individual vector $x_{i,g} $ and a mutated vector $v_{i,g} =x_{r0,g} +F_i \cdot \left( {x_{r1,g} -x_{r2,g} } \right) $, where $F_i(F_i >0)$ is a scaling factor, $x_{r0,g} ,x_{r1,g,} x_{r2,g} $ are three independent decision vectors selected randomly from the whole population $P=\left\{ {x_{1,g} ,x_{2,g} ,\ldots ,x_{NP,g} } \right\} $. in decision space. For each vector $x_{i,g} \in P$ in turn, there is a corresponding trial vector $u_{i,g} $ being generated. Each old vector $x_{i,g} $ in P will not be replaced unless its trial vector $u_{i,g} $ yields a better objective function value than itself. Consequently $x_{i,g} $ is also called a target vector in literature. One can refer to [24] for more different crossover operators and more variants of DE in detail.

2.3 SaE-ELM

Self-adaptive evolutionary extreme learning machine (SaE-ELM) [18] is upgraded from DE-LM [25] and E-ELM [17]. It chooses trial vector generation strategies and some relative control parameters adaptively. Their common place is to explore the network input weights and hidden node biases of ELM aiming to get optimum of the network output weights. When training data set $X_{D\times N} $, L hidden layers and an activation function $f\left( \cdot \right) $ are given, the individuals to be evolved during the g-th generation can be coded into as following vector [18],

$\theta _{k,g} =\big ( h_{11}^g ,\ldots ,h_{1D}^g ,h_{21}^g ,\ldots ,h_{2D}^g ,\ldots ,h_{L1}^g ,\ldots ,h_{LD}^g ,b_1^g ,\ldots ,b_L^g \big ),$ where $1\le k\le NP, NP$ is the population size, $b_i^g ,1\le i\le L$, represents the bias value for the i-th hidden layer in g generations.

Based on the coding format, the parameters like H, Bias are obtained as follows,

$$\begin{aligned} H=\left[ {{\begin{array}{c} {h_{11}^g ,\ldots ,h_{1D}^g } \\ {h_{21}^g ,\ldots ,h_{2D}^g } \\ \vdots \\ {h_{L1}^g ,\ldots ,h_{LD}^g } \\ \end{array} }} \right] ,P=X_{D\times N} ,Bias=\left[ {{\begin{array}{c} {b_1^g } \\ {b_2^g } \\ \vdots \\ {b_L^g } \\ \end{array} }} \right] \times J_{1\times N}\nonumber \\ \end{aligned}$$

(7)

where $J_{1\times N} $ is a one row and N columns matrix of ones. Then the corresponding fitness function is formulated as Eq. (8),

$$\begin{aligned} RSME=\sqrt{mse\left( {f\left( {H \cdot P_{test} +Bias} \right) ^{\mathrm{T}} \cdot \hat{\beta } -T_{test} } \right) } \end{aligned}$$

(8)

where $P_{test} $ and $T_{test} $ are testing data set and testing target vector respectively.

The main aim of such kind of algorithms is to explore an optimum of H from population consisted of $\theta _{k,g} ( 1\le k\le {\textit{NP}})$ during $g_{max} $ generations. The individuals which can survive from g generations to the next must satisfy Eq. (9).

$$\begin{aligned} \theta _{k,g+1} =\left\{ {{\begin{array}{l@{\quad }l} {u_{k,g} ,}&{} {RMSE_{\theta _{k,g} } -RMSE_{\theta _{k,g+1} } >\varepsilon \cdot RMSE_{\theta _{k,g} } } \\ {u_{k,g} ,}&{} {|RMSE_{\theta _{k,g} } -RMSE_{\theta _{k,g+1} } |<\varepsilon \cdot RMSE_{\theta _{k,g} } } \\ &{}{and\; \Vert \beta _{u_{k,g+1}}\Vert < \Vert \beta _{u_{k,g} } }\Vert \\ {\theta _{k,g} ,}&{} {Otherwise} \\ \end{array} }} \right. \end{aligned}$$

(9)

3 Approximation model

Sometimes traditional optimization approaches are demonstrated very useful for evolutionary exploring as long as they can be manipulated properly. However, in many situations, traditional optimization methods are difficult to be used directly due to the complex functional relationship. These stimulate a motivation to employee simpler approximation models to replace the original fitness function. For example, as shown in Fig. 1, suppose the functional relationship between variable x and its objective function $f\left( x \right) $ is complicated, yet in another space variable vector $\left( {t_1 ,t_2 } \right) $ can map into the same image $f\left( x \right) $ by an easier mapping rule $g\left( {t_1 ,t_2 } \right) $, hence, if a mapping relationship is found between the space in where x is located and the other space where variable vector $\left( {t_1 ,t_2 } \right) $ is included, then the original problem can be simplified, because the complex functional relationship has been replaced by a simpler mapping model. In the following section, a simple first-order approximation model is proposed in order to imitate the compound mapping between variable data and their objective function. Perhaps, such kind of functional relationship based on the approximation model is not very accurate to replace the original one over whole hyper-plane, but within a limited region it can satisfy those practical demands [26, 27].

3.1 First-order approximation model

Without loss of generality, a decision space can be formulated as a hyper-plane by one point attached with two vectors. Let $x\in { \mathbb {R}}^{n}$ is an arbitrary point in decision space ${\mathbb {R} }^{n}$ or the point can be denoted as a decision vector $(x_1 ,x_2 ,\ldots ,x_n )^{\mathrm{T}}, L$ is the hyper-plane, suppose $x^{0}\ne x^{1}\ne x^{2}$ are three distinct points selected randomly among ${\mathbb {R}}^{n}$, then any arbitrary point, $x\in L$, can be formulated as such style as Eq. (10)

$$\begin{aligned} x=x^{0}+ t_1 \cdot \left( {x^{1}-x^{0}} \right) + t_2 \cdot \left( {x^{2}-x^{0}} \right) \end{aligned}$$

(10)

where $t_1 ,t_2 $ are two independent real variables.

According to Eq. (10), any $x\in L$ is linear corresponding to the variable vector $\left( {t_1 ,t_2} \right) $ because the rest parameters are constants, i.e., $x\Leftrightarrow \left( {t_1 ,t_2 } \right) $, if and only if three arbitrary yet independent points, $x^{0}\ne x^{1}\ne x^{2}$, have been fixed. In other words, if $x^{0}\ne x^{1}\ne x^{2}$ are located, any $x\in L$ can be evaluated based on variable vector $\left( {t_1 ,t_2 } \right) $ and Eq. (10). Therefore, when decision vector x approaching its optimum, $x^{*},$ there must exist a corresponding variable vector $\left( {t_1^{*} ,t_2^{*} } \right) \Leftrightarrow x^{*}$, i.e.,

$$\begin{aligned} x^{{*}}=x^{0}+t_1^{*} \cdot \left( {x^{1}-x^{0}} \right) +t_2^{*} \cdot \left( {x^{2}-x^{0}} \right) \end{aligned}$$

(11)

Likewise, for any pair of fitness function $f\left( x \right) $ and its variable x, there exists another pair of image $g\left( \cdot \right) $ and its variable vector $\left( {t_1 ,t_2 } \right) $ which has a common place, i.e., $g\left( {t_1 ,t_2 } \right) =f\left( x \right) $. However, the difference is the functional relationship of $g\left( \cdot \right) $ is simpler than the one of $f\left( \cdot \right) $. The conversion relationship between $g\left( \cdot \right) $ and $f\left( \cdot \right) $ is defined as Eq. (12)

$$\begin{aligned} g\left( {t_1 ,t_2 } \right) =f\left( x \right) =g^{0}+t_1 \cdot \left( {g^{1}-g^{0}} \right) +t_2 \cdot \left( {g^{2}-g^{0}} \right) \end{aligned}$$

(12)

where $g^{0},g^{1},g^{2}$, can be treated as constants, when $x^{0}\ne x^{1}\ne x^{2}$ have been fixed as mentioned above. In order to obtain the constants, $g^{0},g^{1},g^{2}$,in a simple way, some special points are considered here. Assume $\left( {t_1 ,t_2 } \right) $ is substituted by vectors, (0,0), (1,0), (0,1) respectively, then $g^{0}=f\left( {x_0 } \right) , g^{1}=f\left( {x_1 } \right) , g^{2}=f\left( {x_2 } \right) $ can be easily extracted out via Eqs. (12) and (10). Equation (12) hereby provides an approximation equation as well to replace the original fitness function, since $g\left( {t_1 ,t_2 } \right) =f\left( x \right) $. So, no one would like to care about how the complicated functional relationship between the decision variable $x\in {\mathbb {R}}^{n}$ and its original image $f\left( x \right) $ is, as long as the new mapping between $ g\left( \cdot \right) $ and $\left( {t_1 ,t_2 } \right) $ is simpler and enough reliable.

3.2 Direction to optimum

As pointed out before, Eq. (12) also provides a linear functional relationship between variable vector $\left( {t_1 ,t_2 } \right) $ and its image $g\left( {t_1 ,t_2 } \right) $. Through conventional optimization theories, $g\left( {t_1 ,t_2 } \right) $ at point $\left( {t_1 ,t_2 } \right) $ has a vector of first partial derivatives, or gradient vector $\nabla g\big ( {t_1 ,t_2 } \big )=\big ( \big ( {g^{1}-g^{0}} \big ),\big ( {g^{2}-g^{0}} \big ) \big )$. Hence, the local minimum optimum of $\left( {t_1^{*} ,t_2^{*} } \right) $ is most probably being placed in the opposite direction of $\nabla g\left( {t_1 ,t_2 } \right) $.

$$\begin{aligned} \left( {t_1^{*} ,t_2^{*} } \right) =\left( {0,0} \right) -\alpha \cdot \nabla g\left( {t_1 ,t_2 } \right) =-\alpha \cdot \nabla g\left( {t_1 ,t_2 } \right) \end{aligned}$$

(13)

where $\alpha $ is a step parameter. Overall, any three distinct decision variables, say, $x^{0}\ne x^{1}\ne x^{2}$, can deduce out the local optimum $x^{*}$ via Eqs. (11–13), which can be expressed as Eq. (14),

$$\begin{aligned} x^{*}= & {} x^{0}-\alpha \cdot \Big [ \left( {g^{1}-g^{0}} \right) \cdot \left( {x^{1}-x^{0}} \right) \nonumber \\&+\left( {g^{2}-g^{0}} \right) \cdot \left( {x^{2}-x^{0}} \right) \Big ] \end{aligned}$$

(14)

4 Proposed algorithm

In general, no method is flawless, neither is the new rational approximation model. Aiming to obtain a tradeoff between global exploration and local exploitation, another DE mutation strategies, ‘DE/current-to-best/1’ [24], is enrolled as well to construct a hybrid Rational and Self-adaptive mutation (RSM) strategy, whose pseudo-code is shown in Fig. 2.

4.1 Historical pool

From experimental results, fitness values evaluated by the new approximation model are very sensitive to the shape formed by three input individuals $x_{r0,g} , x_{r1,g} $ and $x_{r2,g} $. The basic idea is to avoid excessive similarity between the candidates. Learn from JADE [21], RSM mutation applies a historical pool to temporarily reserve a part of individuals sifted out from the population. Each time, one of three distinct individual is picked out from the union of current population and the historical pool, hereby denoted by ${\hat{x}}_{r2,g} $, while the others $x_{r0,g} , x_{r1,g} $ are still selected from the current population. The size of the historical pool is set to a quarter of the population and the initial state is empty. After being full, the pool permits the individual perished from current population to replace the worst one if the perished one is better.

4.2 Self-adaptive parameters

Motivated by [20–22], in the new algorithm many control parameters are extended into solution individuals for controlling self-adaptively (see Table 1). The parameters are evolved simultaneously whilst the classical population of solutions is being processed in evolution procedure.

Table 1 Encoding format of self-adapting individuals

Full size table

In general, the better control parameter value is corresponding to the optimal trial vector. Therefore the proposed algorithm utilizes the statistical results of recent successful parameters to guide the production of parameters for next generation.

Main parameters for self-adaptive control, such as $Step_{i,g} \in \left[ {0,2} \right] , CR_i^1 ,CR_i^2 \in \left[ {0,0.9} \right] $ and $F_{i,g} \in \left[ {0.1,1.0} \right] $, are initialized within their definition domain. The successful parameters survive to the next generation, while the unsuccessful ones are replaced by a normal distribution of the mean $P_{m,g} $ and standard deviation sigma as shown in Eq. (15).

$$\begin{aligned} P_{i,g} =P_{m,g} +sigma \cdot randn_{i,g} \end{aligned}$$

(15)

where $P_{i,g} $ represents the variable of parameters for the i-th individual in g generation. The sigma of each parameter equals to ${\mathrm{min}}\left( {\left| {P_{m,g} -P_{Ub,g,} \left| , \right| P_{m,g} -P_{Lb,g} } \right| } \right) $. The mean values are initialized as follows, $Step_{m,1} =1.1, F_{m,1} =0.6,CR_i^k =0.6,\left( {k=1,2} \right) $. Parameter $Step_{i,g} $ controls the incremental degree of the mutation shown in Fig. 2.

$v_{i,g}^1 =x_{r0,g} +s \cdot \left[ {t_1 \cdot \left( {x_{r1,g} -x_{r0,g} } \right) {+}t_2 {\cdot } \left( {\hat{x}_{r2,g} -x_{r0,g} } \right) } \right] ,$ where $s=Step_{i,g} /\sqrt{t_1^2 +t_2^2 }$. At the beginning of whole evolving procedure, $Step_i \ge 1$ helps population converge to optimum fast, while $Step_i <1$ is good at effective exploitation, especially for solutions approaching to the optimum.

4.3 Hybrid strategy for trial vector

In the procedure of selection, as shown in Fig. 3, if two new trail vectors satisfies

Case 1: $ f\left( {x_{i,g} } \right)<f\left( {u_{i,g}^1 } \right) <f\left( {u_{i,g}^2 } \right) $

Both two trail vectors are successful trail ones, i.e., success(i,1) $=$ 1, success (i,2) $=$ 1, all their parameters can be kept to the next generation.

Case 2: $\hbox {f}\left( {\hbox {u}_{\mathrm{i},\mathrm{g}}^1 } \right)<f\left( {\hbox {x}_{\mathrm{i},\mathrm{g}} } \right) <f\left( {\hbox {u}_{\mathrm{i},\mathrm{g}}^2 } \right) $

$u_{i,g}^1 $ is named as a successful trail vector and success (i,1) is set to 1.

Case 3: $\hbox {f}\left( {\hbox {u}_{\mathrm{i},\mathrm{g}}^1 } \right)<f\left( {{\hbox {u}}_{\mathrm{i},\mathrm{g}}^2 } \right) <f\left( {\hbox {x}_{\mathrm{i},\mathrm{g}} } \right) $

This case means all the parameters need to be adjusted.

At end of each generation, the mean of each parameter is adjusted by Eq. (16)

$$\begin{aligned} P_{i,g+1} =0.85 \cdot P_{m,g} +0.15 \cdot mean\left( {P_{success,g} } \right) \end{aligned}$$

(16)

where mean (.) is a function of arithmetic mean.

4.4 RSM-DE algorithm

The main body of RSM-DE algorithm:

Input: :

NP: the size of the population;

Maxgen: the number of the maximum iteration;

Fitness function;

D: The dimension of decision space.

Output: :

Optima of the fitness function.

Step 1 Initialization

Create a random initial population $\{x_{i,0} |i=1,\ldots ,NP\}$. Initialize parameters within their definition regions.

For ${{\varvec{g}}}={{\varvec{1}}},\ldots ,{{\varvec{Maxgen}}}$ , do

Step 2 Evolution Items

For ${{\varvec{i}}}=\mathbf{1},\ldots ,{{\varvec{NP}}}$ do

Step 2.1 New Parameters Generating: Unsuccessful parameters are refreshed based on Eq. (15).

Step 2.2 Mating: One of the P best individuals and other three independent individuals, $x_{best,g}^P ,x_{r0,g} \ne x_{r1,g} \ne \hat{x} _{r2,g} $, are picked out. $\hat{x} _{r2,g} $ is from the union of current population plus historical pool and $x_{best,g}^P $ is one out of from current population, $P=5$ in this paper.

Step 2.3 Call Function RSM-Trial(): To produce two trail vectors by two strategies respectively.

Step 2.4 Call Function Selection(): To select successful trail vectors and parameters into the next generation.

Step 2.5 Renew Historical Pool: If the historical pool is not full then the eliminated individuals are pushed into the pool, otherwise the worst one in the pool is replaced when the eliminated on is better.

Step 2.6 Summarize the Statistical Result of Successful Trail Vectors: To evaluate the arithmetical mean value of each parameter by Eq. (16).

Step 3 Stopping criteria When stopping criterion is satisfied, the algorithm stops here and outputs corresponding results. Otherwise, goes to Step 2.

Table 2 Six single objective benchmark

Full size table

4.5 RSM-DE-ELM

Given a set of training data, a set of testing data, a candidate range for L hidden layers and an objective function g($\cdot )$, RSM-DE-ELM algorithm is summarized as following Fig. 4. $RSM\_DE\_ELM\left( \cdot \right) $ represents a procedure to optimize ELM based on the RSM-DE algorithm. It returns the optimum of the net. $\left[ {L_1 ,L_2 } \right] $ is the candidate range and Train, Test denote training set, testing set respectively.

5 Experimental results

In this section, being compared with other representative differential evolution approaches, the DE-like part of the proposed algorithm is first tested by several single objective benchmark, which are picked out from different test categories, in order to examine the capacity for exploring optimum. Furthermore, regarding performance of the new E-ELM method in machine learning area, the integrate algorithm is also utilized to the application for electricity price prediction. The experimental results are also compared against other state-of-the-art algorithms.

Table 3 Summary of QLD RRP from June 2006 to May 2007 (AUD/MWH)

Full size table

5.1 Performance on single objective benchmark

At the beginning of this section, six representative single objective benchmark listed in Table 2 are applied to testify the performance of RSM-DE algorithm. The indices of the benchmark are kept as same as [21] for convenient comparing. Function $f_2 ,f_4 $ are continuous unimodal functions, $f_7 $ is noisy quartic function, $f_8 ,f_9 ,f_{12} $ are multimodal and the number of their local minima increase exponentially with the dimensionality of the problem. The main control parameters are set as: the population size is 100 when the decision space is 30 dimensions. JADE is one self-adaptive algorithm of state-of-the-art. In [21], JADE has shown better performances on many benchmark than SaDE [20], jDE [22] as well as PSO [28]. So this paper chooses JADE as one main reference to measure the new algorithm. Experimental results including RSM-DE, JADE [21], DE/rand/1 [19] are summarized in Table 3, simultaneously, Fig. 5 shows a group of convergence semi-log graphs for 30-dimensional problems with median values after 50 independent runs. Form these converging curves, it is easy to find the RSM-DE algorithm owning a very good performance in 30-dimensional space. Especially, the new algorithm is very successful on benchmark $f_4 $, which shows clearly the approximation model works well both with the better rate of convergence and themore robust reliability. Even facing many local minima existed in the benchmark $f_8 ,f_9 ,f_{12} $, the RSM-DE still can accelerate the optimization procedure with promising results although its mutation operator is equipped with a greedy strategy. The results display if the diversity of population is being kept properly during the whole population evolving, the rational method will play a positive role rather than become a crucial problem.

Table 4 Six 30 dimensional benchmark with 50 independent runs

Full size table

5.2 RSM-DE-ELM for market price prediction

In this section, several sequential data series extracted from the Australian Energy Market Operator (AEMO) website [29] are used to test the performance of our new method.

For a convenient comparison, the first dataset in our case study is a whole year’s RRPs from QLD market just as [1]. It includes total of 17520 observations and the period crosses over 01 June, 2006 to 31 May, 2007. Without loss generality, the dataset can be divided into four seasons in Australia and the main features are summarized as Table 4.

The dataset format is constructed as follows,

1.
All observation points in every two weeks are defined as input attributes and their targets are those observation data from the following day, that is to say the data in the last day of each fifteen days is the target set and each day includes 48 even observation points.
2.
In each season, the data archive from the last seven successive days is used as the testing dataset. In order to compare efficiently, four other representative methods, BPNN, RBFNN, ELM and SaE-ELM are collected here to compare with the proposed approach via three different criteria, i.e., MAE, MAPE, and RMSE, respectively, which are listed in Eq. (17).
$$\begin{aligned} \left\{ {{\begin{array}{l} {MAE=\frac{1}{n}\mathop {\sum }\nolimits _{i=1}^n \left| {y_i -t_i } \right| } \\ {MAPE=\frac{1}{n}\mathop {\sum }\nolimits _{i=1}^n \frac{\left| {y_i -t_i } \right| }{y_i }\times 100\,\% } \\ {RMSE=\sqrt{\frac{1}{n}\mathop {\sum }\nolimits _{i=1}^n \left( {y_i -t_i } \right) ^{2}}} \\ \end{array} }} \right. \end{aligned}$$
(17)

Parameter setting for RSM-DE-ELM:

The population size $\textit{NP}=100$ and maximum generation is 60. Since the iteration number of RSM-DE is not high, so the mean values of the parameters are initialized as, $Step_{m,1} =1.1,F_{m,1} =0.7,CR_{m,1}^k =0.75,\left( {k=1,2} \right) $. The candidate range $[L_1 ,L_2 ]$ of hidden layers L is set to [10,150].

Results analysis:

Table 5 shows the comparisons between the new approach and four existing methods [1, 18]. All these results are the mean values collected by multiple trails which include 50 independent forecasts of each season model. From Table 5 the new algorithm wins most of the lowest testing criteria in four season dataset among all these five approaches. For testing in Spring and Autumn, the performances have been improved dramatically. For example, the testing MAE of Spring using RSM-DE-ELM is 1.9443, while the other four methods, the testing MAEs of this season dataset are all greater than 2.2000.

Table 5 Comparison of five methods on RRP Forecast

Full size table

In Fig. 6., the prediction results is given out, which is run by RSM-DE-ELM on first half of 7*48 observation points belonging to the testing dataset. The error curve shows the new algorithm can forecast with low and stable error rate in most points.

In terms of the training time, due to our approach falls into E- ELM category, the training procedure practically consists of several sub-trainings of basic ELM, thus it takes longer time in training than one single basic ELM. However, the proposed approach is definitely faster than SaDE-ELM [18] because only two basic DE strategies are included rather than four in SaDE-ELM. Secondly, rational DE model provides our method fast convergence in addition to promising experimental results, e.g., RSM-DE-ELM can get better results within 60 generations whilst SaDE-ELM need 100 more generations to reach the same magnitude. What’s more, our approach is no longer running on the way mentioned in the previous literature [17, 18] in which the number of hidden layers is often gradually increased and the one with the best generalization performance is adopted in final. In our proposed approach, the binary search frame helps the algorithm not only find the optimum at last, but also keep in less time complexity.

6 Conclusion

In this paper, a self-adaptive DE-like frame embedded with a rational approximation operator is proposed intending to optimize E-ELM with faster speed and better solution. Based on the benchmark experimental results, it can be seen that the reliability of rational means become a less crucial problem in optimization, i.e., they are no more the patent of local optimum or premature convergence, on the contrary, their fast convergence becomes more attractive as long as a well-design scheme is provided. Furthermore, supported by the new rational approach, E-ELM has obtained better results than many state-of-art ones in the practical application of electricity price predication. Overall, the experimental results have illustrated that mathematical auxiliary guiding during evolving optima may create better performances for E-ELM than stochastic strategies did.

References

Chen X, Dong ZY, Meng K, Xu Y, Wong KP, Ngan HW (2012) Electricity price forecasting with extreme learning machine and bootstrapping. IEEE Trans Power Syst 27(4):2055–2062
Article Google Scholar
Xu Y, Dong ZY, Xu Z, Meng K, Wong KP (2012) An intelligent dynamic security assessment framework for power systems with wind power. IEEE Trans Ind Inf 8(4):995–1003
Article Google Scholar
Meng K, Dong ZY, Wong KP (2009) Self-adaptive RBF neural network for short-term electricity price forecasting. IET Gen Trans Dist 3(4):325–335
Article Google Scholar
Wan C, Xu Z, Pinson P, Dong ZY, Wong KP (2014) Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Trans Power Syst 29(3):1033–1044
Article Google Scholar
Amjady N, Keynia F (2009) Day-ahead price forecasting of electricity markets by mutual information techniques and cascaded neuroevolutionary algorithm. IEEE Trans Power Syst 24(1):306–318
Article Google Scholar
Contreras J, Espinola R, Nogales FJ, Conejo AJ (2003) ARIMA models to predict next-day electricity prices. IEEE Trans Power Syst 18(3):1014–1020
Article Google Scholar
Conejo AJ, Plazas MA, Espinola R, Molina AB (2005) Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans Power Syst 20(2):1035–1042
Article Google Scholar
Garcia RC, Contreras J, Akkeren MV, Garcia JBC (2005) A GARCH forecasting model to predict day-ahead electricity prices. IEEE Trans Power Syst 20(2):867–874
Article Google Scholar
Li G, Liu CC, Mattson C, Lawarree J (2007) Day-ahead electricity price forecasting in a grid environment. IEEE Trans Power Syst 22(1):266–274
Article Google Scholar
Bishop CM et al (2006) Pattern recognition and machine learning, vol 1. Springer, New York
MATH Google Scholar
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Article Google Scholar
Li M-B, Huang G-B, Saratchandran P, Sundararajan N (2005) Fully complex extreme learning machine. Neurocomputing 68:306–314
Article Google Scholar
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Article Google Scholar
Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16–18):3056–3062
Article Google Scholar
Huang GB, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18):3460–3468
Article Google Scholar
Cao J, Lin Z, Huang G-B, Liu N (2012) Voting based extreme learning machine. Inf Sci 185(1):66–77
Article MathSciNet Google Scholar
Zhu Q-Y, Qin AK, Suganthan PN, Huang G-B (2005) Evolutionary extreme learning machine. Pattern Recognit 38(10):1759–1763
Article MATH Google Scholar
Cao J, Lin Z, Huang G-B (2012) Self-adaptive evolutionary extreme learning machine. Neural Process Lett 36:285–305
Article Google Scholar
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Article MathSciNet MATH Google Scholar
Abbass HA (2002) The self-adaptive pareto differential evolution algorithm. In: Evolutionary computation, 2002. CEC’02. Proceedings of the 2002 Congress on, vol. 1, pp 831–836
Zhang J, Sanderson AC (2009) JADE: adaptive differential evolution with optional external archive. Evol Comput IEEE Trans 13(5):945–958
Article Google Scholar
Brest J, Greiner S, Boskovic B, Mernik M, Zumer V (2006) Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. Evol Comput IEEE Trans 10(6):646–657
Article Google Scholar
Brest, Greiner S, Boskovic B, Mernik M, Zumer V (2006) Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. IEEE Trans Evol Comput 10(6):646–657
Article Google Scholar
Das S, Suganthan PN (2011) Differential evolution: a survey of the state-of-the-art. Evol Comput IEEE Trans 15(1):4–31
Article Google Scholar
Subudhi B, Jena D (2008) Differential evolution and Levenberg Marquardt trained neural network scheme for nonlinear system identification. Neural Process Lett 27(3):285–296
Article Google Scholar
Montgomery DC (2006) Design and analysis of experiments. Wiley, Hoboken, NJ
Xiao C, Xue Z, and Yin J (2014) Rational models to improve performance of differential evolution for MOEA/D. In: Natural computation (ICNC), 2014 10th International Conference on, pp 335–342
Trelea IC (2003) The particle swarm optimization algorithm: convergence analysis and parameter selection. Inf Process Lett 85(6):317–325
Article MathSciNet MATH Google Scholar
Australian Energy Market Operator (AEMO), www.aemo.com.au

Download references

Acknowledgments

This work is supported in part by the Australian Research Council (ARC) through a Linkage Project (Grant No. 120100302), in part by the University of Newcastle through a Faculty Strategic Pilot Grant, in part by the Research Foundation of Education Bureau of Hunan Province, China (Grant No. 14A136) and in part by General Financial Grant 2015M572796 from the China Postdoctoral Science Foundation. The authors would like to thank Prof. Qingfu Zhang(UK & Hongkong) for his fruitful discussions and patient tutoring when I was in UK. Thank Dr. Jingqiao Zhang for providing the source code of JADE. Meanwhile, the authors wish to express gratitude to anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

The College of Information Engineering, Xiangtan University, Xiangtan, 411105, China
Chixin Xiao
The School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, NSW, 2522, Australia
Chixin Xiao
The School of Electrical and Information Engineering, University of Sydney, Sydney, NSW, 2006, Australia
Zhaoyang Dong, Yan Xu, Ke Meng & Xin Zhang
The Centre for Intelligent Electricity Networks, University of Newcastle, Newcastle, NSW, 2308, Australia
Chixin Xiao & Xun Zhou

Authors

Chixin Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ke Meng
View author publications
You can also search for this author in PubMed Google Scholar
Xun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaoyang Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, C., Dong, Z., Xu, Y. et al. Rational and self-adaptive evolutionary extreme learning machine for electricity price forecast. Memetic Comp. 8, 223–233 (2016). https://doi.org/10.1007/s12293-016-0195-0

Download citation

Received: 12 September 2015
Accepted: 03 June 2016
Published: 15 June 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s12293-016-0195-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rational and self-adaptive evolutionary extreme learning machine for electricity price forecast

Abstract

Similar content being viewed by others

Rational and Self-adaptive Evolutionary Extreme Learning Machine for Electricity Price Forecast

Pinball-Huber boosted extreme learning machine regression: a multiobjective approach to accurate power load forecasting

Photovoltaic Power Forecasting by Evolutionary Algorithm-Based Improved Extreme Learning Machine

1 Introduction