Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm

Shayeghi, H.; Ghasemi, A.; Moradzadeh, M.; Nooshyar, M.

doi:10.1007/s00500-015-1807-1

Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm

Methodologies and Application
Published: 06 August 2015

Volume 21, pages 525–541, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm

Download PDF

H. Shayeghi^1,2,
A. Ghasemi¹,
M. Moradzadeh³ &
…
M. Nooshyar¹

470 Accesses
22 Citations
Explore all metrics

Abstract

Electricity price forecasting has nowadays become a significant task to all market players in deregulated electricity market. The information obtained from future electricity helps market participants to develop cost-effective bidding strategies to maximize their profit. Accurate price forecasting involves all market participants such as customer or producer in competitive electricity markets. This paper presents a novel hybrid algorithm to forecast day-ahead prices in the electricity market. This hybrid algorithm consists of (a) generalized mutual information (GMI), wavelet packet transform (WPT) as pre-processing methods, (b) least squares support vector machine based on Bayesian model (LSSVM-B) as forecaster engine, (c) and a modified artificial bee colony (ABC) algorithm used for optimization. Moreover, the orthogonal learning (OL) is used as a global search tool to enhance the exploitation of the ABC algorithm. Hereafter, call the proposed hybrid algorithm as S-OLABC. The numerical simulation results performed in this paper for different cases in comparison to previously known classical and intelligent methods. In addition, it will be shown that GMI based on WPT has better performance in extracting input features compared to classical mutual information (MI).

Day-Ahead Electricity Price Forecasting Using WT, MI and LSSVM Optimized by Modified ABC Algorithm

RETRACTED ARTICLE: Day-ahead electricity price forecasting using WPT, VMI, LSSVM-based self adaptive fuzzy kernel and modified HBMO algorithm

Article Open access 30 August 2021

Applications of empirical wavelet decomposition, statistical feature extraction, and antlion algorithm with support vector regression for resident electricity consumption forecasting

Article 08 October 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Aims and difficulties

Effective and accurate electricity price forecasting is a critical issue for energy-market participants, because it can help them make a suitable risk management in competitive electricity markets and thus maximize their profits. Market participants rely on price forecasts to decide on their bidding strategies, allocate assets, and plan facility investment (Taherian et al. 2013). The main purpose of electricity-market participants is to have a clear and cost-effective market. Hereby, all market players need an exact and robust estimation of future electricity price to set their bidding strategies in the real market to maximize their profit. On the other hand, the prediction of electricity prices is very difficult as prices are shown to be more volatile in electricity markets than any other financial markets. This is partially due to the fact that electrical supply and demand need to be on a real-time equilibrium and, unlike other commodities, it is not financially viable to store large quantity of electricity. In addition, many different parameters such as weather conditions, availability of relatively inexpensive generation facilities (e.g., nuclear and hydro), sudden disturbance or fault in generation and transmission power systems can affect on electricity market price volatility (Shayeghi et al. 2015). This volatility undoubtedly indicates that there is a strong need in the electric power industry for reasonably accurate methods that suitably forecast electricity prices.

1.2 Literature review

In deregulated electricity markets, participants that are involved in trading make extensive use of price prediction techniques either to bid or to hedge against volatility. Having reliable daily price forecast information, producers or energy service companies is able to delineate good bilateral contracts and makes better financial decision. In recent years, several models have been applied to predict prices in electricity markets (Taherian et al. 2013; Shayeghi et al. 2015; Contreras et al. 2003; Garcia et al. 2005; Mohamed and Bodger 2005; Melek and Derya 2010; Anbazhagan and Kumarappan 2012; Da et al. 2014). The available forecasting models can be generally classified into three groups: classical, intelligent and hybrid computing methods.

The classical group consists of auto-regressive integrated moving average (ARIMA) (Contreras et al. 2003), the generalized auto-regressive conditional heteroskedastic (GARCH) approach (Garcia et al. 2005), dynamic regression (DR) and multiple linear regressions (Mohamed and Bodger 2005). They usually use linear models with limited or even no capability to characterize the non-linearity of the electricity price patterns. In addition, the stationary process considered for most of these studies cannot capture non-stationary features of the price time-series.

Intelligent group use data-driven models, where input–output mapping is learned from historical examples. For example, fuzzy inference (Melek and Derya 2010), artificial neural network (ANN) (Anbazhagan and Kumarappan 2012), and wavelet transform (WT) $+$ support vector machine (SVM) (Da et al. 2014), etc.

However, a single model may not be able to predict electricity price accurately. Thus, hybrid models have been widely used in various applications, including price forecasting. The goal in hybrid forecasting is to combine different models and improve the final forecast accuracy. The hybrid methods have been investigated in several studies such as MI $+$ WT $+$ FNN (Osórioa et al. 2014), MI $+$ WT $+$ SVM (Shayeghi and Ghasemi 2013), hybrid intelligent method (Ghofrani et al. 2015) and WT $+$ ARIMA $+$ LSSVM method (Zhang et al. 2012). A hybrid model using WT combined with ARIMA and GARCH models was used in Tan et al. (2010). Sousa et al. (2012) proposed a hybrid method that applied a neural network as an auxiliary forecasting tool for predicting electricity market prices. Through the analysis of prediction error patterns, the simulation method predicts the expected error for the next forecast, and uses it to adapt the actual forecast. Sharma and Srinivasan (2013) combined a FitzHugh–Nagumo model, for mimicking the spiky price behavior, with an Elman network, for regulating the latter, and a feed-forward ANN, for modeling the residuals. Therefore, this hybrid model is used for point and interval forecasting in markets in Australia, Ontario, Spain and California. To achieve the helter-skelter kind of electricity price, one-dimensional discrete cosine transforms (DCT) input featured feed-forward neural network (FFNN) known as DCT-FFNN is modeled in Anbazhagan and Kumarappan (2014). The DCT-FFNN model is compared with other models to estimate the market clearing prices of mainland Spain. The common disadvantage of most of previously reviewed methods is that those models suffer from the over-fitting problem. Therefore, this paper proposes a new hybrid algorithm for price forecasting that can efficiency tackle said shortages.

1.3 Motivation and contribution

The main contributions of this paper are summarized below:

With increasing input of data dimension into learning machine (e.g., ANN), different data classification models become accordingly harder to utilize. Also, high-dimensional data is a serious difficulty for many classification models due to its high computational cost and memory usage. Dimensionality diminution and best feature subset selection are two available methods for reducing the attribute space of an input feature set, which is a significant factor of both supervised and unsupervised regression and classification models (Shayeghi and Ghasemi 2013). MI is a special tool (to minimize relative entropy) in evaluating the approximation of the joint probability with the product of marginal probabilities. According to our review of MI, they require provision of a complete model with more performance. Albeit, the pervious MI technique can consider relevancy and redundancy in simultaneous term, but by increasing of input data, they may lose some valuable data. Therefore, in this paper, a new three-way feature selection algorithm is proposed that filters out irrelevant and redundant candidate inputs, respectively.
Specifically, this paper employs the WPT to construct features that highly correlate with alertness and the different levels of drowsiness. The WPT is chosen due to its ability to deal with stationary, non-stationary, or transitory characteristics of different signals including abrupt changes, spikes, drifts, and trends.
To select the best tree from WPT output to decrease the computational time, the Shannon entropy criteria will be applied in this paper. It measures the predictability of future subset values based on the probability distribution of the values already observed in the data.
This paper presents LSSVM-B as learning engine that models the nonlinear pattern in the price signal. The combination of LSSVM and Bayesian statistics allows us for both accurate prediction of day-ahead price forecasts and the level of their uncertainty. In fact, use of Bayesian method can calibrate an ensemble forecast method to give better forecasts than the linear methods. Also, enjoying all potential of LSSVM-B in learning, its control parameters are optimized by S-OLABC algorithm. In other words, to cope with the disadvantage of standard ABC method that is often trapped in local optima when optimization problem has a large number of optimization variables, two modifications are proposed in ABC based on self-adaptive model and orthogonal learning.

1.4 Structure of the paper

The reminder of this paper is organized as follows. Section 2 provides the mathematical formulation of the forecasting problem. Section 3 introduces the proposed price forecasting hybrid algorithm. To demonstrate the advantages of the proposed algorithm, three test markets have been considered. Simulation results and comparison with previously reported results are discussed in Sect. 4. Finally, the paper is concluded in Sect. 5.

2 Price forecasting problem formulation

2.1 Proposed WPT model

Application of WT in signal processing is one of the active areas in wavelet studies. In contrast to other transforms, e.g., in cosine or Fourier transforms that works truly in the frequency domain only, the DWT decomposes a signal into detail and approximation terms so that it is represented more efficiently and localized in both time (space) and frequency domains. The mathematical formulation of DWT is given in Wickerhauser et al. (1992). This paper uses WPT to decompose a price signal into detail and approximation terms (Wickerhauser et al. 1992). Let resolution $2^{-j}$ be defined as an orthogonal level on $L^{2}(R)$ where the scale index j is on integer. $V_{j}$ is a possible approximation at resolution $2^{-j}$. The orthogonal function of variable x on space $V_{j}$ is the function $x_{j}$ that minimizes $\Vert x-x_{j}\Vert $. The details of a signal at resolution 2$^{-j}$ are the difference between the approximations at resolutions 2$^{-j+1}$ and $2^{-j}$. According to this description and DWT (Shayeghi and Ghasemi 2013; Wickerhauser et al. 1992), a signal with approximation space $V_{j}$ can be defined by $V_{j+1}+W_{j+1}$ (approximation $+$ detail). Therefore, the original orthogonal function $\{\phi _j (t-2^jn)\}_{n\in Z} $ can be defined by approximation $V_{j+1} =\{\phi _{j+1} (t-2^{j+1}n)\}_{n\in Z} $ and detail $W_{j+1} =\{\varphi _{j+1} (t-2^{j+1}n)\}_{n\in Z} $ spaces. Decompositions of $\varphi _{j+1}$ and $\phi _{j+1} $ into $\{\phi _j (t-2^jn)\}_{n\in Z}$ achieved by the following two filters of H (low-pass filter) and G (high-pass filter) that can be expressed as follows:

$$\begin{aligned}&H(n)=\left\langle {\frac{\upphi (0.5t)}{\sqrt{2} },\phi (t-n)} \right\rangle \end{aligned}$$

(1)

$$\begin{aligned}&\left\langle {\phi _{j+1,p} ,\phi _{j,n} } \right\rangle =\left\langle {\frac{\phi (0.5t)}{\sqrt{2} },\phi (t-n+2p)} \right\rangle =H(n-2p)\nonumber \\ \end{aligned}$$

(2)

$$\begin{aligned}&\phi _{j+1,p} =\sum \limits _{n\in (-\infty ,+\infty )} {H(n-2p)\phi _{j,n} }\end{aligned}$$

(3)

$$\begin{aligned}&G(n)=\left\langle {\frac{\varphi (0.5t)}{\sqrt{2} },\phi (t-n)} \right\rangle \end{aligned}$$

(4)

$$\begin{aligned}&\left\langle {\varphi _{j+1,p} ,\phi _{j,n} } \right\rangle =\left\langle {\frac{\varphi (0.5t)}{\sqrt{2} },\phi (t-n+2p)} \right\rangle =G(n-2p) \nonumber \\ \end{aligned}$$

(5)

$$\begin{aligned}&\varphi _{j+1,p} =\sum \limits _{n\in (-\infty ,+\infty )} {G(n-2p)\phi _{j,n}}\end{aligned}$$

(6)

$$\begin{aligned}&G(n)=(-1)^{-n+1}H(-n+1) \end{aligned}$$

(7)

Note that this theory can be generalized to each space $D_{j}$ with approximation $n2^{-j}$, $n\,\in \,Z$. It is clear that WPT offers a more suitable and flexible analysis compared to DWT. Figure 1 shows a wavelet packet decomposition tree at 3 levels. It can be seen that n-level wavelet packet decomposition produces $2^{n}$ different sets of coefficients as opposed to $n+ 1$ set in the DWT (Wickerhauser et al. 1992).

2.2 Shannon entropy

The Shannon entropy is a way of quantifying the “disorder” of a random variable and represents a powerful approach for evaluating the contributions of the different channels. The Shannon entropy H(p) of a probability distribution $p\in \hbox {Pr}_c ( A)$ is:

$$\begin{aligned} H(p)=\sum \limits _{x\in A} {F(p( x))} ,\quad F(x) = x \log \frac{1}{x} \end{aligned}$$

(8)

with the convention that $F(0) = 0$. Given an A-random variable X, then have $H(X) = H(p_{X})$. More details are given in previous work (Shayeghi et al. 2015).

2.3 Least squares support vector machine-Bayesian model (LSSVM-B)

The SVM algorithm, developed by Vapnik (1995) in 1995, is based on statistical learning theory. However, SVM formulates the training process through quadratic programming, which can take much too time. Suykenns and Vandewalle (1999) proposed a novel SVM known as LSSVM, which is able to solve linear problems quicker with a more straight forward approach. However, in this paper, Bayesian model is chosen among all learning methods for the artificial network, as it shows the most proper performance when limited training datasets are available (Neal 1996). In fact, this paper proposes the Bayesian model integrated with a Gaussian process to enhance the LSSVM performance in prediction. The proposed method enables the uncertainty in LSSVM weight estimates to be explicitly accounted for, which in turn enables the generation of probabilistic predictions (Shayeghi and Ghasemi 2013). In other words, Bayesian method can be used to overcome various limitations that prevents SVM from becoming more widely accepted and reaching its full potential, such as lack of consideration of forecasting uncertainty, complexity in estimating appropriate parameter values, difficulty in selecting the optimum complexity, and lack of a good method to properly validate the model and interpret the modeled relationship (Melek and Derya 2010). Assume input and training variables in Gaussian process with $x_{i},y_{i}= f(x_{i}) (i = 1,2,{\ldots },l)$, respectively. Let $y=f({\mathbf {x}})+\delta $ denote relation between two variables x and y which $\delta $ is the error term. Then, the covariance between two input variables $x_{i}, x_{j}$ with the relevancy factor $w(w_{k}$ measures the relevance of the kth variable with the prediction of the output variable) as follows:

$$\begin{aligned} E[f(x_i )f(x_j )]=\hbox {e}^{\left( -\frac{1}{2}\sum \limits _{k=1}^l {\omega _k (x_i^k -x_j^k )} \right) } \end{aligned}$$

(9)

Let us collect relevancy factors w together with parameters, A and $\varepsilon $, as the hyperparameter vector (denotes with $\vartheta $) that $\varepsilon $ is noise density function. Hereby, the noise might be biased in the proposed model in this paper. Therefore, the posterior probability of f can be expressed as follows:

$$\begin{aligned}&p(\delta )=\frac{\exp (-A\chi _{\varepsilon ,\beta } (\delta ))}{\int {\exp (-A\chi _{\varepsilon ,\beta } (\delta ))} d\delta }\nonumber \\&\chi _{\varepsilon ,\beta } (\delta )=\left\{ {\begin{array}{ll} -(\varepsilon +\delta )&{}\quad -N<\delta \\ \frac{(\delta +(1-\beta )\varepsilon )^2}{4\beta \varepsilon }&{}\quad -N\le \delta \le -M \\ 0&{}\quad -M<\delta <M \\ \frac{(\delta -(1-\beta )\varepsilon )^2}{4\beta \varepsilon }&{}\quad M\le \delta \le N \\ -(\varepsilon -\delta )&{}\quad N<\delta \\ \end{array}} \right. ,\quad \left\{ {\begin{array}{l} M=(1-\beta )\varepsilon \\ N=(1+\beta )\varepsilon \\ \end{array}} \right. \nonumber \\ \end{aligned}$$

(10)

$$\begin{aligned}&p(f\vert U,\vartheta )=\frac{p(U\vert f,\vartheta )p(f\vert \ vartheta )}{p(U\vert \vartheta )} \end{aligned}$$

(11)

where $\varepsilon >$0 and $0\,<\beta \le \,1$. $p(U\vert f,\vartheta )$ and $p(f\vert \vartheta )$ are the likelihood and priori probability functions based on Gaussian process of the dataset U, respectively, then,

$$\begin{aligned}&p(f\vert \vartheta )=\frac{\exp \left( -\frac{1}{2}f^T\sum {^{-1}f} \right) }{(2\pi )^{\frac{1}{2}l}\sqrt{\vert \Sigma \vert } },\quad f=\left[ {\begin{array}{l} f(x_1 ) \\ f(x_2 ) \\ \vdots \\ f(x_l ) \\ \end{array}} \right] \end{aligned}$$

(12)

$$\begin{aligned}&p(U\vert f,\vartheta )=\frac{\exp \left( -A\sum \nolimits _{k=1}^n {\chi _{\varepsilon ,\beta } (y_i -f(x_i) )}\right) }{\int {\exp ((-A\cdot \chi _{\varepsilon ,\beta } (\delta )))} } \end{aligned}$$

(13)

where $| \Sigma |$ returns the determinant of the square covariance matrix. In other words, $| \Sigma |$ is computed using the triangular factors obtained by Gaussian elimination. Using the normalized $p(f\vert \vartheta )$ in Eq. (11), irrelevant deduction of f can be obtained. Based on Eqs. (12)–(13) and Bayes’ theorem, the posterior probability can be expressed as follows:

$$\begin{aligned}&p(f\vert U,\vartheta )\nonumber \\&=\frac{\exp \left( {-A\sum \nolimits _{k=1}^n {\chi _{\varepsilon ,\beta } ( {y_k -f(x_k )})-\frac{1}{2}f^T\sum {^{-1}f} } }\right) }{\displaystyle \int {\exp \left( {-A\sum \nolimits _{k=1}^n {\chi _{\varepsilon ,\beta } ( {y_k -f(x_k )})-\frac{1}{2}f^T\sum {^{-1}f} } }\right) \,\hbox {d}f} }\nonumber \\ \end{aligned}$$

(14)

If $h(f)=A\sum _{k=1}^n {\chi _{\varepsilon ,\beta } (y_k -f(x_k )} )+\frac{1}{2}f^T\sum {^{-1}f} $ then $p(f\vert U,\vartheta )\propto e^{-h(f)}$, and the minimization function can be calculated as follows:

$$\begin{aligned} \mathop {\min }\limits _f h(f)=\mathop {\min }\limits _f A\sum \limits _{k=1}^n {\chi _{\varepsilon ,\beta } (y_k -f(x_k )} )+\frac{1}{2}f^T\sum {^{-1}f}\nonumber \\ \end{aligned}$$

(15)

To solve Eq. (15), define two slack variables $\zeta _i$ and $\zeta _i^*$:

$$\begin{aligned} \begin{aligned}&y_i -f(x_i )-(1-\beta )\varepsilon =\zeta _i\\&{-}y_i +f(x_i )-(1-\beta )\varepsilon =\zeta _i^*\end{aligned} \end{aligned}$$

(16)

Consequently, LSSVM-B based on regression can be defined by:

$$\begin{aligned}&\mathop {\min }\limits _{f,\zeta ,\zeta ^*} J=\mathop {\min }\limits _{f,\zeta ,\zeta ^*} A\sum \limits _{k=1}^n {(a(\zeta _k^*)+a(\zeta _k )} )+\frac{1}{2}f^T\sum {^{-1}f} \nonumber \\&\hbox {S.t.} \nonumber \\&(1-\beta )\varepsilon +\zeta _k \ge y_k -f(x_k ) \nonumber \\&(1-\beta )\varepsilon +\zeta _k^*\ge -y_k +f(x_k ) \nonumber \\&\zeta _k^*,\zeta _k \ge 0 \end{aligned}$$

(17)

where f is the hyper-plane function. $y_{k}$ is kth output of LSSVM-B, $x_{k}$ is kth input data set to the model. Based on $\zeta $-insensitive Vapnik loss function, then (Vapnik 1995):

$$\begin{aligned}&\mathop {\min }\limits _{f,\zeta ,\zeta ^*} A\sum \limits _{k=1}^n {(a(\zeta _k^*)+a(\zeta _k )} )+\frac{1}{2}f^T\sum {^{-1}f} \nonumber \\&a(\zeta _k^*)=\left\{ {\begin{array}{l} \frac{(\zeta _k^*)^2}{4\varepsilon \beta },0\le \zeta _k^*\le 2\varepsilon \beta \\ -\varepsilon \beta +\vert \zeta _k^*\vert ,2\varepsilon \beta <\zeta _k^*\\ \end{array}} \right. \end{aligned}$$

(18)

Matching Lagrange function can be calculated as follows:

$$\begin{aligned}&\mathop {\min }\limits _{\alpha ,\alpha ^*} W(\alpha ,\alpha ^*)=\frac{1}{2}(\alpha -\alpha ^*)^T\sum (\alpha -\alpha ^*)\nonumber \\&\quad +\sum \limits _{k=1}^n \left( -(\alpha _k -\alpha _k^*)y_k +(\alpha _k +\alpha _k^*)(1-\beta )\varepsilon \right. \nonumber \\&\quad \left. +\frac{\varepsilon \beta }{A}\left( (\alpha _k +\alpha _k^*)^2-2\alpha _k \alpha _k^*\right) \right) \nonumber \\&\hbox {S.t.} \nonumber \\&\upsilon \in (0,A)\end{aligned}$$

(19)

$$\begin{aligned}&L(\alpha _k ,\alpha _k^*,\zeta ,\zeta ^*,\beta )=J(f,\zeta ,\zeta ^*)-W(\alpha _k ,\alpha _k^*) \end{aligned}$$

(20)

By the Karush–Kuhn–Tucker (KKT) conditions, the solutions using partial differentiation with respect to $\alpha _k ,\alpha _k^*,\zeta ,\beta $ and $\zeta ^*$can be written as:

$$\begin{aligned} \left\{ {\begin{array}{l} \frac{\partial L}{\partial w}=0\Rightarrow J=\sum \limits _{k=1}^n {W(\alpha _k ,\alpha _k^*)} \\ \frac{\partial L}{\partial \zeta }=0\Rightarrow A\sum \limits _{k=1}^n {a(\zeta _k )} =0 \\ \frac{\partial L}{\partial \zeta ^*}=0\Rightarrow A\sum \limits _{k=1}^n {a(\zeta _k^*)} =0 \\ \frac{\partial L}{\partial \beta }=0\Rightarrow -\varepsilon \gamma =0 \\ \frac{\partial L}{\partial \alpha _k }=0\Rightarrow \sum \limits _{k=1}^n {W_{(\alpha _k ,\alpha _k^*)}^{'} \vert _{\partial \alpha _k } ==0} \\ \frac{\partial L}{\partial \alpha _k^*}=0\Rightarrow \sum \limits _{k=1}^n {W_{(\alpha _k ,\alpha _k^*)}^{'} \vert _{\partial \alpha _k^*} ==0} \\ \end{array}} \right. \end{aligned}$$

(21)

By eliminating w and $\beta $, the equations can be written as:

$$\begin{aligned}&\left[ {\begin{array}{l} \beta \\ \alpha \\ \end{array}} \right] =\left[ {\begin{array}{ll} \begin{array}{ll} 0 &{}\quad I^{T} \\ \end{array} \\ \begin{array}{ll} {I} &{} \quad {\Omega +\gamma ^{-1}I} \\ \end{array} \\ \end{array}} \right] ^{-1}\left[ {\begin{array}{ll} 0 \\ y \\ \end{array}} \right] ,\quad \hbox {where}\nonumber \\&\left\{ {\begin{array}{ll} y=[y_1 ,y_2 ,...,y_l ]^T \\ I_\nu =[1,1,...,1]^T \\ \alpha =[\alpha _1 ,\alpha _2 ,...,\alpha _l ]^T\rightarrow \,\text{ Lagrange } \text{ multipliers } \\ \end{array}} \right. \end{aligned}$$

(22)

Finally, the LSSVM-B regression can be calculated by:

$$\begin{aligned} y(x)=\sum \limits _{i=1}^l {\alpha _i K(x,x_i )+\beta } \end{aligned}$$

(23)

where K refers to kernel function ($K(x_{i}, x_{j})$), with the positive definition of the matrix $\Omega +\gamma ^{-1}I_l \in {{\mathbb {R}}}^{v\times v}$. Here, $\Omega \in {{\mathbb {R}}}^{l\times l}$ is defined by its elements $\Omega _{ij} =\varphi (x_i )^T\varphi (x_j )=K(x_i ,x_j )$ for $\forall (i,j)\in {{\mathbb {N}}}_l \times {{\mathbb {N}}}_l $ and K(0, 0) is a kernel function meeting the Mercer’s theorem (Vapnik 1995). $\beta $ is a variable scanning the space. $\alpha _{i},\beta \in R$ are the solution of Eq. (22), $x_{i}$ is training data and x is the new input vector, and y(x) is the LSSVM-B output model.

2.4 Theory of MI

Feature selection method is a commonly used process in data classification, wherein a subset of the features available from the data is selected for application of forecasting (Lee and Kim 2013). In this section, the concept of MI is briefly reviewed; more details can be found in Lee and Kim (2013). Let $X= X_{1}$, $X_{2}$, ${\ldots }$, $X_{n}$ and $P(X)= P(X_{1}), P(X_{2}),\ldots , P(X_{n})$ denote vector of variable and probability distribution. The entropy theory H(X) then can be expressed as:

$$\begin{aligned} H(X)=-\sum \limits _{i=1}^n {P(X_i )\log _2 ( {P(X_i )})} \end{aligned}$$

(24)

For universalization, let the X and Y be discrete random variables with a joint probability distribution P(X, Y) then the joint entropy function H(X, Y) can be calculated by:

$$\begin{aligned} H(X,Y)=-\sum \limits _{i=1}^n {\sum \limits _{j=1}^m {P(X_i ,Y_j )\log _2 \left( {P(X_i ,Y_j )}\right) } } \end{aligned}$$

(25)

Moreover, when certain variables are known and others are not, the remaining uncertainty is calculated by the conditional entropy (Akadi et al. 2008):

$$\begin{aligned} H(Y/X)= & {} \sum \limits _{i=1}^n {P(X_i )H(Y/ {X=X_i })}\nonumber \\= & {} -\sum \limits _{i=1}^n {P(X_i )\sum \limits _{j=1}^m {P({Y_j }/ {X_i })\log _2 ( {P({Y_j }/ {X_i })})} } \nonumber \\= & {} -\sum \limits _{i=1}^n {\sum \limits _{j=1}^m {P(X_i ,Y_j )\log _2 \left( {P({Y_j }/ {X_i })}\right) } } \end{aligned}$$

(26)

Then, H(X, Y) and H(X / Y) or H(Y / X) are defined by Eq. (27) known as chain rule (Fig. 2):

$$\begin{aligned} H(X,Y)=H(X)+H(Y/X)=H(Y)+H(X/Y) \end{aligned}$$

(27)

Thus, the mathematical formulation of $\hbox {MI}(X,Y)$ between two variables X and Y can be expressed as:

$$\begin{aligned} \hbox {MI}(X,Y)=\sum \limits _{i=1}^n {\sum \limits _{j=1}^m {P(X_i ,Y_j )\log _2 \left( {\frac{P(X_i ,Y_j )}{P(X_i )P(Y_j )}}\right) } } \end{aligned}$$

(28)

When the MI is a large value then X and Y are closely related and vice versa.

2.5 Proposed GMI model

Dimensionality reduction of the raw input variable space is a fundamental step in most forecasting tasks. Focusing on the most relevant information in a potentially overwhelming amount of data is useful for a better understanding of the data (Lee and Kim 2013). In this paper, a new feature selection based on feature subset selection (FSS) and dimensionality reduction (DR) is proposed (Akadi et al. 2008).

There are two different kinds of relevance model as shown in Fig. 3:

Strong relevance A feature $f_{i}$ is strongly relevant if its removal degrades the performance of the Bayes optimum classifier.

Weak relevance A feature $f_{i}$ is weakly relevant if it is not strongly relevant and there exists a subset of features $\vec {{F}'}$ such that the performance on $\vec {{F}'}\cup \{f_i \}$ is better than the performance on just $\vec {{F}'}$.

The rest of the features will be irrelevant. Among them we could say that there are redundant features and noisy features.

A different classification of the features consists of:

Relevant features those features which, by themselves or in a subset with other features, have information about the class.

Redundant features those which can be removed because there is another feature or subset of features which already supply the same information about the class. This feature selection is completely illustrated in previous work (Shayeghi et al. 2015). The conditional mutual information (CMI) between two subsets, e.g., A and B conditioned subset C can be defined as:

$$\begin{aligned} I(A;B\vert C)&\mathop \equiv \limits ^\Delta \sum \limits _{{a,b,c}} {P(a,b,c)\log _2 \left( {\frac{P(a,b\vert c)}{P(a\vert c)P(b\vert c)}}\right) } \nonumber \\&=H(A\vert C)+H(B\vert C)-H(AB\vert C) \nonumber \\&=H(A\vert C)-H(A\vert B,C)\nonumber \\&=H(AC)+H(BC)-H(C)-H(ABC)\nonumber \\ \end{aligned}$$

(29)

And in similar way, MI in three interactions information based on relevancy and redundancy influence can be expressed as follows:

$$\begin{aligned} I(A;B;C)&\mathop \equiv \limits ^\Delta I(A;B\vert C)-I(A;B)\nonumber \\&=I(A,B;C)-H(A;C)-I(B;C)\nonumber \\&=H(AB)+H(BC)+H(AC)-H(A)\nonumber \\&\quad -H(B)-H(C)-H(ABC) \end{aligned}$$

(30)

In this paper, the relevance and redundancy of the candidate feature can be calculated by:

$$\begin{aligned} \hbox {GMI}=\arg \max \left( {I(x_i ;C)+\mathop {\min }\limits _{X_s \in S} ( {I(x_i ;x_s ;C)})}\right) \end{aligned}$$

(31)

where $I(x_{i};x_{s};C)$ is the redundancy term (Fig. 4).

3 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)

3.1 Overview of standard ABC

In this section, the standard ABC is briefly reviewed. Interested readers are referred to Karaboga and Akay (2009) for more details. The pseudo-code of the standard ABC algorithm is shown in Fig. 5.

3.2 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)

Albeit the standard ABC is a successful algorithm for finding the optimal solution in a optimization problem, however, it suffers from often converging to local optima in search process. Therefore, to make equilibrium between the exploration and exploitation capability in the search process in an attempt to achieve high optimization performance, two modifications are proposed in this paper. However, the original design of the solution’s generation is affected by random variable $\varphi _{ij} $, therefore, it is not strong enough to maximize the exploitation capacity. Hereby, an extra coefficient $\mho _{ij}$ can be defined as follows:

$$\begin{aligned} x_{ij} (t+1)=x_{ij} (t)+\mho _{ij} \varphi _{ij} [x_{ij} (t)-x_{kj} (t)] \end{aligned}$$

(32)

where $\mho _{ij}$ can be adaptively updated at each iteration. If $\mho _{ij}$ is large, then $\mho _{ij} \varphi _{ij} $ will be large and exploration can be enhanced to get rid of local optimal areas. Inversely, when $\mho _{ij}$ is small, then $\mho _{ij} \varphi _{ij} $ will be small and exploitation will be efficient to find the best solution in the search space. To select the best value for $\mho _{ij}$ in exploitation and exploration, two variables $T_{1}<$0 and $T_{2}>0$ are defined and two random variables $R_{1}$ and $R_{2}$ are generated in (0, $T_{1}$) and (0, $ T_{2})$, respectively. Therefore, the following factors $\mho _{1j} =2^{R_1 }$ and $\mho _{2j} =2^{R_2 }$ can be given for exploitation and exploration, respectively. In fact, two population vectors are generated by $\mho _{1j}$, $\mho _{2j} $. When $T_{1}<0$, then $R_{1}$ is negative, and $\mho _{1j}$ will be small, and exploitation will be amplified. Next, the best $\mho _{ij} $ is chosen based on its best fitness values obtained from the generated population with $\mho _{1j}$ and $\mho _{2j} $. Hereby, this factor can be updated by:

$$\begin{aligned} \mho _{ij} =\lambda \mho _{ij} +(1-\lambda )\mho _{kj} ,\quad k=1,2 \end{aligned}$$

(33)

where $\lambda $ is a constant factor and initially $\mho _{ij} =1$. As another disadvantage of standard ABC algorithm: each scout bee uses its historical/neighborhood’s best information through simple model. Such a learning approach is easy to employ, but is incompetent when searching in complex optimization problem spaces. Thus, a novel OL model (Parsopoulos and Vrahatis 2004) is proposed for ABC to recover more useful information via orthogonal operator in this paper. Let us consider two initial solution vectors of $X_1 =\left\{ {x_1^1 ,x_1^2 ,\ldots ,x_1^D } \right\} $ and $X_2 =\left\{ {x_2^1 ,x_2^2 ,\ldots ,x_2^D } \right\} $. $X_{1 }$ and $X_{2}$ define a search range $\left[ {\min ( {x_1^j ,x_2^j }),\max ( {x_1^j ,x_2^j })} \right] $ for jth variable of the design variable vector, X. Hereby, the search range is quantized into N-level, $\rho _{k,1} ,\rho _{k,2} ,\ldots ,\rho _{k,N} $as follows:

$$\begin{aligned} \rho _{j,l} =\frac{(N-1)\min (x_1^j ,x_2^j )+(l-1)(\max (x_1^j ,x_2^j )-\min (x_1^j ,x_2^j ))}{N-1}\!\!\!\nonumber \\ \end{aligned}$$

(34)

This quantization cannot be directly applied to orthogonal array. Thus, it is divided into k-subsets vectors as follows:

$$\begin{aligned} K=\left[ {{\begin{array}{ll} {x_1 ,x_2 ,\ldots ,x_{n_1 } } \\ {x_{n_1 +1} ,x_{n_1 +2} ,\ldots ,x_{n_2 } } \\ {x_{n_2 +1} ,x_{n_2 +2} ,\ldots ,x_{n_3 } } \\ \vdots \\ {x_{n_k +1} ,x_{n_k +2} ,\ldots ,x_D } \\ \end{array} }} \right] ,\quad x_{n_1 } \le x_{n_2 } \le \cdots \le x_D\nonumber \\ \end{aligned}$$

(35)

Finally, the proposed OL solutions can be defined as follows:

$$\begin{aligned}&L_{k1} =[\rho _{n_{k-1} +1,1} ,\rho _{n_{k-1} +2,1} ,\ldots ,\rho _{n_k ,1} ]\nonumber \\&L_{k2} =[\rho _{n_{k-1} +1,2} ,\rho _{n_{k-1} +2,2} ,\ldots ,\rho _{n_k ,2} ]\nonumber \\&\vdots \nonumber \\&L_{kN} =[\rho _{n_{k-1} +1,N} ,\rho _{n_{k-1} +2,N} ,\ldots ,\rho _{n_k ,N} ] \end{aligned}$$

(36)

The algorithm analysis is given in the Appendix.

4 Application of S-OLABC to the day-ahead electricity price forecasting problem

The process of the day-ahead electricity price forecasting by S-OLABC algorithm can be summarized as:

Step 1 Read price data that contains price series and decompose them into training and testing sets known as $T_{r}$ and $I_{n}$, respectively.
Step 2 Set up the adjustable S-OLABC parameter values, such as the maximum cycle number, colony size and limit value to be 50, 40 and 10 values, respectively.
Step 3 Decompose price data (historical data up to 24 h of day $d-1$) via WPT in a set of four constitutive series which is defined as $a_{h}, b_{h}, c_{h}$ and $d_{h}, h=1,2 ,{\ldots },T$, where the value of T ranges usually between 1 week to 2 months. $a_{h}, b_{h}, c_{h}$ are detail with small adjustments and $d_{h}$ is approximation series which is the main component of the transform. Therefore, applying the WPT method to the original prices series can be formulated by:
$$\begin{aligned} W(p_h ;h=1,\ldots ,T)=\{a_h ,b_h ,c_h ,d_h ;h=1,\ldots ,T\}\nonumber \\ \end{aligned}$$
(37)
Step 4 As a contribution of this paper and unlike other research papers in prediction area, consider detail and approximation terms as valuable information with probable input candidate, and send to the GMI system. In addition, use the Shannon theory and select the best branch of WPT tree. Then, GMI system tries to select the best data with most relevancies and least redundancy. The passed data of this system are sorted as a vector; {$x_{1}$, $x_{2}$, ${\ldots }$, $x_{n}$}. Now, these vectors are ready to send to step 5 for training.
Step 5 This step is similar to the engine of a machine, being the main part of the forecasting algorithm. When the LSSVM-B is launched with {$x_{1}$, $x_{2}$, ${\ldots }$, $x_{n}$} vector then the training process will start. Further details about learning process of the LSSVM-B model are given in Sect. 2.3. In first iteration, the outputs of LSSVM-B system may be not proper; therefore, completion of training process needs stages 6 and 7 as well.
Step 6 Calculate the fitness value. Sort the initial population as well as all their related data in the descending order of fitness.
$$\begin{aligned} \hbox {Fitness}=\frac{1}{N}\sum \limits _{i=1}^N {\frac{\hbox {act}_i -\hbox {forc}_i }{y_i }} \end{aligned}$$
(38)
where $\hbox {act}_{i}$ and $\hbox {forc}_{i}$ represent the actual and forecast values, respectively.
Step 7 Use the S-OLABC to find the best agent. The best solution is considered as an initial solution. If the best solution found by chaotic local searches is better than the previous food source then it is replaced.
Step 8 Use the inverse WPT to estimate the hourly prices for day d by means of the estimates for day d of the constitutive series. The inverse WPT is used in turn to reconstruct the estimate series for prices, i.e.,
$$\begin{aligned} W^{-1}(\{a_h^{\mathrm{est}} ,b_h^{\mathrm{est}} ,c_h^{\mathrm{est}} ,d_h^{\mathrm{est}} ;h= & {} T+1,\ldots ,T+24\})\nonumber \\= & {} P_h^{W,est} \end{aligned}$$
(39)
Step 9 Update velocity and position-based S-OLABC scheme. In other words, move the agents to the search area for discovering new solutions.
Step 10 If the maximum number of iteration is reached, then finish. Otherwise, go to step 2. The graphical process of the proposed algorithm is depicted in Fig. 6.

5 Simulation results and discussion

The proposed hybrid algorithm is tested using the real data of the hourly Iran electricity price (HIEP), Spanish and New South Wales (NSW) markets. In the proposed algorithm, the adjusted parameters are set with population size, limit value, $G_{0}$ and iteration by 50, 100, 10 and 200, respectively.

5.1 Evaluating the forecasting error

As indicated in Shayeghi et al. (2015), this paper uses similar error-based indices to compare the forecast accuracy. These indices are:

The daily MAPE can be defined as

$$\begin{aligned}&\hbox {MAPE}=\frac{1}{N}\sum \limits _{i=1}^N {\frac{\vert P_{i\mathrm{ACT}} -P_{i\mathrm{FOR}} \vert }{P_{\mathrm{AVE-ACT}} }}\end{aligned}$$

(40)

$$\begin{aligned}&P_{\mathrm{AVE-ACT}} =\frac{1}{N}\sum \limits _{i=1}^N {P_{i\mathrm{ACT}} } \end{aligned}$$

(41)

where $P_{i\mathrm{ACT}}$ and $P_{i\mathrm{FOR}}$ are actual and predicted values, and $P_{\mathrm{AVE-ACT}}$ denotes the average of actual prices. The FMSE is:

$$\begin{aligned} \hbox {FMSE}_{\mathrm{day}} =\sqrt{\frac{1}{N}\sum \limits _{i=1}^N {( {P_{i\mathrm{ACT}} -P_{i\mathrm{FOR}} })^2} } \end{aligned}$$

(42)

The ESD is given by:

$$\begin{aligned} \left\{ {\begin{array}{l} \hbox {ESD}_{\mathrm{day}} =\sqrt{\frac{1}{N}\sum \limits _{i=1}^N {( {E_i -E_i^{\mathrm{Ave}} })^2} }\\ E_i =P_{i\mathrm{ACT}} -P_{i\mathrm{FOR}} \\ E_i^{\mathrm{Ave}} =\frac{1}{N}\sum \limits _{i=1}^N {E_i } \\ \end{array}} \right. \end{aligned}$$

(43)

5.2 Spanish electricity market

To perform forecasting, the target and candidate inputs variable are linearly normalized between (0, 1). Then, the proposed GMI based on feature selection is employed to select the best input variables for machine learning. Using the selected training, inputs and validation samples are constructed based on the data of the previous 50 days. Each of 24 forecasted outputs is trained through its respective 49 training trials, and is validated via 1 validation trial. The proposed price forecasting approach has price series data up to the midnight of pervious night. Generally, if the kth forecaster engine acquires the hourly prices of the forecast day, the predicted price values for these hours by its earlier forecasters are used. For example, $P_{h-1}$ for the second forecaster is the predicted price by the first forecaster. As shown in the Table 1, the results of the proposed algorithm for 4 test weeks of the Spanish electricity market in year 2002 are compared with some other existing methods in literature (Amjady and Hemmati 2009; Pindoriya et al. 2008; Catalão et al. 2010; Anbazhagan and Kumarappan 2013; Shafie-khah et al. 2011; Amjady and Keynia 2009, 2010; Amjady and Daraeepour 2009). For the sake of a fair comparison, the same test weeks, i.e., the fourth week of February, May, August, and November, in winter, spring, summer, and fall seasons, respectively, are considered for all calculations (Pindoriya et al. 2008; Catalão et al. 2010; Anbazhagan and Kumarappan 2013; Shafie-khah et al. 2011; Amjady and Keynia 2009, 2010; Amjady and Daraeepour 2009; Keynia et al. 2012).

Table 1 Weekly MAPE values in terms of percentage (%) for 4 weeks of the Spanish electricity market

Full size table

According to Table 1, the proposed hybrid algorithm has better weekly MAPE than other methods in all seasons. Moreover, the average of the MAPE index of the proposed method is considerably less than all other methods (last row of Table 1). Improved values in the average MAPE of the proposed price forecasting approach with respect to the other methods are shown in Fig. 7, and are calculated by Eq. (44). Table 2 shows the variance of the forecasting errors, as a measure of uncertainty. While this index for the proposed hybrid algorithm is less than the other forecasting methods, the average variance value of the method is significantly less than the other methods. Improvement in the average error variance of the proposed method compared to HEA (as a recently published paper (Osórioa et al. 2014) and best answer among all other existing forecasting methods in Table 1) is about 20 %. The electricity price data of Spanish’s market can be found in website (Informe de operación del sistema eléctrico 2015).

$$\begin{aligned}&\hbox {Improvement}\nonumber \\&=\frac{\left( {\hbox {MAPE}_{{\mathrm{Other \,Method}}} -\hbox {MAPE}_{{\mathrm{Proposed\, Method}}} }\right) }{\hbox {MAPE}_{{\mathrm{Other\,Method}}} }\times 100\nonumber \\ \end{aligned}$$

(44)

Table 2 Error variance for 4 weeks of the Spanish electricity market

Full size table

Furthermore, sample results for the MI (Shayeghi and Ghasemi 2013) and GMI are presented. For the sake of conciseness, results for only one of the test days are represented in Table 3. $P_{h}$ shows price of hour h. In Table 3, selected features (columns 2 and 4) and normalized MI values (columns 3 and 5) are shown. The results have been obtained after the convergence of the iterative search procedure. According to Table 3, out of 600 candidate inputs considered for the Spanish electricity market, 18 inputs are selected for MI, which indicates filtering ratio ($600/18=33.33\,\%$) of the feature selection technique. While the GMI selects only 14 inputs which indicates higher filtering ratio ($600/14=42.85\,\%$) than traditional MI methods. To be able to additionally provide a graphical view on the day-ahead and week-ahead price forecasting accuracy obtained by the proposed hybrid algorithm, the forecast, actual signal and forecast error for the day-ahead price forecasting is shown in Fig. 8.

Table 3 Selected features plus their ranks and normalized MI and GMI values for Spanish market

Full size table

5.3 Iranian electricity market

In this case study, the proposed hybrid forecasting algorithm shown in Fig. 6 is used to forecast the day-ahead prices in Iranian electricity market. The effectiveness of the proposed algorithm is demonstrated by forecasting price for Iran’s electricity market over March of 2012. The data of Iran’s electricity market can be found in Informe de operación electricity market (2015). The simulation mechanism is similar to the Spanish market. Price data of the Iran’s electricity market in year 2012 are shown in Fig. 9. For comparison, multi-layer perceptron (MLP) neural network, WT $-$ MI $+$ NN and the proposed hybrid algorithm is used for price forecasting. To provide a graphical view on the forecast accuracy of the proposed hybrid algorithm, results for the Iran’s electricity market are shown in Fig. 10a and b. Table 4 compares results obtained via the proposed hybrid algorithm, MLP and WT $-$ MI $+$ NN methods for the 24 h.

Table 4 shows forecast results of the proposed hybrid algorithm, MLP and WT $-$ MI $+$ NN methods for the 24 h (daily) and 168 h (weekly). Comparison shows that the prediction is occasionally not acceptable because obtained error in some methods is generally above 40 % and move away from the actual price signal at price valley or peaks. One of important reasons for this shortage can be found in small size of available historical data. WT $-$ MI $+$ NN method recovers neural network (NN) performance considerably while it has some big deviation in some hours. Except for hour 24 in Table 4, the error percentage is always above of 15 %. Compared to NN and WT $-$ MI $+$ NN, the proposed hybrid algorithm (with error lower than 10 %) has superior accuracy with more stable prediction.

Table 4 A sample of 24 forecasted hours in a working day in Iran

Full size table

In addition, Table 4 compares the proposed method, MLP and WT $-$ MI $+$ NN techniques for forecasting electricity prices using the MAPE, FMSE and ESD criteria. As the proposed forecasting algorithm shows lower MAPE (FMSE and ESD), its predictions are more stable. During the valley hours, the volatility of the spot price increases, and thus MLP shows considerably larger prediction errors. However, the price volatility during the specific off-peak hours is also less accurate. Furthermore, overall setup time of the proposed hybrid algorithm including the WPT-GMI, training of the forecaster (LSSVM+S-OLABC) and fine-tuning of the adaptable parameters takes about 26 min on a Pentium P4 3.2 GHz PC with 1 GB RAM which is acceptable within a day-ahead decision making framework.

Table 5 MAPE (%) analysis of the forecasting error of NSW electricity market in year 2007

Full size table

5.4 New South Wales (NSW) market

As the final test market New South Wales (NSW) of Australian’s national grid is used to demonstrate the performance of the proposed forecasting algorithm. The historical data for this market in 2007 can be found in Australian Energy Market Operator (2015). To perform forecasting, the target and candidate input variables ($T_{r}$ and $I_{n})$ are normalized in the range of 0 and 1. Then, the proposed feature selection algorithm is used to pick up the best input candidates. Twelve months were considered for simulation so that all days of any month, except the last day, were selected for training and the final day was used for forecast evaluation. The forecast obtained by the proposed hybrid algorithm is compared with other existing methods in literature in Table 5. The performance of the proposed hybrid electricity price forecasting algorithm is compared with similar methods in literature such as ARIMA, LSSVM, PLSSVM, ARIMA $+$ LSSVM, ARIMA $+$ PLSSVM, and WT $+$ ARIMA $+$ LSSVM methods (Zhang et al. 2012). According to Table 5, the proposed hybrid algorithm has better daily MAPE than other methods in all simulations (except for some months such as April and November with nearly the same results). Furthermore, ARIMA method predictions were less accurate than those of the proposed hybrid algorithm with the average MAPE (%) increasing from 2.10 to 13.63 %.

5.5 Forecasting results for three markets

For fair comparison between MI and proposed feature selection (GMI), the data set is kept constant so that the efficiency of the feature selection methods can be compared. The MI technique (in Table 3) is a well-known feature selection method that only considers the relevancy of candidate inputs with the target variable. The proposed GMI method considers both relevancy and redundancy of candidate inputs. However, the whole common information content of two candidate inputs is considered as the redundant information in this feature selection method. Moreover, a more accurate formulation of redundancy based on interaction gain is proposed in the GMI algorithm, which considers the common information content of two candidate inputs. The numerical results of Tables 1, 2, 4 and 5 show that the proposed hybrid forecasting algorithm outperforms all other methods in the price forecasting. It is also shown that the proposed algorithm has lowest MAPE, FMSE and ESD values among all methods of Tables 1, 2, 4 and 5 in all test weeks, indicating better forecast accuracy and stability.

6 Conclusions

To improve the accuracy of electricity price forecasting, a novel hybrid method is proposed in this paper, which is the combination of WPT, GMI, and LSSVM-B optimized by S-OLABC algorithm. The proposed forecasting algorithm is examined using data from the Spanish, NSW, and Iranian electricity markets as three successful electricity markets. Greater performance of the proposed hybrid algorithm can be attributed to three causes. First, the WPT can convert ill-behaved price signals into some subsets of better behaving signals. Second, GMI can be computed by a reasonable amount of historical data and low computation burden which maximizes relevancy and minimizes redundancy and the LSSVM-B model has nonlinear mapping capabilities, which can more easily capture the nonlinear component of electricity prices. Third, S-OLABC algorithm can tune suitable variables of the LSSVM-B model, in which choosing inappropriate adjusting variables will cause either under or over fitting. The proposed method can also be used for the other forecast processes such as wind power forecasting. The GMI considers three-way interactions. Extension of this method to additionally include higher order interactions will be considered in the future work.

References

Amjady N, Daraeepour A (2009) Design of input vector for day-ahead price forecasting of electricity markets. Exp Syst Appl 36(10):12281–12294
Article Google Scholar
Amjady N, Hemmati H (2009) Day-ahead price forecasting of electricity markets by a hybrid intelligent system. Euro Trans Elect Power 19(1):89–102
Article Google Scholar
Amjady N, Keynia F (2009) Day-ahead price forecasting of electricity markets by mutual information technique and cascaded neuro-evolutionary algorithm. IEEE Trans Power Syst 24(1):306–318
Article Google Scholar
Amjady N, Keynia F (2010) Application of a new hybrid neuro-evolutionary system for day-ahead price forecasting of electricity markets. Appl Soft Comput 10(3):784–792
Article Google Scholar
Anbazhagan S, Kumarappan N (2012) A neural network approach to day-ahead deregulated electricity market prices classification. Elect Power Syst Res 86:140–150
Article Google Scholar
Anbazhagan S, Kumarappan N (2013) Day-ahead deregulated electricity market price forecasting using recurrent neural network. IEEE Syst J 7(4):866–872
Article Google Scholar
Anbazhagan S, Kumarappan N (2014) Day-ahead deregulated electricity market price forecasting using neural network input featured by DCT. Energy Convers Manag 78:711–719
Article Google Scholar
Australian Energy Market Operator (2015) (Online). http://www.aemo.com.au
Catalão JPS, Mariano SJPS, Mendes VMF, Ferreira LAFM (2007) Short-term electricity prices forecasting in a competitive market: a neural network approach. Elect Power Syst Res 77:1297–1304
Article Google Scholar
Catalão JPS, Pousinho HMI, Mendes VMF (2010) Neural networks and wavelet transform for short-term electricity prices forecasting. Eng Intell Syst Elect Eng Commun 18(2):85–92
Google Scholar
Catalão JPS, Pousinho HMI, Mendes VMF (2011) Hybrid wavelet-PSO-ANFIS approach for short-term electricity prices forecasting. IEEE Trans Power Syst 26(1):137–144
Article Google Scholar
Conejo AJ, Plazas MA, Espinola R, Molina AB (2005) Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans Power Syst 20(2):1035–1042
Article Google Scholar
Contreras J, Espínola R, Nogales FJ, Conejo AJ (2003) ARIMA models to predict next-day electricity prices. IEEE Trans Power Syst 18(3):1014–1020
Article Google Scholar
Da L, Dongxiao N, Hui W, Leilei F (2014) Short-term wind speed forecasting using wavelet transform and support vector machines optimized by genetic algorithm. Renew Energy 62:592–597
Article Google Scholar
El Akadi, El Ouardighi A, Aboutajdine A (2008) A powerful feature selection approach based on mutual information. Int J Comput Sci Netw Secur 8:116–121
Google Scholar
Garcia RC, Contreras J, Akkeren MV, Garcia JBC (2005) A GARCH forecasting model to predict day-ahead electricity prices. IEEE Trans Power Syst 20:867–874
Article Google Scholar
Garcia-Martos C, Rodriguez J, Sanchez MJ (2007) Mixed models for short-run forecasting of electricity prices: application for the Spanish market. IEEE Trans Power Syst 22(2):544–551
Article Google Scholar
Ghofrani M, Ghayekhloo M, Arabali A, Ghayekhloo A (2015) A hybrid short-term load forecasting with a new input selection framework. Energy 81:777–786
Article Google Scholar
http://infinity77.net/global_optimization/test_functions_nd_L.html (2015)
Informe de operación del sistema eléctrico (2015) Red Eléctrica de España (REE), Madrid, Spain (online). http://www.ree.es/cap03/pdf/Inf_Oper_REE_99b.pdf
Informe de operación electricity market (2015) Tehran, Iran (Online). http://edipg.igmc.ir:8070/edipg
Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comp 214:108–32
Article MathSciNet MATH Google Scholar
Keynia F (2012) A new feature selection algorithm and composite neural network for electricity price forecasting. Eng Appl Artif Intell 25(8):1687–1697
Article Google Scholar
Lee J, Kim D (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn Lett 34:349–357
Article Google Scholar
Lora Troncoso A, Riquelme Santos JM, Gómez Expósito A, Martínez Ramos JL, Riquelme Santos JC (2007) Electricity market prices forecasting based on weighted nearest neighbors techniques. IEEE Trans Power Syst 22(3):1294–1301
Article MATH Google Scholar
Martinez-Alvarez F, Troncoso A, Riquelme JC, Aguilar-Ruiz JS (2011) Energy time series forecasting based on pattern sequence similarity. IEEE Trans Knowl Data Eng 23(8):1230–1243
Article Google Scholar
Melek AB, Derya A (2010) An adaptive network-based fuzzy inference system (ANFIS) for the prediction of stock market return: the case of the Istanbul stock exchange. Expert Syst Appl 37:7908–7912
Article Google Scholar
Mohamed Z, Bodger P (2005) Forecasting electricity consumption in New Zealand using economic and demographic variables. Energy 30(10):1833–1843
Article Google Scholar
Neal RM (1996) Bayesian learning for neural networks. Springer, New York
Book MATH Google Scholar
Osórioa GJ, Matiasa JCO, Catalão JPS (2014) Electricity prices forecasting by a hybrid evolutionary-adaptive methodology. Energy Convers Manag 80:363–373
Article Google Scholar
Parsopoulos KE, Vrahatis MN (2004) On the computation of all global minimizers through particle swarm optimization. IEEE Trans Evol Comput 8(3):211–224
Article Google Scholar
Pindoriya NM, Singh SN, Singh SK (2008) An adaptive wavelet neural network-based energy price forecasting, in electricity market. IEEE Trans Power Syst 23(3):1423–1432
Article Google Scholar
Shafie-khah M, Moghaddam MP, Sheikh-El-Eslami MK (2011) Price forecasting of day-ahead electricity markets using a hybrid forecast method. Energy Convers Manag 52(5):2165–2169
Article Google Scholar
Sharma V, Srinivasan D (2013) A hybrid intelligent model based on recurrent neural networks and excitable dynamics for price prediction in deregulated electricity market. Eng Appl Artif Intell 26(5–6):1562–1574
Article Google Scholar
Shayeghi H, Ghasemi A, Moradzadeh M, Nooshyar M (2015) Simultaneous day-ahead forecasting of electricity price and load in smart grids. Energy Convers Manag 95:371–384
Article Google Scholar
Shayeghi H, Ghasemi A (2011) Solving economic load dispatch problems with valve point effects using artificial bee colony algorithm. Int Rev Electr Eng 6(5):2569–2577
Google Scholar
Shayeghi H, Ghasemi A (2013) Day-ahead electricity prices forecasting by a modified CGSA technique and hybrid WT in LSSVM based scheme. Energy Convers Manag 74:482–491
Article Google Scholar
Shayeghi H, Ghasemi A (2014) A modified artificial bee colony based on chaos theory for solving non-convex emission/economic dispatch. Energy Convers Manag 79:344–354
Article Google Scholar
Sousa TM, Pinto T, Vale Z, Praca I, Morais H (2012) Adaptive learning in multiagent systems: a forecasting methodology based on error analysis. Adv Intell Soft Comput 156:349–357
Google Scholar
Suykenns JAK, Vandewalle J (1999) Least squares support vector machine. Neural Process Lett 9(3):293–300
Article Google Scholar
Taherian H, Nazer I, Razavi E, Goldani SR, Farshad M, Aghaebrahimi MR (2013) Application of an improved neural network using cuckoo search algorithm in short-term electricity price forecasting under competitive power markets. J Oper Autom Power Eng 1(2):136–146
Google Scholar
Tan Z, Zhang J, Wang J, Xu J (2010) Day-ahead electricity price forecasting using wavelet transform combined with ARIMA and GARCH models. Appl Energy 87(11):3606–3610
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Wickerhauser MV, Coifman RR, Meyer Y (1992) Wavelet analysis and signal processing. In: Wavelets and their applications. Jones and Bartlett, Boston, pp 153–178
Zhang J, Tan Z, Yang S (2012) Day-ahead electricity price forecasting by a new hybrid method. Comput Ind Eng 63:695–701
Article Google Scholar
Zhang J, Tan Z (2013) Day-ahead electricity price forecasting using WT, CLSSVM and EGARCH model. Elect Power Energy Syst 45:362–368
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technical Engineering Department, University of Mohaghegh Ardabili, Daneshgah Street, P.O. Box: 179, Ardabil, Iran
H. Shayeghi, A. Ghasemi & M. Nooshyar
Department of Electrical Engineering, Centre of Excellence for Power System Automation and Operation, Iran University of Science and Technology, Tehran, Iran
H. Shayeghi
Electrical Energy Laboratory, Department of Electrical Energy, Systems and Automation, Ghent University, 9000, Ghent, Belgium
M. Moradzadeh

Authors

H. Shayeghi
View author publications
You can also search for this author in PubMed Google Scholar
A. Ghasemi
View author publications
You can also search for this author in PubMed Google Scholar
M. Moradzadeh
View author publications
You can also search for this author in PubMed Google Scholar
M. Nooshyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. Shayeghi.

Additional information

Communicated by V. Loia.

Appendix: Algorithm analysis

In this section, the exploration and exploitation of the proposed S-OLABC algorithm are examined using orthogonal learning ABC (OLABC) and standard ABC. Langermann’s function (infinity77.net 2015) with two variables of $X_{1}$ and $X_{2}$ is selected as non-convex problem with high local areas far from global optimum and flat areas and shown in Fig. 11. The mathematical formula can be expressed as:

$$\begin{aligned}&f(x_1 ,x_2 )=-\sum \limits _{i=1}^5 {\frac{c_i \cos (\pi [(x_1 -a_i )^2+(x_2 -b_i )^2])}{\exp (\frac{(x_1 -a_i )^2+(x_2 -b_i )^2}{\pi })}} ,\nonumber \\&\left\{ {\begin{array}{lllll} a=\left[ {{\begin{array}{llllll} {{\begin{array}{ll} 3 &{}\quad 5 \\ \end{array} }} &{}\quad 2 &{}\quad 1 &{}\quad 7 \\ \end{array} }} \right] ^T \\ b=\left[ {{\begin{array}{lllll} {{\begin{array}{ll} 5 &{}\quad 2 \\ \end{array} }} &{}\quad 1 &{}\quad 4 &{}\quad 9 \\ \end{array} }} \right] ^T \\ c=\left[ {{\begin{array}{llllll} {{\begin{array}{ll} 1 &{}\quad 2 \\ \end{array} }} &{}\quad 5 &{}\quad 2 &{}\quad 3 \\ \end{array} }} \right] ^T \\ \end{array}} \right. \end{aligned}$$

(45)

Figure 12 shows the contour plot of the Langermann’s function with movement of the population in the search process. Moreover, for the sake of a fair comparison, the initial populations were the same for these algorithms as shown in Fig. 12 A$_{1}$–A$_{3}$. Other control parameters were selected based on the other available papers (Shayeghi and Ghasemi 2014, 2011). It is clear that after the last iteration, all particles are collected on the global optimum for the proposed algorithm; however, ABC and OLABC had some violation. This figure clearly depicts that, by applying each improvement to the original ABC algorithm, the total performance was enhanced and better results were obtained step by step.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shayeghi, H., Ghasemi, A., Moradzadeh, M. et al. Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm. Soft Comput 21, 525–541 (2017). https://doi.org/10.1007/s00500-015-1807-1

Download citation

Published: 06 August 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00500-015-1807-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm

Abstract

Similar content being viewed by others

Day-Ahead Electricity Price Forecasting Using WT, MI and LSSVM Optimized by Modified ABC Algorithm

RETRACTED ARTICLE: Day-ahead electricity price forecasting using WPT, VMI, LSSVM-based self adaptive fuzzy kernel and modified HBMO algorithm

Applications of empirical wavelet decomposition, statistical feature extraction, and antlion algorithm with support vector regression for resident electricity consumption forecasting

1 Introduction

1.1 Aims and difficulties

1.2 Literature review

1.3 Motivation and contribution

1.4 Structure of the paper

2 Price forecasting problem formulation

2.1 Proposed WPT model

2.2 Shannon entropy

2.3 Least squares support vector machine-Bayesian model (LSSVM-B)

2.4 Theory of MI

2.5 Proposed GMI model

3 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)

3.1 Overview of standard ABC

3.2 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)

4 Application of S-OLABC to the day-ahead electricity price forecasting problem

5 Simulation results and discussion

5.1 Evaluating the forecasting error

5.2 Spanish electricity market

5.3 Iranian electricity market

5.4 New South Wales (NSW) market

5.5 Forecasting results for three markets

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Algorithm analysis

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm

Abstract

Similar content being viewed by others

Day-Ahead Electricity Price Forecasting Using WT, MI and LSSVM Optimized by Modified ABC Algorithm

RETRACTED ARTICLE: Day-ahead electricity price forecasting using WPT, VMI, LSSVM-based self adaptive fuzzy kernel and modified HBMO algorithm

Applications of empirical wavelet decomposition, statistical feature extraction, and antlion algorithm with support vector regression for resident electricity consumption forecasting

Explore related subjects

1 Introduction

1.1 Aims and difficulties

1.2 Literature review

1.3 Motivation and contribution

1.4 Structure of the paper

2 Price forecasting problem formulation

2.1 Proposed WPT model

2.2 Shannon entropy

2.3 Least squares support vector machine-Bayesian model (LSSVM-B)

2.4 Theory of MI

2.5 Proposed GMI model

3 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)

3.1 Overview of standard ABC

3.2 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)

4 Application of S-OLABC to the day-ahead electricity price forecasting problem

5 Simulation results and discussion

5.1 Evaluating the forecasting error

5.2 Spanish electricity market

5.3 Iranian electricity market

5.4 New South Wales (NSW) market

5.5 Forecasting results for three markets

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Algorithm analysis

Appendix: Algorithm analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation