Abstract
Electricity price forecasting has nowadays become a significant task to all market players in deregulated electricity market. The information obtained from future electricity helps market participants to develop cost-effective bidding strategies to maximize their profit. Accurate price forecasting involves all market participants such as customer or producer in competitive electricity markets. This paper presents a novel hybrid algorithm to forecast day-ahead prices in the electricity market. This hybrid algorithm consists of (a) generalized mutual information (GMI), wavelet packet transform (WPT) as pre-processing methods, (b) least squares support vector machine based on Bayesian model (LSSVM-B) as forecaster engine, (c) and a modified artificial bee colony (ABC) algorithm used for optimization. Moreover, the orthogonal learning (OL) is used as a global search tool to enhance the exploitation of the ABC algorithm. Hereafter, call the proposed hybrid algorithm as S-OLABC. The numerical simulation results performed in this paper for different cases in comparison to previously known classical and intelligent methods. In addition, it will be shown that GMI based on WPT has better performance in extracting input features compared to classical mutual information (MI).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
1.1 Aims and difficulties
Effective and accurate electricity price forecasting is a critical issue for energy-market participants, because it can help them make a suitable risk management in competitive electricity markets and thus maximize their profits. Market participants rely on price forecasts to decide on their bidding strategies, allocate assets, and plan facility investment (Taherian et al. 2013). The main purpose of electricity-market participants is to have a clear and cost-effective market. Hereby, all market players need an exact and robust estimation of future electricity price to set their bidding strategies in the real market to maximize their profit. On the other hand, the prediction of electricity prices is very difficult as prices are shown to be more volatile in electricity markets than any other financial markets. This is partially due to the fact that electrical supply and demand need to be on a real-time equilibrium and, unlike other commodities, it is not financially viable to store large quantity of electricity. In addition, many different parameters such as weather conditions, availability of relatively inexpensive generation facilities (e.g., nuclear and hydro), sudden disturbance or fault in generation and transmission power systems can affect on electricity market price volatility (Shayeghi et al. 2015). This volatility undoubtedly indicates that there is a strong need in the electric power industry for reasonably accurate methods that suitably forecast electricity prices.
1.2 Literature review
In deregulated electricity markets, participants that are involved in trading make extensive use of price prediction techniques either to bid or to hedge against volatility. Having reliable daily price forecast information, producers or energy service companies is able to delineate good bilateral contracts and makes better financial decision. In recent years, several models have been applied to predict prices in electricity markets (Taherian et al. 2013; Shayeghi et al. 2015; Contreras et al. 2003; Garcia et al. 2005; Mohamed and Bodger 2005; Melek and Derya 2010; Anbazhagan and Kumarappan 2012; Da et al. 2014). The available forecasting models can be generally classified into three groups: classical, intelligent and hybrid computing methods.
The classical group consists of auto-regressive integrated moving average (ARIMA) (Contreras et al. 2003), the generalized auto-regressive conditional heteroskedastic (GARCH) approach (Garcia et al. 2005), dynamic regression (DR) and multiple linear regressions (Mohamed and Bodger 2005). They usually use linear models with limited or even no capability to characterize the non-linearity of the electricity price patterns. In addition, the stationary process considered for most of these studies cannot capture non-stationary features of the price time-series.
Intelligent group use data-driven models, where input–output mapping is learned from historical examples. For example, fuzzy inference (Melek and Derya 2010), artificial neural network (ANN) (Anbazhagan and Kumarappan 2012), and wavelet transform (WT) \(+\) support vector machine (SVM) (Da et al. 2014), etc.
However, a single model may not be able to predict electricity price accurately. Thus, hybrid models have been widely used in various applications, including price forecasting. The goal in hybrid forecasting is to combine different models and improve the final forecast accuracy. The hybrid methods have been investigated in several studies such as MI \(+\) WT \(+\) FNN (Osórioa et al. 2014), MI \(+\) WT \(+\) SVM (Shayeghi and Ghasemi 2013), hybrid intelligent method (Ghofrani et al. 2015) and WT \(+\) ARIMA \(+\) LSSVM method (Zhang et al. 2012). A hybrid model using WT combined with ARIMA and GARCH models was used in Tan et al. (2010). Sousa et al. (2012) proposed a hybrid method that applied a neural network as an auxiliary forecasting tool for predicting electricity market prices. Through the analysis of prediction error patterns, the simulation method predicts the expected error for the next forecast, and uses it to adapt the actual forecast. Sharma and Srinivasan (2013) combined a FitzHugh–Nagumo model, for mimicking the spiky price behavior, with an Elman network, for regulating the latter, and a feed-forward ANN, for modeling the residuals. Therefore, this hybrid model is used for point and interval forecasting in markets in Australia, Ontario, Spain and California. To achieve the helter-skelter kind of electricity price, one-dimensional discrete cosine transforms (DCT) input featured feed-forward neural network (FFNN) known as DCT-FFNN is modeled in Anbazhagan and Kumarappan (2014). The DCT-FFNN model is compared with other models to estimate the market clearing prices of mainland Spain. The common disadvantage of most of previously reviewed methods is that those models suffer from the over-fitting problem. Therefore, this paper proposes a new hybrid algorithm for price forecasting that can efficiency tackle said shortages.
1.3 Motivation and contribution
The main contributions of this paper are summarized below:
-
With increasing input of data dimension into learning machine (e.g., ANN), different data classification models become accordingly harder to utilize. Also, high-dimensional data is a serious difficulty for many classification models due to its high computational cost and memory usage. Dimensionality diminution and best feature subset selection are two available methods for reducing the attribute space of an input feature set, which is a significant factor of both supervised and unsupervised regression and classification models (Shayeghi and Ghasemi 2013). MI is a special tool (to minimize relative entropy) in evaluating the approximation of the joint probability with the product of marginal probabilities. According to our review of MI, they require provision of a complete model with more performance. Albeit, the pervious MI technique can consider relevancy and redundancy in simultaneous term, but by increasing of input data, they may lose some valuable data. Therefore, in this paper, a new three-way feature selection algorithm is proposed that filters out irrelevant and redundant candidate inputs, respectively.
-
Specifically, this paper employs the WPT to construct features that highly correlate with alertness and the different levels of drowsiness. The WPT is chosen due to its ability to deal with stationary, non-stationary, or transitory characteristics of different signals including abrupt changes, spikes, drifts, and trends.
-
To select the best tree from WPT output to decrease the computational time, the Shannon entropy criteria will be applied in this paper. It measures the predictability of future subset values based on the probability distribution of the values already observed in the data.
-
This paper presents LSSVM-B as learning engine that models the nonlinear pattern in the price signal. The combination of LSSVM and Bayesian statistics allows us for both accurate prediction of day-ahead price forecasts and the level of their uncertainty. In fact, use of Bayesian method can calibrate an ensemble forecast method to give better forecasts than the linear methods. Also, enjoying all potential of LSSVM-B in learning, its control parameters are optimized by S-OLABC algorithm. In other words, to cope with the disadvantage of standard ABC method that is often trapped in local optima when optimization problem has a large number of optimization variables, two modifications are proposed in ABC based on self-adaptive model and orthogonal learning.
1.4 Structure of the paper
The reminder of this paper is organized as follows. Section 2 provides the mathematical formulation of the forecasting problem. Section 3 introduces the proposed price forecasting hybrid algorithm. To demonstrate the advantages of the proposed algorithm, three test markets have been considered. Simulation results and comparison with previously reported results are discussed in Sect. 4. Finally, the paper is concluded in Sect. 5.
2 Price forecasting problem formulation
2.1 Proposed WPT model
Application of WT in signal processing is one of the active areas in wavelet studies. In contrast to other transforms, e.g., in cosine or Fourier transforms that works truly in the frequency domain only, the DWT decomposes a signal into detail and approximation terms so that it is represented more efficiently and localized in both time (space) and frequency domains. The mathematical formulation of DWT is given in Wickerhauser et al. (1992). This paper uses WPT to decompose a price signal into detail and approximation terms (Wickerhauser et al. 1992). Let resolution \(2^{-j}\) be defined as an orthogonal level on \(L^{2}(R)\) where the scale index j is on integer. \(V_{j}\) is a possible approximation at resolution \(2^{-j}\). The orthogonal function of variable x on space \(V_{j}\) is the function \(x_{j}\) that minimizes \(\Vert x-x_{j}\Vert \). The details of a signal at resolution 2\(^{-j}\) are the difference between the approximations at resolutions 2\(^{-j+1}\) and \(2^{-j}\). According to this description and DWT (Shayeghi and Ghasemi 2013; Wickerhauser et al. 1992), a signal with approximation space \(V_{j}\) can be defined by \(V_{j+1}+W_{j+1}\) (approximation \(+\) detail). Therefore, the original orthogonal function \(\{\phi _j (t-2^jn)\}_{n\in Z} \) can be defined by approximation \(V_{j+1} =\{\phi _{j+1} (t-2^{j+1}n)\}_{n\in Z} \) and detail \(W_{j+1} =\{\varphi _{j+1} (t-2^{j+1}n)\}_{n\in Z} \) spaces. Decompositions of \(\varphi _{j+1}\) and \(\phi _{j+1} \) into \(\{\phi _j (t-2^jn)\}_{n\in Z}\) achieved by the following two filters of H (low-pass filter) and G (high-pass filter) that can be expressed as follows:
Note that this theory can be generalized to each space \(D_{j}\) with approximation \(n2^{-j}\), \(n\,\in \,Z\). It is clear that WPT offers a more suitable and flexible analysis compared to DWT. Figure 1 shows a wavelet packet decomposition tree at 3 levels. It can be seen that n-level wavelet packet decomposition produces \(2^{n}\) different sets of coefficients as opposed to \(n+ 1\) set in the DWT (Wickerhauser et al. 1992).
2.2 Shannon entropy
The Shannon entropy is a way of quantifying the “disorder” of a random variable and represents a powerful approach for evaluating the contributions of the different channels. The Shannon entropy H(p) of a probability distribution \(p\in \hbox {Pr}_c ( A)\) is:
with the convention that \(F(0) = 0\). Given an A-random variable X, then have \(H(X) = H(p_{X})\). More details are given in previous work (Shayeghi et al. 2015).
2.3 Least squares support vector machine-Bayesian model (LSSVM-B)
The SVM algorithm, developed by Vapnik (1995) in 1995, is based on statistical learning theory. However, SVM formulates the training process through quadratic programming, which can take much too time. Suykenns and Vandewalle (1999) proposed a novel SVM known as LSSVM, which is able to solve linear problems quicker with a more straight forward approach. However, in this paper, Bayesian model is chosen among all learning methods for the artificial network, as it shows the most proper performance when limited training datasets are available (Neal 1996). In fact, this paper proposes the Bayesian model integrated with a Gaussian process to enhance the LSSVM performance in prediction. The proposed method enables the uncertainty in LSSVM weight estimates to be explicitly accounted for, which in turn enables the generation of probabilistic predictions (Shayeghi and Ghasemi 2013). In other words, Bayesian method can be used to overcome various limitations that prevents SVM from becoming more widely accepted and reaching its full potential, such as lack of consideration of forecasting uncertainty, complexity in estimating appropriate parameter values, difficulty in selecting the optimum complexity, and lack of a good method to properly validate the model and interpret the modeled relationship (Melek and Derya 2010). Assume input and training variables in Gaussian process with \(x_{i},y_{i}= f(x_{i}) (i = 1,2,{\ldots },l)\), respectively. Let \(y=f({\mathbf {x}})+\delta \) denote relation between two variables x and y which \(\delta \) is the error term. Then, the covariance between two input variables \(x_{i}, x_{j}\) with the relevancy factor \(w(w_{k}\) measures the relevance of the kth variable with the prediction of the output variable) as follows:
Let us collect relevancy factors w together with parameters, A and \(\varepsilon \), as the hyperparameter vector (denotes with \(\vartheta \)) that \(\varepsilon \) is noise density function. Hereby, the noise might be biased in the proposed model in this paper. Therefore, the posterior probability of f can be expressed as follows:
where \(\varepsilon >\)0 and \(0\,<\beta \le \,1\). \(p(U\vert f,\vartheta )\) and \(p(f\vert \vartheta )\) are the likelihood and priori probability functions based on Gaussian process of the dataset U, respectively, then,
where \(| \Sigma |\) returns the determinant of the square covariance matrix. In other words, \(| \Sigma |\) is computed using the triangular factors obtained by Gaussian elimination. Using the normalized \(p(f\vert \vartheta )\) in Eq. (11), irrelevant deduction of f can be obtained. Based on Eqs. (12)–(13) and Bayes’ theorem, the posterior probability can be expressed as follows:
If \(h(f)=A\sum _{k=1}^n {\chi _{\varepsilon ,\beta } (y_k -f(x_k )} )+\frac{1}{2}f^T\sum {^{-1}f} \) then \(p(f\vert U,\vartheta )\propto e^{-h(f)}\), and the minimization function can be calculated as follows:
To solve Eq. (15), define two slack variables \(\zeta _i\) and \(\zeta _i^*\):
Consequently, LSSVM-B based on regression can be defined by:
where f is the hyper-plane function. \(y_{k}\) is kth output of LSSVM-B, \(x_{k}\) is kth input data set to the model. Based on \(\zeta \)-insensitive Vapnik loss function, then (Vapnik 1995):
Matching Lagrange function can be calculated as follows:
By the Karush–Kuhn–Tucker (KKT) conditions, the solutions using partial differentiation with respect to \(\alpha _k ,\alpha _k^*,\zeta ,\beta \) and \(\zeta ^*\)can be written as:
By eliminating w and \(\beta \), the equations can be written as:
Finally, the LSSVM-B regression can be calculated by:
where K refers to kernel function (\(K(x_{i}, x_{j})\)), with the positive definition of the matrix \(\Omega +\gamma ^{-1}I_l \in {{\mathbb {R}}}^{v\times v}\). Here, \(\Omega \in {{\mathbb {R}}}^{l\times l}\) is defined by its elements \(\Omega _{ij} =\varphi (x_i )^T\varphi (x_j )=K(x_i ,x_j )\) for \(\forall (i,j)\in {{\mathbb {N}}}_l \times {{\mathbb {N}}}_l \) and K(0, 0) is a kernel function meeting the Mercer’s theorem (Vapnik 1995). \(\beta \) is a variable scanning the space. \(\alpha _{i},\beta \in R\) are the solution of Eq. (22), \(x_{i}\) is training data and x is the new input vector, and y(x) is the LSSVM-B output model.
2.4 Theory of MI
Feature selection method is a commonly used process in data classification, wherein a subset of the features available from the data is selected for application of forecasting (Lee and Kim 2013). In this section, the concept of MI is briefly reviewed; more details can be found in Lee and Kim (2013). Let \(X= X_{1}\), \(X_{2}\), \({\ldots }\), \(X_{n}\) and \(P(X)= P(X_{1}), P(X_{2}),\ldots , P(X_{n})\) denote vector of variable and probability distribution. The entropy theory H(X) then can be expressed as:
For universalization, let the X and Y be discrete random variables with a joint probability distribution P(X, Y) then the joint entropy function H(X, Y) can be calculated by:
Moreover, when certain variables are known and others are not, the remaining uncertainty is calculated by the conditional entropy (Akadi et al. 2008):
Then, H(X, Y) and H(X / Y) or H(Y / X) are defined by Eq. (27) known as chain rule (Fig. 2):
Thus, the mathematical formulation of \(\hbox {MI}(X,Y)\) between two variables X and Y can be expressed as:
When the MI is a large value then X and Y are closely related and vice versa.
2.5 Proposed GMI model
Dimensionality reduction of the raw input variable space is a fundamental step in most forecasting tasks. Focusing on the most relevant information in a potentially overwhelming amount of data is useful for a better understanding of the data (Lee and Kim 2013). In this paper, a new feature selection based on feature subset selection (FSS) and dimensionality reduction (DR) is proposed (Akadi et al. 2008).
There are two different kinds of relevance model as shown in Fig. 3:
Strong relevance A feature \(f_{i}\) is strongly relevant if its removal degrades the performance of the Bayes optimum classifier.
Weak relevance A feature \(f_{i}\) is weakly relevant if it is not strongly relevant and there exists a subset of features \(\vec {{F}'}\) such that the performance on \(\vec {{F}'}\cup \{f_i \}\) is better than the performance on just \(\vec {{F}'}\).
The rest of the features will be irrelevant. Among them we could say that there are redundant features and noisy features.
A different classification of the features consists of:
Relevant features those features which, by themselves or in a subset with other features, have information about the class.
Redundant features those which can be removed because there is another feature or subset of features which already supply the same information about the class. This feature selection is completely illustrated in previous work (Shayeghi et al. 2015). The conditional mutual information (CMI) between two subsets, e.g., A and B conditioned subset C can be defined as:
And in similar way, MI in three interactions information based on relevancy and redundancy influence can be expressed as follows:
In this paper, the relevance and redundancy of the candidate feature can be calculated by:
where \(I(x_{i};x_{s};C)\) is the redundancy term (Fig. 4).
3 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)
3.1 Overview of standard ABC
In this section, the standard ABC is briefly reviewed. Interested readers are referred to Karaboga and Akay (2009) for more details. The pseudo-code of the standard ABC algorithm is shown in Fig. 5.
3.2 Self-adaptive orthogonal learning artificial bee colony (S-OLABC)
Albeit the standard ABC is a successful algorithm for finding the optimal solution in a optimization problem, however, it suffers from often converging to local optima in search process. Therefore, to make equilibrium between the exploration and exploitation capability in the search process in an attempt to achieve high optimization performance, two modifications are proposed in this paper. However, the original design of the solution’s generation is affected by random variable \(\varphi _{ij} \), therefore, it is not strong enough to maximize the exploitation capacity. Hereby, an extra coefficient \(\mho _{ij}\) can be defined as follows:
where \(\mho _{ij}\) can be adaptively updated at each iteration. If \(\mho _{ij}\) is large, then \(\mho _{ij} \varphi _{ij} \) will be large and exploration can be enhanced to get rid of local optimal areas. Inversely, when \(\mho _{ij}\) is small, then \(\mho _{ij} \varphi _{ij} \) will be small and exploitation will be efficient to find the best solution in the search space. To select the best value for \(\mho _{ij}\) in exploitation and exploration, two variables \(T_{1}<\)0 and \(T_{2}>0\) are defined and two random variables \(R_{1}\) and \(R_{2}\) are generated in (0, \(T_{1}\)) and (0, \( T_{2})\), respectively. Therefore, the following factors \(\mho _{1j} =2^{R_1 }\) and \(\mho _{2j} =2^{R_2 }\) can be given for exploitation and exploration, respectively. In fact, two population vectors are generated by \(\mho _{1j}\), \(\mho _{2j} \). When \(T_{1}<0\), then \(R_{1}\) is negative, and \(\mho _{1j}\) will be small, and exploitation will be amplified. Next, the best \(\mho _{ij} \) is chosen based on its best fitness values obtained from the generated population with \(\mho _{1j}\) and \(\mho _{2j} \). Hereby, this factor can be updated by:
where \(\lambda \) is a constant factor and initially \(\mho _{ij} =1\). As another disadvantage of standard ABC algorithm: each scout bee uses its historical/neighborhood’s best information through simple model. Such a learning approach is easy to employ, but is incompetent when searching in complex optimization problem spaces. Thus, a novel OL model (Parsopoulos and Vrahatis 2004) is proposed for ABC to recover more useful information via orthogonal operator in this paper. Let us consider two initial solution vectors of \(X_1 =\left\{ {x_1^1 ,x_1^2 ,\ldots ,x_1^D } \right\} \) and \(X_2 =\left\{ {x_2^1 ,x_2^2 ,\ldots ,x_2^D } \right\} \). \(X_{1 }\) and \(X_{2}\) define a search range \(\left[ {\min ( {x_1^j ,x_2^j }),\max ( {x_1^j ,x_2^j })} \right] \) for jth variable of the design variable vector, X. Hereby, the search range is quantized into N-level, \(\rho _{k,1} ,\rho _{k,2} ,\ldots ,\rho _{k,N} \)as follows:
This quantization cannot be directly applied to orthogonal array. Thus, it is divided into k-subsets vectors as follows:
Finally, the proposed OL solutions can be defined as follows:
The algorithm analysis is given in the Appendix.
4 Application of S-OLABC to the day-ahead electricity price forecasting problem
The process of the day-ahead electricity price forecasting by S-OLABC algorithm can be summarized as:
-
Step 1 Read price data that contains price series and decompose them into training and testing sets known as \(T_{r}\) and \(I_{n}\), respectively.
-
Step 2 Set up the adjustable S-OLABC parameter values, such as the maximum cycle number, colony size and limit value to be 50, 40 and 10 values, respectively.
-
Step 3 Decompose price data (historical data up to 24 h of day \(d-1\)) via WPT in a set of four constitutive series which is defined as \(a_{h}, b_{h}, c_{h}\) and \(d_{h}, h=1,2 ,{\ldots },T\), where the value of T ranges usually between 1 week to 2 months. \(a_{h}, b_{h}, c_{h}\) are detail with small adjustments and \(d_{h}\) is approximation series which is the main component of the transform. Therefore, applying the WPT method to the original prices series can be formulated by:
$$\begin{aligned} W(p_h ;h=1,\ldots ,T)=\{a_h ,b_h ,c_h ,d_h ;h=1,\ldots ,T\}\nonumber \\ \end{aligned}$$(37) -
Step 4 As a contribution of this paper and unlike other research papers in prediction area, consider detail and approximation terms as valuable information with probable input candidate, and send to the GMI system. In addition, use the Shannon theory and select the best branch of WPT tree. Then, GMI system tries to select the best data with most relevancies and least redundancy. The passed data of this system are sorted as a vector; {\(x_{1}\), \(x_{2}\), \({\ldots }\), \(x_{n}\)}. Now, these vectors are ready to send to step 5 for training.
-
Step 5 This step is similar to the engine of a machine, being the main part of the forecasting algorithm. When the LSSVM-B is launched with {\(x_{1}\), \(x_{2}\), \({\ldots }\), \(x_{n}\)} vector then the training process will start. Further details about learning process of the LSSVM-B model are given in Sect. 2.3. In first iteration, the outputs of LSSVM-B system may be not proper; therefore, completion of training process needs stages 6 and 7 as well.
-
Step 6 Calculate the fitness value. Sort the initial population as well as all their related data in the descending order of fitness.
$$\begin{aligned} \hbox {Fitness}=\frac{1}{N}\sum \limits _{i=1}^N {\frac{\hbox {act}_i -\hbox {forc}_i }{y_i }} \end{aligned}$$(38)where \(\hbox {act}_{i}\) and \(\hbox {forc}_{i}\) represent the actual and forecast values, respectively.
-
Step 7 Use the S-OLABC to find the best agent. The best solution is considered as an initial solution. If the best solution found by chaotic local searches is better than the previous food source then it is replaced.
-
Step 8 Use the inverse WPT to estimate the hourly prices for day d by means of the estimates for day d of the constitutive series. The inverse WPT is used in turn to reconstruct the estimate series for prices, i.e.,
$$\begin{aligned} W^{-1}(\{a_h^{\mathrm{est}} ,b_h^{\mathrm{est}} ,c_h^{\mathrm{est}} ,d_h^{\mathrm{est}} ;h= & {} T+1,\ldots ,T+24\})\nonumber \\= & {} P_h^{W,est} \end{aligned}$$(39) -
Step 9 Update velocity and position-based S-OLABC scheme. In other words, move the agents to the search area for discovering new solutions.
-
Step 10 If the maximum number of iteration is reached, then finish. Otherwise, go to step 2. The graphical process of the proposed algorithm is depicted in Fig. 6.
5 Simulation results and discussion
The proposed hybrid algorithm is tested using the real data of the hourly Iran electricity price (HIEP), Spanish and New South Wales (NSW) markets. In the proposed algorithm, the adjusted parameters are set with population size, limit value, \(G_{0}\) and iteration by 50, 100, 10 and 200, respectively.
5.1 Evaluating the forecasting error
As indicated in Shayeghi et al. (2015), this paper uses similar error-based indices to compare the forecast accuracy. These indices are:
The daily MAPE can be defined as
where \(P_{i\mathrm{ACT}}\) and \(P_{i\mathrm{FOR}}\) are actual and predicted values, and \(P_{\mathrm{AVE-ACT}}\) denotes the average of actual prices. The FMSE is:
The ESD is given by:
5.2 Spanish electricity market
To perform forecasting, the target and candidate inputs variable are linearly normalized between (0, 1). Then, the proposed GMI based on feature selection is employed to select the best input variables for machine learning. Using the selected training, inputs and validation samples are constructed based on the data of the previous 50 days. Each of 24 forecasted outputs is trained through its respective 49 training trials, and is validated via 1 validation trial. The proposed price forecasting approach has price series data up to the midnight of pervious night. Generally, if the kth forecaster engine acquires the hourly prices of the forecast day, the predicted price values for these hours by its earlier forecasters are used. For example, \(P_{h-1}\) for the second forecaster is the predicted price by the first forecaster. As shown in the Table 1, the results of the proposed algorithm for 4 test weeks of the Spanish electricity market in year 2002 are compared with some other existing methods in literature (Amjady and Hemmati 2009; Pindoriya et al. 2008; Catalão et al. 2010; Anbazhagan and Kumarappan 2013; Shafie-khah et al. 2011; Amjady and Keynia 2009, 2010; Amjady and Daraeepour 2009). For the sake of a fair comparison, the same test weeks, i.e., the fourth week of February, May, August, and November, in winter, spring, summer, and fall seasons, respectively, are considered for all calculations (Pindoriya et al. 2008; Catalão et al. 2010; Anbazhagan and Kumarappan 2013; Shafie-khah et al. 2011; Amjady and Keynia 2009, 2010; Amjady and Daraeepour 2009; Keynia et al. 2012).
According to Table 1, the proposed hybrid algorithm has better weekly MAPE than other methods in all seasons. Moreover, the average of the MAPE index of the proposed method is considerably less than all other methods (last row of Table 1). Improved values in the average MAPE of the proposed price forecasting approach with respect to the other methods are shown in Fig. 7, and are calculated by Eq. (44). Table 2 shows the variance of the forecasting errors, as a measure of uncertainty. While this index for the proposed hybrid algorithm is less than the other forecasting methods, the average variance value of the method is significantly less than the other methods. Improvement in the average error variance of the proposed method compared to HEA (as a recently published paper (Osórioa et al. 2014) and best answer among all other existing forecasting methods in Table 1) is about 20 %. The electricity price data of Spanish’s market can be found in website (Informe de operación del sistema eléctrico 2015).
Furthermore, sample results for the MI (Shayeghi and Ghasemi 2013) and GMI are presented. For the sake of conciseness, results for only one of the test days are represented in Table 3. \(P_{h}\) shows price of hour h. In Table 3, selected features (columns 2 and 4) and normalized MI values (columns 3 and 5) are shown. The results have been obtained after the convergence of the iterative search procedure. According to Table 3, out of 600 candidate inputs considered for the Spanish electricity market, 18 inputs are selected for MI, which indicates filtering ratio (\(600/18=33.33\,\%\)) of the feature selection technique. While the GMI selects only 14 inputs which indicates higher filtering ratio (\(600/14=42.85\,\%\)) than traditional MI methods. To be able to additionally provide a graphical view on the day-ahead and week-ahead price forecasting accuracy obtained by the proposed hybrid algorithm, the forecast, actual signal and forecast error for the day-ahead price forecasting is shown in Fig. 8.
5.3 Iranian electricity market
In this case study, the proposed hybrid forecasting algorithm shown in Fig. 6 is used to forecast the day-ahead prices in Iranian electricity market. The effectiveness of the proposed algorithm is demonstrated by forecasting price for Iran’s electricity market over March of 2012. The data of Iran’s electricity market can be found in Informe de operación electricity market (2015). The simulation mechanism is similar to the Spanish market. Price data of the Iran’s electricity market in year 2012 are shown in Fig. 9. For comparison, multi-layer perceptron (MLP) neural network, WT \(-\) MI \(+\) NN and the proposed hybrid algorithm is used for price forecasting. To provide a graphical view on the forecast accuracy of the proposed hybrid algorithm, results for the Iran’s electricity market are shown in Fig. 10a and b. Table 4 compares results obtained via the proposed hybrid algorithm, MLP and WT \(-\) MI \(+\) NN methods for the 24 h.
Table 4 shows forecast results of the proposed hybrid algorithm, MLP and WT \(-\) MI \(+\) NN methods for the 24 h (daily) and 168 h (weekly). Comparison shows that the prediction is occasionally not acceptable because obtained error in some methods is generally above 40 % and move away from the actual price signal at price valley or peaks. One of important reasons for this shortage can be found in small size of available historical data. WT \(-\) MI \(+\) NN method recovers neural network (NN) performance considerably while it has some big deviation in some hours. Except for hour 24 in Table 4, the error percentage is always above of 15 %. Compared to NN and WT \(-\) MI \(+\) NN, the proposed hybrid algorithm (with error lower than 10 %) has superior accuracy with more stable prediction.
In addition, Table 4 compares the proposed method, MLP and WT \(-\) MI \(+\) NN techniques for forecasting electricity prices using the MAPE, FMSE and ESD criteria. As the proposed forecasting algorithm shows lower MAPE (FMSE and ESD), its predictions are more stable. During the valley hours, the volatility of the spot price increases, and thus MLP shows considerably larger prediction errors. However, the price volatility during the specific off-peak hours is also less accurate. Furthermore, overall setup time of the proposed hybrid algorithm including the WPT-GMI, training of the forecaster (LSSVM+S-OLABC) and fine-tuning of the adaptable parameters takes about 26 min on a Pentium P4 3.2 GHz PC with 1 GB RAM which is acceptable within a day-ahead decision making framework.
5.4 New South Wales (NSW) market
As the final test market New South Wales (NSW) of Australian’s national grid is used to demonstrate the performance of the proposed forecasting algorithm. The historical data for this market in 2007 can be found in Australian Energy Market Operator (2015). To perform forecasting, the target and candidate input variables (\(T_{r}\) and \(I_{n})\) are normalized in the range of 0 and 1. Then, the proposed feature selection algorithm is used to pick up the best input candidates. Twelve months were considered for simulation so that all days of any month, except the last day, were selected for training and the final day was used for forecast evaluation. The forecast obtained by the proposed hybrid algorithm is compared with other existing methods in literature in Table 5. The performance of the proposed hybrid electricity price forecasting algorithm is compared with similar methods in literature such as ARIMA, LSSVM, PLSSVM, ARIMA \(+\) LSSVM, ARIMA \(+\) PLSSVM, and WT \(+\) ARIMA \(+\) LSSVM methods (Zhang et al. 2012). According to Table 5, the proposed hybrid algorithm has better daily MAPE than other methods in all simulations (except for some months such as April and November with nearly the same results). Furthermore, ARIMA method predictions were less accurate than those of the proposed hybrid algorithm with the average MAPE (%) increasing from 2.10 to 13.63 %.
5.5 Forecasting results for three markets
For fair comparison between MI and proposed feature selection (GMI), the data set is kept constant so that the efficiency of the feature selection methods can be compared. The MI technique (in Table 3) is a well-known feature selection method that only considers the relevancy of candidate inputs with the target variable. The proposed GMI method considers both relevancy and redundancy of candidate inputs. However, the whole common information content of two candidate inputs is considered as the redundant information in this feature selection method. Moreover, a more accurate formulation of redundancy based on interaction gain is proposed in the GMI algorithm, which considers the common information content of two candidate inputs. The numerical results of Tables 1, 2, 4 and 5 show that the proposed hybrid forecasting algorithm outperforms all other methods in the price forecasting. It is also shown that the proposed algorithm has lowest MAPE, FMSE and ESD values among all methods of Tables 1, 2, 4 and 5 in all test weeks, indicating better forecast accuracy and stability.
6 Conclusions
To improve the accuracy of electricity price forecasting, a novel hybrid method is proposed in this paper, which is the combination of WPT, GMI, and LSSVM-B optimized by S-OLABC algorithm. The proposed forecasting algorithm is examined using data from the Spanish, NSW, and Iranian electricity markets as three successful electricity markets. Greater performance of the proposed hybrid algorithm can be attributed to three causes. First, the WPT can convert ill-behaved price signals into some subsets of better behaving signals. Second, GMI can be computed by a reasonable amount of historical data and low computation burden which maximizes relevancy and minimizes redundancy and the LSSVM-B model has nonlinear mapping capabilities, which can more easily capture the nonlinear component of electricity prices. Third, S-OLABC algorithm can tune suitable variables of the LSSVM-B model, in which choosing inappropriate adjusting variables will cause either under or over fitting. The proposed method can also be used for the other forecast processes such as wind power forecasting. The GMI considers three-way interactions. Extension of this method to additionally include higher order interactions will be considered in the future work.
References
Amjady N, Daraeepour A (2009) Design of input vector for day-ahead price forecasting of electricity markets. Exp Syst Appl 36(10):12281–12294
Amjady N, Hemmati H (2009) Day-ahead price forecasting of electricity markets by a hybrid intelligent system. Euro Trans Elect Power 19(1):89–102
Amjady N, Keynia F (2009) Day-ahead price forecasting of electricity markets by mutual information technique and cascaded neuro-evolutionary algorithm. IEEE Trans Power Syst 24(1):306–318
Amjady N, Keynia F (2010) Application of a new hybrid neuro-evolutionary system for day-ahead price forecasting of electricity markets. Appl Soft Comput 10(3):784–792
Anbazhagan S, Kumarappan N (2012) A neural network approach to day-ahead deregulated electricity market prices classification. Elect Power Syst Res 86:140–150
Anbazhagan S, Kumarappan N (2013) Day-ahead deregulated electricity market price forecasting using recurrent neural network. IEEE Syst J 7(4):866–872
Anbazhagan S, Kumarappan N (2014) Day-ahead deregulated electricity market price forecasting using neural network input featured by DCT. Energy Convers Manag 78:711–719
Australian Energy Market Operator (2015) (Online). http://www.aemo.com.au
Catalão JPS, Mariano SJPS, Mendes VMF, Ferreira LAFM (2007) Short-term electricity prices forecasting in a competitive market: a neural network approach. Elect Power Syst Res 77:1297–1304
Catalão JPS, Pousinho HMI, Mendes VMF (2010) Neural networks and wavelet transform for short-term electricity prices forecasting. Eng Intell Syst Elect Eng Commun 18(2):85–92
Catalão JPS, Pousinho HMI, Mendes VMF (2011) Hybrid wavelet-PSO-ANFIS approach for short-term electricity prices forecasting. IEEE Trans Power Syst 26(1):137–144
Conejo AJ, Plazas MA, Espinola R, Molina AB (2005) Day-ahead electricity price forecasting using the wavelet transform and ARIMA models. IEEE Trans Power Syst 20(2):1035–1042
Contreras J, Espínola R, Nogales FJ, Conejo AJ (2003) ARIMA models to predict next-day electricity prices. IEEE Trans Power Syst 18(3):1014–1020
Da L, Dongxiao N, Hui W, Leilei F (2014) Short-term wind speed forecasting using wavelet transform and support vector machines optimized by genetic algorithm. Renew Energy 62:592–597
El Akadi, El Ouardighi A, Aboutajdine A (2008) A powerful feature selection approach based on mutual information. Int J Comput Sci Netw Secur 8:116–121
Garcia RC, Contreras J, Akkeren MV, Garcia JBC (2005) A GARCH forecasting model to predict day-ahead electricity prices. IEEE Trans Power Syst 20:867–874
Garcia-Martos C, Rodriguez J, Sanchez MJ (2007) Mixed models for short-run forecasting of electricity prices: application for the Spanish market. IEEE Trans Power Syst 22(2):544–551
Ghofrani M, Ghayekhloo M, Arabali A, Ghayekhloo A (2015) A hybrid short-term load forecasting with a new input selection framework. Energy 81:777–786
http://infinity77.net/global_optimization/test_functions_nd_L.html (2015)
Informe de operación del sistema eléctrico (2015) Red Eléctrica de España (REE), Madrid, Spain (online). http://www.ree.es/cap03/pdf/Inf_Oper_REE_99b.pdf
Informe de operación electricity market (2015) Tehran, Iran (Online). http://edipg.igmc.ir:8070/edipg
Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comp 214:108–32
Keynia F (2012) A new feature selection algorithm and composite neural network for electricity price forecasting. Eng Appl Artif Intell 25(8):1687–1697
Lee J, Kim D (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn Lett 34:349–357
Lora Troncoso A, Riquelme Santos JM, Gómez Expósito A, Martínez Ramos JL, Riquelme Santos JC (2007) Electricity market prices forecasting based on weighted nearest neighbors techniques. IEEE Trans Power Syst 22(3):1294–1301
Martinez-Alvarez F, Troncoso A, Riquelme JC, Aguilar-Ruiz JS (2011) Energy time series forecasting based on pattern sequence similarity. IEEE Trans Knowl Data Eng 23(8):1230–1243
Melek AB, Derya A (2010) An adaptive network-based fuzzy inference system (ANFIS) for the prediction of stock market return: the case of the Istanbul stock exchange. Expert Syst Appl 37:7908–7912
Mohamed Z, Bodger P (2005) Forecasting electricity consumption in New Zealand using economic and demographic variables. Energy 30(10):1833–1843
Neal RM (1996) Bayesian learning for neural networks. Springer, New York
Osórioa GJ, Matiasa JCO, Catalão JPS (2014) Electricity prices forecasting by a hybrid evolutionary-adaptive methodology. Energy Convers Manag 80:363–373
Parsopoulos KE, Vrahatis MN (2004) On the computation of all global minimizers through particle swarm optimization. IEEE Trans Evol Comput 8(3):211–224
Pindoriya NM, Singh SN, Singh SK (2008) An adaptive wavelet neural network-based energy price forecasting, in electricity market. IEEE Trans Power Syst 23(3):1423–1432
Shafie-khah M, Moghaddam MP, Sheikh-El-Eslami MK (2011) Price forecasting of day-ahead electricity markets using a hybrid forecast method. Energy Convers Manag 52(5):2165–2169
Sharma V, Srinivasan D (2013) A hybrid intelligent model based on recurrent neural networks and excitable dynamics for price prediction in deregulated electricity market. Eng Appl Artif Intell 26(5–6):1562–1574
Shayeghi H, Ghasemi A, Moradzadeh M, Nooshyar M (2015) Simultaneous day-ahead forecasting of electricity price and load in smart grids. Energy Convers Manag 95:371–384
Shayeghi H, Ghasemi A (2011) Solving economic load dispatch problems with valve point effects using artificial bee colony algorithm. Int Rev Electr Eng 6(5):2569–2577
Shayeghi H, Ghasemi A (2013) Day-ahead electricity prices forecasting by a modified CGSA technique and hybrid WT in LSSVM based scheme. Energy Convers Manag 74:482–491
Shayeghi H, Ghasemi A (2014) A modified artificial bee colony based on chaos theory for solving non-convex emission/economic dispatch. Energy Convers Manag 79:344–354
Sousa TM, Pinto T, Vale Z, Praca I, Morais H (2012) Adaptive learning in multiagent systems: a forecasting methodology based on error analysis. Adv Intell Soft Comput 156:349–357
Suykenns JAK, Vandewalle J (1999) Least squares support vector machine. Neural Process Lett 9(3):293–300
Taherian H, Nazer I, Razavi E, Goldani SR, Farshad M, Aghaebrahimi MR (2013) Application of an improved neural network using cuckoo search algorithm in short-term electricity price forecasting under competitive power markets. J Oper Autom Power Eng 1(2):136–146
Tan Z, Zhang J, Wang J, Xu J (2010) Day-ahead electricity price forecasting using wavelet transform combined with ARIMA and GARCH models. Appl Energy 87(11):3606–3610
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Wickerhauser MV, Coifman RR, Meyer Y (1992) Wavelet analysis and signal processing. In: Wavelets and their applications. Jones and Bartlett, Boston, pp 153–178
Zhang J, Tan Z, Yang S (2012) Day-ahead electricity price forecasting by a new hybrid method. Comput Ind Eng 63:695–701
Zhang J, Tan Z (2013) Day-ahead electricity price forecasting using WT, CLSSVM and EGARCH model. Elect Power Energy Syst 45:362–368
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Loia.
Appendix: Algorithm analysis
Appendix: Algorithm analysis
In this section, the exploration and exploitation of the proposed S-OLABC algorithm are examined using orthogonal learning ABC (OLABC) and standard ABC. Langermann’s function (infinity77.net 2015) with two variables of \(X_{1}\) and \(X_{2}\) is selected as non-convex problem with high local areas far from global optimum and flat areas and shown in Fig. 11. The mathematical formula can be expressed as:
Figure 12 shows the contour plot of the Langermann’s function with movement of the population in the search process. Moreover, for the sake of a fair comparison, the initial populations were the same for these algorithms as shown in Fig. 12 A\(_{1}\)–A\(_{3}\). Other control parameters were selected based on the other available papers (Shayeghi and Ghasemi 2014, 2011). It is clear that after the last iteration, all particles are collected on the global optimum for the proposed algorithm; however, ABC and OLABC had some violation. This figure clearly depicts that, by applying each improvement to the original ABC algorithm, the total performance was enhanced and better results were obtained step by step.
Rights and permissions
About this article
Cite this article
Shayeghi, H., Ghasemi, A., Moradzadeh, M. et al. Day-ahead electricity price forecasting using WPT, GMI and modified LSSVM-based S-OLABC algorithm. Soft Comput 21, 525–541 (2017). https://doi.org/10.1007/s00500-015-1807-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1807-1