1 Introduction

Electricity pricing and forecasting is one of the important functions in an electricity market since electricity has become an essential commodity in the modern society. The price signal is non-homogeneous, and its variations show a little cyclic property. The electricity price signal is volatile in nature due to volatility in fuel price, load uncertainty, transmission congestion, behaviour of market participant, market manipulations, etc. Because of the significant volatility, it is difficult to make an accurate forecast. Based on the literature, the accuracy of the price forecast varies around 10 % as compared to 3 % for load forecasting [24]. However, price forecasting accuracy is not as stringent as that of load forecasting. There are various electricity markets, around the world, using different types of time series models [11, 16, 18]. However, each market uses a suitable forecast model for its own method of functioning. As a result, it is necessary to develop an accurate forecasting model relevant to a particular electricity market. Here, we had considered an Indian electricity market, namely Indian Energy Exchange (IEX). In this market, market clearing price (MCP) is forecasted, which depends on the supply and demand of electricity in the power market. It should also be noted that the accuracy in forecasting MCP depends on the intrinsic and extrinsic factors that depend on each market. Therefore, MCP is a varying price signal that mostly depends on the dynamic behaviour of buyer and seller, analogous to the demand and supply in the market, respectively. When electricity MCP is determined, every supplier whose offering price is below or equal to the electricity MCP will be picked up by the Independent System Operator (ISO) to supply electricity at that hour. They will be paid at the same price, the electricity MCP and not at the price they offered. The reason for this is to keep fairness of the market and to avoid market manipulation [27].

Aggarwal et al. [2] investigated the state of price forecasting methodologies by reviewing 47 papers published during 1997 to November 2006. The review is based on the type of model used for forecasting, time horizon for prediction, input variables, output variables, analysis of results, data points used for analysis, pre-processing employed and the model architecture. They have observed that forecasting errors are still high from risk management perspective and the results obtained by different price forecasting methodologies are difficult to compare with each other. Catalao et al. [7] had proposed a novel approach combining wavelet transform, particle swarm optimization and adaptive-network-based fuzzy inference system to forecast short-term electricity prices forecasting of Spain market. Dev et al. [10] have investigated extensively on feed-forward back-propagation neural network and extreme value distributions to model electricity pool prices for the Australian National Electricity Market from 1998–2013. However, their work has shown that price spikes can be well-modelled using extreme value distributions, and this understanding based on their characteristics is an important component of understanding the extraordinary volatility of electricity spot prices. Gao et al. [12] used a three-layered feed-forward back-propagation artificial neural network method to forecast the Market Clearing Pricing and Market Clearing Quantity for the California day-ahead energy markets. The author discussed on one of critical issues in training neural networks which is over fitting. That is, it fitted the training set very well, but did not generalize well to new data outside the training set.

Mandal et al. [17] had presented a hybrid intelligent algorithm utilizing a data filtering technique. It is based on wavelet transform (WT), an optimization technique based on firefly (FF) algorithm and a soft computing model based on fuzzy ARTMAP (FA) network. Here, it forecasts day-ahead electricity prices in the Ontario market and was evaluated against PJM market data. Aggarwal et al. [1] and Pindoriya et al. [21] had used WT with neural network (WNN) for short-term price forecasting for the day-ahead prediction of the market clearing price (MCP) in the Ontario, Spain and PJM electricity markets. The performance of WNN was found to be encouraging, since the data had been pre-processed by WT. The benefit of WT for pre-processing is also found in [26] when it is used with Extreme Learning Machine for electricity price forecasting. Pousinho et al. [22] had proposed a hybrid PSO–ANFIS (adaptive-network-based fuzzy inference system) approach for short-term price forecasting. It is a combination of particle swarm optimization and adaptive-network-based fuzzy inference system on the electricity market of mainland Spain. The author demonstrated its effectiveness regarding forecasting accuracy and reduced computation time. Amjady [3] developed a fuzzy neural network model for day-ahead electricity price forecasting (EPF) for Spanish electricity market and claimed the model to be satisfactory.

Sharma and Srinivasan [25] had proposed a hybrid intelligent model based on the recurrent neural network (RNN) and excitable dynamics for price prediction in the deregulated electricity market. The developed model consists of three components: a Fitz-Hugh Nagumo model (FHN) model to mimic spiking behaviour, RNN unit to regulate FHN and feed-forward neural network to model the residue of RNN-FHN. Success of this synergistic combination of RNN and coupled system of equations presents exciting opportunities for future work in day-ahead prediction in this time series system using multi-scale system. Short-term electricity price forecasting (i.e. day-ahead hourly electricity price forecasting) in case of organized power exchanges in developing nations (like India where power sector is getting deregulated due to reforms such as Electricity Act 2003) is one of the directions for future research. The Central Electricity Regulatory Commission issued a discussion paper for setting up a common platform for trading of electricity on February 6, 2007. After much debate and discussion, the plan for setting up power exchanges within the country was formulated. Applications from two exchanges, namely Power Exchange of India and Indian Energy Exchange, were submitted and approved by June and September 2008, respectively [23]. Both these exchanges have active participation from various utilities, and both provide electronic platforms for trading electricity on a day-ahead basis. Indian electricity market is one of the primitive electricity markets in the world which has much resistance to completely deregulate the electricity market based on the market equilibrium. There are very limited studies related to short-term electricity price forecasting for developing countries, particularly India, where electricity markets are getting deregulated. With practical relevance and stakes for implications of price forecasting for a Generator/Indian Power Producer (IPP)/firm being high (i.e. next-day price forecasting is a crucial need for producers, consumers and energy service companies), accurate forecasting tools and techniques is valued by power market participants, especially in an emerging economy like India. One of the directions for future research is developing a Time Series Econometric forecasting model. The model has to accurately forecast day-ahead hourly electricity prices in Indian Electricity market. Therefore, the proposed research work mainly focuses on developing ANN-based forecasting tools for the Indian Energy Exchange Ltd. (IEX) launched in June 2008.

In the literature, though similar-day approach is carried out for electricity price forecasting [19], most of the neural network-based research papers are using day-ahead electricity pricing approach [4, 6, 8, 15]. Here, the electricity price data of the previous day (nth day) are mapped with the next day (n + 1)th day while training neural networks. The reason may be that the strength of the correlation of both linear and nonlinear parameters between nth and (n + 1)th day is stronger. For example, the MCP profile on Monday of the previous day is mapped to Tuesday of the next day. So when a test input of the nth day is fed into the forecast model, the MCP of the (n + 1)th MCP is forecasted. However, it should be noted the problem formulation is short-term electricity price forecasting in which the electricity price or MCP is forecasted for a day or a week.

2 Proposed work

The proposed research work focuses to develop a hybrid ANN forecast engine solely to Indian Energy Exchange (IEX), since only meagre work is carried out in this market. Hybrid ANN models, which combine heuristic search algorithms, such as genetic algorithms, particle swarm optimization, artificial bee colony algorithms for updating the weights, show some better performance. Among these optimization algorithms, particle swarm optimization [13, 14] is regarded as a promising method in several engineering applications as is evident in the literature. Therefore, the proposed research work develops a novel ANN-based training scheme using PSO for forecasting the MCP. Even though similar-day and day-ahead approaches are available in the literature, day-ahead approach is found to be more appropriate. Therefore, this research work investigates day-ahead training approach for the electricity price forecasting. In day-ahead training, the correlation is between nth and (n + 1)th day. It is also understood that the performance of forecasting accuracy is improved by pre-processing of data before training. Among the data processing techniques, WT method can be pointed out as the prevailing approach [21] due to its easy implementation and adaptive ability of time frequency analysis. Therefore, WT is used for pre-processing in the proposed work.

3 Indian energy exchange and its historical database

IEX is India’s premier power trading platform. By providing a transparent, neutral, demutualized and automated platform for physical delivery of electricity, IEX enables efficient price discovery and price risk management for participants in the electricity market, including industries eligible for open access.

In this exchange, there are about 4000 participants across utilities from 29 States, 5 Union Territories (UTs), 1000 + private generators (both commercial and renewable energy). More than 3000 open access consumers are leveraging the exchange platform to manage their power portfolio in the most competitive and reliable way. A typical market snapshot of IEX can be obtained from http://www.iexindia.com/marketdata/marketsnapshot.aspx. The most influential historical data [20], namely hourly Purchase Bid (MW) and Market Clearing Price (INR), are taken from the website for the proposed work. The historical data considered for performance analysis are taken for a time period from 4 January 2014 to 1 November 2014. The smoothing feature of the Daubechies [9] wavelet of order 4 (db4) is used to remove higher-frequency component. The PB and MCP for the same time frame are shown in Figs. 1 and 2 without and with pre-processing using WT technique.

Fig. 1
figure 1

Market clearing price

Fig. 2
figure 2

Purchase bid price

The historical dataset is usually not used directly in process modelling of ANNs due to the difference in magnitude of the process variables. Therefore, the data need to be scaled to a fixed range to prevent unnecessary domination of certain variables, and to prevent data with larger magnitude from overriding the smaller and impede the premature learning process. The choice of range depends on the transfer function of the output nodes in ANN. Typically, [0, 1] for sigmoid function and [−1, 1] for hyperbolic tangent function. However, due to nonlinear transfer function having an asymptotic limit, the range of dataset is always set slightly less than the lower and upper limits. In this work, since the sigmoid function is adopted, the data are normalized in the range of [0.1, 0.9]. That is, if x 1 and x 2 are the maximum and minimum values of the training set, then the normalized data are given by N(x) as in (1).

$$ N\left( x \right) = \left( {\frac{{\left( {x - x_{1} } \right) \times \left( {0.1 - 0.9} \right)}}{{\left( {x_{2} - x_{1} } \right)}}} \right) + 0.9 $$
(1)

Here, the total number of input nodes for the feed-forward back-propagation neural network is equal to 48. The first 24 inputs represent the hourly PB in the day, and the next 24 inputs represent the hourly MCP. The total number of output nodes is 24 since MCP has to be forecasted using day-ahead forecasting. According to literature, generally one hidden layer is sufficient for most neural network applications. The number of hidden neurons in the hidden layer is fixed either by trial and error or by statistical evaluation.

Generally, during ANN training, the weights are updated based on conventional gradient descent method. It should be noted that the weights can be updated either in incremental or in batch modes. In incremental mode of training, weights are updated every time an individual pattern in a training set is sent into the network. While in batch mode of training, the weights are updated simultaneously for the entire training set. In order to choose between either modes of training, performance analysis between incremental and batch modes is carried out using historical data of IEX from 12 October 2014 to 01 November 2014. The number of hidden nodes is varied from 1 to 5. The final result is tabulated in Table 1. The result from Table 1 infers that the average mean square error (AMSE) and the time taken are found to be better for batch mode of training. Therefore, in the proposed model, batch mode training is used.

Table 1 Batch mode versus incremental mode

4 Proposed methodology

The proposed methodology involves three phases (Phase A, Phase B and Phase C) of operation. In Phase A, the raw data are pre-processed by removing the high-frequency components as mentioned in Sect. 3. In Phase B, the training of feed-forward back-propagation neural network is carried out in the batch mode using the conventional gradient-based neural network. Before the start of the training, the weights are initialized randomly and the training is continued till the training error gets minimized and no further improvement is possible. Then the final weights obtained for the network are recorded. This phase is repeated for a sufficient number of trials, and the final weights obtained in each of the trials are recorded. In the Phase C, the final weights obtained from the trials will be the initial population for the ANN-based training using PSO. The optimal number of population is fixed from the number of trials carried out in the first phase of operation. Since, it involves four techniques used in three phases of operation; the model can be referred to as wavelet-based ANN–ANN-PSO. The PSO algorithm will improve upon the training of weights from the point where the conventional gradient-based training gets stagnated. The proposed wavelet-based ANN–ANN-PSO model improves upon the PSO-based training in the right direction. The block diagram shown in Fig. 3 gives an outline picture on the proposed model.

Fig. 3
figure 3

Block diagram of the proposed wavelet-based ANN–ANN-PSO model

4.1 Batch mode training algorithm for FFBPNN

The generalized architecture of FFBPNN (Phase B) is given in Fig. 4.

Fig. 4
figure 4

Architecture of feed-forward back-propagation neural network

Step 1 Set epoch ep = 1.

Step 2 Generate the weights (V n × h , W h × y ) randomly to small random values between 0 and 1 to ensure that the network is not saturated by large values of weights. Let I and T be the normalized input and target training vector from set of P number of training patterns.

Step 3 For the training data, present one set of inputs and outputs. Present the complete input matrix [I] to the input layer.

[I]P × M :

input training set

\( I_{p} = (i_{1} , \ldots ,i_{n} , \ldots ,i_{M} ) \) :

pth input training vector

[T]P × O :

output target set

\( T_{p} = (t_{1} , \ldots ,t_{y} , \ldots ,t_{O} ) \) :

pth target training vector

Step 4 Compute the inputs to the hidden layer by multiplying corresponding weights as in (2)

$$ [{\text{sum}}]_{H \times P} = B_{H \times P} + [V]_{M \times H}^{T} \times [I]_{M \times P} $$
(2)

Step 5 Evaluate the hidden layer units’ output using the sigmoidal function in (3)

$$ f({\text{sum}}_{H \times P} ) = \frac{1}{{1 + \exp ( - {\text{sum}}_{H \times P} )}}. $$
(3)

Step 6 Compute the inputs to the output layer by multiplying corresponding weights of synapses as in (4)

$$ [{\text{sum}}]_{O \times P} = B_{O \times P} + [W]_{H \times O}^{T} \times f({\text{sum}}_{H \times P} ) $$
(4)

Step 7 Let the output layer units evaluate the output using sigmoidal function as in (5)

$$ [K]_{O \times P} = f({\text{sum}}_{O \times P} ) = \frac{1}{{1 + \exp ( - {\text{sum}}_{O \times P} )}} $$
(5)

Step 8 Calculate the squared error \( [\varepsilon ]_{O \times P} \), the difference between the network output \( [K]_{O \times P} \) and the desired target \( [T]_{O \times P} \), for all the training pairs as in (6) and then the average mean squared error (AMSE) as in (7), which is calculated for every epoch. Update ep = ep + 1.

$$ [\varepsilon ]_{O \times P} = \left[ {[T]_{O \times P} - [K]_{O \times P} } \right]^{2} $$
(6)
$$ {\text{AMSE}} = \frac{{\sum\nolimits_{y = 1}^{O} {[\varepsilon ]_{O \times P} } }}{P \times O} $$
(7)

Step 9 Calculate the updation of weights (W h × y ) between hidden layer and output layer.

$$ \left[ d \right]_{O \times P} = 2 \times s \times \left[ {[T]_{O \times P} - [K]_{O \times P} } \right] \times [K]_{O \times P} \times \left[ {[T]_{O \times P} - [K]_{O \times P} } \right] $$
(8)
$$ [Y]_{H \times O} = f({\text{sum}}_{H \times P} ) \times \left[ d \right]_{O \times P}^{T} $$
(9)
$$ [YB]_{O \times P} = \left[ d \right]_{O \times P}^{{}} \times [1]_{P \times P} $$
(10)
$$ [\Delta W]_{H \times O}^{ep + 1} = \mu \times [Y]_{H \times O} + \alpha \times [Y]_{H \times O} $$
(11)
$$ [\Delta BW]_{O \times P}^{ep + 1} = \mu \times [YB]_{O \times P} + \alpha \times [YB]_{O \times P} $$
(12)
$$ [W]_{H \times O}^{ep + 1} = \left[ W \right]_{H \times O} \,+\, [\Delta W]_{H \times O}^{ep + 1} $$
(13)
$$ [BW]_{O \times P}^{ep + 1} = \left[ {BW} \right]_{O \times P} \,+\, [\Delta BW]_{O \times P}^{ep + 1} $$
(14)

Step 10 Calculate the updation of weights (V n × h ) between input and hidden layer.

$$ [e]_{H \times P} = \left[ W \right]_{H \times O}\, \times \,\left[ d \right]_{O \times P} $$
(15)
$$ \left[ {dx} \right]_{H \times P}\,= [e]_{H \times P} \times f({\text{sum}}_{H \times P} ) \times [1 - f({\text{sum}}_{H \times P} )] $$
(16)
$$ [X]_{M \times H} = [I]_{M \times P} \times \left[ {dx} \right]_{H \times P}^{T} $$
(17)
$$ [XB]_{H \times P} = \left[ {dx} \right]_{H \times P} + [1]_{P \times P} $$
(18)
$$ [\Delta V]_{M \times H}^{ep + 1} = \mu \times [X]_{M \times H} + \alpha \times [X]_{M \times H} $$
(19)
$$ [\Delta BV]_{H \times P}^{ep + 1} = \mu \times [XB]_{H \times P} + \alpha \times [XB]_{H \times P} $$
(20)
$$ [V]_{M \times H}^{ep + 1} = \left[ V \right]_{M \times H} + [\Delta W]_{M \times H}^{ep + 1} $$
(21)
$$ [BV]_{H \times P}^{ep + 1} = \left[ {BV} \right]_{H \times P} \,+\, [\Delta BV]_{H \times P}^{ep + 1} $$
(22)
$$ [V]^{t + 1} = [V]^{t} + [\Delta V]^{t + 1} $$
(23)
$$ [W]^{t + 1} = [W]^{t} + [\Delta W]^{t + 1} $$
(24)

Step 11 Repeat steps 3 to 10, if ep < TE (total number of epochs) or if AMSE has reached a desired minimum value or if the validation error is increasing such that the number validation checks is greater than the validation count (VC), then stop the training.

4.2 Step-by-step algorithm of ANN–ANN-PSO architecture

The iterative approach of PSO for ANN training (Phase C) can be described by the following:

Step 1 Initialize the population size which is also equal to the number of trials performed in Phase B. Get the previously stored recorded weights and biases obtained in Phase B. Also, initialize the positions and velocities of agents

Step 2 Set the current best fitness achieved by particle p as pbest. Set the pbest with best value as gbest, and store the value

Step 3 Evaluate the desired optimization fitness function f p for each particle as the AMSE over a given data set

Step 4 Compare the evaluated fitness value f p of each particle with its pbest value. If f p  < pbest, then pbest = f p and pbest xp  = f p , x p is the current coordinates of particle p, and pbest xp is the coordinates corresponding to particle p as the best fitness so far

Step 5 Calculate objective function value for new positions of each particle. If a better position is achieved by an agent, pbest value is replaced by the current value. As in step 2, gbest value is selected among pbest values. If the new gbest value is better than the previous gbest value, the gbest value is replaced by the current gbest value and this value is stored. If f p  < gbest, then gbest = p, where gbest is the particle having the overall best fitness over all particles in the swarm

Step 6 Change the velocity and location of the particle according to Eq. (25) and (26), respectively [13, 14]

$$ V_{i} = wV_{i - 1} + {\text{acc}} \times {\text{rand}}() \times ({\text{best}}_{xp} - x_{p} ) + {\text{acc}} \times {\text{rand}}() \times \left( {{\text{best}}_{\text{xgbest}} - x_{p} } \right) $$
(25)

where acc is the acceleration constant that controls how far particles \( p \) move from one another, and rand() returns a uniform random number between 0 and 1, V i is the current velocity, V i−1 is the previous velocity,

$$ x_{p} = x_{pp} + V_{i} $$
(26)

x p is the present location of the particle, x pp is the previous location of the particle, and i is the particle index. Here, the coordinates best xp and best xgbest are used to pull the particles towards the global minimum

Step 7 Fly each particle p according to Eq. (26)

Step 8 If the maximum number of predetermined iterations is exceeded, then stop; otherwise, go to step 3 until convergence

The flow chart for the proposed wavelet-based ANNANN-PSO model is given in Fig. 5.

Fig. 5
figure 5

Flow chart for the proposed wavelet-based ANN–ANN-PSO model

5 Performance evaluation

The experimental results in this case study are evaluated based on three error indices. They are: mean absolute percentage error (MAPE), normalized mean square error (NMSE) and error variance (EV). The accuracy of the forecasted results is evaluated by mean absolute percentage error (MAPE) which is defined by the following Eq. (27).

$$ {\text{MAPE}} = \frac{1}{\text{NH}}\sum\limits_{i = 1}^{\text{NH}} {\left| {\frac{{P_{{{\text{For}}\left( t \right)}} - A_{Ac\left( t \right)} }}{{A_{Ac\left( t \right)} }}} \right|} $$
(27)

where P For(t) and A Ac(t) are the forecasted and actual data at time t, respectively, and NH is the total number of predictions.

The normalized mean square error (NMSE) is an estimator of the overall deviations between predicted and measured values. Here, in the NMSE the deviations (absolute values) are summed instead of the differences. For this reason, the NMSE generally shows the most striking differences among models. If a model has a very low NMSE, then it is well performing both in space and in time. On the other hand, high NMSE values do not necessarily mean that a model is completely wrong. NMSE [5] is defined by Eq. (28) given below

$$ {\text{NMSE}} = \left[ {\frac{1}{{\Delta^{2} {\text{NH}}}}\sum\limits_{t = 1}^{\text{NH}} {\left( {P_{{{\text{For}}(t)}} - A_{{{\text{Ac}}(t)}} } \right)^{2} } } \right] $$
(28)

where

$$ \Delta = \frac{1}{{{\text{NH}} - 1}}\sum\limits_{t = 1}^{\text{NH}} {\left( {A_{{{\text{Ac}}(t)}} - A_{\text{Ave}} } \right)^{2} } $$
(29)

where A Ave is average of actual data.

An index of the uncertainty of a model is the variability of what is still unexplained after fitting the model, which can be measured through the estimation of the variance of the error term. The smaller the variance, the more precise the prediction of prices [8]. EV [5] is defined by the following Eq. (30)

$$ \sigma^{2} = \frac{1}{\text{NH}}\sum\limits_{i = 1}^{\text{NH}} {\left( {\left| {\frac{{P_{{{\text{For}}\left( t \right)}} - A_{{{\text{Ac}}\left( t \right)}} }}{{A_{{{\text{Ac}}\left( t \right)}} }}} \right| - {\text{MAPE}}} \right)}^{2} $$
(30)

6 Optimal ANN architecture and training

In order to obtain an accurate forecasting for training of neural networks, the selection of a suitable ANN architecture with an optimal size of historical data becomes important. During training, weights are updated simultaneously only when an epoch is completed. The performance valuation can be accomplished by forecasting the MCP for a particular week using day-ahead approach. The same week is forecasted with various combinations of hidden neurons and training size. Tables 2, 3 and 4 shown below furnish statistical evaluation of the performance for selecting the best ANN architecture and training size. The number of trials and number of epochs performed for a typical ANN architecture are fixed to 25 and 1000, respectively, in order to have a reference platform for better comparison. Having considered 1 week of training data, Fig. 6 presents the best forecasted MCP for a week. The italicized values in Tables 2, 3 and 4 provide the optimal architecture of the neural network with 1, 3 and 1 hidden neurons, respectively and are compared with respect to average MAEP obtained from all the 25 trials.

Table 2 Statistical performance measures for 1-week training data
Table 3 Statistical performance measures for 2-week training data
Fig. 6
figure 6

Forecasted market clearing price

If the value of average MAEP is more, then the performance is not found to be satisfactory. Therefore, the best architecture is selected only if the average MAEP is minimal. It is inferred from Tables 2, 3 and 4 that as the number of weeks is increased, i.e. the size of the training data, the performance is not improving. Therefore, it is best to consider the training data solely for 1 week.

Table 4 Statistical performance measures for 3-week training data

6.1 Implementation

Once the optimal architecture for ANN is finalized, the performance analysis part is carried out in two parts. In the first part, the raw data are taken as such, whereas in the second part, the raw data are filtered for high-frequency components using WT technique. The resolution of the signal, which is a measure of the amount of detailed information, is determined by filtering operations. Discrete WT decomposes a time domain signal into approximations and details by successive low-pass and high-pass filtering. Here, Daubechies wavelet transform is used. This wavelet offers an appropriate trade-off between wavelength and smoothness, resulting in an appropriate behaviour for MCP forecasting. A non-decimated wavelet function of type Daubechies of order 4 (abbreviated as db4) and decomposition level 4 is used as the mother wavelet.

Embedded ANN-PSO simulation in Phase C is carried out using the weights obtained from all the trials from Phase B. These weights are the initial population for ANN-PSO training. The architecture of the ANN-PSO is the same as that of Phase B ANN architecture. Table 5 presents the controlling parameters for the PSO algorithm. The parameter setting for the PSO algorithm is fixed based on trial and error approach. Here, the convergence of the error plot is not as smooth as that of ANN Phase B training. The reason is the stochastic nature of the particle search in finding the optimal weights for minimizing the MAPE while training and also satisfying the validation check. A comparison is also made with Phase C where only ANN-PSO training is carried out with the random initialization of weights for ANN training using PSO.

Table 5 PSO parameter settings

7 Results and discussion

All the simulations are carried out using MATLAB programming using MATLAB R2011b version. The hardware details are as follows, Intel® Core i5-3210 M CPU @ 2.50 GHz, 4.00 GB RAM, 64-bit OS, x64-based processor using Windows 8.1 OS. Various samples are taken in order to validate the better performance of the proposed model. All trainings are carried out for 1000 epochs and 100 trial runs. The following are the five types of ANN-based forecast models used for the performance study.

  1. (a)

    ANN

  2. (b)

    ANN–ANN-PSO

  3. (c)

    Wavelet-based ANN

  4. (d)

    Wavelet-based ANN–ANN-PSO

  5. (e)

    Wavelet-based ANN-PSO (random initialization)

The same data used in finding the optimal architecture of ANN are again compared for all the five models (a–e). Here, the source and the target for the training data are taken from 27 September to 3 October 2014 and from 28 September to 4 October 2014, respectively. The source and the target for the validation data are the same as those of the training data where the validation error is evaluated based on MAPE. The verification is carried out by giving a test input from 4 to 10 October 2014, and the forecasted resultant output is compared with the actual MCP from 05 to 11 October 2014 for a period of 1 week using day-ahead forecasting. The performance of the accuracy in forecasting is evaluated using the performance indices explained in Sect. 5. Table 6 summarizes the results of all the ANN-based models. Figures 7 and 8 show the convergence plot and validation plot of the best trial for the ANN-based forecast models, i.e. b, d and e, respectively. The wavelet-based ANN–ANN-PSO training is found to converge better when compared with the other models b and e. The convergence of ANN-PSO training with initial random initialization of weights is found to be non-satisfactory. The random initialization of weights of ANN-PSO will not give an immediate state of orientation of the weights or the particles towards the near-global optima. However, during Phase B operation, the recorded final weights of the batch mode ANN training for all trials give the initialization of particles for the PSO search to orient the weights towards the near-global optimum. Thereby, fine tuning of weight updation gives a better trained model without getting trapped in the local minima. Hence, the performance due to ANN–ANN-PSO model is better than the embedded ANN-PSO model. The same performance is also reflected while validation during training. The validation error is evaluated in terms of MAEP, and the WT-based ANN–ANN-PSO model is able to give a lower validation error of 9.25 % for the best trial.

Table 6 Statistical result
Fig. 7
figure 7

Error convergence plot for the ANN–ANN-PSO-based models (b, d and e)

Fig. 8
figure 8

Validation plot for the ANN–ANN-PSO-based models (b, d and e)

Figures 9 and 10 show the forecasted MCP with respect to the actual MCP for the IEX from 5 to 11 October 2014 for the ANN models without wavelet pre-processing (i.e. a and b) and with wavelet pre-processing (i.e. c and d), respectively. However, the resultant forecasted ANN-PSO model with random initialization (see Fig. 11) is found to perform poorly with an average MAEP of around 70 %. The total time (Phase A + Phase B) taken for training of WT-based ANN (no. of trials = 100; no. of epochs = 1000) is 949 s. The average time (Phase C) taken for training of WT-based ANN-PSO (no. of trials = 1) is 650 s. Therefore, the total approximate execution time for the proposed model is 1599 s. In order to validate the proposed forecast model, some more samples are considered and the statistical results are tabulated in “Appendix” as Tables 7, 8 9. Also Figs. 12, 13, 14 give the forecasting market clearing price for all these samples. The resultant forecasted MCP validates the accuracy of the proposed WT-based ANN–ANN-PSO hybrid model.

Fig. 9
figure 9

Forecasted MCP versus actual MCP for the IEX—without wavelet transform

Fig. 10
figure 10

Forecasted MCP versus actual MCP for the IEX—with wavelet transform

Fig. 11
figure 11

Forecasted MCP VS actual MCP for the IEX—with wavelet transform (random)

Fig. 12
figure 12

Forecasted MCP versus actual MCP for the IEX

Fig. 13
figure 13

Forecasted MCP versus actual MCP for the IEX

Fig. 14
figure 14

Forecasted MCP versus actual MCP for the IEX

8 Conclusion

The pre-processing of wavelet by smoothening of raw data by removing higher-frequency components helps the neural network to train better. The weights obtained from the various trials of ANN training in Phase B helps a better start for PSO search of weights for ANN training in Phase C. The method is simple, and the performance of forecasting improves accuracy. Therefore, the proposed novel sequential wavelet-ANN with embedded ANN-PSO hybrid model can be used in the Indian energy exchange for a better estimate of market clearing price.

9 Appendix

See Tables 7, 8 9 and Figs. 12, 13, 14.

Table 7 Statistical result
Table 8 Statistical result
Table 9 Statistical result