Keywords

1 Introduction

Nowadays, interconnected power systems have been developed from reliability and stability aspects. Studies indicate that increasing rate of population all around the world leads to a growth in electricity consumption [1]. Meanwhile, uncertain nature of electrical demand and renewable energy sources such as solar and wind adds some limitations to dynamic stability of large electricity grids [2]. Hence, forecasting of electrical demand becomes more and more critical for assisting power system operators in electricity grid management, whether for short-term analysis or long-term applications such as economic emission dispatch, unit commitment, optimal scheduling, etc. [3,4,5].

Recently, many scholars have proposed different short and long-term load forecasting algorithms. In this context, authors of [6] have simulated a specific aggregated state prediction for electrical consumption of interconnected power networks with 1% error in 700 h. In [7], combination of genetic algorithm and neural network is an illustrative example with accuracy of 98.95% for expanding a feed backward neural network for forecasting of heterogeneous demand time series in very-short and short time intervals. Application of support vector machine (SVM) has been presented in [8] for one hour ahead demand forecasting. In addition, this two-phase technique consisting of artificial neural network (ANN) and SVM has demonstrated the resolution of speed and accuracy through precise experiment on real historical data of 4th July 2012. Son et al. [9] has evaluated the application of support vector regression (SVR), fuzzy logic, and particle swarm optimization (PSO) with mean demand scaling of 149.28754 kW for short-term electrical demand forecasting. Guo et al. [10] has introduced a self-learning algorithm for load forecasting process which benefits from economic factors. This approach inserts some economic elements in searching process to reduce computational error. According to overall implementation of automation system in residential consumption as demand response strategies, it is found that modified algorithms, which made up SVR, can fix the intermittent nature of internal loads (i.e. cooling, heating, and ventilation). Thereby, poly-phase prediction can practically actualize the demand response strategies in such programming. In addition, Le Cam et al. [11] has aimed to forecast total electricity cost of automation system in a benchmark building by providing a poly-stage prediction model that 14.2–22.5% optimum absolute error has been observed. Li et al. [12] has combined a wavelet decomposition technique with ANN to diminish the negative impacts of volatile load data. Plus, in the noted study, it can be given as advanced intelligent algorithm with 2.4% mean absolute percentage error (MAPE). Effectiveness of ANNs in Poland’s natural gas consumption forecasting has been described in [13]. This approach has been investigated on historical time series of Szczecin city with characteristics of 22 input, 36 hidden layer and 1 output and 8% MAPE. Authors of [14] have used ANNs for hour ahead prediction of universal solar radiation with 7.86% RMSE through filtering the low frequency of input data set. Reference [15] runs two different learning rules on ANNs. Therefore, it is observed that integration of back propagation (BP) and extreme learning method (ELM) for decoding the fluctuation of wind speed prediction leads to 1.33% and 1.1965% RMSE, respectively. In a similar standpoint, conducting PSO and genetic algorithm for optimal selecting of weight vectors of ANN in solar irradiance estimation system can be found in [16]. It indicates that the combination of BP-ANN and PSO technique has demonstrated correlated results as 0.78 RMSE and 0.685 mean absolute error (MAE). Authors of [17] have aimed to apply a proper orthogonal decomposition (POD) to ANNs for wind and demand forecasting of high altitude towers. It is found that such complex algorithm can reaches RMSE and mean error of 4% and 0.98%, respectively. As mentioned, ANNs support both regression based and computational methods under various prediction scales. Ramasamy et al. [18] has formed a unique wind power forecasting with respect to ANNs through speed estimation experiment in western Himalayan. To prove its robustness, output series have compared with real historical set considering some environmental and geographical factors such as temperature, air pressure, latitude, and longitude. Resiliency of this method against the time variant nature of ANN’s input parameters has been revealed by 6.489% as MAPE. Yadav and Chandel [19] have identified relevant input variables for predicting of 1-min time-step photovoltaic module power using ANNs and multiple linear regression models with 2.15–2.55% MAPE. da Silva et al. [20] has reached to important point that using Bayesian Regularization (BR) and Levenberg Marquardt (LM) as training of ANNs has real time result in comparison with others for solar power estimation in a way that MAPE and RMSE are equal to 0.02%, 0.11% (BR), and 0.31%, 0.74% (LM), respectively.

This chapter aims to present a dynamic feed-forward back-propagation ANNs based method for long-term forecasting of electrical demand. In addition, the highly features of compatibility and accuracy of the proposed algorithm is revealed using a comparison between the forecasted and the actual electrical demands of Canada’s, Ontario independent electricity operator system (IESO), low voltage grid. The remainder of this chapter can be organized as follows: Sect. 2 presents the problem formulation. Simulation result and discussions are provided in Sect. 3. Finally, concluding remarks appear in Sect. 4.

2 Problem Formulation

2.1 Artificial Neural Network (ANN)

Artificial neural network (ANN) is a powerful and extensive tool for engineering applications such as fitting, pattern recognition, clustering, and prediction. The performance of this algorithm is that, by applying weigh matrix and bias vectors to the input vectors and mathematical operations, we can reach the desired output which is demand consumption as in Fig. 1. The aim is to learn our ANN for obtaining the desired output, and also updating the error vectors in any steps. Then, after several iterations, we can regain the optimal weight values for the input vectors. The type of learning in our ANN is supervised learning rule which it benefits from three dynamic training techniques. Hence, the learning rule will be discussed in the next section. During learning, we update the weight matrix to determine the best error as well as observing a tolerable output dependency. Then, ANN will initiate with parameter load and time series for the network as an input, and then variables would have multiplied by weight matrix. Finally, they would have added by bias vectors. The next step begins in such order that, a specific mathematical function is presented  to start the calculation as in Fig. 2.

Fig. 1
figure 1

The block diagram of proposed algorithm

Fig. 2
figure 2

The operation of the proposed algorithm

Where, \(x_{i} ,w_{i} ,B_{i}\) are the input vectors, weight matrixes and bias vectors respectively and Y is the output of neural network as is depicted in Eq. (1).

$$y_{i} = F(x_{i} \times w_{i} + b_{i} )$$
(1)

In order to satisfy the convergence condition, the algorithm is constructed based on supervised learning rule. In supervised learning at any moment in time K input \(x(k)\) is applied to the network. Network desired response \(\hat{Y}({\text{k}})\) is given and Couples \((x(k),\hat{Y}({\text{k}}))\) belong to a given set of learning that are pre-selected. The \(x(i)\) and \(\hat{Y}(i),i = 1 \ldots ,{\text{N}}\) (N is number of neurons) are used in supervised learning rule when \(\hat{Y}({\text{k}}) = \hat{Y}(i)\), \(x(k) = x(i)\). Our desired network is Multi-Layer Perceptron (MLP) which has a group of vectors like input, output (validation), and network response \((Y({\text{k}}))\). MLP is a computational unit in the ANN architecture that is consisted of input layer, hidden layer, and one output layer. After the combination of this input, calculation process will begun as in Fig. 3.

Fig. 3
figure 3

Different layers of MLP

2.2 Dynamic Artificial Neural Network (DANN)

Dynamic artificial neural network is a computational operating system in which the continuous integration between its elements and the training processes improves the power of prediction. As it is depicted in Fig. 3, the conventional neural network has been made up of one input layer, one hidden layer and, one output layer. In addition, the relation among the parameters, layers, and also the training steps, leads to the prediction purpose. According to the nonlinear feature of input data set, using effective learning rules which they will define in the following section, can help the propagation procedure to be more dependable in comparison with individual learning network. Plus, the volume of calculation is a very critical point that must be considered if the number of sample data is notable for the convergence application. From this view point, overall conformity of forecasting algorithm will lead to the better understanding of estimated output. As it is clear in Eq. (1), the weight matrix and bias vectors are the stimulation parameters of ANN that are needed to generalize the diversity of input vectors. They simply interconnect the intermittent based inputs which are electric demand in this case, into the correcting steps of predicting. The activation function F is the operator of the net that is accommodated with both correcting and training processes. In another word, the F will demonstrate the predicting based on the termination of gradient process. The connectors, which are neurons, transfer the optimized values of weight and bias to the output layer in specific order. Moreover, dynamic neural network is defined as the combination of three regression based learning method which are Levenberg Marquardt (LM), Bayesian Regulation (BR), and Scaled Conjugated Gradient (SCG). First of all, the learning system operator will initiate with LM to train the portion of input data set (70%) and will allocate primary weight and bias vectors. After the generalization, the output trained is employed to the second propagation network (BR) to be normalize and filter the white noises of set with respect to the error performance (MSE) of the early neural network with 70% as training ratio. The next view is to conjugate the performance of two aforementioned learnings techniques to scale the searching process of optimal weight and bias vectors selecting in the hypothesis space. In another word, scaled conjugated will find the specific vectors for minimizing the error (MSE). Plus, this algorithm uses desired values of weight matrix at each stage to change them so that, the downward slope of the error curve is going to be descent. The flowchart of proposed algorithm for load forecasting is depicted as in Fig. 4.

Fig. 4
figure 4

The proposed algorithm

2.3 Back Propagation Technique (BP)

Back propagation is a learning and adjusting method which conveys several partial derivatives from the basic parameter of neural network. In this method, we try to minimize the objective function and obtain mean square error (MSE) between the output of net and the desired output of electrical demand using dynamic algorithm. It is observed that the hypothesis searching space is a large space that defines all possible values of the weights. The ANN tries to optimize the error to achieve reasonable states. However, there is no guarantee that it will reach to the absolute minimum. Therefore, the training algorithm (DANN) is used to find the weights that minimize mean squared error (MSE). Broadly, the mechanism of BP is based on the operating of tansigmoid as a sigmoid function for the hidden layer as well as pure linear function for the output layer. In this context, Eqs. (1) and (2) are the illustrative of hidden and output structure, respectively. As the same manner, the predicted output of the net can be achieved through the Eqs. (3) and (4) and algorithms were trained in three steps:

  1. 1.

    Forward the input data

  2. 2.

    Compute and propagate error backward

  3. 3.

    Update the weights

$$x_{j} = \sum\limits_{i = t - 1}^{t - n} {\sum\limits_{j = 1}^{h} {\omega_{ij} \times y_{i} + b_{j} } }$$
(2)
$$y_{j} = \frac{1}{{1 + \exp ( - x_{j} )}}\quad j = 1,2, \ldots ,h$$
(3)
$$x_{t} = \sum\limits_{j = 1}^{h} {\omega_{jt} \times y_{j} } + a_{t} \quad t = 1, \ldots ,T$$
(4)
$$y_{t} = x_{t} \quad t = 1, \ldots ,T$$
(5)

where,

\(x_{j} ,y_{j}\):

Input and output of the jth node of the hidden layer

\(\omega_{ij}\):

Weight between ith input layer neuron and jth hidden layer neuron

\(b_{j} ,a_{j}\):

Bias of the input and the hidden layers which are within the range of [−1, 1]

\(n,h,T\):

Number of input, hidden, and output layer nodes

\(x_{t} ,y_{t}\):

Input and output values of the output layer at time horizon t

\(\omega_{jt}\):

Connection weights of the jth hidden and output layers.

The mean square error of per cycle or epoch (Total square error for all learning models) and the norm of the gradient error is less than a predetermined value. The BP’s view rest on the assumption of error gradient technique in the weight space. Hence, there is possibility to catch in Local minimum. To avoid such obstacle, we can use stochastic gradient with different values for weights. Considering the aforementioned concept, the weight adjustment rule in ith iteration depends on the size of the weight in the previous iteration as in Eq. (6).

$${\text{MSE}} = \frac{1}{2}\sum\limits_{t = 1}^{T} {({\hat{\text{y}}}_{t} - y_{t} )^{2} }$$
(6)

where, \({\hat{\text{y}}}_{t}\) and \(y_{t}\) are the predicted results and expected output of the neural network, respectively. As a result, trapping in local minimum and placing on flat surfaces can be avoided, however, the search speed increases with gradual increase of step modification. It is observed from BP’s properties that it can show undetected features of input data in hidden layer of network. Hence, the adjusting procedure is initiated by Eqs. (7) and (8) to propagate the weights of hidden and input neurons as follows:

$$\Delta \omega_{jt} \propto - \frac{{\partial {\text{MSE}}}}{{\partial \omega_{jt} }}$$
(7)
$$\begin{aligned} \Delta \omega_{jt} &= - \eta \left( {\frac{{\partial {\text{MSE}}}}{{\partial y_{t} }}} \right)\left( {\frac{{\partial y_{t} }}{{\partial x_{t} }}} \right)\left( {\frac{{\partial x_{t} }}{{\partial \omega_{jt} }}} \right) = \eta ({\hat{\text{y}}}_{t} - y_{t} )\left( {\frac{{\partial \left( {(1 + \exp ( - x_{t} ))^{ - 1} } \right)}}{{\partial x_{t} }}} \right)y_{j} \\ & = \eta ({\hat{\text{y}}}_{t} - y_{t} )y_{t} (1 - y_{t} )y_{j} \\ & {\text{for}}\,j = 1, \ldots ,h,t = 1, \ldots ,T \\ \end{aligned}$$
(8)

In which,

\(\Delta \omega_{jt}\):

Weights of hidden neurons

η:

Learning rate

\(\frac{{\partial {\text{MSE}}}}{{\partial y_{t} }}\):

Derivative of the error with respect to the activation

\(\frac{{\partial y_{t} }}{{\partial x_{t} }}\):

Derivative of the activation with respect to the net input

\(\frac{{\partial x_{t} }}{{\partial \omega_{jt} }}\):

Derivative of the net input with respect to a weight.

According to the association of error (MSE) in each step of iteration, if the algorithm continues until the error is less than a certain amount, BP will terminate which can lead us to over-fitting. Over-fitting is caused by the weight adjusting that may’ve not a conformity with overall distribution data. By increasing the number of iteration, the complexity of the hypothesis space learned by the algorithm becomes more and more comprehendible until it can able to evaluate noise and rare example of a network in the training set properly. The solution is to import approved collection called validation set to stop learning when the error is small enough in this series and also to network for simpler hypothesis spaces. Then, the amount of weight in each iteration can be reduced. After determining the optimized values of weights, the error in all nodes can participate as follows:

$$\Delta \omega_{ij} \propto - \frac{{\partial {\text{MSE}}}}{{\partial \omega_{ij} }}$$
(9)

Consequently,

$$\begin{aligned} \Delta \omega_{ij} &= - \sum\limits_{t = 1}^{T} {\left[ {\left( {\frac{{\partial {\text{MSE}}}}{{\partial y_{t} }}} \right)\left( {\frac{{\partial y_{t} }}{{\partial x_{t} }}} \right)\left( {\frac{{\partial x_{t} }}{{\partial y_{j} }}} \right)} \right]} \left( {\frac{{\partial y_{j} }}{{\partial x_{j} }}} \right)\left( {\frac{{\partial x_{j} }}{{\partial \omega_{ij} }}} \right) \\ & = \eta \sum\limits_{t = 1}^{T} {\left[ {({\hat{\text{y}}}_{t} - y_{t} )y_{t} (1 - y_{t} )\omega_{it} } \right]} y_{j} (1 - y_{j} )y_{i} \\ & i = t - n, \ldots ,t - 1\quad j = 1, \ldots ,h \\ \end{aligned}$$
(10)

2.4 Levenberg Marquardt Algorithm (LM)

The LM is a computational approach for data mining problems of NN which include uncertain parameter structure. In this premise, LM categorize the input data set by learning the NN algorithm to adapt with the previous state of parameter through the error expected (MSE). This method is basically drowned out by the popular Gaussi-Newton technique [21] in non-singularity functions (tansig) as in Eqs. (11) and (12):

$$x^{k + 1} = x^{k} + \Delta x\quad k = 1, \ldots ,N$$
(11)

In which, \(x^{k + 1} ,x^{k}\) and \(\Delta x\) represent the current state, historical recent state, and the deviation with time step of time series, respectively. The deviation can be modeled in the LM concept in which the Jacobians of errors train each node of neural network as follows:

$$\Delta x = \left[ {J^{T} J + \eta I} \right]^{ - 1} J^{T} MSE$$
(12)

where, the \(J,\eta\) and \(MSE \,\) represent the first derivative of errors with respect to the back propagation process, learning rate (70%) and the mean square error, respectively. The merit of LM is the speed of convergence that aims to escape from the local minima for the sake of prediction [22]. According to the abovementioned equation, LM method has been conducted a correcting system on error (MSE) instead of using Hessian matrix. It is noted that the main point in the weight adjusting of NN is the propagation of neurons of hidden layer which over fitting may have been occurred, if the covariance of data set is contaminated with heterogeneous pattern [23]. Hence, the propagation search is described as Eqs. (13) and (14):

$$\omega_{ij}^{k + 1} = \omega_{ij}^{k} + \Delta \omega_{ij} \quad k = 1, \ldots ,N$$
(13)
$$\Delta \omega_{ij} = \left[ {J^{T} J + \eta I} \right]^{ - 1} J^{T} MSE$$
(14)

2.5 Bayesian Regularization (BR)

After considering the standardizing steps of LM in the propagation process, Bayesian Regularization (BR) is applied for the over-fitting problem of weight allocation in NN [24, 25]. Meanwhile, BR detects the unregulated weights considering their error (MSE) as well as accelerates the search speed for classifying the weights by the help of  reducing their possibility from the state space. In another word, BR filters out the unbiased weights which are selected randomly. Plus, by determining such weight that are the white noises of NN, the optimum values can be more achievable than its former state. Then, by adding the extra term to the propagation equations as the sum of all weights of net, the decision function for the learning is described as follows [26]:

$$\begin{array}{*{20}c} {{\text{Min }}\Delta \omega_{ij} = \alpha E(w) + \beta MSE} \\ {\alpha ,\beta { > 0}} \\ \end{array}$$
(15)
$$E({\text{w}}) = \frac{1}{2}\sum\limits_{i = 1}^{N} {\omega_{i}^{2} }$$
(16)

In which, MSE, \(E({\text{w}})\), \(\alpha\) and \(\beta\) are the mean square error of NN, total sum of all weights, and filtering variables, respectively [27, 28]. Hence, when the possibility of unbiased weights decreases, the convergence of forecasting increases till it is turned to a resistant computational unit against the local optimums. According to the volume of input data as well as the learning interactions, the training rate of BR technique has been set to 70%. As it can be inferred from the Eq. (15), the propagation procedure is converted to the quadratic equation optimization which the filtering variables play a regularization role in this problem. By solving this equation and finding the minimum point for the feasible solution of variables, the propagation process will be improved as follows [29]:

$$\left\{ {\begin{array}{*{20}l} {\alpha^{op} = \frac{\gamma }{{2E_{\omega } (\omega^{op} )}} \, } \hfill \\ {\beta^{op} = \frac{N - \gamma }{2MSE} \, } \hfill \\ {\gamma = K - \alpha {\text{Trace(A)}}^{ - 1} \, } \hfill \\ {\alpha = \frac{1}{{\sigma_{\omega }^{2} \, }}} \hfill \\ \end{array} } \right.$$
(17)

The cooperation of γ that is the optimum diagnostic number of regularized weights with the covariance factor of input data set leads to the refinement of feasible solutions of quadratic problem. In this equation, K is the weight matrix of neural network and A is the Hessian matrix of the quadratic problem which acts as variance operator to determine the error deviation as well as α has an inverse relation with diversity of weights. It is noted that, the effective number of γ can vary from 0 to K because of the priority of input data set. Hence, the suitable set of solutions which are the best-fitted in the quadratic problem enhance the propagation modelling by diminishing the noises from the Eq. (15).

2.6 Scaled Conjugated Gradient (SCG)

In this step, the conjugate scope is used to maximize the optimization feature of dynamic technique. The concept of SCG is based on the arrangement of overall minima of quadratic problem which aims to decrease the slop of errors. In this category, we consider a gradient operator for both errors and gradient of errors. After conducting the two aforementioned techniques, SCG initialize with \(x_{0}\) as the primary point of linear searching algorithm for weights in accordance to Eq. (18) [30]. The combinatory gradient can be checked with Eq. (19).

$$x_{0} \in R, \, f({\text{x}}) \le \, f({\text{x}}_{0} )$$
(18)
$$\begin{aligned} & \left\| {\nabla f(x) - \nabla f(y)} \right\| \le {\text{L}}\left\| {x - y} \right\| \\ & y = g_{k + 1} - g_{k} \\ & g_{k} = \nabla f(x_{k} ) \\ \end{aligned}$$
(19)

In which \(\nabla f(x)\), \(\nabla f(y)\) and L are the error gradient, gradient of error gradient of weight matrixes as well as x and y are the demonstration of input weights, and error gradient, respectively [31]. It has to be noted that the procedure can be achieved under the differentiability of objective function (15). In this assumptions, the propagating process of SCG can be expressed as follows (20) [32]:

$$\begin{aligned} & x_{k + 1} = x_{k} + \alpha_{k} d_{k} \to (\omega_{k + 1} = \omega_{k} + \alpha_{k} d_{k} ) \\ & d_{k + 1} = - \theta_{k + 1} g_{k + 1} + \beta_{k} s_{k} \\ & \theta_{k + 1} = \frac{{s_{k}^{T} s_{k} }}{{y_{k} s_{k} }} \\ & s_{k} = x_{k + 1} - x_{k} \\ \end{aligned}$$
(20)

In which \(d_{k}\) and \(\alpha_{k}\) are the direction and step counter of searching technique [33]. According to the quasi-newton theorem, if \(\beta_{k} = 1\) then the possibility of \(\theta_{k}\) for being a positive definite matrix increases. Therefore, we can call the first step to be innervated as:

$$g_{0} = \nabla f(x_{0} ) , { }d_{0} = - g_{0} ,\alpha_{0} = \frac{1}{{\left\| {g_{0} } \right\|}}$$
(21)

In addition, the searching algorithm updates every iteration until the Eq. (19) can be satisfied. Hence, the Eq. (20) indicates us that the propagating process of SCG is completely depends on the optimal selecting of \(d_{k}\) land \(\alpha_{k}\) [34]. This premises imposes us that the step counter (\(\alpha_{k}\)) must be determined originally for accelerating the computation search. Thereby, the Wolf condition is implemented on the objective function for this specific purpose as in Eq. (22) [35]:

$$\begin{aligned} & f(x_{k} + \alpha_{k} d_{k} ) - f(x_{k} ) \le \sigma_{1} \alpha_{k} g_{k}^{T} d_{k} \\ & \nabla f(x_{k} + \alpha_{k} d_{k} )^{T} \ge \, \sigma_{2} g_{k}^{T} d_{k} \\ \end{aligned}$$
(22)

where \(\sigma_{1}\) and \(\sigma_{2}\) are the positive constant considering \(0 < \sigma_{1} \le \sigma_{2} { < 1}\). At last, the configuration of three strong computational units which compensate the propagating search that is depicted in the following section.

3 Numerical Result and Discussions

3.1 Resiliency of Hybrid Proposed Strategy

In the conducted survey, the set of electrical load demand are assembled by three steps as: the training, the validation, and the testing that are valued by a tentative options 70%, 15%, and 15%, respectively. In order to fit the assumption of proposed technique by NN, the MSE criterion serves as the best identification of error distance. This criteria is defined for each stages of DANN to reach the constraints satisfaction. Moreover, after aforementioned scaling standardize the output, the analogy between the real historical demand data set and the linearized prediction set is obtained to verify the compatibility of algorithm as shown in Fig. 5.

Fig. 5
figure 5

The flowchart of the proposed strategy

3.2 Robustness and Scalability

The forecasting operatory system is made by main body of NN which benefits from three learning technique that are LM, BR, and SCG. The selection of 4344 net input is allocated from the Canada, Ontario independent electricity operator system (IESO), during 1/1/2001 to 6/30/2001 till 1/1/2009 to 6/30/2009 in six month time horizons as well as 9 years which have been imported to DANN. The configuration of abovementioned structure are set as 10 hidden layer within 24 hidden neurons for each stage as well as 4320 output set iteratively. Furthermore, the comparison of predetermined and forecasted outputs in association with error functionality (MSE) are denoted as in Figs. 6, 7, 8, and 9. According to the Figs. 8 and 9, the result is accommodated with the actual historical data set. To qualify the contribution of simulation, the symmetric resolution of compared output of network are presented in Figs. 10 and 11 which convey the participation of measured dataset with forecasted output, respectively. As it is obvious, the blue line considered as the actual selection of measured data from IESO. In addition the red curve is defined to output value of simulated approach coherently. According to the identification of error trials of our correlated method, after 1000 epoch, the MSE is decreased to \(8.803 \times 10^{ - 3}\) (\(\varepsilon\)) which enables the conformity of strategy. The appraisal of NN construction incorporated with performance of NN are depicted in Figs. 12 and 13, respectively. Moreover Figs. 14 and 15 attached to clarify the feasibility of algorithm substantially. It is worth mentioning that, Fig. 16 is represented as linear regression view of simulated structure to fit all data set simultaneously.

Fig. 6
figure 6

Actual value of the power consumption from 1/1/2010 to 6/30/2010

Fig. 7
figure 7

The demonstration of predicted set of input

Fig. 8
figure 8

The comparison of actual and estimated power consumption from 1/1/2010 to 6/30/2010

Fig. 9
figure 9

The comparison of actual and estimated power consumption from 1/1/2010 to 2/1/2010

Fig. 10
figure 10

The actual value of input data set for the last week during 6/24/2010 to 6/30/2010

Fig. 11
figure 11

The comparison of actual and estimated power consumption 6/24/2010 to 6/30/2010

Fig. 12
figure 12

The configuration of LM method

Fig. 13
figure 13

The performance of LM-DANN

Fig. 14
figure 14

The autocorrelation of error

Fig. 15
figure 15

The error histogram of proposed architecture

Fig. 16
figure 16

The regression criterion for proposed architecture

4 Conclusion

All in all, we have assumed the advantage of ANN for the long term forecasting of electrical consumption as well as to predict the desired data set using the error criterion. In this paper, the intermittent nature of our problem has been depicted us that implementing proposed method is applicable for the uncertain frequency data sets. Thereby, the historical sets is reported by IESO, Canada’s power network, for the purpose of estimating . Plus, after determining the composition of DANN, the regulating steps which is guided by training progress of demand curve are applied to gain dependable results. Consequently, the simulation performance of DANN covers the sensitivity and practicable operating of proposed architecture which is obtained as tolerable minimum MSE.