Keywords

1 Introduction

In the recent years, two major trends can be observed within the European electricity market: a growing share of renewable energy supply and an increased market interconnection. Simultaneously, a steadily growing trading volume and a high price volatility are observed at the EPEX SPOT day-ahead market [1]. Providing an accurate price forecast creates a strategic as well as an economic advantage, which is important to all participants of the EPEX SPOT day-ahead market. Current forecasting approaches mainly apply linear regression, GARCH, ARMA and ARIMA models or artificial neural networks [2]. Especially artificial neural networks are able to capture real world market processes mathematically and market dynamics can be transferred into a market model [3]. In terms of neural network approaches, three-layer networks, known as multilayer perceptrons (MLPs), are state of the art. However, deep neural networks have proven in many cases, that they can approximate complex dynamics better and with fewer units [4]. In this paper, we propose a market modeling approach applying deep neural networks in order to secure a holistic and robust market model with accurate predictions. The forecasts are given each day at 11:30 am, in order to have sufficient time to place the orders at the EPEX SPOT day-ahead market, where order books close at 12:00 pm. Only data, which are available until 11:30 am are considered in the modeling process. This makes the whole system real-time capable.

2 Empiric Market Modeling

2.1 Neural Network Approach

Originally, artificial neural networks were developed to model the biological processes within the human brain. Their characteristic to capture highly non-linear and complex systems makes them well suited for econometric modeling. The internal information processing within an artificial neural network can be interpreted as the mathematical model of the real world decision-making process of a trader at the stock exchange. Here, the trader needs to rate all information according to their relevance, aggregate the weighted information and derive a final decision. In the artificial neuron, the first two steps, weighting and aggregation of relevant information, are mathematically captured by multiplying the numerical input information with a certain weight and summing them up: \(\sum _{i=1}^n w_{i}x_{i}\). In a mathematical model, the decision-making process is described by a step function f. Since step functions are not continuously differentiable, sigmoid functions are applied in the artificial neuron: \(f\left( \sum _{i=1}^n w_{i}x_{i}\right) \). These three steps form the information process within an artificial neuron (see Eq. 1 and Fig. 1 left). An additional threshold \(w_{0}\) is considered here, which can be used as certain stimulus threshold for the decision making.

Fig. 1
figure 1

Left: Information process within an artificial neuron, right: Structure of an artificial neural network

Mathematical information process within an artificial neuron:

$$\begin{aligned} y = f\left( \sum _{i=1}^n w_{i}x_{i} - w_{0}\right) \end{aligned}$$
(1)

Since each artificial neuron can be interpreted as an individual trader, a network of artificial neurons can thus be seen as a whole market model. In the right side of Fig. 1, a three layer neural network with four hidden neurons is displayed. All hidden neurons are connected to each input neuron (numerical input information) and each output neuron. To any link, a certain weight is attached, which filters the information. Whereas in reality, the trader weights the available information according to his gut feeling or his experience, in artificial neural networks a mathematical algorithm adjusts the weights such, that the outcome assimilates the target function. This is achieved by applying the backpropagation algorithm, which computes the gradient of the error function with respect to each weight. Afterward, a suitable optimization algorithm searches for the optimal weights.

Fig. 2
figure 2

Topology of the applied deep neural network

2.2 Deep Neural Networks

According to [5], a sufficiently large three-layer neural network is actually able to capture any kind of continuous function on a compact domain [5]. However, deep neural networks often show better results using fewer units to approximate complex functions [4]. Therefore, we apply three-layer neural networks as well as deep neural networks. In this context, a neural network model is called "deep", if it consists more than one hidden layer. This can be achieved by simply adding additional hidden layers to the three-layer model. However, this has some major drawbacks. As described in [6], it is not ensured, that the lower layers contribute to the final output at all. Additionally, through a large tower-like construction, relevant input information might get lost on the long forward path, whereas the error signal in the backward path decays, while it propagates through the large number of hidden layers. Therefore, the topology depicted in Fig. 2 is applied within this study. Each hidden layer is separately connected to the input layer. By the use of the shared weight matrix A, each hidden layer will get the same input information. Moreover, all hidden layers are connected with a separate output layer. Thus, on the backward path learning is applied on each single intermediate layer. The information is transferred from one hidden layer to the following through the backbone of the model. Applying additional highway connections, here visualized with dashed lines, the information can also surpass intermediate layers. The error, on the other hand, is not propagated through the backbone.

In the empirical study, the optimal values for the meta-parameters training set, validation set, pattern learning sequence, activation function, training epochs as well as the best topology needs to be determined.

2.3 Quantile Base Bias Correction

Just as per mathematical definition, artificial neural networks are not able to extrapolate. In most cases, it can be observed, that higher values are more likely to be underestimated whereas smaller values are often overestimated. We propose a quantile-based scaling method (QBS) in order to further reduce the prognosis error, especially for the rare events. In the quantile-based scaling process, the cumulative distribution functions (CDFs) are computed and their percentiles compared. The difference between the mean of the model quantile \(\overline{x_{model_{q}}}\) and the corresponding target quantile \(\overline{x_{target_{q}}}\) is computed and added to the corresponding model output \(x_{model_{q}}\). Since model and target CFDs only show larger deviations at very large and very low values, only these areas are adjusted using this method. This results in more accurate predictions, especially for the rare events of high or low prices.

$$\begin{aligned} QBS:\qquad \widehat{x_{q}}=\left( \overline{x_{model_{q}}}-\overline{x_{target_{q}}} \right) + x_{model_{q}} \end{aligned}$$
(2)
Table 1 Final performance errors (MAE [€/MWh])

3 Results and Conclusion

As described above, at first, optimal values for all meta-parameters need to be distinguished within an empirical study. In the case of modeling the EPEX SPOT day-ahead market, the underlying dynamics can be captured best applying the following setup:

\( \begin{array}{ll} \mathrm {Training set:} &{} \mathrm {4368 \, pattern \,(182 days)} \\ \mathrm {Validation set:} &{} \mathrm {336 \,pattern \,(14 days)} \\ \mathrm {Generalization set:} &{} \mathrm {24 \,pattern \,(1 day)} \\ \mathrm {Activation Function:} &{} \mathrm {tanh} \\ \mathrm {Pattern selection:} &{} \mathrm {Permute} \\ \mathrm {Training epochs: } &{} \mathrm {50} \\ \mathrm {State dimension:} &{} \mathrm {30} \end{array} \)

Fig. 3
figure 3

Comparison of final results in an exemplary winter week in 2016

The subsequent modeling study shows, that a setup with four hidden layers combined with the use of shared weights matrices A, B, C and highway connections results in the lowest mean absolute error (MAE). In Table 1 it can be seen, that the neural network approach by far outperforms the simple estimate of applying the previous days’ prices. Compared to the best three-layer neural network setup, the optimal setup of the deep neural network shows superior results, especially in autumn. Here, the deep neural network reduces the remaining error by 9%. The quantile based bias correction further reduces the error. This effect can be seen in Fig. 3. Whilst the three models perform quite similar in the medium price range, larger deviations occur in sharp price peaks as on the 9th of February 2016. For these events, the deep neural network architecture and especially the QBS approach reduce the remaining residual errors significantly. As the QBS mostly affects the tales of a distribution (i.e. rare events), the overall error is only reduced slightly. However, this method helps to identify extreme values and can be effectively combined with a trigger function for rare events.

Through the systematic process of understanding and capturing real world information, extracting relevant parameter in the sensitivity analysis, a large empirical neural network study and the statistical bias correction step, we obtain accurate results from a robust model setup. The findings within this paper can either be directly used for an improved trading at the EPEX SPOT day-ahead market or can be used as the basis for further research. The proposed approach can easily be transferred for the modeling of other markets, other products or different forecasting horizons. Moreover, it can be applied to analyze the impact of future market developments on the EPEX SPOT price.