1 Introduction

Load forecasting has attracted much attention in research on power systems. It enables utility suppliers to forecast loads to preserve a balance between supply and demand, reduce electricity production costs, and manage operation scheduling and future capacity planning. Mocanu et al. [1] classified load forecasting methods into three categories according to forecast range: short-term from 1 h to 1 week, mid-term from 1 week to 1 year, and long-term 1 year or more. Short-term load forecasting (STLF) is important for managing power markets and for power system operations such as unit commitment and economic dispatch. Therefore, STLF errors could interfere with reliable power system operation and cause economic losses [2].

Load forecasting methods can be classified as time series method or causal method. In time series method, the load is modeled as a function of past observed values. Auto-regressive and exponential smoothing methods are common time-series forecasting methods, and these methods usually use time delay to forecast based on past observations. Choosing the appropriate time delay is an important step in eliminating redundant features, and time series method should select the time delay range of the previous input variables to use in model construction. This allows forecasters to increase the accuracy of the method and better understand the underlying process of time-series data [3,4,5].

Meanwhile, in causal method, the load is modeled as a function of some external factors, especially weather and social variables. The most common causal method is multiple linear regression (MLR). MLR is attractive because forecasters can attach physical interpretations to the MLR’s components. However, MLR is in essence linear method, whereas loads are known as non-linear functions of external factors [6,7,8]. In addition, authors of some studies have applied other forecasting methods such as non-parametric regression [9], multiplicative auto-regression [10], Kalman Filter [11] and ARMAX [12].

Recently, many researchers have applied artificial intelligence techniques to load forecasting, including fuzzy linear regression, random forest, support vector machine [13, 14], but the method that has received the most attention is neural networks (NNs). NNs have been widely used for forecast tasks because they can model non-linearity. The basic structure of NNs, multilayer perceptron, was used to forecast loads using previous data [15, 16]. Various NNs structures have been applied to improve the forecasting accuracy [17,18,19,20]. Deep neural networks (DNNs) are NNs with more than one hidden layer. The multiple layers improve the feature abstraction of network, allowing for efficient learning of complex and non-linear relations [21]. DNNs are reported to produce improved performance over shallow NNs [1]. But from an intuitive point of view, it is clear that the nature of load is dynamic, not static. Changes in load are affected not only by external factors but also by past and current load conditions. In this sense, the static neural network is a lane load forecasting method, and RNNs are proposed to integrate previous load state information to current load state [22]. Additionally, LSTM is a variation of RNNs that was originally developed by Hochreiter et al. [23] to allow the preserving the weights that are forward and back-propagated through layers. Salah et al. [24] employed LSTM model for load forecasting that outperformed other machine learning approaches. Zheng et al. [25] developed a hybrid LSTM model using Xgboost and k-means. Xgboost is used to determine the feature selection and k-means is used to merge similar days into one cluster. Because load is fluctuated various non-linear factors such as social, economic, and weather factors and have time-series characteristics, various load forecasting methods have been proposed to reflect these characteristics [26]. But researchers still emphasize the need for more accurate and reliable load forecasting methods. In particular, STLF has become increasingly important in modern power systems since the rise of solar PV and wind power, through which the output is intermittent according to the weather conditions [27].

The aim of this paper is to contribute to addressing the issues related to day-ahead load forecasting. Load forecasting should reflect both the time-series characteristics of loads, and the non-linear correlations of load fluctuation factors. Day-ahead load forecasting also requires prediction data such as weather information and day of week of a given forecasting day. To address the challenge of filling this need, deep neural networks based forecasting method that can learn and extract rich features from the input is proposed. In the proposed method, long short-term memory (LSTM) layer with a fully-connected (FC) layers are combined. These receive data with different characteristics as input. LSTM layer based on recurrent neural networks (RNNs) are used to model the variability and dynamics from historical data. FC layers are used to project prediction data and form the relationship with the output of the layer.

The proposed method is evaluated respectively based on load forecasting accuracy for the total load of Korea power system in 2017 and 2018. Additionally, to objectively verify the performance of the proposed method, KPX short-term load forecasting (KSLF), which is the short-term load forecasting tool to forecast day-ahead load in Korea, is compared with proposed method [2]. The forecast results based on the total load data of Korea power system show that the proposed forecasting method based on deep neural networks using LSTM layer outperforms alternative approaches with high accuracy.

The remainder of this paper is organized as follows: Sect. 2 provides an overview of the literature on load forecasting. Section 3 provides brief background on LSTM-based RNNs. Section 4 describes the methodology of the proposed method. Section 5 describes case study results and provides a discussion on validity of the results. Section 6 draws conclusions.

2 Long Short-Term Memory Based on Recurrent Neural Networks

In this section, the brief backgrounds on LSTM based on RNNs are provided. RNNs are special NNs that are connected in a feedback structure between units of an individual layer. They are called recurrent because they perform the same operation on all elements in the sequence. They make it possible to model data with time-series characteristics by complementing the limits of non-recurrent NNs, which independently assumes the relationship between inputs. The rolled and unrolled RNNs configurations are as follows Fig. 1.

Fig. 1
figure 1

The rolled and unrolled RNNs configurations [28]

In the Fig. 1, \({\text{A}}\) is RNNs cell and the \(x_{t}\) is the sequence input {\(x_{1} ,x_{2} , \ldots ,x_{n}\)} at time \({\text{t}}\) and \(h_{t}\) is the output of the cell. And \(h_{t}\) used as input at time \({\text{t}} + 1\). RNNs are able to store memory because the current output of neurons depends on the previous computations. The output of cell, \(h_{t} ,\) is calculated to use recurrence function \({\text{f}}\), expressed as follows (1). Therefore, RNNs are algorithm that exhibit excellent performance when the input has time-series characteristics such as sequential data [28].

$$h_{t} = f\left( {h_{t - 1} , x_{t} } \right)$$
(1)

However, RNNs have a gradient vanishing problem. It means that the weights propagated forward and backward through the layer decrease, and thus, the long-range dependencies cannot be preserved [29]. LSTM is proposed to overcome this gradient vanishing problem by introducing cell state and gates into RNN cells. In LSTM, the cell state and the three gates are added as follows Fig. 2. Using these gates allows preserving weights propagated through time and layers, LSTM can solve the long-range dependencies problems [30].

Fig. 2
figure 2

The structure of LSTM [31]

The output of LSTM is computed as follows (2)–(7):

$$f_{t} = \sigma \left( {W_{f} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right)$$
(2)
$$i_{t} = \sigma \left( {W_{i} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right)$$
(3)
$$o_{t} = \sigma \left( {W_{o} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right)$$
(4)
$$\tilde{C}_{t} = tanh\left( {W_{C} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{C} } \right)$$
(5)
$$C_{t} = f_{t} \odot C_{t - 1} + i_{t} \odot \tilde{C}_{t}$$
(6)
$$h_{t} = o_{t} * {\text{tanh}}\left( {C_{t} } \right)$$
(7)

where, \(f_{t}\), \(i_{t}\) and \(o_{t}\) are the forget, input and output gates respectively. The forget gate \(f_{t}\) determines how much the previous state is reflected in the current state. The input gate \(i_{t}\) determines the new information to update the cell state. The output gate \(o_{t}\) determines the information to output based on the cell state. The sigmoid function, \(\sigma\), is used to adjust the output to a value between 0 and 1 for the \(f_{t}\), \(i_{t}\) and \(o_{t}\) following (2)–(4). The output of these gates depends on the value of the current input \(x_{t}\) and previous output \(h_{t - 1}\). If the gate output is 0, the values are blocked by the gate, and if it is 1, the values are stored. \(C_{t}\) is the cell state and \(\tilde{C}_{t}\) is the value for calculating the current cell state \(C_{t}\). \(\tilde{C}_{t}\) is calculated following (5), and \(C_{t}\) serves as an accumulator of the state information. It is updated from previous cell state \(C_{t - 1}\) following (6). \(h_{t}\) is the output of current LSTM following (7). The \(C_{t}\) and \(h_{t}\) are used as input of the next time step. This process is repeated every time step. \({\text{W}}\) and \({\text{b}}\) are weights and biases of the LSTM cell. They are updated to a value that minimizes the differences between the output of LSTM and the target value through the training process.

3 Methodology

In order to forecast load accurately, it is necessary to identify and reflect the factors that influenced load. The factors that affect the load fluctuation include weather conditions, social events, etc. These factors have a non-linear relationship with load, and thus, it is difficult for load forecasts to reflect the effects of all such factors. Furthermore, load has strong daily, weekly, monthly time-series characteristics, and load forecasts should reflect these all characteristics. Therefore, forecasting method based on deep neural network using LSTM layer for STLF is proposed that reflects both the time-series trends of load itself and the non-linear correlation of load fluctuation factors. The output of proposed method is the hourly load for the next day.

In this section, the workflow for STLF is described in detail. The process follows six steps as shown in Fig. 3. In the following, detailed task of each steps are described.

Fig. 3
figure 3

The workflow for the proposed forecasting method

3.1 Data Selection

3.1.1 Historical Load

The hourly loads have significant approximate periodicity. This is because people consume electricity as part of their regular production activities and lifestyles. The proposed method uses historical load as input because it can reflect trends of load [2]. Prior to using the historical loads, a time-series exploratory analysis of the load should be performed. This can be useful in identifying load trends, patterns and anomalies. For this study, the Korean nationwide load data set is used, and these data include hourly load. The Fig. 4 shows the hourly load for June of 2016.

Fig. 4
figure 4

Hourly load in June 2016

In the Fig. 4, the loads have a similar pattern every week. This is because in general, load is closely related to the life patterns of modern society. Because electricity consumption for industrial and commercial activities is low on weekends, the weekend electricity usage patterns generally differ from the weekday patterns. And because the load is lower on weekends than on weekdays, the load on Monday morning is lower than that on other weekdays. Figure 5 shows a box plot of weekday and weekend load. In all years, the week-days loads are higher. Because the load patterns vary by Monday, Tuesday to Friday, Saturday, and Sunday, the load patterns are classified as day type according to daily load pattern. And this features should be reflected in the STLF.

Fig. 5
figure 5

Comparison between load of weekend and load of weekdays

People’s behavior patterns on public holidays, especially long holidays, generally differ from their normal daily patterns, and this affects load. Therefore, the data for public holidays are filtered from inputs [13]. In addition, the average hourly load of the latest 2 days which the day type is same with the type of each input day is used as input.

3.1.2 Temperature

The weather is known to have a great influence on electricity usage because it affects people’s behavior. People tend to use more electricity when temperatures are uncomfortable, whether too hot or too cold, and to do more activities indoors. The hourly load from January to December in 2016 is shown in Fig. 6. It shows patterns related to seasonal human activities. In summer (June to August) and winter (December–February), the loads are higher due to the effect of temperature than in other month.

Fig. 6
figure 6

The hourly load from January to December in 2016

The effect of load by temperature in 2016 is shown non-linearly as follows Fig. 7. Here, the temperatures are weighted averages obtained by multiplying the temperature of the eight major cities in Korea by the regional weights of the eight cities [32].

Fig. 7
figure 7

The relationship of temperature and load

The correlation coefficient between the hourly load and the hourly temperatures is 0.73, which is the yearly average of the absolute values of the correlation coefficient calculated monthly. The load correlated highly with the temperature and have non-linear relationship. Additionally, temperature is more accurately measured than are other meteorological factors. Therefore, the hourly temperatures are employed as input [33]. In the future research, we will review about the effects of other meteorological factor on load.

3.1.3 Dummy Values

As shown in Fig. 8, the load has similar load patterns for each day of week. Therefore, time and day of the week index can be useful for forecasting hourly load. So these 2 types of features are included as input in forecasting method [27, 34]. Because a day is 24 h, an incremental sequence from 1 to 24 is used as a time index. And number between 0 and 3 is specified for each day of week types to represent categorical features. Here, 0 is specified for Monday, 1 for Tuesday through Friday, 2 for Saturday, and 3 for Sunday.

Fig. 8
figure 8

Patterns of load by day of week

3.2 Input Feature Pre-processing

Data pre-processing is an important step to obtain better performance and accuracy of neural networks. Because neural networks are sensitive to data scale, the inputs have to be normalized before they can be used in neural networks. In the load forecasting, because the difference in scale between the load and temperature is large, accurate training cannot be performed when inputs are used without being normalized. After reviewing some normalization methods, min–max normalization is selected through analysis [35]. The min–max normalization is used to scaling the data in the proposed method to range (0, 1).

3.3 Structure of Forecasting Method

The pre-processed data are divided into training and testing sets. The training set is data for training the neural networks and the testing set is input to the trained neural networks and used to forecast the load of forecasting day. Except for the presence of labels, the basic data structures of the training and testing set are similar. Here, the label is the target value to input for the training the neural networks, and the proposed method uses the load as label. The neural networks are trained by comparing the output of the final layer calculated from the input features with the target load.

Both training and testing set consist of two types of input. One is historical data and other is prediction data. Here, historical data means recent past data used for load forecasting, consisting of past day of week, time of day, hourly temperature, hourly load, and the average hourly load of the latest 2 days. Prediction data are the data on target day, such as day of week, time of day, predicted hourly temperature, and the average hourly load of the latest 2 days. So appropriate NNs layers are used to learn the relationship and extract rich features from the data. As shown in Fig. 9, LSTM layer are used to extract features from the historical data. In addition to the historical data, prediction data should be incorporated into the networks. Therefore, the proposed method input the prediction data for the target day to FC layer to form relationship with target load. All the outputs from LSTM and FC layers are combined and used as input to next FC layer.

Fig. 9
figure 9

The structure of proposed forecasting method

In the historical data, D is day index, H is time index, T is temperature, L is load, AL is the average hourly load of the latest 2 days. And the number of historical data points is determined by the LSTM sequence length. In the prediction data, PD is day index, PH is time index, PT is predicted temperature by meteorological office, and prediction data are the values for the target day. And FL is the forecasted load, which is the output of final layer.

The difference between outputs from final layer and target values is calculated using mean squared error, and each NNs parameters are optimized until the error reaches the proper value. The computationally efficient Adam optimizer [36] showed better results than other optimizer, including the steep gradient descent, Adagrad [37], Adadelta [38] and RMSProp [39]. Therefore, Adam optimizer is used to train the proposed method.

4 Case Studies

The proposed method is evaluated based on the results of load forecasting for the year 2017 and 2018. Day-ahead load forecasting is performed only the normal days except special days. The training and testing sets for the forecast are constructed using the moving window method. The Korea Power Exchange (KPX), power system operator in Korea, performs day-ahead load forecasting on previous day. Because there are no hourly data for the day on which the forecasting is performed, the data until the day before the forecast execution day is used to forecast. Therefore, the testing set for day-ahead forecasting consists of several days of data before the forecast execution day. The data from the day before the forecast execution day to 1 year ago is used as the training data. The load forecasting error rate is based on mean absolute percentage error (MAPE) as shown in (8):

$${\text{MAPE}}\left( {\text{\%}} \right) = \frac{100}{n} \times \mathop \sum \limits_{t = 1}^{n} \left| {\frac{{L_{t}^{Mesured} - L_{t}^{Forecast} }}{{L_{t}^{Mesured} }}} \right|$$
(8)

where, \(L_{t}^{Mesured}\) is load measured at time t, \(L_{t}^{Forecast}\) is load forecast at time t, and n is the number of time steps.

4.1 Configure the Networks

The sequence length should be determined to use LSTM layer. Here, the sequence length means the number of days of historical data used for forecast. The hyper-parameter tuning of neural networks is usually based on experience and experimentation. So, the optimal sequence length for LSTM layer is selected based experimentation. Table 1 shows the yearly average for 2016 according to LSTM sequence length. Based on the results, sequence length is selected to 3 in the proposed method.

Table 1 The forecasting results of yearly average MAPE for 2016 according to sequence length of LSTM layer

4.2 Compare with the Results of Previous Study

In order to verify the performance of proposed method, it is necessary to compare its results with those from previous studies. To do this, the day-ahead load forecasting method, KPX Short-term Load Forecasting (KSLF), which the KPX uses is compared with proposed method. As a grid operator, KPX performs day-ahead load forecasting to operate the Korean power system. Exponential smoothing method is main scheme for day-ahead load forecasting in KSLF. It uses the latest 3 days of load as input. The past load used as input is corrected to reflect the load fluctuation by temperature. The detailed operation and description of KSLF for load forecasting are described in [2]. Tables 2, 3 shows the day of week average MAPE of each method’s forecasting results for 2017 and 2018.

Table 2 The forecasting results of day of week average MAPE according to each method in 2017
Table 3 The forecasting results of day of week average MAPE according to each method in 2018

Compared with the results of KSLF, the proposed method shows better forecasts for all days of the week. In the KSLF, Monday, Tuesday through Friday, Saturday, and Sunday are separately forecast. Also, the input data and forecasting method are different for each day of the week. This is to reflect different load characteristics by day of the week. For this reason, when forecasting Monday, Tuesday and weekend, the KSLF cannot reflect load continuity characteristics between adjacent days. Additionally, on Monday and weekends, the forecasting accuracy is low because other week data is used for forecasting. In the proposed method, forecasting accuracy is improved because forecasts use adjacent days. As shown in Fig. 10, the proposed method has a better forecasting accuracy in the Monday morning time than previous study.

Fig. 10
figure 10

Comparison the load forecasting error between KSLF and the proposed method for the Monday morning in 2018

The FC layer which has same factors except the presence of LSTM layer in proposed method, as follows Fig. 11, are also used for comparison.

Fig. 11
figure 11

The difference of structure with proposed method (left) and FC layer (right)

Tables 4, 5 shows the monthly average MAPE of each method’s forecasting results. It appears that adding LSTM layer to FC layer reflects the functionality between historical data for improved predictability. Because of applying hourly temperature, the accuracy of the proposed method is significantly better in summer and winter than KSLF. But the proposed method has poor forecasting performance for the period after long holidays such as Lunar New Year’s Day, Chuseok. The data for public holidays are not used as input for forecast. Therefore, the periods after long holidays are forecast using data from a day far from the forecasting day. For these reasons, there is a large error in the period after long holidays.

Table 4 The forecasting results of monthly average MAPE according to each method in 2017
Table 5 The forecasting results of monthly average MAPE according to each method in 2018

5 Conclusion

A forecasting method based on deep neural networks using LSTM layer is proposed to perform day-ahead load forecasting. In the proposed forecasting method, input features are processed using different types of layers according to their specific characteristics. In order to analyze the characteristics of load, time series analysis was conducted. In addition, correlation analysis with temperature is performed to select the input features. The selected input features are divided into the historical and prediction data to reflect the characteristics of the past data and the information of the forecasting day. A LSTM layer is used to extract features from past data and the FC layer is used as input for the forecasting day. In order to test the proposed forecasting method, day-ahead load forecasting is performed for normal days (excluding special days) in 2017 and 2018. Moving window method are used to select the training and testing set. The annual average MAPE of proposed method is 1.49% for 2017, 1.52% for 2018. And the proposed method shows better performance than KSLF, and better than FC layer which have the same structure except for no LSTM layer.

The forecasting method based on deep neural networks using LSTM layer presented in this paper will contribute to stabilizing power systems and to efficient power market operation. Future research will be carried out to reflect load fluctuations due to solar radiation and wind speed changes in order to take into account fluctuations due to renewable energy.