1 Introduction

The electricity suppliers require forecasts for electricity load balancing, supply, and demand management to utilize the power plant’s ability up to the required demand level. In terms of predictions, forecasting is divided into three categories (1) Long-term, (2) Mid-term, and (3) Short-term. All three categories are necessary to smooth line the load balancing, demand, and supply according to the market’s needs. All three mentioned categories are investigated in recent years with different parameters and perspectives. These studies can save electricity expenditures by analyzing the data and electricity patterns from approximate future consumption values. Analyzing the approximate future values, electricity suppliers can manage the prices of electricity [1], which eventually helps in the country’s economy. Properly planned electricity load and consumption can save money that can help in building economy [2].

Statistical and machine learning models are resourceful to get information about electricity consumption from the dataset. Generally, these consumption datasets have time series representation. Time series based datasets can be uni-variant or multi-variant. Time series data can have observations with specified time stamps. Timestamps can range from seconds to years. Time series data can be obtained from domains where data figures change with time, like in stock market [3]. Statistical or machine learning models can be applied to other fields. [4] introduces an approach to analyze churn dataset that also contains geographic locations. Several other studies are conducted to forecast electricity consumption with different variables and parameters [5,6,7,8,9]. Such models can help understand the variation in consumption while increasing the accuracy rate and decreasing the computational time.

Auto-regressive and Moving-average models are widely used for such forecasting studies. However, models from artificial intelligence can also be utilized to increase forecast accuracy. ARIMA stands for ”Auto-Regressive Integrated Moving Average. It comprises models that predict a particular value derived from its former values, especially from lags to deferred estimate errors. Each unseasonal time set that one reveals the template and stays to be casual white noise can also be shown with this model’s help. The ARIMA system is described with three terms: p, d, and q. To understand the ARIMA model, we will have to know the implication of these terms p, d, and q for this model. The goal for making this ARIMA model is to generate a fixed time set. Auto-Regressive, which follows the linear regression algorithm’s function that uses its lag values as the predictors. Linear regression helps to achieve a better outcome even though the predictors are non-correlated and non-dependent. Hence to generate a fixed time set, we have a simple mathematical approach that is mostly used. The approach is that computing the differential value from the prior and the present values. Due to the difficulty in the set, we may have to do a repetitive computation. Therefore, the d- value is the lowest number of times the computation being performed to generate the fixed time set. ARIMA is used in studies like forecasting wheat production, infant mortality rate, automated modeling for electricity production [5, 10,11,12], and many more. The biological neurons’ nature inspires neural network models as neurons, and the synapses connected them carry out brain processing. These models can help in a broad spectrum of domains like decision making, learning, emotions, language. Neural network models are well equipped to handle such experiments. Forecasting electricity can impact industries that are directly linked to electricity production. Prices can go up and down to such industries like oil and gas industry [13], which helps in electricity production. This statement emphasizes the study’s importance, as many other factors can affect electricity load balancing, production, supply, and demand. In this study, the authors investigated comprehensive models for the related dataset. Authors identify the hyperparameters for models, conduct experiments, and compare results based on MSE, RMSE, MAE, and MAPE metrics. These models provide enough insights for the dataset to get more accurate forecasting. The author also contributes more by combining CNN with Bi-LSTM to create a new model to forecast and analyze electricity consumption. The paper’s structure is as follows: Section 2 provides related work and a much better understanding of the models like ARIMA and Neural networks. Section 3 provides material and methods that have been used while conducting this study. Section 4 provides information about the experiments and discusses results that are observed. Furthermore, results are discussed for future work, and at the end, the authors conclude their study.

2 Related work

ARIMA is a well-known forecasting model, especially in stock and finance, thus [14] talk in detail about the pros, cons, and techniques to improve the ARIMA model. A forecasting model for commercial users in SouthKorea studied by [8] for gas, petroleum, electric heat, and renewable energy to accommodate United Nation’s convention on climate change. Likewise, forecasting electricity production in Turkey [15] is compared with regression and ARIMA model and provide reliable results. As ARIMA is famous among stock market, [3] forecast stock market crises by considering the probability of the stock market crash in the various time frame. ARIMA is also explored by [10] where they use (2,1,2) hyperparameter values for wheat production. The exponential-smoothing method, along with ARIMA, is studied by [16] and expand the application with bagging to determine monthly load and states that their models are suitable for developing countries. The model “FB-Prophet ”is getting famous due to its usability and adaptability, thus studied by [17] in search of opensource-tools and algorithms.

Neural Networks are mostly renowned for image processing, especially in medical image processing. Different models with parameters have to be analyzed to determine the best outcomes, and the study of [18, 26] shows promising results. Neural Networks can help in understanding personality dynamics. They can determine the state of personality is stable or not [19] and what variables can affect the personality variable. Photovoltaic (PV) integration can help in economic growth. It is a promising renewable energy source, thus requiring prediction and forecasting to help make future decisions. Forecasting PV based data can be done by the neural network model named LSTM. LSTM-RNN is analyzed by [20], which can determine temporal changes in PV output can be evaluated through the hourly dataset for a year. Electricity is considered a key role player in economics and is thus studied by many researchers with different models and approaches. An approach used by [21] utilizes LSTM along with the genetic algorithm to get better results and performance with time-series data for short-term and long-term forecasting. Further improvement in forecasting with LSTM is made in the article from [22] where the author combined the CNN with Bi-LSTM to get better forecasting result for electricity for households. Considering electricity plays an essential role, [23] also proposes a model with LSTM capable of forecasting load for single residential as several other parameters are involved. The proposed framework with LSTM evaluates real residential smart meter data. Residential usage is essential, and many researchers are looking deeper to find patterns in residential electricity usage. Forecasting long-term electricity demand for the residential user is also affected by other variables. Such a variable can be the number of households in the residential area. Studies conducted by [24] are considering these variables along with average consumption and electrification rate. Modeling with granularity is quite challenging, as shown by [25]. Modeling long- and short-term temporal pattern with LSTNet is done by [27] where they extract temporal dependencies among variables in the given time-series dataset. Experiments conducted by [28] show the importance of data standardization and data sampling to overcome uncertainty associated with neural network training with time distributed data.

Credit scores can be improved with neural networks as [29] investigates five models with 10-fold cross-validation on two real datasets. Recurrent neural networks (RNN) can be used for dynamic modeling for nonlinear data. Data play a vital role in overall modeling and experimentation. So, [30] makes simple modification into RNN to work along with nonlinear Spatio-temporal data for forecasting applications. Computational time is also a critical factor in the overall forecasting process. The time factor can be heavily improved if we decrease the variable as [31] did it in their study by relying only on past solar energy consumption data.

3 Materials and methods

We have chosen a core i7 processor and NVIDIA GPU with 8GB memory to speed up the computational time in terms of hardware utilized. We used Python and Python integrated environment and TensorFlow and Keras libraries for neural network and stats models’ libraries to implement statistical models to develop prediction models.

$$\begin{aligned} \hat{Y}_{t}= & {} \mu + \phi _{1}Y_{t-1} + \cdots + \phi _{p} Y_{t - p} - \theta _{1} e_{t - 1} - \cdots - \theta _{q}e_{t - q} \end{aligned}$$
(1)
$$\begin{aligned} MSE= & {} \frac{\mathrm {1} }{\mathrm {n}} \sum \limits _{i=1}^{n} (observed -predicted)^{2} \end{aligned}$$
(2)
$$\begin{aligned} {RMSE}= & {} \sqrt{ \sum \limits _{i=1}^{n} \frac{\left( x_{\mathrm{predicted},i} - x_{\mathrm{measured},i} \right) ^2}{n} } \times 100 \end{aligned}$$
(3)
Table 1 Abbrevation table

3.1 Dataset

Dataset is provided by Korea electric supply company. Dataset having a shape of (3120,6). 3120 observations for 25 districts collectively. Further, this dataset is divided into 6 categories like household, public, services, industries, Total and Districts along with timestamp as index for time series prediction. Dataset can be preprocessed as it gives uni-variant representation if we discuss about one district at a given time. These observations are for ten years starting from 1st January 2009 to 1st December 2018. While preprocessing the dataset, we have found multiple observation for January which make it uninterpretable in time series analysis. We have considered that it is a mistake as observations for the month of October were missing. We have replaced second observation from January to October. After taking the decision for multiple observation we check for missing values and found that there are no missing values in the 3120 observations and make it easy for further preprocessing.

Table 2 Dataset description

Table 1 is the abbreviation table, and Table 2 provides information in the dataset according to variables. As the timestamp for observations is one month, we can analyze this dataset for mid-term forecasting, i.e., for a couple of months. In this Districts column, a variable named “Total” can be excluded as we can calculate the monthly total separately. This exclusion makes the dataset easy to interpret appropriately.

Fig. 1
figure 1

GURO District 10 Year’s electricity consumption

Fig. 2
figure 2

DOBONG District 10 Year’s electricity consumption

Figure 1 shows the 10-year electricity consumption in the GURO district. In contrast, Fig. 2 shows the Dobong district in the “Public” category. We have chosen the “Guro” district for our analysis. This decision also makes time series data of uni-variant type. Statistical analysis can be performed on the preprocessed dataset (Table 3).

Table 3 Statistical information of Guro district in “Total” , “Public” and “Industrial” sectors

4 Building the forecasting

We have performed statistical as well as the machine learning algorithm to fetch the information from the dataset. To evaluate our experiments, we have considered performance metrics like MSE and RMSE.

4.1 ARIMA model

Arima is a statistical model to forecast the values. Combining the AR and MA model formulas, we get the general formula as

Fig. 3
figure 3

ACF and PACF

Figure 3 indicates that our data are not stationary. We have to do some data preparation before using the ARIMA model. It can also be observed that the number of spikes is not within the critical range. This phenomenon can determine the P and D values for the ARIMA model.

Fig. 4
figure 4

ARIMA candidate p, d, q results for GURO District

Fig. 5
figure 5

ARIMA 3, 1, 1 result for GURO District

Figure 4 provides the results of different hyperparameters that are determined from Fig. 3. In Fig. 4, it can be seen that the suitable value for p, d, q is 3, 1, 1, respectively. Figure 5 provides a pictorial view of the ARIMA model with hyperparameter (p, d, q) values (3, 1, 1) with an MSE of 0.028. It shows that model was successfully executed with a hyperparameter setting and yield good results.

Fig. 6
figure 6

4-Months prediction with ARIMA (3, 1, 1)

Figure 6 shows the 4-month predicted values in a gray area with a 95% interval, and forecasted values are the blue line (Fig. 7).

Figure 6 shows the 4-month predicted values in a gray area with a 95% interval, and forecasted values are the blue line (Fig. 7).

Fig. 7
figure 7

Enhanced 4-months prediction with ARIMA (3, 1, 1)

4.2 LSTM

LSTM requires data reshaping according to samples, timestamps, and features. Data have been reshaped accordingly and sent to two layers of LSTM are configured with 50 neurons to boost the learning process. RELU activation function has been used with return_sequecnces equal to true so that data can be passed from one layer to another. After 200 epochs, we got a train score of 0.15 RMSE and a test score of 0.21 RMSE.

Fig. 8
figure 8

LSTM results

Figure 8 shows the LSTM result with 120 observation 70–30%. 70% for training and 30% for test (Fig. 9).

Fig. 9
figure 9

LSTM model loss

4.3 Bi-LSTM model

Bi-LSTM also gets normalized data in our experiment. We used two bidirectional layers of LSTM having same configuration as we used in simple LSTM model. RELu activation function has been used with adamax as optimizer. To maintain same criteria for testing and training, we also run this model for 200 epoch which produces Train score of 0.14 RMSE and Test score of 0.22 RMSE.

Fig. 10
figure 10

Bi-LSTM results

Fig. 11
figure 11

Model loss

Figure 10 shows the Training and Testing results while Fig. 11 shows the model loss.It can be seen that after a certain time period, loss is increasing. This phenomenon is because of less observations to work with.

4.4 Bi-LSTM: LSTM model

Our experimentation also includes combined model of Bi-LSTM and LSTM layers. Configuration are same as we have utilizes for our previous models experimentation. We have used two layers of Bi-LSTM and two layers of LSTM for this experimentation. After 200 epochs, we get Training score as 0.15RMSE and Testing score as 0.26 RMSE.

Fig. 12
figure 12

Bi-LSTM: LSTM results

Fig. 13
figure 13

Model loss

Figure 12 shows results of the model for Training and Testing for 120 observations, while Fig. 13 shows the model loss. It can be seen that model loss is fluctuating as compared to Bi-LSTM model and trend can be seen to go upward. This means over the time this model will produce more loss. Like other models, this model can also be re-tuned to work for mid-term forecasting applications.

4.5 CNN-LSTM model

Convolution layer is introduced in this experiment with filter equals 1 and kernel size equals to 1. We choose same activation function RELu to maintain the consistency. MaxPooling layer with pool size one is used for this model. Two LSTM layers with same configuration are also combined in this model with 1 dense layer as we need single output. Dataset is splitted in 70 and 30 ratio for Training and Testing, respectively. After 200 epochs, model produces 0.13 RMSE for Training and 2.06 RMSE for Testing. Testing score is not as good as we anticipated. This shows that CNN combined with LSTM might not good combination for our dataset.

Fig. 14
figure 14

CNN-LSTM results

Fig. 15
figure 15

Model loss

Figure 14 shows the CNN-LSTM results, while Fig. 15 shows the Model loss. It can be seen that after 100th observation we have a downward spike in Fig. 14 and around same observation in Fig. 15, we have more model loss. This phenomenon is because of observations that are feeded to the LSTM from CNN layer and LSTM layer took some time to learn according to feature coming from CNN layer.

4.6 CNN-Bi-LSTM model

The main focus of this study was ARIMA and this model as they can produce better results in forecasting. We have combined CNN layer and Bi-LSTM layers in this proposed model. CNN layer is configured as same in CNN-LSTM model to keep the consistency for evaluation and comparison purpose. After convolution layer, we have added two layers of Bi-LSTM that produces 0.14 RMSE Training score and 0.20 RMSE Testing score after 200 epochs.

Fig. 16
figure 16

CNN-Bi-LSTM results

Fig. 17
figure 17

Model loss

Figure 16 shows the training and testing results of proposed model, while Fig. 17 shows the loss of the model. From Fig. 17, it can be observed that loss is shaking but still have downward trend from which we can say that with large dataset this model will perform better.

5 Experimental results

Table 4 Results’ discussion

Table 4 shows the results of the proposed models. As we can see that ARIMA performs extremely well but it require extensive data preprocessing before data can be analyzed by model and can variate with certain change in data incoming patterns. Mean while ARIMA model is prone to seasonality, trend and white noise which makes it really difficult to make it stationary in real-world scenario. ARIMA does not accept time series data if it is not stationary. Mean while, CNN-Bi-LSTM also shows better results without extensive data preprocessing. That makes neural network easy to adopt in real-world applications. Especially RNN-LSTM makes it easy to predict short, mid and long term electricity consumption forecasting. Our models show good result beside CNN-LSTM model which spikes after 100th observation due to the factor of feature that are coming from CNN layers and LSTM was not able to interpret it as number of observation were also very small. To be concise we only have 120 observations for single districts that makes models to behave like that.

6 Future directions

In future, we are considering to test and develop automated ARIMA models that can determine order of ARIMA to pre-process the incoming data before it can be passed to the model. Also, Neural networks show promising result and in our belief, we can optimize the process to get more accurate results with less computational time.

7 Conclusions

This study developed and proposed number of forecasting models for energy consumption prediction. Main focus of the study is to determine wether statistical models can perform with provided dataset or neural networks. To compare all the models, we maintain the consistency of hyper parameters so comparison can be most realistic. ARIMA and CNN with Bi-LSTM perform well in our study. ARIMA requires heavy data pre-processing, while Neural networks are easy to adopt. Our results show that only combination of CNN-LSTM did not perform well that has been discussed in experiment section and in results and discussion section also. Results produced by ARIMA are certain and can be applied into real-world application where data pattern do not change most of the time. Meanwhile, CNN combined with Bi-LSTM performed well with less MSE and RMSE after ARIMA.