1 Introduction

Electrical energy cannot be stored in the generation, transmission, and distribution process. Therefore, the generation must be enough to meet the consumption. A balance must be struck between the generation and the consumption to ensure effective energy management. The difference between the generation and the consumption causes an imbalance cost. The most effective way to minimize imbalance costs is to make an accurate load forecasting. Energy Stock Exchange and the Energy Market Operating Company (EPİAŞ) engage in activities related to the imbalance costs in Turkey. The main purpose and activity of EPİAŞ are to plan, establish, develop and operate the energy markets included in the market operating licenses in order to meet their needs in the energy market in an efficient, transparent, reliable manner. Day-ahead market, balancing power market, calculation of receivables, and debts arising from energy imbalance are the factors that constitute the imbalance cost. The accuracy of the load estimates affects the market prices and imbalance costs of distribution companies [1].

Electric load data reveal nonlinear features depending on various influences such as climatic factors, social activities, and seasonal factors [2]. There are many methods developed to determine the profile coefficients symbolizing the load estimates [3]. The changes in weather conditions, calendar effects, economic indicators, etc., are the factors affecting the power demand [4, 5]. If these factors are taken into account while estimating consumption, more accurate results will be achieved. Load forecasting is called short-term, medium-term, or long-term load forecasting based on the forecast period. The estimation models from one hour to a week are called short-term load estimation [6, 7]. There are many papers on load estimations. The most commonly used approaches in estimations are Arima models from the Box–Jeckins method [8,9,10], exponential smoothing models [11], autoregressive (AR) models [11, 12], and time-sensitive approaches based on a single variable. Accurate estimation of electric load is one of the most important issues for developing countries. However, univariate prediction models are insufficient in estimating the electric load. Therefore, regression models that include weather, holiday, temperature, wind conditions, humidity, and similar variables in addition to the time variable have also been proposed in the forecasting model [13]. Vrablecová et al. [14] have investigated the suitability, advantages, and disadvantages of the online support vector regression method to short-term load estimation in the publicly available Irish CER dataset. Wang et al. [15] have proposed a combined probability density model for medium-term load estimation based on quantile regression by combining a real monthly data set from the USA with three separate models consisting of random forest regression, gradient boosting decision tree and support vector regression. Liu et al. [16] have proposed a probability density estimation method based on the Copula theory to obtain the relational diagram of electric charge and real-time price with real datasets from Singapore. Zhang et al. [17] have proposed an alternative quantile determination method in addition to a parallel and improved load quantile forecasting method and solved the reliability problem of the structure of direct estimation intervals. Hu et al. [18] made short-term load estimation by using weather factors and periodicity of short-term load in generalized regression neural network method. Dordonnat et al. [19] have proposed a semi-parametric regression model for point load estimation and a multivariate time series simulation model for temperature estimation. Florian has applied a quantile regression-based forecasting method, which takes into account both weekly and annual seasonality, applied in the framework of probabilistic load prediction. He has used temperature information only to offset the long-term trend component [20]. Wu et al. [21] have estimated daily and weekly data by exponential adjustment method and regression model taking into account seasonal and trend data. Yaslan et al. [22] have made hourly electric load estimations using a mixed method created by empirical mode decomposition and support vector regression algorithms. Chen et al. [23] have proposed a new support vector regression that estimates demand using ambient temperature information two hours before the forecast time. He and Zheng have proposed a probability density estimation method based on Yeo–Johnson transformation quantile regression using Gaussian core function with 1hour load measurement data measured in August 2014 for summer months, and December 2014 for winter months in Canada [24]. In the reference [25], the researchers has been developed a forecast model that incorporated solar capacity to predict hourly load in southern California 24-h in advance. The forecast model has based on multiple linear regression, random forest, and gradient boosting methods. A regression tree method for short-term load estimation, which is integrated with the weighted average method, has been explained in [26].

ANN methods are also frequently used for short-term load estimation. There are many studies in the literature on the ANN method. Jha et al. [27] proposed LSTM (Long short-term memory) and random forest approach ANN models that make load estimation using meteorological data, historical loads and date type. Feng et al. [28] performed the medium-term and long-term power load forecast of an economic development region in Jiangsu Province using the method of the Elman neural network. Chen et al. [29] made household load estimates using a multiple cycles self-boosted neural network (MultiCycleNet) method that includes correlation analysis of electricity consumption patterns in multiple cycles. Velasco et al. [30] performed a performance analysis of ANN models for one hour forward electric load prediction using a representative dataset of historical electric load records of a specific geographic area served by an electric utility. Aly proposed six different load estimation models based on clustering techniques for Kalman filtering (KF), wavelet and artificial neural network (WNN and ANN) schemes, using different datasets from two different locations in Egypt and Canada [31]. Piazza et al. [32], using 7-year meteorological datasets and 4.5-year hourly load power demand datasets, made hourly electrical energy demand forecasts with autoregressive network-based artificial neural network (ANN). If the mathematical model is dependent on two variables and the data are taken in hourly periods for a year, the number of available data is quite high. Although a large number of data increase the estimation performance, it will increase the computational load in complex estimation models and will require the use of additional software tools. Therefore, plain and practical methods such as MRAM can be preferred instead of more complex methods.

In this paper, hourly load estimation was made with MRAM based on time and temperature. For the paper, MFMs that make hourly energy consumption estimations were obtained by using the temperature and residential electrical energy consumption data measured for a year in a certain region in Düzce province. The LPCs calculated with the energy consumption estimation values obtained with MFM were compared with the profile coefficients calculated by the local distribution company. The reliability of MFMs obtained with MRAM has been proven by subjecting them to R2, Adjusted R2, MAPE and RMSE tests. The back-and-forward ANN has been applied to randomly selected one-month data to measure the adequacy of the estimation performance of MFMs. Levenberg–Marquardt (LM) algorithm has been preferred because of the speed and stability, it provides in ANN training. It has been observed that the estimation performances of the proposed simple model and the complex ANN model are similar. This result showed that the MRAM method is an adequate and reliable estimation method for the data set we have. Thus, residential electrical energy consumption estimation for Düzce was made in a plain and effective way. This article is important in terms of showing that with sufficient parameters and data sets, without the need for complex software, it is possible to reach correct predictions by only making correct mathematical models.

2 Load profile

In Turkey, profile coefficient refers to the hourly energy consumption and it is calculated using the previous year's data sets prepared by distribution companies. The calculated coefficients are used in settlement balancing in the next year. Profile coefficients are calculated separately for groups such as commercial, agricultural irrigation, industry, lighting, and housing. Profiles are classified as profiles of weekdays, weekends, and public holidays to express daily domestic consumption. Weekday profile coefficient consumption shows Monday, Tuesday, Wednesday, Thursday, and Friday. However, Monday's profile is different from other weekday profiles. The concept of weekend represents consumption on Saturday and Sunday. The holiday profile shows the consumption on official holidays, and the profile name is expressed as vacation. As a result of the classification made, the consumption data for the days in each day type are collected on the basis of the settlement period. Then, the total consumption data for each day type are calculated within each month and averaged on a daily basis. Hourly average consumption data of each day type are divided by the average of Monday type to form hourly profile coefficients [33].

Profile coefficients are announced by EPİAŞ on the official website at the end of each year [1]. Only hourly consumption is taken into account in the calculation of profile coefficients. As an example, residential load profile coefficients (RLPCs) are given in Table 1 for residential consumptions that can be used for the province of Düzce for April announced by EPİAŞ. The hourly graphical representation of these profile coefficients is shown in Fig. 1. Since the day-based curves of the profile coefficients given in Fig. 1 show similar results with each other, it does not provide the opportunity to make a detailed examination. Meteorological data are not taken into account in the calculation of RLPCs announced by EPİAŞ. These RLPCs are used after one year. The fact that this situation will negatively affect the imbalance cost calculation in the settlement is not taken into account. However, four climates can be seen in Turkey. Temperature changes during the day vary regionally. Hourly temperature changes play an effective role in electricity consumption. Correlation coefficients have been calculated to see the relationship between hour-consumption and temperature-consumption for April. The coefficients are given in Table 2. In this paper, detailed mathematical models have been developed to observe the temperature-dependent change of consumption.

Table 1 RLPCs of electrical energy consumptions measured in April
Fig. 1
figure 1

Hourly change of day-type RLPCs for April

Table 2 Correlation coefficients of electrical energy consumption measured in April

3 Mathematical models

3.1 Time series method

The regression method, one of the time series methods, proposes to plot the dependent and independent variables. If a meaningful mathematical model can be derived for the plotted graph, this model will form an estimation equation for the time series. The mathematical model of the analyzed graph can be in polynomial, sinusoidal and exponential forms [34]. The regression equation is chosen considering which mathematical model the graphical distribution of the time series resembles. If the correct mathematical model can be obtained, the model will give close results for unknown times. The closeness of the results to the measurement values is proportional to the success rate of the model. Appropriate residential subscribers, which are thought to represent the whole province in Duzce, have been determined. Measurements taken from these subscribers have been used as a data set in this paper. Temperature data have been measured synchronously in addition to hourly consumptions. The changing electrical energy consumptions depending on the time and measured temperature have been examined. The change should be examined graphically in order to obtain the right model.

In Table 3, hourly energy consumption and air temperature data for Mondays in April of Düzce province residents are given. Graphical representation of these data is shown in Fig. 2. When looking at the change in the graph given in Fig. 2, it is clearly seen that the correct mathematical model should be a polynomial model. Consequently, polynomial multiple analysis regression method (PMRAM) has been used in the paper.

Table 3 Hourly temperature and electrical energy consumption measured on Mondays in April
Fig. 2
figure 2

3D graph of hourly temperature (°C) and consumed electrical energy (kWh) on Mondays in April

The estimation equation obtained from the multiple regression model using the measurements is given in Eq. (1) [35]. \({Y}_{i}\), \({X}_{1i}\) and \({X}_{2i}\), \({a}_{nm}\) and \(e\) represent measurement data (energy consumption), the arguments (hour and temperature), the coefficients of the estimation equation, and the estimation equation error value, respectively. Increasing the performance of the estimation model is possible by minimizing the estimation error. The sum of squares of the errors is shown by \(E\) in Eg. (2). \({\overline{Y} }_{i}\) gives the forecasted value. If the derivative of Eq. (2) is taken according to each of the estimation coefficients and equalized to 0, the matrix given in Eq. (3) is obtained [36, 37]. The estimation coefficients can be calculated using this matrix. In this paper, the mathematical models for detailed analysis have been solved by establishing a 5th order equation. In Eq. (4), the 5th-degree dependent variable model is given depending on two independent variables. A higher-order matrix could have been created, but it was observed that the coefficients of 5th-order matrices approached very close to 0. Therefore, it is predicted that the coefficients of higher-order matrices will be almost 0. In addition, an increase in the amount of coefficient to be calculated will increase the computational load significantly. Therefore, it was decided to create a estimation model up to the 5th-order in the paper.

$$ Y_{i} = a_{00} + a_{10} X_{1i} + a_{01} X_{2i} + \cdots + e $$
(1)
$$ E = \mathop \sum \limits_{i = 1}^{n} \left( {Y_{i} - \overline{{Y_{i} }} } \right)^{2} $$
(2)
$$ \begin{aligned}&\left[ {\begin{array}{*{20}c} n & {\sum X_{1i} } & \cdots & {\sum X_{1i} X_{2i}^{4} } & {\sum X_{2i}^{5} } \\ {\sum X_{1i} } & {\sum X_{1i}^{2} } & \cdots & {\sum X_{1i}^{2} X_{2i}^{4} } & {\sum X_{1i} X_{2i}^{5} } \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ {\sum X_{1i} X_{2i}^{4} } & {\sum X_{1i}^{2} X_{2i}^{4} } & \cdots & {\sum X_{1i}^{2} X_{2i}^{8} } & {\sum X_{1i} X_{2i}^{9} } \\ {\sum X_{2i}^{5} } & {\sum X_{1i} X_{2i}^{5} } & \cdots & {\sum X_{1i} X_{2i}^{9} } & {\sum X_{2i}^{10} } \\ \end{array} } \right]\; \\ &\quad \left[ {\begin{array}{*{20}c} {a_{00} } \\ {a_{10} } \\ \vdots \\ {a_{14} } \\ {a_{05} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\sum Y_{i} } \\ {\sum X_{1i} Y_{i} } \\ \vdots \\ {\sum X_{1i} X_{2i}^{4} Y_{i} } \\ {\sum X_{2i}^{5} Y_{i} } \\ \end{array} } \right]\end{aligned} $$
(3)
$$ \begin{aligned} \overline{Y}_{i} &= a_{00} + a_{10} X_{1i} + a_{01} X_{2i} + a_{20} X_{1i}^{2} + a_{11} X_{1i} X_{2i} \\ &\quad + a_{02} X_{2i}^{2} + a_{30} X_{1i}^{3} + a_{21} X_{1i}^{2} X_{2i} + a_{12} X_{1i} X_{2i}^{2} \\ &\quad + a_{03} X_{2i}^{3} + a_{40} X_{1i}^{4} + a_{31} X_{1i}^{3} X_{2i} + a_{22} X_{1i}^{2} X_{2i}^{2} \\ &\quad + a_{13} X_{1i} X_{2i}^{3} + a_{04} X_{2i}^{4} + a_{50} X_{1i}^{5} + a_{41} X_{1i}^{4} X_{2i} \\ &\quad + a_{32} X_{1i}^{3} X_{2i}^{2} + a_{23} X_{1i}^{2} X_{2i}^{3} + a_{14} X_{1i} X_{2i}^{4} + a_{05} X_{2i}^{5} \\ \end{aligned} $$
(4)

3.2 Fitness tests

The estimation methods are subjected to fitness tests to measure the accuracy and acceptability of the estimate. Thus, the estimation performance of the methods is measured. \({R}^{2}\) (The determination coefficient), Adj. \({R}^{2}\) (Adjusted \({R}^{2}\)), RMSE (Root Mean Square Error), and MAPE (Mean Absolute Percentage Error) are commonly used in load forecasting [38,39,40,41,42,43]. \({R}^{2}\) is expressed by Eq. (5). Here, \({Y}_{i}\) expresses the measurements, and these data are added for each measurement by taking the square of the difference between the measured data and the value of the estimate. Adjusted \({R}^{2}\) is shown in Eq. (6). The RMSE showing the square root of the mean square of the errors is given by Eq. (7). Here, \(n\) is the total number of measurements. Another test of conformity, MAPE, gives the average absolute percentage error value and is obtained as in Eq. (8).

$$ R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \overline{Y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{t} Y_{i}^{2} {-}\frac{{\left( {\mathop \sum \nolimits_{i = 1}^{n} Y_{i} } \right)^{2} }}{n}}} $$
(5)
$$ {\text{Adj}}. R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \overline{Y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{t} Y_{i}^{2} {-}\frac{{\left( {\mathop \sum \nolimits_{i = 1}^{n} Y_{i} } \right)^{2} }}{n}}}\frac{n - 1}{{n - k - 1}} $$
(6)
$$ {\text{RMSE}} = \sqrt {\frac{{\sum \left( {Y_{i} - \overline{Y}_{i} } \right)^{2} }}{n}} $$
(7)
$$ {\text{MAPE}} = \left( {\frac{100}{n}\mathop \sum \limits_{1}^{n} \left| {\frac{{Y_{i} - \overline{Y}_{i} }}{{Y_{i} }}} \right|} \right) $$
(8)

The determination coefficient is a value between 0 and 1. The closeness of the determination coefficient to 1 shows the forecasting performance. It is known that the smaller the RMSE test value, the better the prediction performance [44, 45]. In order to say that its performance is high in MAPE, Eq. (7) is expected to have a value below 10% [46]. The proximity of RMSE and MAPE values to 0 means an increase in forecasting performance.

4 Solution of the problem

The mathematical models developed in Chapter 3 have been solved using one of the time series methods, MRAM. The analyzed data have been divided into similar groups to increase the sensitivity of the observation. Firstly, the data have been into monthly groups. The box chart divided into months is given in Fig. 3. It is seen that the minimum, maximum, median, first quartile, and third quartile values of the data are different from each other in monthly periods. Figure 3 shows that there are significant differences in electricity consumption in summer and winter months as expected due to seasonal temperature differences in Düzce. Each month has been divided into daily periods in order to create more similar groups and to be able to examine them in more detail. As an example, the box graph showing the data of April divided into daily periods is shown in Fig. 4. It has been observed that the minimum, maximum, median, first quartile, and third quartile values of the weekday data are different from each other. Due to the necessities of social life, electricity consumption on weekdays and on weekends and near weekends shows different characteristics. But Saturday and Sunday (weekend) data have relatively similar characteristic values.

Fig. 3
figure 3

Monthly box plot of data

Fig. 4
figure 4

Daily boxplot of April data

All data are grouped into months and 6 days (Monday, Tuesday, Wednesday, Thursday, Friday, and Weekend) per month in order to develop a more precise forecast model in modeling as a result of the box chart observation. Thus, 72 MFMs have been created. While calculating the coefficients of mathematical models, instead of the independent variables (\({X}_{1i}\) and \({X}_{2i}\)) of the models, time of day information (\({H}_{i}\)) and measured temperature (\({T}_{i}\)) data, energy consumption (\({P}_{i}\)) data instead of the dependent variable (\({Y}_{i}\)) have been used. Estimation model coefficients have been calculated for each mathematical model by analyzing the obtained matrices. As a result, the general formula to which the calculated coefficients will be applied is obtained as in Eq. (9). Here, \({\overline{P} }_{i}\), \({H}_{i}\), \({T}_{i}\), and \({a}_{mn}\) indicate energy consumption forecasting, hour information, temperature measurement clock and estimation coefficients, respectively. 6-day estimation coefficients for April are given as an example in Table 4. The \({a}_{mn}\) values given in Table 4 show the coefficients of the estimation model calculated day by day for April. When the measured values were examined, it has been deemed appropriate to select daily groups to produce a mathematical model. With the obtained models, the amount of electrical energy consumed depending on time and air temperature has been estimated. Then, \({R}^{2}\), Adj. \({R}^{2}\), MAPE, and RMSE values have been calculated for each of the models. So, the reliability of the mathematical models obtained has been subjected to four different suitability tests.

$$ \begin{aligned} \overline{P}_{i} &= a_{00} + a_{10} H_{i} + a_{01} T_{i} + a_{20} H_{i}^{2} + a_{11} H_{i} T_{i} + a_{02} H_{i}^{2} \\ &\quad + a_{30} H_{i}^{3} + a_{21} H_{i}^{2} T_{i} + a_{12} H_{i} T_{i}^{2} + a_{03} T_{i}^{3} + a_{40} H_{i}^{4} \\ &\quad + a_{31} H_{i}^{3} T_{i} + a_{22} H_{i}^{2} T_{i}^{2} + a_{13} H_{i} T_{i}^{3} + a_{04} T_{i}^{4} \\ &\quad + a_{50} H_{i}^{5} + a_{41} H_{i}^{4} T_{i} + a_{32} H_{i}^{3} T_{i}^{2} + a_{23} H_{i}^{2} T_{i}^{3} \\ &\quad + a_{14} H_{i} T_{i}^{4} + a_{05} T_{i}^{5} \\ \end{aligned} $$
(9)
Table 4 Coefficients of the mathematical day models of April

5 Results and sensitivity analysis

The data were classified into months and 6 days each month in order to create more precise estimation models. Hence, 72 residential consumption estimation models base on time and temperature were created. Monday in April has been chosen as an example from 72 mathematical models developed in the paper. The consumption estimation diagram of the developed model according to the measured temperature and time is given in Fig. 5. Similar variations according to the measured temperature and time can be obtained for the other 71 forecasting models. Currently, local distribution companies calculate profile coefficients by taking the average of their consumption. The profile coefficients for each hour are calculated by the ratio of the total hourly energy value of Monday. Meteorological data are not taken into account in these calculations. The RLPCs calculated by the local distribution company and the RLPCs obtained by the MRAM are given in Table 5 comparatively. To support the estimation accuracy of the MRAM, estimation was made using ANN with the data and RLPCs were calculated for these estimations. Thus, it has been seen that the MRAM estimation results and the ANN estimation results are close to each other. Consumption estimates were made in cases of ± 2 °C changes in the temperature value in order to examine the effect of temperature change. Accordingly, it has been observed that even a small temperature change causes a visible change in the profile coefficients.

Fig. 5
figure 5

3D Graph of hourly temperature (°C) and electrical energy consumption forecast (kWh) on Mondays in April

Table 5 Comparison of RLPCs calculated with different forecasting methods for Monday in April

The performance of another forecasting model (ANN) is also examined to show the effect of the MRAM. For this, measurement data on Friday in April were taken into account. The graphical change of hourly forecasting RLPCs on this day is shown in Fig. 6. Similar to the considered Monday model, changes close to the measurement results have been observed in the Friday model. It has been determined that there is a noticeable change in the consumption estimation results when the temperature increases by + 2 °C and decreases by − 2 °C. If the temperature effect is not taken into account, it is clearly seen that there will be deviations in the estimates used in settlement, and thus, the imbalance cost will increase. This shows that temperature has an effect on consumption.

Fig. 6
figure 6

Hourly forecasted RLPCs for Friday, April

It is of great importance that the created MFMs pass the reliability test successfully. The fitness tests, which are widely used in the literature, were applied to the estimation results and the results are given in Table 6. On Friday, April, it was the best performing model among the mathematical models. According to the R2 test results of this mathematical model, it has been seen that the developed mathematical model can estimate consumption with an accuracy of up to 96%. In the RMSE test of the model, the RMSE value has decreased to 2.34. According to the MAPE test results, the MAPE value drops to 2.67%. Similar test performances were observed for the other mathematical models. Consequently, the reliability of the proposed models is acceptable for all fitness tests. The change of temperature affects the electrical energy consumption forecasting, as shown in Table 5 and Fig. 6. So, meteorological data should also be taken into account in order to increase the consumption forecasting accuracy and reduce the imbalance cost. RLPCs can be obtained more accurately and reliably with the mathematical models created using hourly temperature data. Thus, an important solution proposal is presented to reduce the imbalance cost in the paper.

Table 6 Estimation performances of the developed mathematical models according to months and days

6 Conclusions

The greater the error to be made in the consumption estimation, the imbalance costs will increase in the same proportion. In order to minimize imbalance costs, estimation methods with high accuracy are needed. In this paper, by using PMRAM, one of the time series methods, short-term load estimation mathematical models were developed using hourly air temperature and residential consumption data measured in a certain region in Duzce for one year. Similar mathematical models can be made for different geographic regions using NWPs instead of temperature measurement.

Electrical energy consumption forecastings have been made based on the mathematical models developed in this paper. In order to examine the estimation performance of the proposed models, the months, days, and hours with high seasonal temperature changes can be taken into account. Accordingly, an increase of 3.68 °C in ambient temperature was measured between 08:00 and 09:00 on Friday in April. While the consumption measured between these hours is 76.77 kWh, the electrical energy consumption has been estimated at 76.24 kWh with the proposed model, and 76.32 kWh with the ANN. There was a 5.52 °C change in the ambient temperature between 00:00 and 01:00 on the same day. Between these hours, the energy consumption was measured as 56.65 kWh. The energy consumption has been estimated at 56.95 kWh with the developed model, and 56.05 kWh with the ANN. According to these results, the accuracy of the proposed models is also supported by a more complex method, ANN.

It has been determined that the increase or decrease in the temperature values for the measured energy consumption region (Duzce) plays a decisive role in residential consumption. Therefore, it has been observed that taking into account the temperature forecasts while estimating the electrical energy consumption increases the forecast performance. This paper shows that the efficient use of MFM will improve the performance of energy consumption estimation and thus reduce imbalance costs.