Keywords

1 Introduction

It is critical for electricity traders [1] to predict electricity consumption for balancing the electricity purchase and sales portfolios. Electricity consumption is a significant commodity to promote society’s economic development and raise people’s standard of living [2]. Difficulties associated with electricity storage necessitate accurate prediction of its consumption. This further emphasizes the use of apt and correct approaches for electricity consumption prediction. Amongst the methods for performing the prediction, ARIMA is a statistical method for analyzing and building the forecasting model which best represents a time series by modeling the correlations in the Data.

1.1 Autocorrelation Functions

  • Autocorrelation: Autocorrelation, also called a lagged correlation or serial correlation is the way in which the observations in time series are related to each other and is calculated as the simple correlation between current observation yt and the observation from p periods before the current one i.e. yt−p for a given interval.

  • Partial Autocorrelation: Partial autocorrelation is the correlation calculated after removing the linear relationship between the two consecutive observations. Basically, it is the correlation between yt and yt−p when the effect of y at other time lags 1, 2, 3, …, p−1 is removed.

  • Both autocorrelations and partial autocorrelations are computed for lags in sequence for the given time series. The autocorrelation at lag 1 is between Yt−1 and Yt, the autocorrelation and partial autocorrelation at lag 2 is between Yt−2 and Yt and so on for n lags.

  • Autocorrelation function (ACF) and Partial autocorrelation function (PACF): The identification of the basic model is done by observing the patterns of ACFs and PACFs (Autocorrelations versus lags). ACFs and PACFs act as a tool for finding the values of the orders of autoregressive and moving average components i.e. p and q of ARIMA(p, d, q) respectively.

1.2 Autoregressive Integrated Moving Average Model

The notation for ARIMA model is ARIMA(p, d, q) where p, q, d denotes the order of autoregression, the order of moving average and the degree of differencing respectively. The final prediction depends on these parameters. Hence, the prediction equation for ARIMA (p, d, q) process is denoted by:

$$ \Delta {\text{y}}_{\text{t}} =\upbeta_{0} +\upbeta_{ 1}\Delta {\text{y}}_{{{\text{t}} - 1}} +\upvarepsilon_{\text{t}} +\uplambda_{ 1}\upvarepsilon_{{{\text{t}} - 1}} $$
(1)

Here β0, β1 and λ1 are the parameters of the model and εt is the white noise error term (the unidentifiable form of data which doesn’t exhibit any patterns).

2 Literature Review

Currently, ARIMA model is avidly used for building a forecasting model for the time series of the electricity consumption. Katara et al. [3] used this method to forecast electricity demand in Tamale, Ghana using data from the Northern Electricity Department Tamale during 1990 to 2013. The model provides a seven-year forecast of the electricity demand in the city. Kandananond [4] implemented three forecasting techniques— autoregressive integrated moving average (ARIMA), multiple linear regression (MLR) and artificial neural network (ANN) to built forecasting model of the electricity demand in Thailand. Yasmeen and Sharif [5] studied, (the least out of sample forecast performance) i.e. the minimum forecast standard deviation value and Mean Absolute Percentage Error (MAPE) value of the four competing time series models. On the basis of these accuracy measures, the most suitable model to predict electricity consumption in Pakistan was built. Erdogdu [6] used co-integration analysis and implemented ARIMA modeling to estimate electricity demand and prediction respectively. Bianco et al. [7] evolved a long-term forecasting time series model by developing different regression models using historical electricity consumption, gross domestic product per capita (GDP per capita), gross domestic product (GDP) and population for the time period 1970 to 2007.

3 Methodology

The objective of this research is to develop a set of programs using SAS University Edition [8, 9] to find the best fitted model out of monthly, bimonthly and quarterly time series data [10,11,12] to predict the Electricity Consumption using ARIMA [13,14,15,16] model by considering the lowest value of Root Mean Square Error and Mean Percentage error for all three models. The selected forecasting model should have the lowest prediction error.

This work is purely based on ARIMA Model. The time series variable being used is the electricity consumption of the health care institution.

3.1 Data Preprocessing

Monthly data over a long period of time from April 2005 to February 2016 has been used for this research. SAS University Edition is used for building the model. Initially, data was not in a proper format and consisted of some missing values and so wasn’t ready for building the forecasting model.

The data is monthly, but we have to find the suitable time period to predict the future values among three time periods i.e. monthly, bimonthly and quarterly time series.

The following pre-processing steps has been performed:

3.1.1 Converting the Data in Proper Format

By default, SAS displays the date values in a numeric format which is difficult to recognize.

Table 1 represents the sample of the raw data displaying the date values in numeric format, when it is imported to SAS. Hence, a proper date format is assigned to the variable. Table 2 represents a sample of the final data ready to be analyzed on the SAS.

Table 1. Sample of raw data displaying the date values in numeric format
Table 2. Sample of final data ready to be analyzed

3.1.2 Filling the Missing Information

In a particular observation if no value exists for a particular variable, then a missing value is said to occur. Missing data are very common and can have a huge effect on the data analysis. If not handled properly, these can lead to unstable or inaccurate results. There are various methods to fill the missing data with mean, median or previous value. Table 3 represents the sample of the current data set with some missing data. In this paper, previous values have been used to fill the current value by assuming that current value will be similar to the previous value.

Table 3. Sample of data with missing values

3.2 Seasonal Adjustment of Data

For the seasonal adjustment of the data, a time series consists of three components: trend-cycle, combined seasonal effects, and irregular component. Here, Fig. 1 shows the seasonally adjusted data, while Figs. 2 and 3 depict the trend-cycle and irregular data in the time series respectively. The main purpose of the seasonal adjustment of the data is to identify the seasonal effects and remove them from the time series data.

Fig. 1.
figure 1

Seasonal adjusted data

Fig. 2.
figure 2

Trend cycle of data

Fig. 3.
figure 3

Irregular form of data

The time series is then called seasonally adjusted series and constitutes the trend-cycle component and irregular component.

3.3 Steps for Building the Model

Following are the steps for building the forecasting model using ARIMA:

3.3.1 Stationarity Checking

The Electricity Consumption of the health care institution is modeled variable. The time series is monthly time series from April 2005 to February 2016. The Fig. 4 represents the property of stationarity of the monthly data i.e. stationary or nonstationary times series. After the visual inspection of the graphs displayed in the figure, we can see that the time series is nonstationary and needs to be transformed into stationary time series; which is the first condition for applying ARIMA model. Here dB is the stationary time series. This is achieved by taking the first difference of the time series variable in the data. The blue line represents the nonstationary time series and redline represents the stationary time series. Data is prepared into two more formats i.e. bimonthly and quarterly time series.

Fig. 4.
figure 4

Stationary vs nonstationary monthly time series.

The Fig. 5 given below represents the property of stationarity of the bimonthly data while Fig. 6 represents the property of stationarity of the quarterly data.

Fig. 5.
figure 5

Stationary vs nonstationary bimonthly time series

Fig. 6.
figure 6

Stationary vs nonstationary quarterly time series

3.3.2 Model Identification and Model Estimation

By examining the Autocorrelation Function (ACF) plot and Partial Autocorrelation Function (PACF) plot of the stationary time series, the basic ARIMA(p, d, q) model can be identified, but final models can be estimated by conditional least squares estimation method. In this estimation method if there is unit root in any of the AR (autoregressive) or MA (Moving Average) terms i.e. sum of parameter estimates is 1 or close to 1 then the current model is stabilized by adding or removing the number of AR or MA terms so that the unit roots can be removed. By considering the lowest value of AIC and SBC of the candidate models the most suitable model for the three time series can be decided. PACF and ACF plot of stationary monthly time series have been shown in Fig. 7. By examining the PACF plot the value of “p” - the order of autoregressive component – can be estimated. As PACF function plot drops to zero after first two lags, the value of “p” is 2. By examining the ACF plot we can estimate the value of “q” - the order of moving average component. ACF function plot drops to zero after first three lags and hence, the value of “q” is 3. Since, the time series is differenced once, hence the value of “d” is 1. Basic model for the monthly time series of electricity consumption is ARIMA (2, 1, 3).

Fig. 7.
figure 7

PACF and ACF plot for differenced monthly time series.

Similarly, referring to Fig. 8 the values of “p” and “q” can be estimated for differenced bimonthly time series data as 1 and 2 respectively. As illustrated in Fig. 9 “p” and “q” values for differenced quarterly time series data can be ascertained as 2 and 1 respectively.

Fig. 8.
figure 8

PACF and ACF plot for differenced bimonthly time series

Fig. 9.
figure 9

PACF and ACF plot for differenced quarterly time series

The basic models are estimated by examining the ACF and PACF plots for all the three models. However, for bringing stability in the models, the existence of the unit roots is checked by conditional least squares estimation method.

If any unit root is found, then the current model is stabilized by removing or adding the number of AR or MA terms. The estimates for the parameters of all three models are calculated using the conditional least squares estimation method.

Tables 4, 5 and 6 display the parameter estimates for monthly, bimonthly and quarterly series respectively. It is worthwhile to note that in SAS, the estimate for moving average is always taken with opposite sign as displayed in table. Hence, after analysis of time series and performing the conditional least squares estimation methods, some possible models for the three series has been found. The possible candidate models for monthly, bimonthly and quarterly time series are depicted below in Tables 7, 8, and 9, respectively:

Table 4. Conditional least squares estimation for monthly time series.
Table 5. Conditional least squares estimation for bimonthly time series.
Table 6. Conditional least squares estimation for quarterly time series.
Table 7. Possible models for monthly series.
Table 8. Possible models for bimonthly series.
Table 9. Possible models for quarterly series.

The values of the relative quality measures AIC and SBC are also mentioned. The best model is selected by considering the lowest value of AIC and SBC for all the time series. Hence the final model selected for monthly series is ARIMA(2, 1, 3)

$$ \Delta y_{t} = \, 1588.1 + 1.72513\Delta y_{t - 1} - \Delta y_{t - 2} + \varepsilon_{t} - 1.99693\varepsilon_{t - 1} + 1.42805\varepsilon_{t - 2} - 0.32658\varepsilon_{t - 3} $$
(2)

The final model selected for bimonthly series is ARIMA(2, 1, 1)

$$ \Delta y_{t} = \, 3638.5 \, + \, 1.00597\Delta y_{t - 1} - 0.95325\Delta y_{t - 2} + \varepsilon_{t} - 0.68181\varepsilon_{t - 1} $$
(3)

The final model selected for quarterly series is ARIMA(2, 1, 1)

$$ \Delta y_{t} = 14328.6 \, - \, 0.99001\Delta y_{t - 1} - \Delta y_{t - 2} + \varepsilon_{t} + 0.17119\varepsilon_{t - 1} $$
(4)

4 Results and Discussion

The actual electricity consumption and that forecasted by ARIMA method are compared from April 2005 to February 2016. An ARIMA based forecasting of electricity consumption has been performed and applied to the practical power system of the health care institution. ARIMA based method is more reliable and better than the traditional forecasting methods.

The data has been analyzed for all the three series. Table 10 shows the calculated RMSE (Root Mean Square Error) and MPE (Mean Percentage Error) of the best ARIMA models selected for monthly, bimonthly and quarterly time series

Table 10. RMSE and MPE for monthly, bimonthly and quarterly time series.

As depicted in Table 10, the most suitable period for forecasting is the monthly time series model with the lowest value of RMSE and MPE. On the basis of these forecasting accuracy measures of RMSE and MPE, the monthly forecasting model will be used to perform the final prediction of the electricity consumption of the health care institution.

Figure 10 shows the comparison of the actual and forecasted observations for the time period of April 2005 to February 2016, while, Fig. 11 illustrates the prediction of electricity consumption for 2 years ahead. Hence, the resultant prediction of electricity consumption of the health care institution has been performed for the time period of March 2016 to February 2018.

Fig. 10.
figure 10

The actual and forecasted values of monthly time series data

Fig. 11.
figure 11

Prediction of electricity consumption for two years ahead

5 Conclusion

The ARIMA based forecasting of electricity consumption has been performed and applied to the practical power system of the health care institution. From the results it can be concluded that this methodology is very efficient, and is more accurate and reliable than any other single forecasting methods which bring out imprecise predictions. Electricity being a crucial requirement of society needs to be predicted in advance as it is difficult to store. This forecasting will benefit the health care institution in many ways. It will help the management to anticipate the increase in capacity of the power system to cater to the futuristic needs, in addition to allocating the budget for planning and updating of their power system.