Abstract
We develop a time series model to forecast weekly peak power demand for three main states of Australia for a yearly timescale, and show the crucial role of environmental factors in improving the forecasts. More precisely, we construct a seasonal autoregressive integrated moving average (SARIMA) model and reinforce it by employing the exogenous environmental variables including, maximum temperature, minimum temperature, and solar exposure. The estimated hybrid SARIMA-regression model exhibits an excellent mean absolute percentage error (MAPE) of \(3.41\%\). Moreover, our analysis demonstrates the importance of the environmental factors by showing a remarkable improvement of \(46.3\%\) in MAPE for the hybrid model over the crude SARIMA model which merely includes the power demand variables. In order to illustrate the efficacy of our model, we compare our outcome with the state-of-the-art machine learning methods in forecasting. The results reveal that our model outperforms the latter approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Electrical energy is a vital resource to drive industries [1]. Thus, energy demand forecasting is essential to the economic and socioeconomic aspects of modern society. Accurate forecasts ensure that utilities can meet energy demand and avoid undesirable events in the network such as blackouts and load shedding. While underestimation is undesirable, overestimation leads to wasted resources. In spite of recent advances in storage technologies, demand forecasting models are still critical in power planning [2].
In general, there are four main timescales (or, forecast horizons) for power demand modeling [3]:
-
(i)
Long-term load forecasting (LTLF) is used for expansion planning of the network;
-
(ii)
Medium-term load forecasting (MTLF) is used for operational planning;
-
(iii)
Short-term load forecasting (STLF) is used for day to day planning and dispatch cost minimization;
-
(iv)
Very short-term load forecasting (VSTLF) on the scale of seconds to minutes allows the network to respond to the flow of demand.
Australia is a vast and environmentally diverse continent with climate zones ranging from equatorial to temperate. It is thus important to understand how the dynamics of power demand varies across different regions.
In this paper, we develop a seasonal autoregressive integrated moving average (SARIMA) model to forecast peak weekly demand in the medium-term (i.e., MTLF). The demand data are from three main Australian states consisting of: New South Wales (NSW), Victoria (VIC), and South Australia (SA). To investigate the impact of the environmental factors on the power demand, we hybridize the SARIMA model with a linear regression model by employing the exogenous environmental variables including, maximum temperature, minimum temperature, and solar exposure. Our results reveal that the latter hybrid model improves the accuracy of forecasts by an average factor of \(46.3\%\) over the three states. Furthermore, to demonstrate the efficacy of the hybrid model, its outputs are compared with the state-of-the-art machine learning methods in forecasting. The results reveal that the former hybrid model outperforms the latter methods.
The structure of this paper is organized as follows: Section 2 provides a review of the literature and establishes the motivation for using a SARIMA-regression model. Section 3 discusses the data resources and aggregation, and visualizes the obtained time series. Section 4 explains the details of the statistical procedure to fit a SARIMA model to the weekly peak power demand data. In Sect. 5, we employ secondary environmental time series to construct a hybrid SARIMA-regression model. Section 6 discusses the quality of 52-week forecasts and compare the outcome with the state-of-the-art machine learning methods in forecasting. Finally, Sect. 7 presents a final discussion of our findings, and provides conclusions and directions for future research.
2 Literature Review and Motivation
Energy demand is an amalgamation of millions of individual demand requirements from consumers, varying with time, weather, population growth, electricity price and many other economic factors (e.g., see [4] and [5]). The time dependency of the demand along with its inherent seasonality to weather patterns across a yearly timescale would suggest time series methods to study the dynamics of the demand.
Box and Jenkins [6] introduced their celebrated SARIMA model for analyzing those non-stationary time series displaying seasonal effects in their behavior. Each SARIMA model is a linearly transformed time series constructed by differencing the original time series at proper lags. A hybrid SARIMA-regression approach could be effective, if the time covariance of the series is well captured by the SARIMA component and the remaining mean value of trends is captured by the exogenous independent variables (e.g., see [2, 7]). Although it has been more than 40 years since such model were developed, due to their simplicity and vast practicality, they continue to be widely used in theory and practice, particularly effectively in electricity demand forecasting.
Crude SARIMA as well as hybrid SARIMA-regression models have formed the basis of many power forecasting models with a focus on STLF to MTLF timescale (i.e., looking days to weeks ahead) in several countries, as Nigeria [8], Iraq [5], Malaysia [9], South Africa [2], and Thailand [10]. Focusing on a metric of peak demand ensures that demand can be met when the electricity network is under maximum duress. Ghalehkhondabi et al. [11] studied the peak monthly demand in Northern India by using two different time series methods including “SARIMA” and “exponential smoothing” models. The authors showed that the SARIMA model outperformed the exponential smoothing model on their data. In Australia, Amaral et al. [12] developed a smooth transition periodic autoregressive model for the New South Wales power demand, and As’ad [13] predicted the peak demand for New South Wales at a daily resolution. For a more comprehensive overview of such techniques in power demand modeling and forecasting, see [11].
In time series forecasting, global forecasting methods (GFM) that simultaneously learns from a collection of time series, are becoming a strong alternative to the state-of-the-art univariate statistical forecasting method such as SARIMA [14, 15]. In GFMs, a unified model is built using a set of related time series that enables the model to exploit key structures, behaviors, and patterns common within a group of time series. In fact, more recently, deep learning-based GFMs have shown promising results in forecasting competitions and real-world applications (e.g., see [14,15,16,17,18]).
While artificial neural networks (ANN) are increasing in popularity, Kandananond [10] compared ANN, multiple linear regression (MLR) and SARIMA models for electricity demand forecasting in Thailand. Although they did not find a statistically significant difference between the three methods, MLR and SARIMA were simpler to compute, and the coefficients were more easily interpreted.
In this paper, we develop a hybrid SARIMA-regression model to forecast the weekly peak power demand in Australia over an MTLF timescale, that is one year horizon (52 weeks). The main contribution of this work is to demonstrate the crucial role of novel environmental variables in the dynamics of the demand. The quality of forecasts are compared with the state-of-the-art machine learning techniques. The results show that our model not only outperforms the others, but also can more easily be computed and interpreted.
We conclude this section by noting that as electricity energy is still difficult to store, it is critical that the system can meet peak demand [4]. To the best of our knowledge, this work is the first attempt to investigate the impact of environmental factors on predicting the aggregated weekly peak demand in an MTLF timescale study.
3 Data: Resources, Aggregation and Visualizing
The power demand data for three major states of Australia, consisting of New South Wales (NSW), Victoria (VIC), and South Australia (SA), are obtained from the Australian Energy Market Operator [19]. They are measured in megawatts (MW). The secondary environmental time series data are acquired from the Australian Bureau of Meteorology [20]. We use the data from those weather stations in close proximity to the primary population center for each state. These major population centers are Sydney, Melbourne, and Adelaide for NSW, VIC, and SA, respectively. Table 1 lists the details of those weather stations.
While the power demand data are given at 15-minute intervals, the environmental data are recorded weekly. So the former are aggregated by finding the peak demand for each day and then aggregating on a weekly basis. This aggregated value will be referred to as the weekly peak demand (WPD). The weekly data from the first week of January 2011 to the last week of December 2016 (i.e., six years) are used as the training data for modeling and estimating the parameters. Following the MTLF timescale, the data from the first week of January 2017 to the last week of December 2017 (i.e., 52 weeks) are used as the test data to check the accuracy of forecasts generated by the model.
The three secondary environmental time series used in this work are “maximum temperature”, “minimum temperature”, and “solar exposure”, denoted by \(\mathtt {Min}_t\), \(\mathtt {Max}_t\) and \(\mathtt {Sol}_t\), respectively. Solar exposure is defined as the amount of solar energy falling on a flat one meter square surface, parallel to the ground and exposed to direct sunlight.
Figure 1 displays the time series of WPD from 2014 to 2016 (inclusive). Previous years show similar seasonal trends. Visual inspection of these graphs reveals that the seasonal trends may vary between the states.
Remark 1
All data analysis and graphing are conducted in R using the packages “astsa”Footnote 1, “forecast”Footnote 2, and “tseries”Footnote 3,.
4 Crude SARIMA Model: WPD Time Series
We start this section by introducing a formal definition of a SARIMA model.
Definition 1
Shumway and Stoffer [21] A time series \(\{ x_t;\, t=0,1,\ldots \}\) is \(\mathtt {SARIMA}(p,d,q)\times (P,D,Q)_S\), if
where \(\{ w_t;\, t=0,1,\ldots \}\) is a Gaussian white noise series, B is the backshift operator (i.e., \(B^k x_t = x_{t-k}\)), and
The autoregressive order p, moving average order q, seasonal autoregressive order P, seasonal moving average order Q, differencing orders d and D, seasonal lag S, autoregressive coefficients \(\phi _i\), moving average coefficients \(\theta _i\), seasonal autoregressive coefficients \(\Phi _i\), seasonal moving average coefficients \(\Theta _i\), and the intercept \(\delta\) are unknown parameters and should be estimated.
Box and Jenkins [6] showed that if a time series was non-stationary due to a trend in the mean, it could be detrended and converted to a stationary time series by differencing at appropriate lag(s). Perhaps, this is the main contribution of the SARIMA model in theory and practice.
Intuitively, “stationarity” means that the statistical properties of a time series do not vary over time. More precisely, a time series is stationary, if the mean function is constant (with respect to time), and the autocovariance function for two observations of the series depends only on the time difference, the so-called lag, between two observation points, not the actual times. A common statistical test to investigate such property for a given time series is the “Kwiatkowski-Phillips-Schmidt-Shin” (KPSS) test with the following hypotheses [22]:
After implementing the KPSS test on the aggregated WPD data for the three states NSW, VIC and SA, it is revealed that the p values of all of them are less than 0.01, implying that the null hypothesis is rejected at a significance level of \(1\%\). Thus, all three WPD time series are not stationary. However, we estimate an appropriate differencing orders d and D and the seasonality lag S for each time series to convert them to a stationary time series. The outcomes of the KPSS test on before and after differenced time series are provided in Table 2.
To assist in choosing the order parameters for the model, including p, q, P, and Q, the autocorrelation and partial autocorrelation plots are applied. They would come up with a few options for the orders. Ultimately, the best model (i.e., set of orders) is selected by finding the set achieving the minimum AICc (corrected Akaike information criterion) [23]. AICc-based model choice enables us to balance the model complexity with the model ability to extract information from the training data [24]. Furthermore, we restrict the maximum sum of orders (i.e., \(p + q + P + Q\)) to five to balance the model accuracy with complexity. As a final check, all coefficient p-values were assessed to be significant. The final fitted models and the estimated parameters along with their corresponding p-values are presented in Tables 3 and 4, respectively.
5 Hybrid SARIMA-Regression Model: Environmental Influence
In order to construct an appropriate hybrid SARIMA-regression model, we first need to realize the relationship between the primary time series WPD and the three environmental time series, including maximum temperature (\(\mathtt {Max}_t\)), minimum temperature (\(\mathtt {Min}_t\)), and solar exposure (\(\mathtt {Sol}_t\)). Figure 1 demonstrates that all three WPD time series possess a strong seasonal component, appearing to vary with the location. Analogously, Fig. 2 displays a similar temporal and spatial variation for the secondary environmental time series (to save space, only the NSW environmental time series are displayed). This observation implies that there could potentially be a significant relationship between the primary and secondary time series.
Since the inference theory for the hybrid SARIMA-regression models with stationary regressor variables is completely different form that with non-stationarity variables, we need to test the stationarity of the environmental time series data at the outset. Therefore, the KPSS test is implemented on them and the corresponding p-values are reported in Table 5. This table indicates that all three environmental time series over the three states are stationary at a significance level of \(1\%\). Indeed, this outcome is visually supported by Fig. 2.
To investigate possible relationships between these exogenous environmental time series and the primary WPD time series, scatter plots are utilized. Figure 3 displays the scatter plots for NSW. This figure suggests that while the maximum and minimum temperatures have a strong quadratic relationship with the WPD data, such relationship may not be as strong for the solar exposure.
These observations would suggest 27 combinations of the environmental variables (none, linear, and quadratic for each variable) for the “regression” component of the hybrid model. Once again, AICc is used to find the best combination, taking into account the secondary time series data.
The significance of each coefficient of the AICc chosen model was assessed and the final selected combinations are presented in Table 6. This table shows that, while NSW and VIC require the full group of regression variables, surprisingly, SA does not seem to obtain sufficient benefit from the solar exposure time series. The estimates of model parameters with their corresponding p-values are presented in Tables 7 and 8.
Model Validation.
The estimated models are checked for statistical validity by analyzing the residuals. Figure 4 shows the autocorrelation function (ACF) as well as QQ-plot for the residuals from the fitted hybrid SARIMA-regression model to the NSW WPD data. Clearly, the residuals have no autocorrelation at any lag, and the vast majority of the QQ-plot lies well within the 95% significance area (i.e., shaded gray). Similar results are observed for the other two states.
6 Medium-term Load Forecasting
The two crude SARIMA and hybrid SARIMA-regression models constructed in Sects. 4 and 5 are used to predict the WPD for all three states over 52 weeks in 2017. The results are displayed in Fig. 5. In this figure, the black, red, blue and green plots are actual demands, forecasts generated by the SARIMA model, forecasts generated by the SARIMA-regression model, and the \(99\%\) confidence boundary for WPD, respectively.
It is readily seen that the SARIMA-regression model performs significantly better than the SARIMA model. A more solid comparison can be carried out by finding the following two popular measures to assess the effectiveness of the forecasts.
Definition 2
Willmott and Matsuura [25] The mean absolute error (MAE) is defined as:
where \(f_t\), \(x_t\) and h are the forecast values, actual values, and prediction horizon, respectively. Analogously, the mean absolute percentage error (MAPE) is given by
Tables 9 and 10 display MAE and MAPE for the two estimated models and show the percentage improvement by employing the exogenous environmental time series into the model. The MAE and MAPE suggest an average \(46.6\%\) and \(46.3\%\) improvement in the accuracy of forecasts when the environmental factors are included in the model, respectively. These observations highly support the importance of environmental factors in forecasting Australian peak power demand.
Machine learning approach.
In order to compare the performance of our proposed models with other methods, we apply the state-of-the-art machine learning approach to forecast WPD. More precisely, we use recurrent neural networks (RNN)-based GFM proposed by [26]. Table 12 summarizes the optimal hyper-parameter values used in our experiments. According to [26], these optimal hyper-parameters are determined by a sequential model-based algorithm configuration (SMAC), a variant of Bayesian optimization proposed by [27]. Furthermore, this framework uses COntinuous COin Betting (COCOB) optimization algorithm proposed by [28] that does not require tuning of the network learning rate (See Table 11).
The MAE and MAPE of forecasts generated by this method are reported in Table 12. We observe that the hybrid SARIMA-regression model thoroughly outperforms the GFM benchmark.
Remark 2
Note that while the SARIMA-regression model outperforms the RNN method, the former is simpler to compute and the coefficients are more easily interpreted. In practical applications, easily compared model coefficients and specifications are highly desirable. It is also noteworthy to mention that an unrolled RNN in time resembles to a nonlinear approximation of ARMA models, which can be expressed as a NARMA(p,q) model. Here, p denotes the order of lags in the autoregressive model and q denotes the order of error terms in the moving average model. For more detailed comparisons between RNN and ARIMA models, we refer to [14]
7 Discussion and Conclusion
To the best of our knowledge, this work is the first attempt to investigate the crucial role of environmental factors in the dynamics of the Australian electricity power demand. More precisely, we developed a SARIMA-regression model for the weekly power demand in three major states of Australia, and empirically demonstrated the significant influence of environmental factors on predictions over a medium-term load forecasting timescale (i.e., 52 weeks). The results revealed that while the SARIMA-regression model generated, on average, an MAPE of \(3.41\%\) over all states, the environmental factors could improve the accuracy of forecasts by a factor of \(46.3\%\). Such an excellent MAPE is comparable with the other methods listed in Sect. 2. However, a direct comparison might not be fair (in favor of our model) due to the lack of other MTLF studies in the literature of Australian weekly peak power demand. This highlights the potential explanatory influence and impact environmental variables may have on power demand. Furthermore, we compared our model with the state-of-the-art machine learning methods in forecasting and demonstrate the superiority of the former model.
The weather regression variables used within this work are historical data and provided without forecasting. This was done to maximize the predictive value of the regressors to highlight their importance to predicting power demand. To move the model towards practical use future work could forecast the weather variables and use the predictions for the SARIMA regression. While this is expected to reduce the accuracy of the prediction, observation shows the weather variables are strongly seasonal and stationary and so should maintain the majority of their predictive power.
An alternative to using environmental data derived from a single weather station would be to take the data from several sites across each state with different characteristics, and then use a weighted average by population. This method may help decision makers to identify a trend in demand that could improve the modeling of WPD. A practical drawback of this method is that many weather stations do not report complete data. Hence, the regression system will have to adjust the missing values which may bring more errors into the model.
Our model provides a scaffold for future work in improving the accuracy and utility of forecasts. Incorporating additional environmental explanatory factors such as humidity and wind direction/strength could further improve the model and, consequently, the accuracy of forecasts.
References
Soliman A., & Al-Kandari A. (2010). Electrical load forecasting. Elsevier publishing.
Chikobvu, D., & Sigauke, C. (2012). Regression-SARIMA modelling of daily peak electricity demand in South Africa. Journal of Energy in South Africa, 23(3), 23–30.
Hernandez, L., Baladron, C., Aguiar, J. M., Carro, B., Sanchez-Esguevillas, A. J., Lloret, J., & Massana, J. (2014). A survey on electric power demand forecasting: Future trends in smart grids, microgrids and smart buildings. IEEE Communications Surveys and Tutorials, 16(3), 1460–1495.
Zhu, S., Wang, J., Zhao, W., & Wang, J. (2011). A seasonal hybrid procedure for electricity demand forecasting in China. Applied Energy, 88(11), 3807–3815.
Kareem, Y. H., & Majeed, A. R. (2006). Sulaimany governorate using SARIMA. Building, (April 2003):1–5.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: Forecasting and control. Wiley.
Abolghasemi, M., Hurley, J., Eshragh, A., & Fahimnia, B. (2020). Demand forecasting in the presence of systematic events: Cases in capturing sales promotions. International Journal of Production Economics, 230,.
Mati, A. A., Gajoga, B. G., Jimoh, B., Adegobye, A., & Dajab, D. D. (2009). Electricity demand forecasting in Nigeria using time series model. The Pacific Journal of Science and Technology, 10(2), 479–85.
Mohamed, N., Ahmad, M. H., & Ismail, Z. (2010). Double seasonal ARIMA model for forecasting load demand. Matematika, 26, 217–31.
Kandananond, K. (2011). Forecasting electricity demand in Thailand with an artificial neural network approach. Energies, 4, 1246–1257.
Ghalehkhondabi, I., Ardjmand, E., Weckman, G.R., & Young, W.A. (2017). An overview of energy demand forecasting methods published in 2005–2015, Energy Systems.
Amaral, L. F., Souza, R. C., & Stevenson, M. (2008). A smooth transition periodic autoregressive (STPAR) model for short-term load forecasting. International Journal of Forecasting, 24(4), 603–615.
As’ad, M. (2012). Finding the best ARIMA model to forecast daily peak electricity demand. Proceedings of the Fifth Annual ASEARC Conerence. University of Wollongong.
Bandara, K., Bergmeir, C., & Smyl, S. (2020). Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach. Expert Systems with Applications, 140, 112896.
Smyl, S. (2020). A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting, 36(1), 75–85.
Bandara, K., Bergmeir, C., & Hewamalage, H. (2020). LSTM-MSNet: Leveraging forecasts on sets of related time series with multiple seasonal patterns. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2020.2985720
Bandara, K., Bergmeir, C., Campbell, S., Scott, D., & Lubman, D. (2020). Towards accurate predictions and causal ‘What-if’ analyses for planning and policy-making: A case study in emergency medical services demand, presented at the International Joint Conference on Neural Networks (presented), Glasgow.
Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2019). DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting.
Australian Energy Market Operator. www.aemo.com.au/Electricity/National-Electricity-Market-NEM/Data-dashboard#aggregated-data. Accessed on 27 Nov 2021.
Australian Bureau of Meteorology. www.bom.gov.au/climate/data/index.shtml. Accessed on 27 Nov 2021.
Shumway, R. H., & Stoffer, D. S. (2011). Time series analysis and its applications with R examples. New York: Springer.
Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54(1–3), 159–178.
Hurvich, C. M., & Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297.
Boroojeni, K. G., Amini, M. H., Bahrami, S., Iyengar, S. S., Sarwat, A. I., & Karabasoglu, O. (2017). A novel multi-time-scale modeling for electric power demand forecasting: From short-term to medium-term horizon. Electric Power Systems Research, 142, 58–73.
Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82.
Hewamalage, H., Bergmeir, C., & Bandara, K. (2020). Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting.
Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization (pp. 507–523). Rome, Italy. https://doi.org/10.1007/978-3-642-25566-3_40
Orabona, F., & Tommasi, T. (2017). Training deep networks without learning rates through coin betting. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 2157–2167). Long Beach, California, USA.
Acknowledgements
The authors thank the Advisory Editor and two anonymous reviewers for their invaluable comments that helped improve the previous version of this paper.
Funding
The authors declare no funding was used in support of this research.
Author information
Authors and Affiliations
Contributions
All four authors have had significant contributions in preparing this paper, including the design of the work, the acquisition, analysis, and interpretation of the data, drafting and revising the paper.
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Eshragh, A., Ganim, B., Perkins, T. et al. The Importance of Environmental Factors in Forecasting Australian Power Demand. Environ Model Assess 27, 1–11 (2022). https://doi.org/10.1007/s10666-021-09806-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10666-021-09806-1