1 Introduction

Electrical energy is a vital resource to drive industries [1]. Thus, energy demand forecasting is essential to the economic and socioeconomic aspects of modern society. Accurate forecasts ensure that utilities can meet energy demand and avoid undesirable events in the network such as blackouts and load shedding. While underestimation is undesirable, overestimation leads to wasted resources. In spite of recent advances in storage technologies, demand forecasting models are still critical in power planning [2].

In general, there are four main timescales (or, forecast horizons) for power demand modeling [3]:

  1. (i)

    Long-term load forecasting (LTLF) is used for expansion planning of the network;

  2. (ii)

    Medium-term load forecasting (MTLF) is used for operational planning;

  3. (iii)

    Short-term load forecasting (STLF) is used for day to day planning and dispatch cost minimization;

  4. (iv)

    Very short-term load forecasting (VSTLF) on the scale of seconds to minutes allows the network to respond to the flow of demand.

Australia is a vast and environmentally diverse continent with climate zones ranging from equatorial to temperate. It is thus important to understand how the dynamics of power demand varies across different regions.

In this paper, we develop a seasonal autoregressive integrated moving average (SARIMA) model to forecast peak weekly demand in the medium-term (i.e., MTLF). The demand data are from three main Australian states consisting of: New South Wales (NSW), Victoria (VIC), and South Australia (SA). To investigate the impact of the environmental factors on the power demand, we hybridize the SARIMA model with a linear regression model by employing the exogenous environmental variables including, maximum temperature, minimum temperature, and solar exposure. Our results reveal that the latter hybrid model improves the accuracy of forecasts by an average factor of \(46.3\%\) over the three states. Furthermore, to demonstrate the efficacy of the hybrid model, its outputs are compared with the state-of-the-art machine learning methods in forecasting. The results reveal that the former hybrid model outperforms the latter methods.

The structure of this paper is organized as follows: Section 2 provides a review of the literature and establishes the motivation for using a SARIMA-regression model. Section 3 discusses the data resources and aggregation, and visualizes the obtained time series. Section 4 explains the details of the statistical procedure to fit a SARIMA model to the weekly peak power demand data. In Sect. 5, we employ secondary environmental time series to construct a hybrid SARIMA-regression model. Section 6 discusses the quality of 52-week forecasts and compare the outcome with the state-of-the-art machine learning methods in forecasting. Finally, Sect. 7 presents a final discussion of our findings, and provides conclusions and directions for future research.

2 Literature Review and Motivation

Energy demand is an amalgamation of millions of individual demand requirements from consumers, varying with time, weather, population growth, electricity price and many other economic factors (e.g., see [4] and [5]). The time dependency of the demand along with its inherent seasonality to weather patterns across a yearly timescale would suggest time series methods to study the dynamics of the demand.

Box and Jenkins [6] introduced their celebrated SARIMA model for analyzing those non-stationary time series displaying seasonal effects in their behavior. Each SARIMA model is a linearly transformed time series constructed by differencing the original time series at proper lags. A hybrid SARIMA-regression approach could be effective, if the time covariance of the series is well captured by the SARIMA component and the remaining mean value of trends is captured by the exogenous independent variables (e.g., see [2, 7]). Although it has been more than 40 years since such model were developed, due to their simplicity and vast practicality, they continue to be widely used in theory and practice, particularly effectively in electricity demand forecasting.

Crude SARIMA as well as hybrid SARIMA-regression models have formed the basis of many power forecasting models with a focus on STLF to MTLF timescale (i.e., looking days to weeks ahead) in several countries, as Nigeria [8], Iraq [5], Malaysia [9], South Africa [2], and Thailand [10]. Focusing on a metric of peak demand ensures that demand can be met when the electricity network is under maximum duress. Ghalehkhondabi et al. [11] studied the peak monthly demand in Northern India by using two different time series methods including “SARIMA” and “exponential smoothing” models. The authors showed that the SARIMA model outperformed the exponential smoothing model on their data. In Australia, Amaral et al. [12] developed a smooth transition periodic autoregressive model for the New South Wales power demand, and As’ad [13] predicted the peak demand for New South Wales at a daily resolution. For a more comprehensive overview of such techniques in power demand modeling and forecasting, see [11].

In time series forecasting, global forecasting methods (GFM) that simultaneously learns from a collection of time series, are becoming a strong alternative to the state-of-the-art univariate statistical forecasting method such as SARIMA [14, 15]. In GFMs, a unified model is built using a set of related time series that enables the model to exploit key structures, behaviors, and patterns common within a group of time series. In fact, more recently, deep learning-based GFMs have shown promising results in forecasting competitions and real-world applications (e.g., see [14,15,16,17,18]).

While artificial neural networks (ANN) are increasing in popularity, Kandananond [10] compared ANN, multiple linear regression (MLR) and SARIMA models for electricity demand forecasting in Thailand. Although they did not find a statistically significant difference between the three methods, MLR and SARIMA were simpler to compute, and the coefficients were more easily interpreted.

In this paper, we develop a hybrid SARIMA-regression model to forecast the weekly peak power demand in Australia over an MTLF timescale, that is one year horizon (52 weeks). The main contribution of this work is to demonstrate the crucial role of novel environmental variables in the dynamics of the demand. The quality of forecasts are compared with the state-of-the-art machine learning techniques. The results show that our model not only outperforms the others, but also can more easily be computed and interpreted.

We conclude this section by noting that as electricity energy is still difficult to store, it is critical that the system can meet peak demand [4]. To the best of our knowledge, this work is the first attempt to investigate the impact of environmental factors on predicting the aggregated weekly peak demand in an MTLF timescale study.

3 Data: Resources, Aggregation and Visualizing

The power demand data for three major states of Australia, consisting of New South Wales (NSW), Victoria (VIC), and South Australia (SA), are obtained from the Australian Energy Market Operator [19]. They are measured in megawatts (MW). The secondary environmental time series data are acquired from the Australian Bureau of Meteorology [20]. We use the data from those weather stations in close proximity to the primary population center for each state. These major population centers are Sydney, Melbourne, and Adelaide for NSW, VIC, and SA, respectively. Table 1 lists the details of those weather stations.

Table 1 Australian Bureau of Meteorology weather stations

While the power demand data are given at 15-minute intervals, the environmental data are recorded weekly. So the former are aggregated by finding the peak demand for each day and then aggregating on a weekly basis. This aggregated value will be referred to as the weekly peak demand (WPD). The weekly data from the first week of January 2011 to the last week of December 2016 (i.e., six years) are used as the training data for modeling and estimating the parameters. Following the MTLF timescale, the data from the first week of January 2017 to the last week of December 2017 (i.e., 52 weeks) are used as the test data to check the accuracy of forecasts generated by the model.

The three secondary environmental time series used in this work are “maximum temperature”, “minimum temperature”, and “solar exposure”, denoted by \(\mathtt {Min}_t\), \(\mathtt {Max}_t\) and \(\mathtt {Sol}_t\), respectively. Solar exposure is defined as the amount of solar energy falling on a flat one meter square surface, parallel to the ground and exposed to direct sunlight.

Figure 1 displays the time series of WPD from 2014 to 2016 (inclusive). Previous years show similar seasonal trends. Visual inspection of these graphs reveals that the seasonal trends may vary between the states.

Fig. 1
figure 1

Time series of the aggregated WPD for NSW, VIC, and SA over all of the training data. For brevity and clarity other graphs in this report will only show the last three years of training data

Remark 1

All data analysis and graphing are conducted in R using the packages “astsa”Footnote 1, “forecast”Footnote 2, and “tseries”Footnote 3,.

4 Crude SARIMA Model: WPD Time Series

We start this section by introducing a formal definition of a SARIMA model.

Definition 1

Shumway and Stoffer [21] A time series \(\{ x_t;\, t=0,1,\ldots \}\) is \(\mathtt {SARIMA}(p,d,q)\times (P,D,Q)_S\), if

$$\begin{aligned} \Phi _P\left( B^S\right) \phi (B)\nabla ^D_S\nabla ^dx_t&= \delta + \Theta _Q\left( B^S\right) \theta (B)w_t, \end{aligned}$$

where \(\{ w_t;\, t=0,1,\ldots \}\) is a Gaussian white noise series, B is the backshift operator (i.e., \(B^k x_t = x_{t-k}\)), and

$$\begin{aligned} \phi (B)&= 1-\phi _1 B - \phi _2B^2 -\dots - \phi _pB^p, \\ \Phi _P\left( B^S\right)&= 1-\Phi _1 B^S - \Phi _2B^{2S} -\dots - \Phi _PB^{PS}, \\ \theta (B)&= 1+\theta _1B+\theta _2B^2+\dots +\theta _qB^q, \\ \Theta _Q\left( B^S\right)&= 1+\Theta _1B^S+\Theta _2B^{2S}+\dots +\Theta _QB^{QS}, \\ \nabla ^d&= (1-B)^d, \\ \nabla ^D_S&= (1-B^{S})^D. \end{aligned}$$

The autoregressive order p, moving average order q, seasonal autoregressive order P, seasonal moving average order Q, differencing orders d and D, seasonal lag S, autoregressive coefficients \(\phi _i\), moving average coefficients \(\theta _i\), seasonal autoregressive coefficients \(\Phi _i\), seasonal moving average coefficients \(\Theta _i\), and the intercept \(\delta\) are unknown parameters and should be estimated.

Box and Jenkins [6] showed that if a time series was non-stationary due to a trend in the mean, it could be detrended and converted to a stationary time series by differencing at appropriate lag(s). Perhaps, this is the main contribution of the SARIMA model in theory and practice.

Intuitively, “stationarity” means that the statistical properties of a time series do not vary over time. More precisely, a time series is stationary, if the mean function is constant (with respect to time), and the autocovariance function for two observations of the series depends only on the time difference, the so-called lag, between two observation points, not the actual times. A common statistical test to investigate such property for a given time series is the “Kwiatkowski-Phillips-Schmidt-Shin” (KPSS) test with the following hypotheses [22]:

$$\begin{aligned} {\left\{ \begin{array}{ll} H_0 : \text{ The } \text{ time } \text{ series } \text{ is } \text{ stationary. } \\ H_A : \text{ The } \text{ time } \text{ series } \text{ is } \text{ not } \text{ stationary. } \end{array}\right. } \end{aligned}$$

After implementing the KPSS test on the aggregated WPD data for the three states NSW, VIC and SA, it is revealed that the p values of all of them are less than 0.01, implying that the null hypothesis is rejected at a significance level of \(1\%\). Thus, all three WPD time series are not stationary. However, we estimate an appropriate differencing orders d and D and the seasonality lag S for each time series to convert them to a stationary time series. The outcomes of the KPSS test on before and after differenced time series are provided in Table 2.

Table 2 The KPSS test p-values for time series before and after differencing along with the estimated values of d, D, and S

To assist in choosing the order parameters for the model, including p, q, P, and Q, the autocorrelation and partial autocorrelation plots are applied. They would come up with a few options for the orders. Ultimately, the best model (i.e., set of orders) is selected by finding the set achieving the minimum AICc (corrected Akaike information criterion) [23]. AICc-based model choice enables us to balance the model complexity with the model ability to extract information from the training data [24]. Furthermore, we restrict the maximum sum of orders (i.e., \(p + q + P + Q\)) to five to balance the model accuracy with complexity. As a final check, all coefficient p-values were assessed to be significant. The final fitted models and the estimated parameters along with their corresponding p-values are presented in Tables 3 and 4, respectively.

Table 3 Estimated the SARIMA model orders
Table 4 The estimates of SARIMA parameters for the crude model with their p-values in brackets underneath

5 Hybrid SARIMA-Regression Model: Environmental Influence

In order to construct an appropriate hybrid SARIMA-regression model, we first need to realize the relationship between the primary time series WPD and the three environmental time series, including maximum temperature (\(\mathtt {Max}_t\)), minimum temperature (\(\mathtt {Min}_t\)), and solar exposure (\(\mathtt {Sol}_t\)). Figure 1 demonstrates that all three WPD time series possess a strong seasonal component, appearing to vary with the location. Analogously, Fig. 2 displays a similar temporal and spatial variation for the secondary environmental time series (to save space, only the NSW environmental time series are displayed). This observation implies that there could potentially be a significant relationship between the primary and secondary time series.

Fig. 2
figure 2

Maximum temperature, minimum temperature and solar exposure time series for NSW from 2014 to 2016 (inclusive)

Since the inference theory for the hybrid SARIMA-regression models with stationary regressor variables is completely different form that with non-stationarity variables, we need to test the stationarity of the environmental time series data at the outset. Therefore, the KPSS test is implemented on them and the corresponding p-values are reported in Table 5. This table indicates that all three environmental time series over the three states are stationary at a significance level of \(1\%\). Indeed, this outcome is visually supported by Fig. 2.

Table 5 The KPSS test p-values for the environmental time series data

To investigate possible relationships between these exogenous environmental time series and the primary WPD time series, scatter plots are utilized. Figure 3 displays the scatter plots for NSW. This figure suggests that while the maximum and minimum temperatures have a strong quadratic relationship with the WPD data, such relationship may not be as strong for the solar exposure.

Fig. 3
figure 3

Scatter plot for the NSW data showing the presence of quadratic trends between WPD and the environmental variables

These observations would suggest 27 combinations of the environmental variables (none, linear, and quadratic for each variable) for the “regression” component of the hybrid model. Once again, AICc is used to find the best combination, taking into account the secondary time series data.

The significance of each coefficient of the AICc chosen model was assessed and the final selected combinations are presented in Table 6. This table shows that, while NSW and VIC require the full group of regression variables, surprisingly, SA does not seem to obtain sufficient benefit from the solar exposure time series. The estimates of model parameters with their corresponding p-values are presented in Tables 7 and 8.

Table 6 Selected combination of environmental variables based on the minimum value of AICc for each state
Table 7 The estimates of SARIMA parameters for the hybrid model with their p-values in brackets underneath
Table 8 The estimates of regression parameters for the hybrid model with their p-values in brackets underneath (coefficients are rounded to one decimal places for brevity)

Model Validation.

The estimated models are checked for statistical validity by analyzing the residuals. Figure 4 shows the autocorrelation function (ACF) as well as QQ-plot for the residuals from the fitted hybrid SARIMA-regression model to the NSW WPD data. Clearly, the residuals have no autocorrelation at any lag, and the vast majority of the QQ-plot lies well within the 95% significance area (i.e., shaded gray). Similar results are observed for the other two states.

Fig. 4
figure 4

ACF and QQ plots for the residuals from the fitted hybrid SARIMA-regression model for NSW

6 Medium-term Load Forecasting

The two crude SARIMA and hybrid SARIMA-regression models constructed in Sects. 4 and 5 are used to predict the WPD for all three states over 52 weeks in 2017. The results are displayed in Fig. 5. In this figure, the black, red, blue and green plots are actual demands, forecasts generated by the SARIMA model, forecasts generated by the SARIMA-regression model, and the \(99\%\) confidence boundary for WPD, respectively.

Fig. 5
figure 5

Comparison of the forecasts for the SARIMA and SARIMA-regression models to the actual WPD data for 2017

It is readily seen that the SARIMA-regression model performs significantly better than the SARIMA model. A more solid comparison can be carried out by finding the following two popular measures to assess the effectiveness of the forecasts.

Definition 2

Willmott and Matsuura [25] The mean absolute error (MAE) is defined as:

$$\begin{aligned} \text {MAE}&= \frac{\sum _{t=1}^{h}\mid f_t - x_t\mid }{h}, \end{aligned}$$

where \(f_t\), \(x_t\) and h are the forecast values, actual values, and prediction horizon, respectively. Analogously, the mean absolute percentage error (MAPE) is given by

$$\begin{aligned} \text {MAPE}&= \frac{\sum _{t=1}^{h}\left|\frac{ f_t-x_t }{x_t}\right|}{h}\times 100\%. \end{aligned}$$

Tables 9 and 10 display MAE and MAPE for the two estimated models and show the percentage improvement by employing the exogenous environmental time series into the model. The MAE and MAPE suggest an average \(46.6\%\) and \(46.3\%\) improvement in the accuracy of forecasts when the environmental factors are included in the model, respectively. These observations highly support the importance of environmental factors in forecasting Australian peak power demand.

Table 9 Comparison of MAE for the SARIMA and SARIMA-regression models
Table 10 Comparison of MAPE for the SARIMA and SARIMA-regression models

Machine learning approach.

In order to compare the performance of our proposed models with other methods, we apply the state-of-the-art machine learning approach to forecast WPD. More precisely, we use recurrent neural networks (RNN)-based GFM proposed by [26]. Table 12 summarizes the optimal hyper-parameter values used in our experiments. According to [26], these optimal hyper-parameters are determined by a sequential model-based algorithm configuration (SMAC), a variant of Bayesian optimization proposed by [27]. Furthermore, this framework uses COntinuous COin Betting (COCOB) optimization algorithm proposed by [28] that does not require tuning of the network learning rate (See Table 11).

Table 11 The hyper-parameter values used to train the GFM-based RNN

The MAE and MAPE of forecasts generated by this method are reported in Table 12. We observe that the hybrid SARIMA-regression model thoroughly outperforms the GFM benchmark.

Table 12 The MAE and MAPE for the RNN-based GFM

Remark 2

Note that while the SARIMA-regression model outperforms the RNN method, the former is simpler to compute and the coefficients are more easily interpreted. In practical applications, easily compared model coefficients and specifications are highly desirable. It is also noteworthy to mention that an unrolled RNN in time resembles to a nonlinear approximation of ARMA models, which can be expressed as a NARMA(p,q) model. Here, p denotes the order of lags in the autoregressive model and q denotes the order of error terms in the moving average model. For more detailed comparisons between RNN and ARIMA models, we refer to [14]

7 Discussion and Conclusion

To the best of our knowledge, this work is the first attempt to investigate the crucial role of environmental factors in the dynamics of the Australian electricity power demand. More precisely, we developed a SARIMA-regression model for the weekly power demand in three major states of Australia, and empirically demonstrated the significant influence of environmental factors on predictions over a medium-term load forecasting timescale (i.e., 52 weeks). The results revealed that while the SARIMA-regression model generated, on average, an MAPE of \(3.41\%\) over all states, the environmental factors could improve the accuracy of forecasts by a factor of \(46.3\%\). Such an excellent MAPE is comparable with the other methods listed in Sect. 2. However, a direct comparison might not be fair (in favor of our model) due to the lack of other MTLF studies in the literature of Australian weekly peak power demand. This highlights the potential explanatory influence and impact environmental variables may have on power demand. Furthermore, we compared our model with the state-of-the-art machine learning methods in forecasting and demonstrate the superiority of the former model.

The weather regression variables used within this work are historical data and provided without forecasting. This was done to maximize the predictive value of the regressors to highlight their importance to predicting power demand. To move the model towards practical use future work could forecast the weather variables and use the predictions for the SARIMA regression. While this is expected to reduce the accuracy of the prediction, observation shows the weather variables are strongly seasonal and stationary and so should maintain the majority of their predictive power.

An alternative to using environmental data derived from a single weather station would be to take the data from several sites across each state with different characteristics, and then use a weighted average by population. This method may help decision makers to identify a trend in demand that could improve the modeling of WPD. A practical drawback of this method is that many weather stations do not report complete data. Hence, the regression system will have to adjust the missing values which may bring more errors into the model.

Our model provides a scaffold for future work in improving the accuracy and utility of forecasts. Incorporating additional environmental explanatory factors such as humidity and wind direction/strength could further improve the model and, consequently, the accuracy of forecasts.