Continually Learning Out-of-Distribution Spatiotemporal Data for Robust Energy Forecasting

Prabowo, Arian; Chen, Kaixuan; Xue, Hao; Sethuvenkatraman, Subbu; Salim, Flora D.

doi:10.1007/978-3-031-43430-3_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14175))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

922 Accesses
1 Citations

The original version of this chapter was revised: the unintentional errors in the table 3 and a small punctuation error in the section 6.3. have been corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-031-43430-3_34

Abstract

Forecasting building energy usage is essential for promoting sustainability and reducing waste, as it enables building managers to adjust energy use to improve energy efficiency and reduce costs. This importance is magnified during anomalous periods, such as the COVID-19 pandemic, which have disrupted occupancy patterns and made accurate forecasting more challenging. Forecasting energy usage during anomalous periods is difficult due to changes in occupancy patterns and energy usage behavior. One of the primary reasons for this is the shift in distribution of occupancy patterns, with many people working or learning from home. This has created a need for new forecasting methods that can adapt to changing occupancy patterns. Online learning has emerged as a promising solution to this challenge, as it enables building managers to adapt to changes in occupancy patterns and adjust energy usage accordingly. With online learning, models can be updated incrementally with each new data point, allowing them to learn and adapt in real-time. Continual learning methods offer a powerful solution to address the challenge of catastrophic forgetting in online learning, allowing energy forecasting models to retain valuable insights while accommodating new data and improving generalization in out-of-distribution scenarios. Another solution is to use human mobility data as a proxy for occupancy, leveraging the prevalence of mobile devices to track movement patterns and infer occupancy levels. Human mobility data can be useful in this context as it provides a way to monitor occupancy patterns without relying on traditional sensors or manual data collection methods. We have conducted extensive experiments using data from four buildings to test the efficacy of these approaches. However, deploying these methods in the real world presents several challenges.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Building Energy Information: Demand and Consumption Prediction with Machine Learning Models for Sustainable and Smart Cities

Building Energy Consumption Prediction Model Using Machine Learning

A Scalable Long-Horizon Forecasting of Building Electricity Consumption

Keywords

1 Introduction

Accurate prediction of the electricity demand of buildings is vital for effective and cost-efficient energy management in commercial buildings. It also plays a significant role in maintaining a balance between electricity supply and demand in modern power grids. However, forecasting energy usage during anomalous periods, such as the COVID-19 pandemic, can be challenging due to changes in occupancy patterns and energy usage behavior. One of the primary reasons for this is the shift in distribution of occupancy patterns, with many people working or learning from home, leading to increased residential occupancy and decreased occupancy in offices, schools, and most retail establishments. Essential retail stores, such as grocery stores and restaurants, might experience a divergence between occupancy and energy usage, as they have fewer dine-in customers but still require energy for food preparation and sales. This has created a need for new forecasting methods that can adapt to changing occupancy patterns.

Online learning has emerged as a promising solution to this challenge, as it enables building managers to adapt to changes in occupancy patterns and adjust energy usage accordingly. With Online learning, models can be updated incrementally with each new data point, allowing them to learn and adapt in real-time [13].

Furthermore, continual learning methods offer an even more powerful solution by addressing the issue of catastrophic forgetting [6, 17]. These methods allow models to retain previously learned information while accommodating new data, preventing the loss of valuable insights and improving generalization in out-of-distribution scenarios. By combining online learning with continual learning techniques, energy forecasting models can achieve robustness, adaptability, and accuracy, making them well-suited for handling the challenges posed by spatiotemporal data with evolving distributions.

Another solution is to use human mobility data as a proxy for occupancy, leveraging the prevalence of mobile devices to track movement patterns and infer occupancy levels. Human mobility data can be useful in this context as it provides a way to monitor occupancy patterns without relying on traditional sensors or manual data collection methods [28].

In this study, we evaluate the effectiveness of mobility data and continual learning for forecasting building energy usage during anomalous periods. We utilized real-world data from Melbourne, Australia, a city that experienced one of the strictest lockdowns globally [4], making it an ideal case for studying energy usage patterns during out-of-distribution periods. We conducted experiments using data from four building complexes to empirically assess the performance of these methods.

2 Related Works

2.1 Energy Prediction in Urban Environments

Electricity demand profiling and forecasting has been a task of importance for many decades. Nevertheless, there exist a limited number of work in literature that investigate how human mobility patterns are directly related to the urban scale energy consumption, both during normal periods as well as adverse/extreme events. Energy modelling in literature is done at different granularities, occupant-level (personal energy footprinting), building-level and city-level. Models used for energy consumption prediction in urban environments are known as Urban Building Energy Models (UBEM). While top-down UBEMs are used for predicting aggregated energy consumption in urban areas using macro-economic variables and other aggregated statistical data, bottom-up UBEMs are more suited for building-level modelling of energy by clustering buildings into groups of similar characteristics [2]. Some examples in this respect are SUNtool, CitySim, UMI, CityBES, TEASER and HUES. Software modelling (simulation-based) is also a heavily used approach for building-wise energy prediction (Eg: EnergyPlus [7]). Due to fine-grain end-user level modelling, bottom-up UBEMs can incorporate inputs of occupant schedules. There also exist occupant-wise personal energy footprinting systems. However, for such occupant-wise energy footprinting, it requires infrastructure related to monitoring systems and sensors for indoor occupant behaviours, which are not always available. Also, due to privacy issues, to perform modelling at end-user level granularity, it can be hard to get access to publicly available data at finer temporal resolutions (both occupancy and energy) [33]. Building-wise energy models also have the same problems. Simulation-based models have complexity issues when scaling to the city level, because they have to build one model per each building. Moreover, simulation-based models contain assumptions about the data which make their outputs less accurate [1]. Consequently, it remains mostly an open research area how to conduct energy forecasting with data distribution shifts.

2.2 Mobility Data as Auxiliary Information in Forecasting

The study of human mobility patterns involves analysing the behaviours and movements of occupants in a particular area in a spatio-temporal context [28]. The amount of information that mobility data encompasses can be huge. The behaviour patterns of humans drive the decision making in many use-cases. Mobility data in particular, can act as a proxy for the dynamic (time varying) human occupancy at various spatial densities (building-wise, city-wise etc.). Thus such data are leveraged extensively for many tasks in urban environments including predicting water demand [31], urban flow forecasting [34], predicting patterns in hospital patient rooms [8], electricity use [12] etc. that depend on human activities.

Especially, during the COVID19 pandemic, mobility data has been quite useful for disease propagation modelling. For example, in the work by [32], those authors have developed a Graph Neural Network (GNN) based deep learning architecture to forecast the daily new COVID19 cases state-wise in United States. The GNN is developed such that each node represents one region and each edge represents the interaction between the two regions in terms of mobility flow. The daily new case counts, death counts and intra-region mobility flow is used as the features of each node whereas the inter-region mobility flow and flow of active cases is used as the edge features. Comparisons against other classical models which do not use mobility data has demonstrated the competitiveness of the developed model.

Nevertheless, as [28] state, the existing studies involving human mobility data lack diversity in the datasets in terms of their social demographics, building types, locations etc. Due to the heterogeneity, sparsity and difficulty in obtaining diverse mobility data, it remains a significant research challenge to incorporate them in modelling techniques [2]. Yet, the lack of extracting valuable information from such real-world data sources remains untapped, with a huge potential of building smarter automated decision making systems for urban planning [28].

2.3 Deep Learning for Forecasting

Deep learning has gained significant popularity in the field of forecasting, with various studies demonstrating its effectiveness in different domains [11]. For instance, it has been widely applied in mobility data forecasting, including road traffic forecasting [24,25,26], and flight delay forecasting [30]. In the realm of electricity forecasting, Long Short-Term Memory (LSTM) networks have been widely utilized [21]. Another popular deep learning model for electricity load forecasting is Neural basis expansion analysis for interpretable time series forecasting (N-BEATS) [20].

However, one common challenge faced by these deep learning methods is the performance degradation when the data distributions change rapidly, especially during out-of-distribution (OOD) periods. Online learning methods have been proposed to address this issue [14, 16, 18]. However, online learning methods can suffer from catastrophic forgetting, where newly acquired knowledge erases previously learned information [28]. To mitigate this, continual learning methods have been developed, which aim to retain previously learned information while accommodating new data, thereby improving generalization in OOD scenarios.

One approach to continual learning is Experience Replay [6, 17], a technique that re-exposes the model to past experiences to improve learning efficiency and reduce the effects of catastrophic forgetting. Building upon this idea, the Dark Experience Replay++ algorithm [5] utilizes a memory buffer to store past experiences and a deep neural network to learn from them, employing a dual-memory architecture that allows for the storage of both short-term and long-term memories separately. Another approach is the Fast and Slow Network (FSNet) [22], which incorporates a future adaptor and an associative memory module. The future adaptor facilitates quick adaptation to changes in the data distribution, while the associative memory module retains past patterns to prevent catastrophic forgetting. These continual learning methods have shown promise in mitigating catastrophic forgetting and improving generalization in OOD scenarios.

In the context of energy forecasting, the utilization of continual learning techniques holds great potential for addressing the challenges posed by OOD spatiotemporal data. By preserving past knowledge and adapting to new patterns, these methods enable more robust and accurate energy forecasting even during periods of rapid data distribution shifts.

3 Problem Definition

3.1 Time Series Forecasting

Consider a multivariate time series $\mathcal {X}\in \textbf{R}^{T\times N}$ comprising mobility data, weather data, and the target variable, which is the energy consumption data. The time series consists of T observations and N dimensions. To perform H-timestamps-ahead time series forecasting, a model f takes as input a look-back window of L historical observations $(\textbf{x}_{t-L+1},\textbf{x}_{t-L+2},...,\textbf{x}_{t})$ and generates forecasts for H future observations of the target variable y, which corresponds to the energy consumption of a building. We have:

$$\begin{aligned} f_{\boldsymbol{\omega }}(\textbf{x}_{t-L+1},\textbf{x}_{t-L+2},...,\textbf{x}_{t}) = (y_{t+1},y_{t+2},...,y_{t+H}), \end{aligned}$$

(1)

where $\omega $ denotes the parameters in the model.

3.2 Continual Learning for Time Series Forecasting

In a continual learning setting, the conventional machine learning practice of separating data into training and testing sets with a $70\%$ to $30\%$ ratio does not apply, as learning occurs continuously over the entire period. After an initial pre-training phase using a short period of training data, typically the first 3 months, the model continually trains on incoming data and generates predictions for future time windows. Evaluation of the model’s performance is commonly done by measuring its accumulated errors throughout the entire learning process [27].

4 Method

Continual learning presents unique challenges that necessitate the development of specialized algorithms and evaluation metrics to address the problem effectively. In this context, a continual learner must strike a balance between retaining previously acquired knowledge while facilitating the learning of new tasks. In time-series forecasting, the challenge lies in balancing the need to learn new temporal dependencies quickly while remembering past patterns, a phenomenon commonly referred to as the stability-plasticity dilemma [9]. Building on the concept of complementary learning systems theory for dual learning systems [15], a Temporal Convolutional Network (TCN) is utilized as the underlying architecture, which is pre-trained to extract temporal features from the training dataset. Subsequently, the convolutional layers of the TCN are customized with a future adaptor and an associative memory module to address the challenges associated with continual learning. The future adaptor facilitates quick adaptation to changes, while the associative memory module is responsible for retaining past patterns to prevent catastrophic forgetting. In this section we describe in detail the architecture of FSNet [22].

4.1 Backbone-Temporal Convolutional Network

FSNet adopts the TCN proposed by Bai et al. [3] as the backbone architecture for extracting features from time series data. Although traditional Convolutional Neural Networks (CNNs) have shown great success in image-processing tasks, their performance in time-series forecasting is often unsatisfactory. This is due to several reasons, including (a) the difficulty of capturing contextual relationships using CNNs, (b) the risk of information leakage caused by traditional convolutions that incorporate future temporal information, and (c) the loss of detail associated with pooling layers that extract contour features. In contrast, TCN’s superiority over CNNs can be attributed to its use of causal and dilated convolutions, which enhance its ability to capture temporal dependencies in a more effective manner.

Causal Convolutions. In contrast to traditional CNNs, which may incorporate future temporal information and violate causality, causal convolutions are effective in avoiding data leakage in the future. By only considering information up to and including the current time step, causal convolutions do not alter the order in which data is modelled and are therefore well-suited for temporal data. Specifically, to ensure that the output tensor has the same length as the input tensor, it is necessary to perform zero-padding. When zero-padding is performed only on the left side of the input tensor, causal convolution can be ensured. In Fig. 1(a), zero-padding is shown in light colours on the left side. There is no padding on the right side of the input sequence because the last element of the input sequence is the latest element on which the rightmost output element depends. Regarding the second-to-last output element, its kernel window is shifted one position to the left compared to the last output element. This implies that the second-to-last element’s latest dependency on the rightmost side of the input sequence is the second-to-last element. By induction, for each element in the output sequence, its latest dependency in the input sequence has the same index as the element itself.

Dilated Convolutions. Dilated convolution is an important component of TCN because causal convolution can only access the past inputs up to a certain depth, which is determined by the kernel size of the convolutional layer. In a deep network, the receptive field of the last layer may not be large enough to capture long-term dependencies in the input sequence. In dilated convolutions, the dilation factor is used to determine the spacing between the values in the kernel of the dilated convolution. More formally, we have:

$$\begin{aligned} Conv(\textbf{x})_{i} = \sum _{m=0}^{k}w_m\cdot \textbf{x}_{i-m\times d} \end{aligned}$$

(2)

where i represents the i-th element, w denotes the kernel, d is the dilation factor, k is the filter size. Dilation introduces a fixed step between adjacent filter taps. Specifically, if the dilation factor d is set to 1, the dilated convolution reduces to a regular convolution. However, for $d > 1$, the filters are expanded by d units, allowing the network to capture longer-term dependencies in the input sequence. A dilated causal convolution architecture can be seen in Fig. 1(a).

4.2 Fast Adaptation

FSNet modify the convolution layer in TCN to achieve fast adaptation and associative memory. The modified structure is illustrated in Fig. 1(b). In this subsection, we first introduce the fast adaptation module.

In order to enable rapid adaptation to changes in data streams and effective learning with limited data, Sahoo et al. [27] and Phuong and Lampert [23] propose the use of shallower networks and single layers that can quickly adapt to changes in data streams or learn more efficiently with limited data. Instead of limiting the depth of the network, it is more advantageous to enable each layer to adapt independently. In this research, we adopt an independent monitoring and modification approach for each layer to enhance the learning of the current loss. An adaptor is utilized to map the recent gradients of the layer to a smaller, more condensed set of transformation parameters to adapt the backbone. However, the gradient of a single sample can cause significant fluctuation and introduce noise into the adaptation coefficients in continual time-series forecasting. As a solution, we utilize Exponential Moving Average (EMA) gradient to mitigate the noise in online training and capture the temporal information in time series:

$$\begin{aligned} \hat{g_l} = \gamma \hat{g_l} + (1 - \gamma )\hat{g_l^t}, \end{aligned}$$

(3)

where $\hat{g_l^t}$ denotes the gradient of the l-th layer at time t, $\hat{g_l}$ denotes the EMA gradient, and $\gamma $ represents the momentum coefficient. For the sake of brevity, we shall exclude the superscript t in the subsequent sections of this manuscript. We take $\hat{g_l}$ as input and get the adaptation coefficient $\mu _l$:

$$\begin{aligned} \mu _l = \varOmega (\hat{g_l}; \phi _l), \end{aligned}$$

(4)

where $\varOmega (\cdot )$ is the chunking operation in [10] that partitions the gradient into uniformly-sized chunks. These segments are subsequently associated with the adaptation coefficients that are characterized by the trainable parameters $\phi _l$. Specifically, the adaptation coefficient $\mu _l$ is composed of two components: a weight adaptation coefficient $\alpha _l$ and a feature adaptation coefficient $\beta _l$. Then we conduct weight adaptation and feature adaptation. The weight adaptation parameter $\alpha _l$ performs an element-wise multiplication on the corresponding weight of the backbone network, as described in:

$$\begin{aligned} \tilde{\theta _l} = tile(\alpha _l) \odot \theta _l, \end{aligned}$$

(5)

where we represent the feature maps of all channels in a TCN as $\theta _l$, while the adapted weights are denoted by $\tilde{\theta _l}$. The weight adaptor is applied per-channel on all filters using the tile function, which repeats a vector along the new axes, as indicated by $tile(\alpha _l)$. Finally, the element-wise multiplication is represented by $\odot $. Likewise, we have:

$$\begin{aligned} \tilde{h_l} = tile(\beta _l) \odot h_l, \end{aligned}$$

(6)

where $h_l = \tilde{\theta _l} *\tilde{h}_{l - 1}$ is the output feature map.

4.3 Associative Memory

In order to prevent a model from forgetting old patterns during continual learning in the context of time series, it is crucial to preserve the appropriate adaptation coefficients $\mu $, which encapsulate adequate temporal patterns for forecasting. These coefficients reflect the model’s prior adaptation to a specific pattern, and thus, retaining and recalling the corresponding $\mu $ can facilitate learning when the pattern resurfaces in the future. Consequently, we incorporate an associative memory to store the adaptation coefficients of recurring events encountered during training. This associative memory is denoted as $M_l \in \textbf{R}^{N\times d}$, where d represents the dimensionality of $\mu _l$ and is set to a default value of 64.

Memory Interaction Triggering. To circumvent the computational burden and noise that arises from storing and querying coefficients at each time step, FSNet propose to activate this interaction only when there is a significant change in the representation. The overlap between the current and past representations can be evaluated by taking the dot product of their respective gradients. FSNet leverage an additional EMA gradient $\hat{g'}_l$, with a smaller coefficient $\gamma '$ compared to the original EMA gradient $\hat{g}_l$, and measure the cosine similarity between them to determine when to trigger the memory. We use a hyper-parameter $\tau $, which we set to 0.7, to ensure that the memory is only activated to recall significant pattern changes that are more likely to recur. The interaction is triggered when $cosine(\hat{g}_l, \hat{g'}_l) < - \tau $.

To guarantee that the present adaptation coefficients account for the entire event, which may extend over an extended period, memory read and write operations are carried out utilizing the adaptation coefficients of the EMA with coefficient $\gamma '$. The EMA of $\mu _l$ is computed following the same procedure as Eq. 3. In the event that a memory interaction is initiated, the adaptor retrieves the most comparable transformations from the past through an attention-read operation, which involves a weighted sum over the memory items:

$$\begin{aligned} \textbf{r}_l = softmax(M_l\hat{\mu }_l), \end{aligned}$$

(7)

$$\begin{aligned} \tilde{\mu }_l = \sum _{i=1}^k TopK(\textbf{r}_l)[i]M_l[i], \end{aligned}$$

(8)

where $TopK(\cdot )$ selects the top k values from $\textbf{r}_l$, and [i] means the i-th element. Retrieving the adaptation coefficient from memory enables the model to recall past experiences in adapting to the current pattern and improve its learning in the present. The retrieved coefficient is combined with the current parameters through a weighted sum: $\mu _l = \tau \mu _l + (1 - \tau )\tilde{\mu }_l$. Subsequently, the memory is updated using the updated adaptation coefficient:

$$\begin{aligned} M_l = \tau M_l + (1 - \tau )\tilde{\mu }\otimes TopK(\textbf{r}_l), \end{aligned}$$

(9)

where $\otimes $ denotes the outer-product operator. So far, we can effectively incorporate new knowledge into the most pertinent locations, as identified by the top-k attention values of $\textbf{r}_l$. Since the memory is updated by summation, it can be inferred that the memory $\mu _l$ does not increase as learning progresses.

5 Datasets and Contextual Data

This paper is based on two primary data sources: energy usage data and mobility data, as well as two contextual datasets: COVID lockdown dates and temperature data. The statistical summary of the main datasets are provided in Table 1 and visualized in Fig. 2. These datasets were collected from four building complexes in the Melbourne CBD area of Australia between 2018 and 2021.

Table 1 outlines the essential statistical properties of energy usage and mobility data collected from the four building complexes. It is evident from the data that energy usage varies significantly between the buildings, with BC2 having over ten times the average energy usage of BC4. Similarly, the mobility data shows distinct differences, with BC2 having a mean pedestrian count over three times greater than BC4. These differences emphasize the complexity of forecasting for energy usage in different building complexes.

Table 1. The summary statistics of the four datasets, each of which represents an aggregated and anonymized building complex (BC).

Full size table

It is worth noting that lockdown had a more significant impact on mobility than energy usage, as illustrated in Fig. 2. Additionally, both energy usage and mobility started declining even before the start of lockdown.

5.1 Energy Usage Data

The energy usage data was collected from the energy suppliers for each building complex and measured the amount of electricity used by the buildings. To protect the privacy of the building owners, operators, and users, the energy usage data from each building was aggregated into complexes and anonymized. Buildings in the same complexes can have different primary use (e.g. residential, office, retails)

5.2 Mobility Data

The mobility data was captured by an automated pedestrian counting system installed by the City of Melbourne http://www.pedestrian.melbourne.vic.gov.au/ [19], and provided information on the movement patterns of individuals in and around each building complex. The system recorded the number of pedestrians passing through a given zone as shown in Fig. 3. As no images were recorded, no individual information was collected. Some sensors were installed as early as 2009, while others were installed as late as 2021. Some devices were moved, removed, and upgraded at various times. Seventy-nine sensors have been installed, and we have chosen four sensors, one for each building complex. We performed manual matching between the complexes and sensors by selecting the sensor that was closest to each building complex.

5.3 COVID Lockdown Dates

We used data on the dates of the COVID lockdowns in Melbourne, one of the strictest in the world. Our datasets coincides with the first lockdown from March 30, 2020 to May 12, 2020 (43 days), and the second lockdown from July 8 to October 27, 2020 (111 days). We also divided the time into pre-lockdown and post-lockdown periods, taking the date of the first lockdown (March 30, 2020) as the boundary. We took this information from https://www.abc.net.au/news/2021-10-03/melbourne-longest-lockdown/100510710 [4].

5.4 Temperature Data

Temperature records are extracted from the National Renewable Energy Laboratory (NREL) Asia Pacific Himawari Solar Data [29]. As the building complexes are located in close proximity to one another, we utilized the same temperature data for all of them.

5.5 Dataset Preprocessing

For this study, we have fixed an observation of $L=24$ h and a forecast horizon size of $H=24$ h, to mimic a day-ahead forecasting experiment. To accurately link the foot traffic mobility data with the building, we carefully handpicked the pedestrian counting sensor that is located in the immediate vicinity of the building and used its corresponding mobility signal. The energy usage load of the building, the foot traffic volume, and the temperature degree were all aligned based on their timestamps.

6 Experiments and Results

We conducted two sets of experiments to evaluate the effectiveness of our proposed methods for predicting energy usage during anomalous periods. The first set of experiments evaluated the impact of including mobility contextual data in our models. The second set of experiments assessed the importance of continual learning. In addition, we conducted ablation experiments on FSNet to investigate the impact of different components of the model on the overall performance.

6.1 Experimental Setup

The experiments were conducted on a high-performance computing (HPC) node cluster with an Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz and Tesla V100-SXM2. The software specifications included intel-mkl 2020.3.304, nvidia-cublas 11.11.3.6, cudnn 8.1.1-cuda11, fftw3 3.3.8, openmpi 4.1.0, magma 2.6.0, cuda 11.2.2, pytorch 1.9.0, python3 3.9.2, pandas 1.2.4, and numpy 1.20.0.

The data was split into three months for pre-training, three months for validation of the pre-training, and the rest was used for the usual continual learning setup. No hyperparameter tuning was conducted as default settings were used. The loss function used is MSE.

6.2 Mobility

Table 2. Performance comparison between different contextual features. Results are average over 10 runs with different random seed. The standard deviation is shown. The algorithm used was FSNet with continual learning. +M is the improvement of adding mobility over no context, +T is the improvement of adding temperature over no context, T+M is the improvement of adding mobility over temperature only.

Full size table

To assess the significance of the mobility context in predicting energy usage during anomalous periods, we performed a contextual feature ablation analysis, comparing pre- and post-lockdown performance. Table 2 presents the results of our experiments. Our findings suggest that the importance of mobility context is unclear in pre-lockdown periods, with mixed improvements observed, and the improvements are small compared to the standard deviations. However, post-lockdown, the importance of mobility context is more pronounced, and the best performance was achieved when both mobility and temperature contexts were utilized. Notably, our analysis revealed that post-lockdown, the improvement brought about by the mobility context is larger than that achieved through temperature alone, as observed in BC1, BC2, and BC4. This could be due to the fact that temperature has a comparatively simple and regular periodic pattern such that deep learning models can deduce them from energy data alone.

6.3 Continual Learning

Table 3. Comparing the performance of different algorithm with or without continual learning (CL). The metric used is MAE. Results are average over 10 runs with different random seed. The standard deviation is shown.

Full size table

We conducted an experiment to determine the significance of continual learning by comparing the performance of various popular models with and without continual learning.

The models used in the experiment are:

FSNet [22]: Fast and slow network, described in detail in the method section of this paper. In the ‘no CL’, version we use the exact same architecture, however we use the traditional offline learning.
TCN [1]: Temporal Convolutional Network, is the offline learning baseline. It modifies the typical CNN using causal and dilated convolution which enhance its ability to capture temporal dependencies more effectively. The next three methods are different continual learning methods that uses TCN as the baseline.
OGD: Ordinary gradient descent, a popular optimization algorithm used in machine learning. It updates the model parameters by taking small steps in the direction of the gradient of the loss function.
ER [6, 17]: Experience Replay, a technique used to re-expose the model to past experiences in order to improve learning efficiency and reduce the effects of catastrophic forgetting.
DER++ [5]: Dark Experience Replay++ is an extension of the DER (Deep Experience Replay) algorithm, which uses a memory buffer to store past experiences and a deep neural network to learn from them. DER++ improves upon DER by using a dual-memory architecture, which allows it to store both short-term and long-term memories separately.

Table 3 displays the results, which demonstrate the consistent importance of continual learning in 1both the pre- and post-lockdown periods, with improvements multiple times larger than the standard deviations.

7 Conclusion

In this study, we investigated the impact of mobility contextual data and continual learning on building energy usage forecasting during out-of-distribution periods. We used data from Melbourne, Australia, a city that experienced one of the strictest lockdowns during the COVID-19 pandemic, as a prime example of such periods. Our results indicated that energy usage and mobility patterns vary significantly across different building complexes, highlighting the complexity of energy usage forecasting. We also found that the mobility context had a greater impact than the temperature context in forecasting energy usage during lockdown. We evaluated the importance of continual learning by comparing the performance of several popular models with and without continual learning, including FSNet, OGD, ER, and DER++. The results consistently demonstrated that continual learning is important in both pre- and post-lockdown periods, with significant improvements in performance observed across all models. Our study emphasizes the importance of considering contextual data and implementing continual learning techniques for robust energy usage forecasting in buildings.

Change history

27 November 2023
A correction has been published.

References

Ali, U., et al.: A data-driven approach for multi-scale GIS-based building energy modeling for analysis, planning and support decision making. Appl. Energy 279, 115834 (2020)
Google Scholar
Ali, U., Shamsi, M.H., Hoare, C., Mangina, E., O’Donnell, J.: Review of urban building energy modeling (UBEM) approaches, methods and tools using qualitative and quantitative analysis. Energy Build. 246, 111073 (2021)
Article Google Scholar
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Boaz, J.: Melbourne passes buenos aires’ world record for time spent in lockdown (2021)
Google Scholar
Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark experience for general continual learning: a strong, simple baseline. Adv. Neural. Inf. Process. Syst. 33, 15920–15930 (2020)
Google Scholar
Chaudhry, A., et al.: On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486 (2019)
Crawley, D.B., et al.: Energyplus: creating a new-generation building energy simulation program. Energy Build. 33(4), 319–331 (2001). Special Issue: BUILDING SIMULATION’99
Google Scholar
Dedesko, S., Stephens, B., Gilbert, J.A., Siegel, J.A.: Methods to assess human occupancy and occupant activity in hospital patient rooms. Build. Environ. 90, 136–145 (2015)
Article Google Scholar
Grossberg, S.: Adaptive resonance theory: how a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw. 37, 1–47 (2013)
Article Google Scholar
Ha, D., Dai, A.M., Le, Q.V.: Hypernetworks. In: International Conference on Learning Representations (2016)
Google Scholar
Herzen, J., et al.: Darts: user-friendly modern machine learning for time series. J. Mach. Learn. Res. 23(124), 1–6 (2022)
MathSciNet Google Scholar
Hewamalage, H., Chen, K., Rana, M., Sethuvenkatraman, S., Xue, H., Salim, F.D.: Human mobility data as proxy for occupancy information in urban building energy modelling. In: 18th Healthy Buildings Europe Conference (2023)
Google Scholar
Hoi, S.C., Sahoo, D., Lu, J., Zhao, P.: Online learning: a comprehensive survey. Neurocomputing 459, 249–289 (2021)
Article Google Scholar
Kar, P., Li, S., Narasimhan, H., Chawla, S., Sebastiani, F.: Online optimization methods for the quantification problem. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1625–1634 (2016)
Google Scholar
Kumaran, D., Hassabis, D., McClelland, J.L.: What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20(7), 512–534 (2016)
Article Google Scholar
Li, S.: The art of clustering bandits. Ph.D. thesis, Università degli Studi dell’Insubria (2016)
Google Scholar
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8, 293–321 (1992)
Article Google Scholar
Mahadik, K., Wu, Q., Li, S., Sabne, A.: Fast distributed bandits for online recommendation systems. In: Proceedings of the 34th ACM International Conference on Supercomputing, pp. 1–13 (2020)
Google Scholar
City of Melbourne: City of Melbourne - pedestrian counting system
Google Scholar
Oreshkin, B.N., Dudek, G., Pełka, P., Turkina, E.: N-beats neural network for mid-term electricity load forecasting. Appl. Energy 293, 116918 (2021)
Article Google Scholar
Pełka, P., Dudek, G.: Pattern-based long short-term memory for mid-term electrical load forecasting. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Google Scholar
Pham, Q., Liu, C., Sahoo, D., Hoi, S.C.: Learning fast and slow for online time series forecasting. arXiv preprint arXiv:2202.11672 (2022)
Phuong, M., Lampert, C.H.: Distillation-based training for multi-exit architectures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1355–1364 (2019)
Google Scholar
Prabowo, A.: Spatiotemporal deep learning. Ph.D. thesis, RMIT University (2022)
Google Scholar
Prabowo, A., Shao, W., Xue, H., Koniusz, P., Salim, F.D.: Because every sensor is unique, so is every pair: handling dynamicity in traffic forecasting. In: 8th ACM/IEEE Conference on Internet of Things Design and Implementation, IoTDI 2023, pp. 93–104. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3576842.3582362
Prabowo, A., Xue, H., Shao, W., Koniusz, P., Salim, F.D.: Message passing neural networks for traffic forecasting (2023)
Google Scholar
Sahoo, D., Pham, Q., Lu, J., Hoi, S.C.: Online deep learning: learning deep neural networks on the fly. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 2660–2666 (2018)
Google Scholar
Salim, F.D., et al.: Modelling urban-scale occupant behaviour, mobility, and energy in buildings: a survey. Build. Environ. 183, 106964 (2020)
Google Scholar
Sengupta, M., Xie, Y., Lopez, A., Habte, A., Maclaurin, G., Shelby, J.: The national solar radiation data base (NSRDB). Renew. Sustain. Energy Rev. 89, 51–60 (2018)
Article Google Scholar
Shao, W., Prabowo, A., Zhao, S., Koniusz, P., Salim, F.D.: Predicting flight delay with spatio-temporal trajectory convolutional network and airport situational awareness map. Neurocomputing 472, 280–293 (2022)
Article Google Scholar
Smolak, K., et al.: Applying human mobility and water consumption data for short-term water demand forecasting using classical and machine learning models. Urban Water J. 17(1), 32–42 (2020)
Google Scholar
Wang, L., et al.: Using mobility data to understand and forecast COVID19 dynamics. medRxiv (2020)
Google Scholar
Wei, P., Jiang, X.: Data-driven energy and population estimation for real-time city-wide energy footprinting. In: Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys 2019, pp. 267–276. Association for Computing Machinery, New York (2019)
Google Scholar
Xue, H., Salim, F.D.: TERMCast: temporal relation modeling for effective urban flow forecasting. In: Karlapalem, K., et al. (eds.) PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 741–753. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_58
Chapter Google Scholar

Download references

Acknowledgment

We highly appreciate Centre for New Energy Technologies (C4NET) and Commonwealth Scientific and Industrial Research Organisation (CSIRO) for their funding support and contributions during the project. We would also like to acknowledge the support of Cisco’s National Industry Innovation Network (NIIN) Research Chair Program. This research was undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI), which is supported by the Australian Government. This endeavor would not have been possible without the contribution of Dr. Hansika Hewamalage and Dr. Mashud Rana.

Author information

Authors and Affiliations

UNSW, Sydney, Australia
Arian Prabowo, Kaixuan Chen, Hao Xue & Flora D. Salim
CSIRO, Newcastle, Australia
Subbu Sethuvenkatraman

Authors

Arian Prabowo
View author publications
You can also search for this author in PubMed Google Scholar
Kaixuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hao Xue
View author publications
You can also search for this author in PubMed Google Scholar
Subbu Sethuvenkatraman
View author publications
You can also search for this author in PubMed Google Scholar
Flora D. Salim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arian Prabowo .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics

1.1 Ethical Statement

Data collection: the data used in this paper are a mixture of public and private data. For privacy reasons, the energy usage data cannot be made available publicly. The lockdown dates, and pedestrian data can be access publicly. Lockdown dates is by ABC, an Australian public news service https://www.abc.net.au/news/2021-10-03/melbourne-longest-lockdown/100510710 and the pedestrian data is from City of Melbourne, a municipal government http://www.pedestrian.melbourne.vic.gov.au/.

Statement of Informed Consent: This paper does not contain any studies with human or animal participants. There are no human participants in this paper, and informed consent is not applicable.

1.2 Ethical Considerations

There are several ethical considerations related to this paper.

1.2.1 Data Privacy.

The use of data from buildings may raise concerns about privacy, particularly if personal data such as occupancy patterns is being collected and analyzed. Although the privacy of individual residents, occupants, and users are protected through the building level aggregations, sensitive information belonging to building managers, operator, and owners might be at risk. To this end, we choose to further aggregate the few buildings into complexes and make it anonymous. Unfortunately, the implication is that we cannot publish the dataset.

1.2.2 Bias and Discrimination.

There is a risk that the models used to predict energy usage may be biased against certain groups of people, particularly if the models are trained on data that is not representative of the population as a whole. This could lead to discriminatory outcomes, such as higher energy bills or reduced access to energy for marginalized communities. We do acknowledge that the CBD of Melbourne, Australia is not a representative of energy usage in buildings in general, in CBD around the world, nor Australia. However, our contribution specifically tackle the shift in distributions, albeit only temporally and not spatially. We hope that our contributions will advance the forecasting techniques, even when the distributions in the dataset are not representative.

1.2.3 Environmental Impact.

This paper can make buildings more sustainable by improving energy usage forecasting, even during anomalous periods, such as the COVID-19 pandemic. Robust and accurate forecasting enables building managers to optimize energy consumption and reduce costs. By using contextual data, such as human mobility patterns, and continual learning techniques, building energy usage can be predicted more accurately and efficiently, leading to better energy management and reduced waste. This, in turn, can contribute to the overall sustainability of buildings and reduce their impact on the environment.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prabowo, A., Chen, K., Xue, H., Sethuvenkatraman, S., Salim, F.D. (2023). Continually Learning Out-of-Distribution Spatiotemporal Data for Robust Energy Forecasting. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14175. Springer, Cham. https://doi.org/10.1007/978-3-031-43430-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-43430-3_1
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43429-7
Online ISBN: 978-3-031-43430-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Continually Learning Out-of-Distribution Spatiotemporal Data for Robust Energy Forecasting

Abstract

Similar content being viewed by others

Building Energy Information: Demand and Consumption Prediction with Machine Learning Models for Sustainable and Smart Cities

Building Energy Consumption Prediction Model Using Machine Learning

A Scalable Long-Horizon Forecasting of Building Electricity Consumption

Keywords

1 Introduction

2 Related Works

2.1 Energy Prediction in Urban Environments

2.2 Mobility Data as Auxiliary Information in Forecasting

2.3 Deep Learning for Forecasting

3 Problem Definition

3.1 Time Series Forecasting

3.2 Continual Learning for Time Series Forecasting

4 Method

4.1 Backbone-Temporal Convolutional Network

4.2 Fast Adaptation

4.3 Associative Memory

5 Datasets and Contextual Data

5.1 Energy Usage Data

5.2 Mobility Data

5.3 COVID Lockdown Dates

5.4 Temperature Data

5.5 Dataset Preprocessing

6 Experiments and Results

6.1 Experimental Setup

6.2 Mobility

6.3 Continual Learning

7 Conclusion

Change history

27 November 2023

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics

Ethics

1.1 Ethical Statement

1.2 Ethical Considerations

1.2.1 Data Privacy.

1.2.2 Bias and Discrimination.

1.2.3 Environmental Impact.

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation