Keywords

1 Introduction

Non-Intrusive Load Monitoring (NILM), also referred to as Energy Disaggregation, was firstly introduced by George W.Hart, Ed Kern and Fred Schweppe in the early 1980s. Household energy consumption signals are analyzed and decomposed into various sub-signals, which correspond to the energy consumption of individual appliances. The difficulties arising in this problem involve the existence of multiple sources of uncertainty, such as “noise” present in the background, multiple devices with almost the same energy consumption and similar behavior as well as appliances with complex energy profiles (e.g., multiple states).

The problem of energy disaggregation can be formulated as follows [1]:

$$\begin{aligned} X_{t}=\sum _{i=1}^{N} y_{t}^i + \sigma (t) \end{aligned}$$

The term \(\sigma (t)\) represents devices which contribute to the total consumption at time t but are not taken into account as well as the background noise. The set \(X = \{X_{1}, X_{2}, ..., X_{T}\)} includes the central consumption of N total devices at different time instances \(t = \{1, 2, ..., T\}\). Therefore, the objective is to find the contribution \(y_{t}^i\) of the i-th device separately, where \(i \in \{1, 2, ..., T\}\).

NILM is constantly gaining attention due to the development of smart grids as well as the development of cheap smart meters with enhanced capabilities. This is also due to the undeniable advantages of NILM allowing us to face certain challenges. Among them the following can be distinguished [15, 16]:

  • Detailed information about consumption: the main advantage for customers is that the analytical energy consumption will allow them to adopt an energy saving behavior. In addition, real-time information on running devices could be a useful tool, i.e., provide reminders to turn off certain devices before consumers leave home, especially those that are likely to cause serious damage or those that require excessive energy.

  • Separate device power consumption: This allows the consumers to identify the devices that consume the most energy in their home and in general the contribution of each device in the total energy consumption. In fact, [17] estimates savings of 9% to 20% by implementing an energy consumption strategy based on these power analytics.

  • Detection of dysfunctional devices: a precise device usage archive is useful for checking device status and detecting faulty devices.

  • Illegal load detection: detection of abnormal loads in households is more accurate and can be used to report potential energy theft in public and private buildings.

  • Environmental intelligence: allows for other detection approaches without the need to apply new sensors. Instead of turning all devices into smart devices, which is very expensive and not environment-friendly, a single smart meter can provide the necessary information to implement various policies.

1.1 Impact of NILM

Many studies demonstrate the impact of NILM on consumption behavior [2]. This research has shown a potential saving (theoretically) of 15% in energy consumption. However, a later analysis of 36 studies over 15 years [3] shows that it is possible to influence people’s decision to reduce their consumption by up to 12%, providing real-time information or even daily or weekly information as depicted in Fig. 1.

Fig. 1.
figure 1

Analysis of the impact of different feedback methods on behavior change [3]

Even though the first commercialization of a NILM system was done in 1996 by the company Enetics Inc, the generalized use of the NILM system remains low, as the process is time-consuming, costly due to the difficulties in installation but also highly inaccurate on a larger scale. For these reasons, NILM is still considered unreliable, which prevents its widespread adoption. Nevertheless, there are many start-ups and companies that try to commercialize NILM.

1.2 Our Contributions

In this paper, we process the data generated by a smart meter (designed by Meazon S.A.) that was installed in the central electricity switchboard of a house. Our goal is to identify the appliances that operate in each time instance and compute the total energy consumed by each appliance. To this end, experiments are carried out using three different machine learning techniques for energy disaggregation from the aggregate signal, which are detailed in Sect. 4. Our contribution lies in the use of a set of features to train our models rather than using only active power. This set contains “Active Power”, “Angle between V and I”, “Reactive Power” and “Crest Factor”. Our goal is to measure and compare the accuracy of the predictions of the methods as well as their performance. After the evaluation of the above experiments with the usual metrics in the Regression field (MSE, MAE, RMSE), an attempt was made to compare these methods in terms of their effectiveness and efficiency.

2 Related Work

Many different algorithmic approaches have been used for NILM that contain among others machine learning, signal processing and deep learning methods with artificial neural networks. The latter are increasingly being used as they have proven their effectiveness in various settings. Initially, one of the most used techniques are different variants of Hidden Markov models (HMMs) such as [4, 17, 18, 20], presenting clearly satisfactory results. Other approaches include signal processing techniques (Dynamic Time Warping) [5, 5, 21], and Graph Signal Processing [6, 7].

Machine learning techniques have been extensively used, such as Decision Trees [21], Random Forests [8, 22] or even genetic algorithms like [9, 19, 23,24,25]. In addition, Support Vector Machines have been used in [11, 26] and in [10]. In [11], the authors experimented with k-NN and Naive-Bayes [10]. Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) have also been used as in [12,13,14].

Our paper is structured as follows. In Sect. 3 we discuss the dataset and in Sect. 4 we discuss our experimental setup and our methodology with respect to the experimental evaluation. In Sect. 5, we provide our experimental findings while we conclude in Sect. 6.

3 Data Preprocessing

The sampling rate of the smart meter is at 1Hz. In Fig. 2, we observe a sample of the energy consumption (60 min) of all the appliances in the house as well as of the energy consumption of the boiler and the oven separately.

Fig. 2.
figure 2

Energy consumption of appliances from 06/06/21 20:40 to 06/06/21 21:40

The data set includes NaN (Not a Number) values at many positions. In order for the models to be able process the data, it’s necessary to solve this missing values problem. We tried three different approaches, at first by replacing NaN with zero values, then with the mean value of each column and at last with the median of each column. The results of the algorithms were almost the same in all three cases so we adopted the median approach. The reason that all methods had the same effect was that the NaN values were existent for very small time intervals, and thus their impact on the effectiveness was non-observable.

The dataset has 2.539.386 measurements (rows) that correspond to 70 days, and dimension of 12 (columns) that correspond to the 3 appliances (mains, oven and boiler) with 4 features each. We manually split the dataset into training, test and validation. The training set has 1.523.632 measurements, the validation set has 507.877 and the test data has 507.877 measurements.

4 Experimental Results

The experiments were performed on a 64-bit operating system with an Intel Core i7-10700 processor and with 16 GB RAM.

We are setting the values of the main consumption on the X variable and the data from the boiler (or oven) on the Y. In order to improve the effectiveness of the algorithms and select the best possible classifiers, some tuning experiments were performed in order to achieve the maximum possible performance. The behavior of each machine learning method is determined by certain parameters.

The tuning experiments aim at selecting the most appropriate values for these parameters, which are evaluated on the basis of evaluation metrics and aim to extract the best prediction model of each algorithm. In addition, to avoid overfitting, the cross-fold validation method is used, which is a sampling process used to evaluate machine learning models in a limited data sample. The basic parameter tested in the DT and RF methods is minimum samples split, which specifies the minimum number of samples needed to separate a node. A range with different values for this parameter was initialized for each method. We execute a repeated procedure and as result we have a unique classifier for each different value of minimum sample split. The best classifier is the one with the lowest value of the evaluation metrics.

4.1 Decision Tree

For DT, the range of values for minimum samples split was between 2 and 400 with step 5. Afterwards, we use the best estimator in order to make predictions and the results are shown in Fig. 3. The performance of the model is observed through graphs, comparing the actual values of the oven consumption (blue line) with those predicted by the model (red line).

Fig. 3.
figure 3

Real vs Predicted consumption of Decision Trees for the oven.

In general, the model’s predictions are successful and for long time intervals they match the actual consumption, but in some other time intervals the model does not perform well. This may be the result of the complicated behavior of the specific device.

Fig. 4.
figure 4

Real vs Predicted consumption of Decision Trees for the boiler.

In the next experiment the same model is far more accurate for the boiler as it is confirmed by Fig. 4. As we notice, the model’s predictions are quite successful and they match the actual consumption. The first diagram of Fig. 4 depicts the predicted consumption for a complete operation cycle of the appliance (from ON state until OFF state). In the second diagram we can observe in more detail the predicted consumption versus the real consumption during the operation of the boiler (this is why we use a different scale on the y axis). The model is highly accurate and generally follows the changes of the real consumption.

4.2 Random Forest

The same set of experiments were also conducted for a Random Forest-based regressor. The only difference is the chosen range of the minimum sample split variable. It is set between 2 and 30 with step 2 for efficiency reasons.

Fig. 5.
figure 5

Real vs Predicted consumption of Random Forest for the oven.

Figure 5 depicts the results of regression with the RF model for the oven. The model’s predictions are successful in general and match the actual consumption in many cases. The results are almost the same with the DT model, although we set a lower value for minimum samples split.

Figure 6 presents the experimental results for the boiler. The model’s predictions are quite successful and they match to a great extent the actual consumption. The first diagram of Fig. 6, depicts the predictions for the whole operation cycle of the boiler (from the ON state until OFF state). In the second diagram it is depicted in greater detail the predicted versus the actual consumption when the boiler is operating.

Fig. 6.
figure 6

Real vs Predicted consumption of Random Forest for the boiler.

4.3 k-NN

The tuning of the k-NN regressor is related to the number of neighbors. The range of k value is ranging between 2 and 30. Each value returns a different classifier/estimator and we keep the best one by calculating their RMSE value. The best value for the oven is \(k=29\). The results of this model for the oven are depicted in Fig. 7 and for the boiler in Fig. 8. The k-NN model performs similarly to the previously two models. The results for the boiler are better as expected, although the best results were achieved by using \(k=20\) neighbors.

Fig. 7.
figure 7

Real vs Predicted consumption of k-NN for the oven.

Fig. 8.
figure 8

Real vs Predicted consumption of k-NN for the boiler.

5 Results and Performance

In this section we present the performance of our models in terms of the Mean-Square-Error (MSE), Mean-Average-Error (MAE) and Rooted-Mean-Square-Error (RMSE). In addition, the execution time of tuning, training and regression are reported. In Table 1 we provide the error measures for the oven while in Table 2 we provide the error measures for the boiler.

Table 1. Performance: Decision Tree vs Random Forest vs k-NN (Oven)
Table 2. Performance: Decision Tree vs Random Forest vs k-NN (Boiler)

DT and RF are superior to k-NN, concerning the case of the oven as it can be seen from Table 1. The differences between DT and RF is almost insignificant with respect to accuracy. In the case of the boiler, all three models have the almost the same behavior with the RF marginally outperforming the others. With respect to execution time, DT is significantly faster than RF and k-NN for both appliances. In the case of the oven, DT requires 18 min and 7 s to train in contrast to RF, which needs 1 h, 16 min and 46 s, and k-NN, which is the slowest method, requiring 28 h and 15 min. The same holds for the case of the boiler where DT requires 21 min and 10 s, RF requires 1 h, 39 min and 34 s, and the k-NN requires 28 h and 6 min.

6 Conclusions and Future Work

In the present work, we study the regression problem for energy disaggregation in a real-world dataset. More precisely, we used a smart meter to collect measurements related to the aggregate consumption of a household. Three different supervised machine learning methods are applied: 1) Decision Tree, 2) Random Forest and 3) k-NN. Our main goal and the significant difference to the existing literature, is that we take advantage of a set of automatically (by the smart meter) generated features to train our models. In particular, we do not only use active power data, but our models are also trained with “Angle between V and I”, “Reactive Power” and “Crest Factor” features. Our findings indicate that regression in energy disaggregation is possible and these features enhance the accuracy of our models. The experimental comparison shows that DT and RF are far more accurate than k-NN, although the differences between them seems to be insignificant. Concerning the training time, DT is the superior method, outperforming RF and k-NN.

This research constitutes a first step towards a more profound understanding of regression in energy disaggregation. In the future, we intend to focus on extensive experimental evaluation for deep learning methods and more specifically with Neural Networks. Our main focus is to make use of CNNs and RNNs.

figure a