Keywords

1 Introduction

Digitalization, urbanization, and the reduction of non-renewable energy resources raise concerns on the availability of energy in the future. Increased energy consumption would increase the need for fossil fuels that cause irreparable harm to our environment. According to the report issued by the International Energy Agency Organisation for Economic Co-operation and Development (OECD), energy consumption in the building sector accounts for 40% of total consumption and 36% of CO2 emissions in the EU [1]. In order to ensure the availability of energy and to preserve our environment in the future, we need to find more efficient and intelligent solutions for reducing energy consumption in the building sector. In addition, these solutions can help building managers to make better decisions so as to reasonably control all kinds of equipment

Developing smart houses are an efficient way for energy saving. A smart house is a set of many sensors, controllers, relays and meters, as well as systems for controlling them. The purpose of a smart house is to collect useful data about residents’ behaviour, habits and environment in order to make decisions efficiently for reducing energy usage. With machine learning techniques, a smart house can learn residents’ lifestyles and routines, and thus automates operations. For example, light and temperature can be automatically adjusted in the presence and absence of residents. Moreover, smart home systems can be easily integrated with existing conventional systems and switchboards to optimize the energy consumption of the house [2]. Most smart houses employ the Internet of Things (IoT) for connecting physical and virtual devices in a house to the Internet. IoT allows users to remotely manage, collect data and automate the operation of their device or system. IoT can be utilized in the small scale and simple appliances, such as a refrigerator in a house and a washing machine, or as large as the transport infrastructure of a city.

Energy consumption prediction of a household can very significantly save energy and reduce environmental effect [3,4,5]. Therefore, tremendous efforts have been invested in energy optimization solutions for smart homes. The most important of these solutions is providing minimum energy consumption while maintaining customer comfort [6]. For this purpose, lots of machine learning (ML) algorithms have been proposed for energy prediction of appliances in smart homes [7, 8]. Besides the conventional ML methods like neural networks, multi-layer perceptron, k-nearest neighbor, super vector machines, decision trees, the deep learning, hybrid and ensemble methods have been significantly increased.

In this chapter, we present a framework for energy consumption prediction of a house using machine learning methods. The chapter also addressees the changes in factors such as the presence and absence of residents, as well as environmental impacts such as temperature and humidity, and the effect of ambient luminous intensity on energy consumption. To collect the real data, we used the Arduino micro-controller board and different type of sensors which are installed on the board. The data obtained from the sensors were analysed to determine the correlations between the possible factors influencing energy consumption. The energy consumption prediction was performed using three popular machine learning methods: multi-layer perceptron (MLP) and K-nearest neighbour (K-NN) and long short-term memory (LSTM). The experimental results on real data show that MLP-based prediction model can perform better than K-NN for energy prediction at the current time. In addition, LSTM can forecast hourly energy consumption with high accuracy.

The remainder of the chapter is organized as follows. Section 2 discusses some of the most important related works. The proposed energy consumption prediction framework is introduced in Sect. 3. Section 4 describes the experimental design and setup. Experimental results are presented in Sect. 5. Finally, the conclusion is presented in Sect. 6.

2 Related Work

Designing efficient energy consumption prediction methods can play an important role in improving energy saving and reducing environmental effects. For this reason, various approaches have been proposed for energy consumption prediction in the recent literature. The majority of these approaches employ the historical energy consumption time series data and build a prediction model using ML algorithms. For instance, TSO et al. [6] have utilized linear regression (LR), decision tree (DT) and artificial neural network (ANN) for prediction of energy consumption in government buildings, dwellings and cottages. In addition, they have investigated the effect of the number of residents, the size of the family, the time of construction, the type of housing and the electronic equipment in use. They collected the results for two phases: summer and winter. Their results show that DT has more accuracy compared to ANN and LR.

Similarly, three models based on simple linear regression (SLR) and linear regression multiple layers (LRML) are presented in [9] for energy consumption prediction. In addition, they considered three different cases. In the first case, the training dataset related to energy consumption was grouped on an hourly basis, in the following case each year and lastly on the basis of daily peak hours. Their results show that the SLR-based algorithm was more able to predict hourly and year-based forecasting compared with the LRML.

Truong et al. [10] proposed to predict energy consumption based on a graphical model. They aim to find the relationship between energy consumption and the human actions and routines, and in particular the utilization rate and time of use of electronic household appliances. Olofsson et al. [7] proposed a neural network model for long-term energy demand prediction based on short-term training data. The model parameters include temperature difference between indoor and outdoor environment and energy required for heating and other utilities.

Fayaz et al. [8] proposed extreme learning machine (ELM), adaptive neuro-fuzzy inference system (ANFIS) and ANN to predict energy consumption in residential buildings. The aim of this study was to forecast energy consumption for the following week and the following month. Moreover, they have employed different numbers and types of membership functions to obtain the optimal structure of ANFIS. For ANN, they have also tested several architectures with different number of hidden layers. Kreider et al. [11] reported results of a recurrent neural network on hourly energy consumption data to predict building heating and cooling energy needs in the future, knowing only the weather and timestamp.

Liet al. [12] utilized back propagation ANN, radial basis function neural network (RBFNN), general regression neural network (GRNN) and support vector machine for predicting the annual electricity consumption of buildings. These models are trained on a real data of 59 buildings and tested on nine buildings. Their results show that GRNN and SVM were more applicable to this problem compared with other models. However, the test conducted to all methods indicated that SVM could predict with higher accuracy and was better than the others.

Samuel et al. [13] addressed the energy consumption prediction problem using support vector machine (SVM) and multi-layer perceptron (MLP). They also improved the forecasting for district consumption by clustering houses based on their consumption profiles. Authors demonstrate an empirical plot of normalized root mean squared error against number of customers and show it decreases. This work aggregates up to 782 homes.

SVM is another ML algorithm that is commonly applied for forecasting energy consumption. SVM is increasingly used in research and industry due to its highly effective model in solving non-linear problems [3]. Besides that, since it can be used to solve non-linear regression estimation problems, SVM can be used to forecast time series. This paper [3] reviews the building electrical energy forecasting method using artificial intelligence (AI) methods such as SVM and ANN. Both methods are widely used in the field of forecasting. Dong et al. [14] applied SVM to predict monthly buildings energy consumption in tropical regions. Their results on 3 years data of electricity consumption of show that SVM has a good performance in prediction.

Jain and Satish [15] employed SVM in clustering-based Short-Term Load Forecasting. The forecasting is performed for the 48 half hourly loads of the next day. The daily average load of each day for all the training patterns and testing patterns is calculated and the patterns are clustered using a threshold value between the daily average load of the testing pattern and the daily average load of the training patterns. Their results obtained from clustering the input patterns and without clustering are presented and the results show that the SVM clustering-based approach is more accurate. For the same type of forecasting, Mohandes [16] showed that the SVM result for forecasting was better compared to the Auto Regressive modelling. The RMSE for testing data for SVM was 0.0215, while the value was 0.0376 for AR. The lower RMSE in the analysis indicated that the forecasting was more accurate.

Rahman et al. [17] employed a deep recurrent neural network in order to predict heating demand for a commercial building in the USA. They investigated the performance of the deep model over a medium- to long-term horizon forecasting. Deep learning is a particularly attractive option for longer-term forecasts as they are able to model expressive functions through multiple layers of abstractions [18].

3 The Proposed Energy Consumption Prediction Framework

In this section, we describe the proposed framework for energy consumption prediction. Figure 1 shows an illustration of our whole framework. The framework consists of four main layers as follows:

Fig. 1
figure 1

An overview of the proposed framework for energy consumption prediction

3.1 Data Collection Layer

This layer consists of physical IoT devices including various sensors. The sensors are used to collect the contextual information environmental conditions, circumstances, temperature, humidity and user occupancy. These sensors include:

  • Humidity and temperature sensor provides the indoor temperature real data of the house, as well as the humidity percentage. The humidity and temperature sensor can measure the temperature from − 40C to 125C. In addition, it measures the humidity between 0% and 100%.

  • Illuminance intensity sensor is a digital sensor that can measure the illuminance value between 0.1 and 40.00 Lux. It contains both infrared and visible light spectrum diodes, which help to obtain accurate measurement.

  • Passive Infra-Red (PIR) sensor is used for user occupancy detection by rapid changes in infrared radiation due to human or other influence movement. PIR sensor information is 1 or 0 for representing busy or nor busy state for the past 1 h, respectively.

  • Volume sensor determines the presence or absence of residents based on the sound level. This sensor is based on a simple microphone and LM386 amplifier, which convert the surrounding sound to an analog signal with sensitivity of about 50 dB. It can measure the amplitude of the sound by measuring the analog voltage. The value of this sensor is 0 if it cannot detect any sounds at the house in the past 1 h, otherwise is 1.

Data was collected from this sensor set continuously from November to December 2018. All sensors’ measurements were recorded every 5 s. All selected sensors measure variables that have direct influence on energy consumption. The energy consumption that is obtained by the local electricity utility represents the total value of energy consumption (kilowatt-hour (kWh)) for each hour. For this reason, we consider the average value of lux, temperature, humidity for 1 h. In addition, the maximum value of PIR and volume within 1 h is assumed. Moreover, we got the outdoor temperature data (C) from the local weather bureau which is located in 6 km far from the house. Furthermore, the input variables used from our data include illuminance, indoor temperature, outdoor temperature, humidity, PIR and volume while the output variable is the total energy consumption. The characteristics of the data are shown in Table 1.

Table 1 The data characteristics

Figure 2 can give an idea about the energy consumption distribution of the house per hour for an example day. You can see the energy consumption is low in midnight, early morning and noon whereas the peak energy consumption is observed during afternoon timings. This is due to the fact that residential buildings have maximum occupancy during afternoon times which results in higher energy consumption. This also indicates that there is a strong positive correlation between energy consumption and occupancy in the collected dataset.

Fig. 2
figure 2

Hourly distribution of energy consumption data for 28 November

3.2 Pre-processing Layer

In this layer, we clean the data by removing outliers and fixing the missing values in order to prepare the data in the format required by the upper layer. In addition, we applied data normalization to generate the same range of values for each measurement for using machine learning models. Therefore, the sample data was normalized to fit into the range between 0 and 1 by using the following equation:

$$\displaystyle \begin{aligned} x_{n}=\frac{x_{i}-x_{min}}{x_{max}-x_{min}}, i=1,2,\ldots ,n \end{aligned} $$
(1)

where x n is the normalized values, x i is i of input data. x max and x min represent the maximum and minimum values of data x i, respectively [8]. n is the number of samples.

3.3 Prediction Layer

In the prediction layer, we utilized two well-known machine learning algorithms (K-NN and MLP) for energy consumption prediction at the current time. Moreover, we used LSTM algorithm for 1 h energy consumption prediction for the residential house. Here, we briefly review the machine learning methods which were used in this work for energy consumption prediction.

3.3.1 K-Nearest Neighbour

K-nearest neighbour (K-NN) [19] is one of the most widely used algorithms for regression supported by its simplicity and intuitiveness in finding similar instances in multivariate and large-dimensional feature spaces of arbitrary attribute scales. Suppose the training dataset has m samples that each sample x i is described by n input variables and an output variable y i such as \(x_{i}=\left \{ x_{i1}, x_{i2},\ldots ,x_{in},y_{i} \right \}\). The goal is to learn a function f : x → y known as a regression function that model the relationship between input variables and an output variable. The K-NN regression estimates the function by taking a local average of the training dataset. Locality is defined in terms of the k samples nearest to the estimation sample. As the performance of K-NN algorithm strongly depends on the parameter k, finding the best values of k is essential. A large k value decreases the effect of noise and minimizes the prediction losses. However, a small k value allows simple implementation and efficient queries [20, 21].

Cross validation can employ to estimate the accuracy and validity of the classifier with different k values [21]. Two common schemes of cross validation are k-fold cross validation and leave-one-out cross-validation (LOOCV). In k-fold cross-validation, the dataset is randomly divided into k subsets or folds and repeated k times. Each time, one fold is reserved as a test dataset for validating the model and the remaining k − 1 folds are used for training the model as a training dataset. Then, the classification accuracy across all k trails is computed. LOOCV is a particular type of k-fold cross-validation where k is the number of examples in the dataset. As the size of the dataset is not large, LOOCV is utilized in this paper.

3.3.2 Multilayer Perceptron

Multilayer perceptron (MLP) [22] is an artificial neural network (ANN) for modelling non-linear relationship between input and output variables through multiple layers of interconnected processing elements. ANNs are inspired from the human brain to predict the new observation form the previous observation after executing a learning process. MLP consists of three main layers: input, hidden and output layers. Each interconnection between neurons has associated with a scalar weight. The MLP model is trained by adjusting weights in order to minimize the error. The error in the final layer can be calculated as follows:

$$\displaystyle \begin{aligned} E = {(y_{i}-\hat{y}_{i})^{2}} \end{aligned} $$
(2)

where y and \(\hat {y}\) are the actual or measured output and the predicted output, respectively.

During training the model, the weights of each node j at layer l are updated using stochastic gradient descent as follows:

$$\displaystyle \begin{aligned} {w_{ji}}^{m} = {w_{ji}}^{m}-\alpha(\frac{\partial e}{\partial {w_{ji}}^{m}}) \end{aligned} $$
(3)

where α is the learning rate. The partial derivatives \((\frac {\partial e}{\partial {w_{ji}}^{m}})\) are determined by back-propagation method [18].

However, a gradient descent learning approach poses two problems [23]. First problem is overfitting. It means MLP can adjust weights for performing well on the training dataset but it is not able to produce the accurate response for a new data. This problem can be solved by splitting the training dataset into two parts: one part for training and the part for validation. Then, the algorithms is stopped when the error increases on the validation dataset.

The second problem is local minima for finding the global optimal solution by exploring the search space. One way for avoiding local minima is using adaptive learning rate. An adaptive learning rate dynamically changes the gradient descent step size, such that the step size is larger when the gradient is steep and smaller when the gradient is flat.

3.3.3 Long Short-Term Memory

Long short-term memory (LSTM) [24] networks are a type of recurrent neural networks (RNNs). As RNNs have feedback connections unlike MLP, they are most common to modelling sequences and temporal dependencies. An activation function in LSTM accounts the temporal dependency by considering the previous timestep (t − 1). LSTM consists of input gate i, forget gate f, output gate o and cell state c. Generally, the three gates control the information flow.

At current time t, the input is x t, the hidden layer output is h t and its former output is h t−1, the cell input state is \(\hat {c}_{t}\), the cell output state is c t and its former state is c t−1. For measuring c t and h t, firstly the following equations are calculated in order:

$$\displaystyle \begin{aligned} i_{t}=\sigma (w_{1}^{i}.x_{t}+w_{h}^{i}.h_{t-1}+b_{i}), \end{aligned} $$
(4)
$$\displaystyle \begin{aligned} f_{t}=\sigma (w_{1}^{f}.x_{t}+w_{h}^{f}.h_{t-1}+b_{f}), \end{aligned} $$
(5)
$$\displaystyle \begin{aligned} o_{t}=\sigma (w_{1}^{o}.x_{t}+w_{h}^{o}.h_{t-1}+b_{o}), \end{aligned} $$
(6)
$$\displaystyle \begin{aligned} \hat{c}_{t}=\tanh (w_{1}^{c}.x_{t}+w_{h}^{c}.h_{t-1}+b_{c}), \end{aligned} $$
(7)

where \(w_{1}^{i}\), \(w_{1}^{f}\), \(w_{1}^{o}\) and \(w_{1}^{c}\) are the weights matrices connecting x t to the three gates and the cell input. \(w_{h}^{i}\), \(w_{h}^{f}\), \(w_{h}^{o}\) and \(w_{h}^{c}\) are the weights matrices connecting h t−1 to the three gates and the cell input. b i, b f, b o and b c are the bias terms. σ and \(\tanh \) represent the sigmoid and tangent functions, respectively.Secondly, the cell output state is measured as follows:

$$\displaystyle \begin{aligned} c_{t} = i_{t}\times \hat{c}_{t} + f_{t}\times c_{t-1}, \end{aligned} $$
(8)

Finally, the hidden layer output is calculated as:

$$\displaystyle \begin{aligned} h_{t} = o_{t}\times \tanh(c_{t}) \end{aligned} $$
(9)

3.4 Performance Evaluation Layer

The final layer, the performance evaluation layer, evaluates the performance of the proposed models based on two common performance metrics: mean square error (MSE) and root mean square error (RMSE). They are calculated with the following equations:

$$\displaystyle \begin{aligned} MSE = {\frac{1}{n}\sum_{i=1}^{n}\left ( y_{i} - \hat{y}_{i}\right )^{2}} \end{aligned} $$
(10)
$$\displaystyle \begin{aligned} RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\left ( y_{i} - \hat{y}_{i}\right )^{2}} \end{aligned} $$
(11)

where y is the real energy consumption, \(\hat {y}\) is the predicted energy consumption and n is the number of samples.

4 Experimental Setup

We collected a real dataset using a sensor system in a family house that is located in southwest of Finland and was constructed in the year 2018. This house has a structure which is typical for a country such as Finland. The major construction material is concrete with wood and glass windows. The house has an exhaust air heat pump (EAHP) and is heated by underfloor heating. Typically, the house heating is a closed loop control system that is controlled either by inside or outside temperatures. The control system tries to keep the room temperature constant at the preset value by residences (20–22 C).

Energy consumption data is recorded by the electrical utility on the hourly interval. The three major energy consuming systems are: heating, lighting and electrical equipment. For the experiments, we proposed 80% of the whole data for training the models and 20% for testing.

For the hardware implementation, we used Arduino mega micro-controller as it is inexpensive, cross-platform, simple, clear programming environment, open-source and has sufficient digital/analog ports and extensible software support [25]. This micro-controller has 54 digital pins and 16 analog pins. Figure 3 shows an overview of the hardware and software implementations. Several sensors are used to monitor the smart house: the temperature sensor, humidity sensor, illuminance intensity sensor, PIR motion sensor and volume sensor. All sensors are connected to the pin of Arduino. In order to connect Arduino to the Internet for IoT development, we connected Arduino to the Ethernet shield. The Ethernet shield operating voltage is 5 V (supplied from the Arduino Board) and the connection speed is 10/100 Mb. For Arduino programming, we utilized an environment that is called Arduino IDE. Arduino IDE supports C and C++ programming languages. We used the Ubidots platform for collecting the sensor’s output in Cloud.

Fig. 3
figure 3

An overview of the hardware and software implementations

5 Experimental Results

We evaluated two popular ML methods (K-NN and MLP) for the energy consumption prediction at the current time. In addition, LSTM is used for predicting the hourly energy consumption.

The obtained results show that K-NN-based prediction model can get very high accuracy for the test dataset since the RMSE and MSE values are 1.80 and 3.25, respectively. To obtain the best K value, we utilized the leave-one-out cross-validation (LOOCV). We assumed the value of K from 1 to 10 and got the maximum accuracy when K = 2.

As the number of neurons and layers can significantly effect on MLP performance, we test three different MLP models with one, two and three hidden layers. Table 2 shows the number of neurons of the hidden layer in each model.

Table 2 The number of neurons of the hidden layer in each model

Adam optimization [26] is used to train the model with the learning rate 0.00001 and reduction factor of 0.1. The epoch number was 20 and the number of iteration in each epoch was 1000. As you can see in Fig. 4, MLP-based model with three hidden layers can predict the energy consumption with the minimum RMSE (1.62) and MSE (2.63). Generally, the experimental results show that an MLP-based model is better in prediction than the K-NN-based model. As the aim of this study is to develop accurate prediction model, MLP has been selected. Moreover, Fig. 5 illustrates the prediction results of four proposed models in two training and test phases. The results show that the MLP-based models perform better than K-NN. This supports the claim that the MLP can adequately account the relationship between the measured input variables and output (energy consumption).

Fig. 4
figure 4

RMSE and MSE of MLP architectures with different number of hidden layers

Fig. 5
figure 5

The prediction results of (a) K-NN, (b) MLP with one hidden layer, (c) MLP with two hidden layers and (d) MLP with three hidden layers between 23 November 2018 and 10 December 2018 in two phases: training and test

In addition, we evaluate our framework based on LSTM for forecasting the energy consumption for the next 1-h period. We proposed three LSTM layers which each layer has 30 neurons. Moreover, the value of batch size and epochs are 32 and 200, respectively. In addition, the best optimizer for our neural network model in order to learn properly and tune the internal parameter is Adam based on our experiments. Figure 6 demonstrates the predicted energy consumption obtained from LSTM model in comparison with the actual consumption for test dataset.

Fig. 6
figure 6

LSTM prediction results in comparison with actual consumption for test dataset

The results show that both lines almost coincide, which indicates that our prediction algorithm is working nicely. MSE and RMSE for the LSTM model is 0.0062 and 0.0789, respectively.

6 Conclusion

Energy consumption prediction is an efficient way for planning and balancing energy production according to user consumption requirements. The energy provider can handle energy underestimating and overestimating challenges by the energy consumption prediction. Underestimating the need for electricity will lead to potential power outages and an increase in delivery number of costs while overestimation, would lead to potential idle losses. Renewable energy production is highly dependent on changes in the environment and changing conditions, such as the climate. Predicting energy consumption can help balance changes in renewable energy production with fossil energy sources.

For these reasons, we presented a framework for energy consumption prediction in this paper. Our framework utilized two well-known machine learning methods, K-NN and MLP, for predicting the energy consumption at the current time based on sensor data such as temperature, humidity, motion, sound and user occupancy. In addition, our framework can predict the energy consumption for the next 1-h period according using LSTM. The experimental results on real data show that the MLP model is more accurate than the K-NN model.