Keywords

1 Introduction

Energy modeling is a research field that has increasing for the last years, standing as an essential step towards the increase of efficiency in the energy systems value chain [1].

The building sector is responsible ca. 40 % of the final energy worldwide, which means that this is the largest energy consumer worldwide [2]. With the increase in urbanization mainly in the poorest countries, it is expected that buildings continue to have a high impact in the worldwide consumed energy. For the EU, the increase of energy efficiency in 20 % is one of the three targets for 2020 [3], especially in the building sector, which has the highest energy saving and energy efficiency potential [4].

Increasing energy efficiency in buildings requires planning and implementing energy efficiency measures. Quantifying the impact of a given measure—the energy savings—requires modeling how energy evolves with the surrounding variables. A complete methodology for quantifying energy efficiency measures can be found in the International Performance and Verification Protocol (IPMVP), further adapted to the eeMeasure for the application in European Commission funded projects [5]. In these standards, the application of linear regression is suggested, as it can be easily implemented using spread sheets. However, these models do not capture more complex nonlinear dynamics that can be found in office buildings like the university campus. Thus, in the literature, modeling tools that can capture nonlinear dynamics, as fuzzy set models or neural networks are suggested as alternative methods.

We could not find a quantitative comparison between the most well accepted model by the IPMVP (linear regression) with other nonlinear and under development models, which have been increasingly implemented in energy modeling: fuzzy set theory and neural networks. This is one of the main purposes of this paper: to understand how the accuracy is increased at the expenses of using much more complex models that are difficult to develop and implement without specific software tools.

This paper addresses the implementation of three models (linear regression, fuzzy set theory and neural networks) in three experimental spaces each (a class amphitheater, a library and an set of offices), assessing the development of the models and their performance.

Section 2 explains the theoretical fundamentals of the three models, enlightening their applications in the energy field. Section 3 describes the methodological approach on the data treatment and the development and performance evaluation of the models. Section 4 presents the results from the models and their discussion. Finally, this paper finishes with general remarks and future work to be developed.

2 Energy Consumption Models

Several references can be found in the bibliography tackling energy modeling in buildings. This section describes the methods used in this paper.

2.1 Linear Models

Linear regressions can determine the degree of similarity between two datasets by measuring the quadratic error between each point. It gives a linear model with coefficients that try to explain those relations. The usual performance index that is used is the squared error: R2 [10].

The linear regression models are quite well accepted in the definition of the baseline, namely according to the IPMVP standards. However, a linear model has a limited capacity as there are variables can only be explained with non-linear correlations [11].

A system can be considered linear if the relation between inputs and outputs can be described by linear equations, i.e. if the outputs can be explained by aggregating the inputs, each being multiplied by a corresponding coefficient. The principle of superposition in an important theorem that can explain the properties of a linear system, which states that the influence of all the inputs acting simultaneously in the system output is the same as the sum of the influence of the sum of each input acting alone [6].

$$ c_{1} x_{1} + c_{2} x_{2} = c_{1} y_{1} + c_{2} y_{2} $$
(1)

A linear system can be computed through linear regressions. Conceptually, a linear regression wishes to minimize the total sum of squares (SST), which is equal to the addition of the error of sum of squares (SSE) with the regression sum of squares (SSR).

$$ \begin{aligned} \sum\limits_{i = 1}^{n} (y_{i} - \overline{y} )^{2} & = \sum\limits_{i = 1}^{n} (y_{i} - \widehat{y}_{i} )^{2} + \sum\limits_{i = 1}^{n} \\ SST & = SSE + SSR \\ \end{aligned} $$
(2)

A linear model has the advantage that both the performance and the statistical significance can be easily studied with the coefficient of determination R 2 and the p-value, respectively. R 2 is the result of:

$$ R^{2} = \frac{{{\text{Explained}}\;{\text{variation}}\;{\text{of}}\;{\text{y}}}}{{{\text{Total}}\;{\text{variation}}\;{\text{of}}\;{\text{y}}}} = \frac{SSR}{SST} $$
(3)

R 2 varies from [0; 1], corresponding 1 to the higher correlation. The p-value is a factor that has to be equal or lower than α, which we have defined as 0.05. If p is higher than 0.05, then we cannot reject the null hypothesis, which means that we cannot guarantee that the results were not generated by chance. This, however, is a reflection of statistical confidence and, even choosing a model with parameters higher than α, other studies can be undertaken to see if the model is actually adequate or not. As an example, we can check if the model is bias, i.e. if the model tends to underestimate or overestimate the output, by calculating the median of the generated outputs.

2.2 Fuzzy Set Theory

We find in the literature the application of Fuzzy models in the Energy field for the development of optimization problems and also for the development of energy models. Fuzzy models are adequate when trying to model systems where conventional models are not precise enough, when there is a high degree of uncertainty, or there is a strong non-linear behavior or even when there is time varying characteristics [11]. Fuzzy logic can be described as an approximation of human classification and reasoning, which gives a high interpretability of this type of models [12].

Dounis and Caraiscos [13] undertook a literature review in the energy system analyzes and energy demand modeling. Namely, they have identified the application of hybrid uncertainty models that use hybrid fuzzy stochastic models for regional energy systems planning and management.

Modeling regional energy systems is an important issue especially for designing regional policy. Beyond this, we can find the application of fuzzy models in energy modeling at a lower scale. Zhibin and Jiuping [14] applied fuzzy set theory to deal with uncertainty for the cost optimization in the application of a combined heat and power model. Babuska [12] also found very important applications of fuzzy set theory in modeling energy and comfort in a building environment. Moradi et al. [15] developed interesting applications of building energy modeling through fuzzy set theory. Finally, [12, 15] found that applying fuzzy modeling together with neural networks results in neuro-fuzzy modeling, robust self-learning models could be developed. Overall, fuzzy set theory can be considered important and feasible for the implementation of a more efficient and adequate building energy management system [16, 17].

Fuzzy set theory has been developed in the past years, belonging to a computational philosophy named soft computing, aiming at dealing with complex intelligent systems. On opposition to a crisp data set, a fuzzy set adds a membership degree of a given input to a given set. Taking a simple example of having two glasses of a transparent liquid: on one glass we have a label depicting that the probability of that liquid is deadly poison is 0.1, and on the other glass the label depicts that the liquid has a deadly poison membership of 0.9. While we have 90 % of chances to survive if drinking from the first glass, we know that we will not die if we drink from the second one, since the liquid does not belong 100 % to the deadly poison membership; so we would have a stomach pain but not a deadly one.

From the previous example, we understand that fuzzy modeling applies a degree of gradual transitions between sets, which can be very helpful when we want to design an intelligent decision above predetermined antecedent parameters.

A fuzzy system is processed from the inputs (antecedents) to the outputs (consequents). In order to develop such a system, there is a very important step which is also in common for the development of any model (e.g., linear or neural network): the inputs generation. In fact, the quality of the inputs that we feed the model are crucial for the model to be accurate. From the inner quality of the measured data (e.g. accuracy of the device, data gaps, outliers, etc.) to the clustering and relevance of the inputs; this step involves the comprehension of what we are modeling and if we find the input data relevant for the exercise, and also if the variables have any relation between themselves.

Having a treated input data, we have to determine which set of variables and the universe of discourse that will be used to model the problem. On this issue, we have to define which data will be used for training and which for testing the model. Usually, a 60/40 % or a 50/50 % ratios are used. In this paper, the second one has been chosen.

A further step in building a fuzzy system is the fuzzification, which includes the definition of the membership functions and the fuzzy rules for the rule base. Membership functions are usually built with clustering algorithms. In this paper, we have used fuzzy c-means (FCM) to generate the membership functions used in the inference engine of the fuzzy models. FCM are partition data algorithms forming overlapping sets based on pattern similarities. A generalization of hard c-means is given by the following equations [7, 8].

Given the following data set:

$$ x_{k} = [X_{1k} ,X_{2k} , \ldots ,X_{nk} ]^{T} \in \text{I}\!\!\!\text{R}^{n} ,\quad k = 1, \ldots ,N $$
(4)

The fuzzy partition matrix (having the membership functions for the objects x) and the cluster centers are found.

$$ U = \left[ {\begin{array}{*{20}c} {\mu_{11} } & \cdots & {\mu_{1N} } \\ \cdots & \cdots & \cdots \\ {\mu_{c1} } & \cdots & {\mu_{cN} } \\ \end{array} } \right],\mu_{ij} \in [0,1] $$
(5)
$$ V = V_{1} , \ldots ,V_{c} ,V_{i} \in \text{I}\!\!\!\text{R}^{n} $$
(6)

The process undergoes by repeating the following processes, either by initializing U or V, assuming the partition matrix is fixed:

$$ V_{i} = \frac{{\sum\nolimits_{N}^{k = 1} \mu_{ik}^{m} X_{k} }}{{\sum\nolimits_{k = 1}^{N} \mu_{ik}^{m} }} $$
(7)

Then the distances from the cluster centers are calculated and the partition matrix is updated, assuming that the cluster centers are fixed.

$$ d_{ik}^{2} = (X_{k} - V_{i} )^{T} (X_{k} - V_{i} ) $$
(8)
$$ \mu_{ik} = \frac{1}{{\sum\nolimits_{j = 1}^{C} (\frac{{d_{ik}^{2} }}{{d_{jk}^{2} }})^{1/(m - 1)} }} $$
(9)

The process finalizes when the stopping criteria is satisfied, which can be:

$$ {{\parallel}{\delta U}{\parallel}} < \epsilon $$
(10)

The fuzzy rules for the rule base are fired at this point. By combining the different inputs throughout their membership degree in each adjudicated membership function, the model then applies the inference operators to choose the decisions that compose the outputs. Common inference operators are Kleene-Dienes, Lukasiewicz, Mamdani or Sugeno (or Tagaki-Sugeno). For this paper, Sugeno type inference system was chosen.

While a Mamdani-type fuzzy inference systems (FIS) computes the output consequence with a membership function as the rule strength, followed by a defuzzification process to reach for a membership degree, a Sugeno FIS gives a crisp or a linear equation as an output [7, 8]. The overall output is a weighted average of the individual rule outputs and given by:

$$ \hat{y} = \frac{{\sum\nolimits_{k = 1}^{n} w_{k} y_{k} }}{{\sum\nolimits_{k = 1}^{n} w_{k} }} $$
(11)

The development of a fuzzy model finishes with its fine tuning. If the general output does not have a satisfactory performance, the parameters can be adjusted, such as the number of clusters, a new selection of variables (leaving some aside or including other that were not previously chosen) or training the model varying the number of epochs (iterations).

2.3 Neural Networks

Neural networks (NN) try to apply the human physiological brain reasoning in the development of models. They model the human brain as a continuous-time non-linear dynamic system. With different weights that can be applied to the artificial neurons, adaptive models can be developed [7].

Babuska [12], Moradi et al. [15] have also performed a detailed review on the application of several energy models with artificial networks, with particular relevance for the ability of self-learning that NN can provide. Further, Kalogirou [18] has developed artificial NN to predict energy consumption in a building and they proved faster convergences than simulated dynamic programs.

NN to mimic how the human brain reasons, simulating neurons connected between themselves (through “synapses”), iteratively learning the best combination of weights to be given at each input in order to outcome the most fitted output. NNs are a powerful instrument for dealing with complex systems such as perception, pattern recognition, ability to learn from examples and adaptability and fault tolerance [8].

Given the above features, NN are generally used when input and/or output are multidimensional, when the mathematical structure of the system is unknown and, at the same time, when the interpretability of the model is not required. In fact, NN act as a “black box”, reasoning through several iterations across the nodes (functionally representing neurons in the human brain), giving weights to each variable under a structure that is not understood by the developer [7].

In this paper, we present the application of an adaptive NN with a feedforward architecture, as represented in Fig. 1.

Fig. 1
figure 1

Representation of a feedforward adaptive network with two hidden layers (adapted from [7])

Jang [7] described in a very complete way how a NN is designed. As depicted in Fig. 1, an adaptive network is a structure composed by nodes which are connected by directional vectors, being each node a processing point and the connector the causal relationship between them. The output from each node depends on the input parameters, conferring them, in this way, adaptiveness skills. The way the model output is compared with the real output is called learning rule, which is represented by a mathematical expression. We can define the error measure as the sum of squared errors between the training and the desired for the pth output as:

$$ E_{p} = \sum\limits_{N(L)}^{k = 1} (d_{k} - x_{L,K} )^{2} $$
(12)

where d K is the kth component of the the pth real output and x L,K is the modeled output.

The basic learning rule of adaptive NN is the steepest descent method, which was used by Rumelhart et al. in [9], naming backpropagation learning rule to this procedure.

Figure 1 gives us the understanding that a feedforward backpropagation network has a unidirectional relationship between inputs and outputs, in contrast to the other possible architecture: recurrent NN. Having the configuration of parameters and the learning rule that was chosen, the model will try to minimize the distance between its outputs with the real ones. The modeler will try different configurations of the NN in order to have the most desirable performance, changing the number of hidden layers, the number of nodes (neurons), the learning rule and, as in any other model, changing the input variables.

3 Methodology

Underpinning the general goal of developing the most adequate model that can describe how electricity consumption varies with given inputs in three experimental places, this section describes the methodology undertaken in this work.

3.1 Data Treatment

Three experimental spaces are addressed in this paper: a class amphitheater, a library and a set of offices composed by 11 offices, In a University building in Lisbon, Portugal. The electricity consumption of both has been monitored and so it was possible to gather consumption data for the following periods:

  • Class amphitheater: 25-02-2013 to 20-06-2013

  • Library: 18-03-2013 to 05-09-2013

  • Offices: 26-03-2013 to 04-09-2013

The amphitheater has a fixed class schedule from Mondays to Fridays. The exams period begins in May 25th, which is a period with no classes and therefore with no consumption. With no occupation and null consumption values, a high correlation between occupation and consumption would be achieved with such a data set. In order to eliminate the weekends and holidays effect in the model, in which there is no consumption and so this would bias the model, these days were also eliminated. This space has no direct access to the exterior. It has an exterior wall, two interior walls and an interior wall that points to a common lobby of the building.

No data was eliminated in holidays, weekends and exams periods for the library data set as this space is operating with occupation of students and so there is still variable consumption in those days, although two rooms of the library are closed in those days and also from 18h00 to 09h00 from Mondays to Fridays. However, after the post-exam period, August 2th, the library presented very low consumption values because the whole building was closed for two weeks.

The offices are occupied by research staff, gathering PhD students, administrative staff, teachers, researches and management. With the exception of the administrative staff, the remainders benefit from a certain schedule freedom, which gives a non-routine occupation pattern.

Seven input variables were considered for the development of the models: day type, occupation, day length, average temperature, solar radiation, and HDD/CDD with a fixed temperature at 15 °C.

Day type is a variable that was defined by the authors. This reflects the expected usage intensity of the spaces. Regarding the class amphitheater, this variable is the reflection of the class schedule that is predetermined in before the semesters begin. The days are normalized from [0; 1], respectively from the day with the lowest to the highest number of classes:

$$ \hat{x} = \frac{x - min}{max - min} $$
(13)

where \( {\hat{\text{x}}} \) is the normalized output, x the real output, min the minimum value for the outputs and max the maximum one. All variables were normalized in this way.

The measured consumption data regards the power plugs, illumination and ventilation. Heating and air conditioning for those rooms is provided by a chiller and Air Treatment Units, which were not considered for this study.

3.2 Models

In this paper, we have applied a modeling methodology that comprises the data treatment, parameters definition and fine tuning of the model (corresponding to iteratively change the input parameters of the models).

We have applied multivariate linear regressions, trying all possible combinations of inputs and choosing the model with the highest correlation factors and statistical significance. The fuzzy models are the Sugeno-type, with variation of inputs and fine tuning it by choosing the number of clusters and training the models with different epochs. The NN models have the feedforward backpropagation architecture, varying the number of hidden layers (0–2) and number of neurons in each (1–5). Each model was trained with 50 % of the data set, in an alternated order, and tested with the remaining 50 %. The overall performance of the models was quantified with:

Mean absolute error (MAE) is used to quantify the mean error between the modeled (fi) and real (yi) outputs across all entries of the model (n) given by:

$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} |f_{i} - y_{i} | = \frac{1}{n}\sum\limits_{i = 1}^{n} |e_{i} | $$
(14)

Mean squared error (MSE) is the quadratic loss between the modeled and the real outputs, accounting for the estimator variance and, thus, its bias.

$$ MSE = \frac{1}{n}\sum\limits_{i = 1}^{n} (f_{i} - y_{i} )^{2} $$
(15)

Median gives an extra perception on bias. If different from zero (positive or negative), it depicts that the model is bias (overestimation or underestimation, respectively). Median is any real number that satisfies the following:

$$ P( \le m) \ge \frac{1}{2}\;{\text{and}}\;P(X \ge m) \ge \frac{1}{2} $$
(16)

Absolute error (AE) gives the absolute information that MAE gives, providing the total error across the modeled period.

$$ AE = n*MSE = n\sum\limits_{i = 1}^{n} |f_{i} - y_{i}| $$
(17)

Variance accounted for (VAF) describes the similarity between two data sets (in this case, the output from the model and the real one).

$$ VAF_{i} = (1 - \frac{{var(y_{i} - \hat{y}_{i} )}}{{var(y_{i} )}})*100\;\% $$
(18)

4 Results and Discussion

This section addresses the main results from the application of the different models to the experimental places. Table 1 depicts the input parameters which were used to fine tune the models, thus with highest performances.

Table 1 Models parameters: input variables (Day_t—day type; Occp—Occupation/h; Day_l—Day length [h]; \( {\hat{\text{T}}} \)—Average T [°C]), Eps—epochs in the fuzzy models training, Hddn_lyr—hidden layers in the NN models, and Nrn—nr of neurons in each Hddn_lyr

Consumption in the three spaces is better explained without all inputs, having a performance decrease when adding the remaining inputs. Day type and occupancy are the common inputs, with the exception of linear regression in the classroom since the p-value is above 0.05, thus we cannot reject the null hypothesis. The class amphitheater is an underground space, thus it does not have access to natural light and is well insulated from the external temperature. The library has little use of natural light as well.

Analyzing the linear regression models (Eqs. 1820), we can assess that day type has the highest weight, therefore the most important one, followed by occupation and, for the offices, average temperature and day length.

$$ kWh_{amphitheater} = 0.21 + 0.66Day\_t $$
(19)
$$ kWh_{library} = 0.12 + 0.71Day\_t + 0.12Occp $$
(20)
$$ kWh_{offices} = 0.33 + 0.37Day\_t + 0.23Occp - 0.12Day\_l - 0.18\hat{T} $$
(21)

Training the Sugeno-type fuzzy models in the library and offices improved its performance, as depicted in Table 2, with the highest levels for 5 and 15 epochs, respectively. Finally, the feedforward backpropagation NN models had the highest performances when using two hidden layers, each having 5, 10 and again 5 neurons respectively for each space.

Table 2 Performance indicators for the developed models (MAE, MSE, AE—absolute error, RE—relative error, median, VAF and R2) with respect to the experimental places (amphitheater, library, and offices)

Table 2 depicts the performance results of the best models for each space, outlining the values for models train and test. The results analysis can be undertaken together with Fig. 2, where we can see the consumption profiles generated by the highest performance models and the real consumption profiles for all spaces. Regarding the amphitheater, further training the fuzzy model varying the number of epochs provides a significant over fitting and, therefore, considerable lower performances, even resulting in negative VAFs, which means that no similarity exists between the modeled and the real profiles.

Fig. 2
figure 2

Graphical representation of the modeled and the real electricity consumption for the three experimental spaces, concerning all models

The lowest performances are achieved in the amphitheater consumption models, with higher values in the training of the model for the linear regression and fuzzy models, but with higher performances for the test of the NN model. This may happen due to low relation between inputs and the output (electricity consumption). In fact, the highest consumption types are concerned to illumination, a projector and the lecturer’s laptop, which may vary with the type of class (with different occupations) but occupation by itself has been seen to decrease the performance of the model, which can also be explained by the normal usage of the same illumination intensity regardless the class has 50 or 10 students. Hereupon, a randomness factor plays an important role in consumption behavior.

The highest performance levels are achieved in the library models, overpassing VAFs of 93 % and an R2 of 91.0 % with the feedforward backpropagation NN. Nevertheless, since this is a high intensive consumption space, the minimum kWh/day of error that we could achieve was 12.71 kWh/day.

Regarding the offices, quite acceptable performance values have been reached, overcoming VAFs of 90 % for the training fuzzy model and the NN model, with corresponding R2 of 73.8 and 78.5 %.

All models seem to be just slightly bias, being those corresponding to the offices the lowest bias. Generally, we can argue that linear regression models are the ones which depict the lower performances for each parameter, being the feedforward backpropagation NN models the ones with the highest performance, thus being considered the most adequate to tackle electricity consumption in this experimental space.

The relative errors decrease the most in the NN networks, which can also be related to the total kWh of error that decrease and also to monetary expenditure. Drawing a simple exercise, by considering a 0.10 €/kWh rate, NN models would confer a decrease of 0.03 €/day, 1.40 €/day and 0.65 €/day, respectively for the class amphitheater, library and offices. Arguing that the development of NN models require a higher level of expertise and access to higher level software such as Matlab® (which has been the main software used by the authors), maybe the investment in these modeling capabilities should have a return of investment for an intensive service building. At the end of the year, we can estimate that the total decreased error from linear regressions to NN models would be around 650 kWh only for these three experimental spaces.

This work is a development that has been undertaken from the one presented in [19], where there was not applied the training of the fuzzy models, data was not normalized in [0; 1], the offices had not been considered and the input variable Day type had not been developed, which is in fact the most relevant for this experimental setup. We can see that results highly improved with these experimental steps, leading to models with considerable better performances, although the class amphitheater model is still far from what we desire but, as explained before, this may be related to consumption behavior randomness. With a higher relation between consumption and variables, e.g. illumination and occupancy, this means that the usage of the space is more efficient as the equipment is being used not at maximum intensity but according to the needs.

5 Conclusions and Future Work

This paper presented the research developments on modeling electricity consumption in three experimental spaces in a Portuguese university building. A previous work presented in [19] serves as a preliminary study on how to model electricity consumption in this building. Results considerable improvements from that preliminary approach, mainly due to the consideration of a variable Day type, which is a representation a priori of the expected occupancy of the experimental spaces, mainly taking into consideration the operating schedule of the room and the season (classes, exams or weekends and holidays).

Energy modeling is a bursting research field and several references can be found that outline the performance of different models. However, we could not find a work that compares specifically linear regression, fuzzy and NN models under the same experimental setup. This paper undertook this challenge and we have identified that NN are the models to which better performance values are regarded, reaching VAFs of 69.1, 95.3 and 91.5 %, and R2 of 20.7, 91.0 and 78.5 %, respectively for the amphitheater, library and the offices.

The models that were developed for the amphitheater still lack accuracy and this is explained by consumption behavior randomness. A deeper understanding has to be undertaken and, eventually, implement efficiency measures that encourage users to change their consumption patterns according to the studied variables.

We argue that the investment in modeling capabilities to decrease the modeling error may give a feasible return since the presented results can roughly correspond to an error decrease in kWh between linear regressions and NN models corresponding to 650 €/year solely for the three presented spaces. Having developed more accurate models, we can now study the impact of the implementation of energy efficiency actions that have already been undertaken in these spaces after this experimental procedure.