1 Introduction

According to the International Energy Agency (IEA) [6], in 2017 the building sector represented more than 30% of the total final energy consumption in the world. The [1] report points out that the energy demand for air cooling in buildings has tripled in 1990 and 2016. It is estimated that if no change in the current scenario happens, this demand will triple by the year 2050, representing 37% of the increase in the building’s electricity consumption. The potential increasing demand for artificial cooling in hot climate countries is even more significant. According to [1], among the 2.8 billion people living in the hottest parts of the world, only 8% have an air conditioning system. In Brazil, in 2016, the portion of artificial cooling at peak loads of electricity grids corresponded to 7.6% of the total, and the projection is that this portion will represent 30.8% of the peak load by the year 2050 [1].

To ensure the user’s thermal comfort without significant energy consumption, it is crucial to understand how thermal variations occur in a building. Analysis during the early design stages of a building indicates fundamental decisions for thermal performance. In the early-design phase, the optimization potential is significant, and at this stage, any building’s comfort estimating and energy performance can be reflected in decision-making [3, 13].

The most advanced method of estimating thermal performance in buildings is through computer simulations. It uses models based on physical heat transfer equations, following the principles of energy conservation. However, this process requires the technical knowledge of an expert, as dynamic thermal simulations require detailed models and face several problems, mainly associated with the information needed for input data for the simulated model [4]. In the Brazilian context, the analysis of the building’s thermal performance through computer simulations is a relevant measure, as the lack of accessible data related to energy consumption patterns makes it difficult to conduct analyses from databases [2]. An alternative to overcome these issues is the development of models from computer simulations, the metamodels. Through metamodels, it is possible to obtain energy performance results close to those of complex simulations.

Metamodels for energy efficiency in buildings can be developed using different methods [8]. The most appropriate solution depends on the context and purposes of each application. Versage [16] was able to estimate thermal loads for commercial buildings through different metamodeling methods. Melo et al. [7] developed an Artificial Neural Networks (ANN) model to estimate the cooling degrees hours and heating and cooling thermal loads in residential buildings. The development of a Support Vector Machine (SVM) metamodel capable of estimating thermal comfort in commercial buildings was proposed by [12].

Energy consumption for cooling in buildings is expressive worldwide, and the expectation is that the energy demand will continue to grow in the coming decades, especially in countries with hot climates. In this context, the use of constructive techniques that minimize the need for air conditioning, through specific building components or geometry design, are presented as a solution to mitigate energy consumption. However, the thermal performance of buildings present complex thermophysical phenomena, making thermal comfort estimates to be considered preferentially from the early-design phases. Seeking a tool capable of helping the designers quickly and simply, the possibility of using metamodels arises as a good option.

Therefore, this work presents a comparison between two machine learning approaches for the development of a metamodel capable of estimating the thermal load in single-family buildings.

2 Review

Designers find difficulties in using building performance simulation tools, which may not be compatible with their needs and working methods. Picco et al. [10] proposes to simplify the description of the building and convert a detailed model into a simplified model, with only a limited number of entries. Despite margins of error of around 15% in estimating annual heating and cooling thermal loads, it is observed that model simplifications can help in early design stages when certain characteristics in the building design are not well-defined. In addition to simplifications of models based on physical equations, it is possible to develop models based on statistical functions, which deduce these behaviors. Statistical models work only with inputs and outputs, without correlating cause and effect, although they are more agile. To adapt to the main functionalities of both models, metamodels are introduced.

Automatic learning-based metamodels are mathematical functions that, applied to a significant amount of data, can identify hidden patterns and predict future functionality. According to [18], the most used machine learning methods for predicting the building’s energy performance are ANN and SVM. These models effectively solve nonlinear problems and provide highly accurate predictions as long as the defined model definitions and established parameters are properly defined. ANN models have been used to analyze various types of building energy consumption under various conditions, such as heating and cooling loads, electricity consumption, sub-level component operation and optimization, and estimation of usage parameters. The use of SVM has been growing in research and industry. In many cases, the SVM are superior in performances compared to the ANN, even with a small amount of data training.

Yigit [17] developed a metamodel using Gradient Boosting Machine (GBM), through the Sklearn library, to estimate cooling and heating loads in residential buildings in Turkey. According to the author, GBM models are based on decision tree models and have already demonstrated high performance and flexibility in several areas. These models can operate with a database of several variables, discrete and/or continuous, without the need for data pre-processing. The GBM method enables robust metamodels regardless of the number of parameters or type of parameters. Including parameters with low influence on the dependent variables does not compromise the model’s accuracy either. The importance of choosing hyperparameters to ensure better performance of the models is highlighted. The metamodel developed in the work of [17] was applied in an optimization algorithm to find the most appropriate types of envelopes to minimize energy consumption. The results demonstrate that the use of the optimization system could bring design solutions with greater energy efficiency and low cost. Additionally, the increased cost of building envelope can reduce energy consumption by up to 10%. However, the author comments that the implementation of GBM in energy optimization buildings problems is still limited.

Versage [16] developed a metamodel to estimate the annual integrated cooling load to evaluate the energy performance of artificially conditioned buildings through the individual performance of their thermal zones. A database of approximately 1.29 million simulated cases was developed, with varied building parameters, for the climate of the Brazilian city Florianópolis. A data sample was adopted for the elaboration of metamodels with the techniques of multiple linear regression, multivariate adaptive regression splines, Gaussian process, SVM, random forest, and ANN. To evaluate and compare the metamodels, four performance indicators were chosen: training time, coefficient of determination (R2), root-mean-square deviation (RMSE) and Normalized Root Mean Square Error (NRMSE). The ANN metamodel presented the best performance among those tested. The ANN trained with 1% of the database cases and 72 nodes in the inner layer presented the best overall performance. It was able to reproduce results with errors smaller than 10% for 99.2% of the cases. On the other hand, the metamodel built from SVM had the worst performance. However, the author highlights that other configurations and data processing could change the performance of the evaluated metamodels.

According to the literature, it is observed that different machine learning methods can be used to develop metamodels related to energy efficiency in buildings. There is no rule for choosing the method, and the comparison between the performance of metamodels developed by different machine learning approaches may depend on the configuration of the hyperparameters. Furthermore, they may also depend on specific characteristics of the database used, such as the input and output parameters considered. Therefore, it may be pertinent to consider using more than one type of machine learning technique to search for the most suitable method.

3 Method

This work used a database composed of buildings thermal analyses to develop metamodels using two approaches: ANN e GBM. We compared the performance of the metamodels concerning both approaches and could observe specific issues in the development of each of them. The model training was conducted using 19 parallel processes with a 3.30 GHz Intel Xeon processor. The code used to develop the metamodel is available at the author's \texttt{github} profile.Footnote 1

3.1 The Database

The database (link)Footnote 2 used for this work is composed of simulations from the computer program EnergyPlus [5]. The simulation output data is the thermal load necessary to maintain the air temperature in the occupied thermal zones of the building between 21C and 23C. The simulations were based on a model (Fig. 1) of a single-family building of social housing [15], which had several building characteristics related to the geometry of the envelope and the windows, as well as constructive components varied.

Fig. 1
figure 1

Simulations’ base model. Adapted from [15]

Table 1 presents the parameters of quantitative values, with the maximum and minimum values considered in the sampling. Table 2 presents the variations of the qualitative parameters. Although qualitative, the variables are represented in the database by integer numbers, ranging from 1 to 6 for the wall, 1 to 4 for the roof, and 0 or 1 for blinds and geometry mirroring.

Table 1 Quantitative parameters considered in the database
Table 2 Database qualitative parameters

The 46 696 cases in the database were sampled using Sobol’s method [14], a quasi-random sampling method that ensures better distribution of cases in hyperspace. The input variables distributions were uniform. Figure 2 shows the independent variable’s occurrence distribution as well as the dependent variable. The only input parameter that does not have a uniform distribution is the Mean Annual Temperature. Although the average temperature has been uniformly sampled for values between 10.8C and 28.2C, from the database of weather files in Brazil, there are no cities with average annual temperatures between 10.8C and 13.6C. There are only three cities with average annual temperatures between 13.6C and 15.3C. As a result, the sampled cases with temperatures in these ranges were simulated using weather files with mean annual temperatures closer to the sampled value. The output parameter, air conditioning thermal load, had a non-uniform distribution, which varied between 0 and 235kWh/m2year, with an average equal to 53kWh/m2year and median equal to 43kWh/m2year.

Fig. 2
figure 2figure 2

Input Occurrence distributions parameters

3.2 ANN Development

The ANN metamodel was developed using the Sklearn library [9], available for [11]. Quantitative variables were standardized by calculating the z-score, according to Eq. 1, so that all parameters had the same order of magnitude.

$$Z = \frac{x-\mu}{\sigma}$$
(1)

Where Z is equal to the new value considered for the variable x; μ is the mean of the values of the considered variable; and σ is the standard deviation value of the considered variable. The qualitative variables were transformed into dummy variables, so each column of qualitative value parameters in the dataframe was replaced by new columns. These new columns represent the different values considered for the qualitative values parameter. Thus, for each line of the dataframe, the value 1 is assigned to the column referring to the value considered for the qualitative parameter in that line. On the other hand, the value 0 is assigned to the other columns, referring to the other possible values for that parameter. The number of new columns introduced to replace the original parameter column depends on the number of unique values. The number of columns entered is equal to the number of unique parameter values minus 1. The approach to indicate that a case has the value related to the variable without the corresponding column considers the value 0 in all other columns referring to that parameter. It is also essential to observe the hyperparameters (parameters related to the machine learning process) chosen in developing the metamodel, which influence the results’ accuracy. Therefore, we conducted a grid search to find the best combination of hyperparameters.

The varied hyperparameters were:

  • hidden_layer_sizes: the number of hidden layers and number of nodes in those layers;

  • activation: the activation function used;

  • batch_size: the number of cases used in each iteration in the stochastic optimization process;

  • learning_rate_init: the learning rate;

  • tol: minimum value of reduction of the loss function over a specified number of iterations (for this study, the specified number of iterations was equal to 10).

The values applied in grid to find the optimal combination of hyperparameters are presented in Table 3

Table 3 Hyperparameters considered in the search for the ANN grid

Before the metamodel training step, the sample was divided into a data frame for training, with 90% of cases, and a data frame for testing, with the remaining 10% of cases. The performance indicator used to compare the different models generated by the grid was the mean of the cross-validation score. We chose the ANN model with the best score in the training stage to have its performance analyzed with the training and test samples. The accuracy indicators of the final metamodel were: the R2, the RMSE, the mean absolute error (MAE) and the 95th percentile of the absolute error (AE95).

3.3 GBM Development

The GBM metamodel was also developed using the Sklearn library [9], available for [11]. As this approach does not require data pre-processing, there was no standardization or transformation of quantitative variables, and qualitative variables were not transformed into dummy variables. Hyperparameters were also defined from a search for a grid. The varied hyperparameters were:

  • n_estimators: the number of estimators, or trees;

  • max_depth: the maximum number of nodes in each tree;

  • min_samples_split: the minimum number of cases required in a leave of a node;

  • loss: the loss function. ls refers to the least squares regression, lad refers to the minimum absolute deviation function, and huber refers to a method that combines the previous two;

  • learning_rate: the learning rate.

The values applied in the grid to find the optimal combination of hyperparameters for GBM are presented in Table 4.

Table 4 Hyperparameters considered in the GBM search for the grid

The sample was also divided into a data frame for training, with 90% of cases, and a data frame for testing, with the remaining 10% of cases. As for the ANN, the performance indicator used to compare the different models generated by the grid was the mean of the cross-validation score. The accuracy indicators of the final metamodel were the R2, the RMSE, the MAE and the AE95.

4 Results

4.1 ANN Results

Figure 3 shows the input data distribution after applying the z-score standardization. It is possible to observe that the quantitative input parameters started to present values of the same order of magnitude, with an mean equal to zero.

Fig. 3
figure 3

Standardized input parameters distributions occurrence

The parameters Wall, Roof, Blinds and Mirrored geometry were transformed into dummy variables. Therefore, the parameter related to wall components was replaced by five columns. On the other hand, the parameter related to roof components was replaced by three columns. Finally, the parameters related to the blinds and the mirror geometry were kept with one column. With the combination of the grid hyperparameters, 162 models were trained. The total training time was 2 hours and 17 minutes. The chosen model obtained a value of 0.943 for the mean of the cross-validation score and had the training process interrupted after 310 iterations. The hyperparameters of the chosen model are presented in Table 5.

Table 5 Final ANN template hyperparameters

Figure 4 presents the scatter plot comparing the predicted cases with the simulated cases for the training and test samples. The dotted line corresponds to the line y = x, so the accuracy of the results is related to the proximity of the points to the line. Similar trends can be observed in both graphs, but the test sample presents a proportionally more significant number of points away from the dotted line.

Fig. 4
figure 4

Comparison between predictions and simulated values for (a) training and (b) testing samples from ANN

The performance indicators of the metamodel are presented in Table 6.

Table 6 Performance indicators of the final ANN model for training and testing samples

4.2 GBM Results

With the combination of the grid hyperparameters, 243 models were trained by the GBM. The total training time was 17 hours and 1 minute. The chosen model had the value of 0.991 for the mean of the cross-validation score. The hyperparameters of the chosen model are presented in Table 7.

Table 7 Hyperparameters of the final GBM model

Figure 5 presents the scatter plot comparing the predicted cases with the simulated cases for the training and test samples.

Fig. 5
figure 5

Comparison between predictions and simulated values for the GBM (a) training and (b) test samples

The performance indicators of the final metamodel are presented in Table 8.

Table 8 Performance indicators of the final GBM model for the training and test samples

5 Discussion

Despite significant differences between the performance indicators of the evaluated metamodels, both machine learning approaches have characteristics that can be considered adequate. The choice of the approach depends on how the metamodel development proposal is determined. The first difference that stands out is the need for pre-processing the input data for ANN while for GBM there is no such need. Defining how the pre-processing of each input variable of the metamodel could take a significant amount of time, as the ideal processing method varies according to the variable type. In this work, we used only standardization by z-score for quantitative variables and transforming qualitative variables into dummy variables. However, other types of data pre-processing can be applied to different types of variables.

Another aspect that may be relevant is the training time of the metamodels. The ANN required significantly less training time (2 hours and 17 minutes) compared to GBM (17 hours and 1 minute).

Considering that we conducted the training of the metamodels on a machine with above-average processing and parallelization capacity, the need for agility in the development of the metamodel could make the GBM application unfeasible, as the training can take more than 12 hours to complete. This fact indicates a trend that ANN can be a more coherent metamodel to use in a professional routine since GBM training takes more than 12 hours to complete, and the difference between indicators’ results is short.

Regarding the metamodels’ performance, the GBM presented indicators considerably better than those of the ANN. Both presented acceptable R2 values, both for the training sample and for the test sample. The difference between the scatter plots of Figs. 4 and 5 shows how the GBM metamodel has better accuracy, with points that are closer to the dotted line, along the entire range of thermal load values considered. In the case of ANN, the model still seems to have a bias in the ranges of higher thermal load values, causing the metamodel to underestimate the output value in the predictions. Despite the differences, in both metamodels the values of MAE and RMSE have a good order of magnitude in relation to the estimated output parameter. The AE95 also remains in the same order of magnitude as the MAE and RMSE, which indicates that the accuracy of the metamodel is not significantly compromised for at least 95% of the cases in the test sample.

Both analyzed metamodels present data overfitting, as the indicators presented lower accuracy values when measured from the test sample concerning the training sample. The reasons for overfitting differ between machine learning methods. In the case of ANN, a number of nodes greater than necessary in the hidden layers can be the cause, as well as the variable tol, which interferes in the number of iterations that occur in the training process. In the case of GBM, the increase in the number of estimators can cause the overfit phenomenon. After a certain number of estimators, there is little improvement in the model’s performance during the training process, at which point it is possible for overfitting to occur. Since the criterion for choosing the best model within the grid does not consider the test sample, there may be models with the mean value of the cross-validation score close to those of the chosen models that could present superior performance for the test sample. The differences between the indicators obtained by the training and test samples were proportionally more significant in GBM. We observed that their values were greater than twice the values of the training sample for the RMSE, MAE, and AE95. However, the performance of GBM was significantly superior compared to ANN, since the performance indicators of the test sample obtained from it were significantly higher in accuracy than those of the test sample using ANN.

6 Conclusion

The development of metamodels can help building design, facilitating the estimation of thermal load demand. The metamodels presented in this work were developed employing the GBM and ANN methods, implemented with the Python Sklearn library.

The two machine learning approaches presented in this work have specific characteristics that may suit the application context. The ANN requires data pre-processing and can make its application more complex compared to GBM, which does not require this pre-processing. On the other hand, training GBM models are significantly more computationally expensive than training ANN models.

The use of a grid allowed finding the best combination of hyperparameters for the development of metamodels. However, this method defines the final model without considering the test sample, resulting in overfitting the data. A more rigorous adjustment of the hyperparameters that influence overfitting could be explored in future works, guaranteeing more robust metamodels for unseen cases.

According to the performance indicators evaluated (R2, RMSE, MAE and AE95) the metamodels were able to adequately estimate the thermal loads. However, the GBM metamodel showed considerably higher indicators than the ANN metamodel, both for the training sample and for the test sample. Therefore, the GBM metamodel was the most suitable for the application proposed in this work.

Although the performance of the GBM model got higher performance indicators compared to the ANN model in this study, other configurations of the hyperparameters and pre-processing of the input and output data could result in a better performance for de ANN model. Since there are no definitive answers to the best configuration for these machine learning methods, the results are limited to the optimization methods applied and the computational power available.