Abstract
Hospitals are one of the most energy-consuming commercial buildings in many countries as a highly complex organization because of a continuous energy utilization and great variability of usage characteristic. With the development of machine learning techniques, it can offer opportunities for predicting the energy consumptions in hospital. With a case hospital building in Norway, through analyzing the characteristic of this building, this paper focused on the prediction of energy consumption through machine learning methods (ML), based on the historical weather data and monitored energy use data within the last four consecutive years. A deep framework of machine learning was proposed in six steps: including data collecting, preprocessing, splitting, fitting, optimizing and estimating. It results that, in Norwegian hospital, Electricity was the most highly demand in main building by consuming 55% of total energy use, higher than district heating and cooling. By means of optimizing the hyper-parameters, this paper selected the specific parameters of model to predict the electricity with high accuracy. It concludes that Random forest and AdaBoost method were much better than decision tree and bagging, especially in predicting the lower energy consumption.
Supported by the Norwegian University of Science and Technology (NTNU), the St. Olavs Hospital in Norway, the China Scholarship Council (CSC), the National Key R&D Program of China (No. 2018YFD1100704) and the Graduate Scientific Research and Innovation Foundation of Chongqing (No. CYB17006).
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
As the buildings has become one of the largest energy consumers in the world, the energy saving in building is of great significance for the prediction of energy [1]. From the perspective of function, technology, economy, management and procedures, a hospital can be defined as a highly complex organization, where a continuous energy utilization is required, including electricity, heating and cooling [2]. Some researchers highlighted that hospitals are the second most energy-consuming commercial buildings in many countries, after the food service [3].
As for a special requirement with a clean air and infection control, hospital building require a continuous power to support reliable services 24 h per day. At a survey from the European Union, hospital buildings are responsible for 10% of the total energy use, with occupying about 7% of the non-residential buildings in whole EU [4]. According to the detailed running data of a large number of devices in a Norway hospital, Tarald Rohde [5] concluded that the majority of large medical imaging equipment devices are in use only during daytime. In this article, the usage patterns of these equipment showed great variability between day and night, as well as weekday and weekend, which need to be concerned mostly. The same results were indicated in the study of K.B. Lindberg [6] through monitoring over 100 non-residential buildings from all over Norway. It mentioned that hospital building need to estimate separately for working and weekends because of the different heating load pattern.
In the past time, there have been many studies on short-term load forecasting [7], often using statistical models [8], state-space methods [9], fuzzy systems [10] and artificial neural networks (ANN) [11]. A lot of literatures mentioned by K.B. Lindberg [6], calculate the heat load of the building by a simplified electric equivalent or by a detailed building simulation model based on assumptions on the building’s characteristics. However, according to Richalet et al. [12], a methodology for load and energy predictions should be based on measured energy data (i.e. statistical models), because the real behaviors of the building can differ significantly from its design due to various operation of the building’s energy system.
Nowadays as described in the literature of Wenqiang Li et al., with the development of machine learning techniques, more new models will be invented, and the rapid developments in big data can offer opportunities for the effective use of these models for prediction. As a data-driven model, machine learning mainly relies on the operating data of building HVAC systems, and is more common in predicting building energy consumption [13].
Recently, there are a lot of researchers have utilized the machine learning method to predict energy consumption based on weather station data for case building or HVAC system. By using machine learning Liu, J.-Y. et al. build a predict model between indoor temperature and three parameters (outdoor dry-temperature, passenger flow and supply air temperature) in subway station. Through setting different time delays in time-series prediction, it was found that Support vector regression (SVR) obtained better accuracy than Back propagation neural network (BPNN) and Classification and regression tree (CART), which reflect the data mining models had the best prediction accuracy as well as highest efficiency [14]. R. Sendra-Arranz et al. used the Recurrent neural networks (RNN) model by receiving the 6 input variables, including three outdoor weather parameters (the outdoor temperature, the relative humidity and the irradiance) and three indoor variables (the indoor CO2 level, the indoor temperature of the house and the reference temperature set by the user) to predict the forecasted power consumption in a self-sufficient solar house implemented with the ANN Python’s library Pytorch. The model with highest accuracy of the predictions reaches a test Pearson correlation coefficient of 0.797 and normalized root mean square error (NRMSE) of 0.13 [15].
Additionally, because the working time for HVAC system produces a lot of running data, many of literatures have focused on the equipment to predict the energy consumption and even make the work of fault detection, diagnosis, and optimization to improve the efficiency of system based on the mature and rich data acquisition system. A machine learning algorithm applied by Tao Liu et al., namely Deep Deterministic Policy Gradient (DDPG), is firstly used for short-term HVAC system energy consumption prediction through three weather data parameters (Outdoor temperature, relative humidity and wind speed). The results demonstrate that the proposed DDPG based models can achieve better prediction performance than common supervised models like BP Neural Network and Support Vector Machine [16]. In order to obtain a better model, Yao Huang et al. proposed the ensemble learning methods to select 10 original variables for the energy consumption prediction model for residential buildings with GSHP, including system operating parameters and meteorological parameters. Four machine learning methods are contained, which are ELM method, MLR method, XGB method and SVR method respectively. Results showed that the proposed prediction model based on ensemble learning could reduce the MAE of the testing set prediction result, which ranged from 29.1% to 70% [17]. In addition, focused on a fan-coil, Yaser I. Alamin et al. collected a historical dataset from the CIESOL building in Spain, including the impulse air velocity and the current indoor air temperature in one period from 2013.05−2014.04. In order to predict and assess the fan-coil power demand, an ANN model was obtained. The results developed a model called RBFANN which is very simple and the computational resources for its application are tiny and easily available at modern automation systems [18].
Specially, for hospital building, A. Bagnasco et al. proposed a multi-layer perceptron ANN, based on a back propagation training algorithm, to forecast the electrical consumption in Turin, by taking the inputs of loads, type of the day (e.g. weekday/holiday), time of the day and weather data. The good performances achieved (MAPE and PE5% mean value respectively close to 7%, and 60%) in this work suggests that a similar approach could be applied to forecast the energy loads in other building categories (i.e. domestic, industrial). However, this article only predicts the electrical load which has seasonal variation [2].
Recently, the machine learning methods were relatively mature in energy prediction area with more accurate and quick prediction by adjusting parameters using easier features. However, most of literature use deep learning method for example ANN, which is relative complex, and in order to achieve more precisely, it needs to integrate other optimization algorithm, as a result of taking a lot of time. Therefore, this paper focused on the hospital energy consumption prediction to build a simple method based on machine learning method, with the following three issues posed to be solved:
-
What’s the characteristic on hospital energy in Norway?
-
How to predict the energy consumption using the novel machine learning method?
-
Which is the best model among the selected machine learning methods and what’s the best parameters?
2 Method Development
2.1 Data Collection
The data set is included by two parts: the meteorological data and the hourly energy consumption data monitored by energy meters of hospital building. The meteorological data was used as the input parameters to predict the energy consumption, including outdoor temperature, outdoor relative humidity, wind speed, global radiation and longwave radiation. All meteorological data were collected from the Norwegian Centre for Climate Services (NCCS) through a weather station about 4 km away from the hospital. The energy data were downloaded from the hospital energy management system which kept the same interval, and as the output of prediction. All the HVAC devices, for example the pump, ventilation, heat pump et al., have been monitored by installing the electric meter by the hospital technician. Meanwhile, the data were uploaded on the system as an interval of 15 min. This paper downloaded the dataset from 2016.01.01 to 2020.01.01 in St. Olavs hospital in Trondheim, Norway, shown in Table 1.
2.2 Model Development
The method of machine learning used in this paper was decision tree and its optimization algorithm.
-
(1)
Decision Tree
The decision tree methodology is one of the most commonly used data mining methods. It uses a flowchart-like tree structure to segregate a set of data into various predefined classes, thereby providing the description, categorization, and generalization of given datasets. As a logical model, decision tree shows how the value of a target variable can be predicted by using the values of a set of predictor variables. More details about this algorithm can be found in literature [19].
-
(2)
Ensemble algorithms
Ensemble algorithms are aimed to construct a set of classifiers or regressors to build a more robust and higher-performance classifier or regressor. Bagging and boosting are the two main ensemble methods, which can be merged with basic learners [20]. The base models in the article were established by the front Decision tree.
-
a.
Bagging-decision tree
Bagging, namely bootstrap aggregating, was proposed by Breiman to obtain an aggregated predictor. Bagging sequentially combines the weak learners to reduce the prediction errors [21]. What’s more, Bagging can be used with any base classification techniques and votes classifiers generated by different bootstrap samples. The core of the Bagging algorithm is the majority voting over results from a substantial number of bootstrap samples [22].
-
b.
Random Forest
In contrast to the bagging mentioned above, Random Forest (RF) selects only a few features randomly during each training, thereby achieving a lower error level and requiring less running time than when all the features are used [21].
-
c.
AdaBoost
Boosting is one of the most powerful machine learning algorithms developed in the past few years. The idea of the boosting algorithm is that it generates a series of basic learners by re-weighting the samples in the training sets. AdaBoost, with an exponential loss function is the most widely used forms in boosting. More details about this algorithm can be found in literature [23].
2.3 Data Preprocessing
According to the machine learning, data preprocessing is a vital process which may even occupy 80% of whole work described in some articles, especially for the site measurement data for HVAC system [17]. Usually, for the most machine learning method, the extension of data processing decides the accuracy of model’s prediction. There were two steps to preprocess the raw data set.
-
Step1: data cleaning. Through analysis of the data distribution, it processed the missing values, null values and abnormal values which were very general in data collection meters before prediction using pandas, a very useful toolkit in Python.
-
Step2: feature analysis. Since the data set existed a diversity type as list in Table 1, it was necessary to apply the original data in order to train model correctly. For the string such as data and time, this paper converted the time information to the periodic data including “month”, “day”, “hour”. Therefore, it can be manually encoding by using sin/cos function to transform this data to the period between 2*pi and −2*pi. While for “year”, only four category values, it needs to be transformed into the numeric value. Through this way, it can be created a binary column for each category and returns a sparse matrix using the model in scikit-learning, a machine learning package in Python. For the other input parameters in Table 1, it calculated the daily average and daily variance data as new features for five input parameters to analysis the seasonal characteristic more intuitively. In addition, in order to remove the mean and scale to unit variance, each variable was standardized before training [24]. In total, through this preprocessing method, including data standardization, data encoding, the model input has been derived 25 features to achieve a more accuracy prediction. The process of data preprocessing showed in Fig. 1.
2.4 Fit Model and Hyper-Parameters Optimization
Before predict model, the data set was divided randomly into training data and testing data. The first 75% of data was selected as training, while the remaining 25% was used for testing data. After the data division, scikit-learn was utilised to fit the training data. In addition, cross validation and grid search technique were used to select the best combination of hyper-parameters in each predict method [16]. The hyper-parameters to be optimized are detailed in Table 2.
2.5 Evaluation Index of Model Performances
In this paper, the MSE (Mean squared error), MAE (Mean absolute error) and R2 score (coefficient of determination) was applied to assess the performance of the predictive model. The index was formulated as below:
where, \( Y_{i} \),\( P_{i} \),\( \bar{Y} \) represent the measured value, predicted value and average measured value, respectively. In the best prediction, the predicted values exactly match the measured values, which results in R2 = 1.0.
In addition, visualization method based on Python such as boxplot representation were used to compare the energy distribution of the predicted results in different periods.
2.6 The Framework of Machine Learning to Predict
According to the previous description, a six-step framework of the prediction process was created clearly in Fig. 1.
3 Case Study
The University hospital of St. Olav (see Fig. 2) opened on 1 February 2010, and it is integrated with the Norwegian University of Science and Technology. St. Olavs is the local hospital with the number of beds close to 1000 for the population of Trøndelag, Norway. It contains about 20 clinics and departments, including Lab Center (25556 m2), Gastro Center (31500 m2), Acute, Heart and Lung Center (40093 m2), Woman-Child Center (31427 m2), Knowledge Center (17354 m2), Movement Center (19304 m2), with an area of nearly 250000 m2. The hospital has built a very detailed energy management system to collect energy data in order to maintain the HVAC system.
Figure 3 illustrated the compositions of the three kinds of energy demands per m2 in each of the six main hospital buildings during the year of 2019. For the total energy consumption, it showed that Lab Centre and Gastro Centre are much higher than the others, since these two centers were special than other, not only in running time, but also some equipment. It was also found that all of the buildings have a larger energy consumption compared with the demand of Guidance of Hospital in Building Technical Regulations, Norway (TEK17), which set a limited value of 225 kWh/m2 [25]. Therefore, it reflected a large potential of saving energy in hospital building. On the other hand, almost, in Norwegian hospital, electricity was the most highly demand in main building, with 55% rate on average, higher than District heating and cooling. The knowledge center was a relative balanced demand in energy consumption.
4 Results
According to the description in Sect. 2.1, five parameters were contained in data set from 2016 to 2020. This section explored the data set and compared the results between the predicted and actual energy consumption. In order to compare the actual data and predicted data, this paper selected four specific time period in the data set. The counts and information were listed in Table 3.
-
(1)
Compare the Actual Data and Predict Data
Table 4 showed the actual and predicted data in the four different machine learning methods. Firstly, it can be easy to see that the weekday electricity was high obviously than weekend, about 50kWh high in average. This result was decided by the schedule in hospital staff and patient. In Norway, people preferred to enjoy the weekend time, unlike some countries which have heavily work in weekday. The decrease of medical activity brought the decline of Energy-using devices. Similarly, the winter was higher than summer, about 25 kWh in average, which effected by a longer winter period compared with other countries.
Secondly, for different models, decision tree and AdaBoost were much more accuracy, not only in the higher electricity (weekday and winter) and lower electricity (weekend), but also the electricity distribution such as the box plot showed. However, for decision tree, it indicated a worse prediction in outlier data, for example the weekend data. In addition, Bagging method was the bad estimator which make the prediction data higher than the actual data in weekend, while for weekday, the prediction data was lower. On behalf of the traditional predict method, such as the linear regression, it always achieved a simpler and quicker predict in the data trend with a normal distribution in errors between actual and predict data. Indeed, this way show good results when the data presented a regular distribution. However, when the dataset changed more complex for example the time-series dataset. The ML methods made each predicted value as close to the actual value as possible, thus, it can up to a smaller error.
-
(2) Evaluation of Result
This paper also uses three metrics (the MSE, MAE and R2 score) to estimate the predicted results in train data and test data through comparing all of the actual data and predicted data, see Fig. 4. It showed that Random forest and AdaBoost method were much better than decision tree and bagging, especially in test data. Through optimizing the hyper-parameters, it can obtain the best estimator (n_estimators: 90; max_features: 5) for Random forest method, while for AdaBoost method, the value of best estimator was (n_estimators: 80; loss: linear). According to the MSE and MAE, a better predict result occurred in the training dataset, except for the method of Bagging. For regression problems, the averaging method is usually used in the Bagging algorithm, and the regression results obtained by some weak learners are arithmetic averaged to obtain the final model output, which make a close value between training and testing dataset. However, because of a larger variance in whole year produced in electricity data between weekday and weekend, the algorithm of Bagging has a weakness predict results in this sort of dataset.
5 Conclusions
The machine learning method was proposed in this paper to predict the hospital building energy consumption in the cold Nordic climate. Four methods were utilized to predict the electricity consumption. The main conclusions are as follows:
-
(a)
A result can be found that all of the buildings have a larger energy consumption compared with the demand of Guidance of Hospital in Building Technical Regulations, Norway, which set a limited value of 225 kWh/m2. Electricity was the most highly demand in main building in Norwegian hospital, with 55% rate in average, higher than district heating and cooling.
-
(b)
Because of the fluctuation characteristic of energy consumption, the predicted data need to match not only the higher actual data, but also the small data such as night energy consumption. For electricity consumption, it showed that Random forest and AdaBoost method were much better than decision tree and bagging. Decision tree and bagging had a worse prediction in weekend and weekend data.
-
(c)
This paper has selected the best combination of hyper-parameters in each prediction method: for electricity, it recommends to use Random forest (n_estimators: 90; max_features: 5) and AdaBoost (n_estimators: 80; loss: linear).
References
Ürge-Vorsatz, D., Cabeza, L.F., Serrano, S., Barreneche, C., Petrichenko, K.: Heating and cooling energy trends and drivers in buildings. Renew. Sustain. Energy Rev. 41, 85–98 (2015)
Bagnasco, A., Fresi, F., Saviozzi, M., Silvestro, F., Vinci, A.: Electrical consumption forecasting in hospital facilities: an application case. Energy Building 103, 261–270 (2015)
González, A.G., Sanz-Calcedo, J., Salgado, D.: Evaluation of energy consumption in german oshpitals: benchmarking in the public sector. Energies 11, 2279 (2018)
Dobosi, I., Tanasa, C., Kaba, N.-E., Retezan, A., Mihaila, D.: Building energy modelling for the energy performance analysis of a hospital building in various locations. E3S Web of Conferences 111, p. 06073 (2019)
Rohde, T., Martinez, R.: Equipment and energy usage in a large teaching hospital in norway. J. Healthc. Eng. 6, 419–434 (2015)
Lindberg, K., Bakker, S., Sartori, I.: Modelling electric and heat load profiles of non-residential buildings for use in long-term aggregate load forecasts. Utilities Policy 58, 63–88 (2019)
Chen, Y., Luh, P., Rourke, S.: Short-term load forecasting: similar day-based wavelet neural networks. IEEE Trans. Power Syst. 25(1), 322–330 (2008)
Yan, J., Tian, C., Huang, J., Wang, Y.: Load forecasting using twin gaussian process model. In: Proceedings of 2012 IEEE International Conference on Service Operations and Logistics, and Informatics, pp. 36–41. IEEE (2012)
Yanxia, L., Shi, H.-F.: The hourly load forecasting based on linear Gaussian state space model. In: 2012 International Conference on Machine Learning and Cybernetics 2, pp. 741–747. IEEE (2012)
Khosravi, A., Nahavandi, S.: Load forecasting using interval type-2 fuzzy logic systems: optimal type reduction. IEEE Trans. Ind. Inform. 10(2), 1055–1063 (2013)
Jetcheva, J.G., Majidpour, M., Chen, W.-P.: Neural network model ensembles for building-level electricity load forecasts. Energy Build. 84, 214–223 (2014)
Richalet, V., Neirac, F.P., Tellez, F., Marco, J., Bloem, J.J.: HELP (house energy labeling procedure): methodology and present results. Energy Build. 33(3), 229–233 (2001)
Li, W., Gong, G., Fan, H., Peng, P., Chun, L.: Meta-learning strategy based on user preferences and a machine recommendation system for real-time cooling load and COP forecasting. Appl. Energy 270, 115144 (2020)
Liu, J.Y., Chen, H.X., Wang, J.Y., Li, G.N., Shi, S.B.: Time Series Prediction of the Indoor Temperature in the Subway Station Based on Data Mining Techniques. Kung Cheng Je Wu Li Hsueh Pao/Journal of Engineering Thermophysics 39(6), 1316–1321 (2018)
Sendra-Arranz, R., Gutiérrez, A.: A long short-term memory artificial neural network to predict daily HVAC consumption in buildings. Energy Building 216, 109952 (2020)
Liu, T., Xu, C., Guo, Y., Chen, H.: A novel deep reinforcement learning based methodology for short-term HVAC system energy consumption prediction. Int. J. Refrig 107, 39–51 (2019)
Huang, Y., Yuan, Y., Chen, H., Wang, J., Guo, Y., Ahmad, T.: A novel energy demand prediction strategy for residential buildings based on ensemble learning. Energy Procedia 158, 3411–3416 (2019)
Alamin, Y.I., Álvarez, J.D., del Mar Castilla, M., Ruano, A.: An Artificial Neural Network (ANN) model to predict the electric load profile for an HVAC system. IFAC-PapersOnLine 51(10), 26–31 (2018)
Yu, Z., Haghighat, F., Fung, B.C.M., Yoshino, H.: A decision tree method for building energy demand modeling. Energy Buildings 42(10), 1637–1646 (2010)
Gong, B., Ordieres-Meré, J.: Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques: Case study of Hong Kong. Environ. Model Softw. 84, 290–303 (2016)
Breiman, L.: Machine Learning, Volume 45, Number 1 - SpringerLink. Mach. Learn. 45, 5–32 (2001)
Wu, Z., et al.: Using an ensemble machine learning methodology-Bagging to predict occupants’ thermal comfort in buildings. Energy Buildings 173, 117–127 (2018)
Ridgeway, G.: Generalized boosted models: a guide to the GBM package. Comput. 1, 1–12 (2005)
Scikit-learn. https://scikit-learn.org/. Accessed 2020
Directorate, T.B.Q.: https://dibk.no/byggereglene/byggteknisk-forskrift-tek17/10/innledning. Accessed 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xue, K. et al. (2020). A Simple and Novel Method to Predict the Hospital Energy Use Based on Machine Learning: A Case Study in Norway. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)