INTRODUCTION

In the context of sustainable development and low carbon economy advocated all over the world, the research and application of energy-saving and low carbon emission has become a topic of concern to scholars in the industry. The Paris Agreement, signed by 196 parties on December 12, 2015, has made reducing carbon emissions a global development goal. The long-term goal of the Paris Agreement is to keep the rise in the global average temperature below 2°С above pre-industrial times and to try to limit the rise to 1.5°С. On 28 November 2018, the European Commission published a long-term vision that aimed to achieve carbon neutral by 2050, with net carbon emissions coming down to zero. The Nordic countries of Finland, Sweden, Norway, Denmark and Iceland signed a joint statement on climate change in Helsinki, Finland in November 2019. In the statement, the five countries said they would work together to step up efforts to combat climate change and achieve carbon neutral goals faster than other countries. Even in cold regions such as Europe, space heating accounts for more than 80% of heating and cooling consumption [1]. On 22 September 2020, the Chinese government proposed at the 75th session of the United Nations General Assembly that China would strive to peak its carbon dioxide emissions by 2030, and strive to achieve carbon neutrality by 2060.

This paper systematically summarizes and analyzes the research status of heating load forecasting, and provides suggestions and ideas for reducing building energy consumption and carbon dioxide emission.

DISTRICT HEATING LOAD FORECASTING

With the improvement of living standard and the development of technology, people’s demand for energy is increasing. In 2013, the construction sector consumed 35 percent of the world’s final energy use and accounted for about 17 percent of total direct energy-related carbon dioxide emissions from final energy consumers [2]. In the European Union, district heating covered about 13% of household heating needs in 2010. It is estimated that by 2050, this parameter could increase to 50% [3]. In Europe in 2020, for the operation of buildings accounted for 40% of energy consumption and 36% of COx emissions, and building heating systems ‒ for approximately 23% of total primary energy use [4].

In China, heating facilities will be set up in areas where the annual average daily temperature is less than or equal to 5°C for more than 90 days. Therefore, China’s central heating provinces are Heilongjiang, Jilin, Liaoning, Beijing, Hebei, Shanxi, Shandong, Inner Mongolia, Gansu, Qinghai, Ningxia, Xinjiang. By 2019, China’s heating pipeline length reached 392 917 kilometers, heating area is 925 137 million square meters, heating area has a population of 39 623 million. The design steam heating capacity of 100 943 ton/hour, hot water heating capacity of 550 530 ton/hour, The actual total heat supply is 65 067 ton/hour for steam heating and 327 475 ton/hour for hot water heating [5], with huge energy saving potential for heating sctor.

Heating Load Forecasting Objectives

Heating load forecasting is to predict the heat demand in advance by the heating system at a certain time or a certain period of time in the future during the operation or design stage of the heating system, so as to achieve the purpose of energy saving. Therefore, it is of great significance to improve energy utilization efficiency and reduce carbon emissions of district heating based on heating load forecasting technology.

Danish scholars divide the development process of district heating into four phases from the point of view of heating technology [6] (Table 1). The biggest characteristic of the 4th generation of district heating is smart thermal grids, that is, the operation management and control strategies of each part of the district heating system can be reasonably formulated through in-depth excavation and analysis of the data and information collected from the monitoring of the district heating system. The management of district heating system is the key to realize efficient energy utilization, and the intelligent management of district heating system depends on the detailed understanding of heat demand of various levels, such as heat users, heat stations and heat and power plants. Based on the heating load prediction, the scale design of pipe network, capacity distribution of heat source and equipment layout of pipe network can be realized in the district heating planning period. During the operation of heating pipe network, reasonable and accurate heat production planning can be made through accurate heating load prediction, so as to ensure the balance of supply and demand at the end-users and the optimal operation control strategy of district heating system. Therefore, heating load forecasting is the first and most important step to realize intelligent heating. Based on heating load forecasting, heat network is regulated to improve the quality of district heating and reduce energy consumption.

Table 1.   Periods of development of district heating [6]

There certainly exists a certain discrepancy between the predicted value of heating load and the actual value, but it can be accepted as long as the total heating amount within a period of time is equal to more than the heat demand, which makes use of the thermal inertia of the building. Therefore, how to reduce this error and make the prediction result within the acceptable range has become the research goal of many scholars.

Data Preprocessing

The district heating system is characterized by long time delay, high complexity, non-linearity and uncertainty. In order to realize accurate operation and distribution according to demand, accurate heating load prediction is very important. Heating load forecasting is based on a huge amount of data related to heating, and the data quality of heat network is directly related to the forecasting results. However, in practical engineering, often because of the sensor fault, transmission data error, pipe network leakage and other factors, the data collected will appear abnormal phenomena. The work [7] showed that factors such as data discontinuity and outliers would directly affect the prediction accuracy of the model. Therefore, the preprocessing of the collected data is often the first step to establish the prediction model.

In the data preprocessing stage, abnormal data will be identified by technical means first, and the identified data will be divided according to abnormal types. Different methods will be used for correction according to different abnormal types of data, so that the processed data will be as close to the real value as possible.

The authors of the [8] proposed the three-step data correction method of anomaly detection, anomaly classification and anomaly replacement. In [9] eliminated outlier data and corrected outliers by calculating the deviation rate of data with mathematical statistics. The authors of the [10] used average value method and interpolation algorithm to complete and correct abnormal data. It was shown in [11] introduced Kalman filter to identify and remove outliers The authors of the [12] used knowledge weighted moving average to fill in missing data from surrounding values.

Prediction Methods

Heating load forecasting can be divided into long-term heat load forecasting, medium-term heat load forecasting, short-term heat load forecasting and ultra-short-term heat load forecasting. Ultra-short-term forecasting mainly predicts the heat load in one hour or less. Short-term forecasting refers to the prediction of the change of heating load in the next day. Both of them aim to achieve the balance of heat supply and demand and heat supply control. The period of medium-term load forecast is 3‒7 days, and long-term load forecast refers to annual load forecast, both of which are mainly to provide a basis for the formulation of production planning.

There are many kinds of algorithms for building load prediction model, but not all of them are suitable for heating load prediction. For example, regression tree algorithm has been proved not suitable for building heating load prediction model [13]. At present, the common heating load prediction methods include time series method, parameter regression method, grey system method, expert system method, artificial neural network, support vector machine, and their advantages and disadvantages are shown in Table 2.

Table 2.   Common heating load forecasting methods

Along with the methods mentioned in Table 2, there are other methods that also have some defects. For example, ANFIS (adaptive network-based fuzzy inference system) can only accurately predict the heating load in the range of one hour [14], and the structure of the hidden layer of the random weight neural network is difficult to determine [15]. Even the SVM and DNN (support vector method and deep neural networks), which are generally considered to be the best performing single algorithm, are still unable to avoid such problems. Similarly, in the supervised learning algorithm of machine learning, the goal is to learn a model that is stable and performs well in all aspects. But the actual situation is often not very ideal, sometimes we can only get a number of preferred models, which is in some aspects of the performance of the weak supervised model. Therefore, in order to make up for these defects and further improve the performance of the prediction model, methods such as combinational algorithm prediction [9, 16‒20] and ensemble learning [21‒24] in the field of machine learning have been gradually applied to the field of heating load prediction. The research results show that these methods can effectively improve the performance of the heating load prediction model to a certain extent, and make the prediction accuracy reach more than 80%.

Input Eigenvalue Selection

There are many factors affecting heating load, including not only outdoor temperature, wind speed, solar radiation, primary supply temperature and secondary return temperature and other external factors of the building, but also the type of building, building orientation, indoor personnel behavior and other internal factors are related.

It is very difficult and impossible to make statistical analysis for each parameter combination to consider the influence of all factors on the heating load. And considering the influence of excessive factor would increase the dimension of input variables, exert negative effect on the interpretability and predictive ability of the model. In addition, it not only reduces the generalization ability of the model, but also does not greatly improve the accuracy of the model. For example, adding solar radiation and wind speed to the prediction model does not significantly improve the accuracy [8]. Therefore, when building the heating load prediction model, the critical step is to find the most influential variables, to identify and eliminate the multicollinearityFootnote 1 between the input variables. To save computing, communication, time, and the demand of data acquisition, we need to abandon the predictors of redundancy and no information while did not reduce the prediction performance and reduce the dimension. This is also known as the dimensional reduction in data reduction.

Dimension reduction, also known as feature selection, can be divided into three categories according to the evaluation criteria of feature selection: filter, wrapper and embedded. The filtering methods sort the input variables according to correlation or mutual information criteria and select them according to the highest order. The wrapper methods determine and evaluate a subset of input variables based on their precision against a given output variable. The embedded methods are similar to the wrapper methods in that they evaluate the strengths and weakness of different sets of input variables in the same way. But this evaluation selection occurs directly during the training process, thus avoiding multiple training for each candidate subset.

As can be seen from Table 3, the current model input eigenvalues mainly include outdoor temperature and historical operation parameters of pipe network. However, according to the actual engineering conditions, such as the difficulty and types of data collection as well as prediction period. Researchers usually add other characteristic parameters such as social factors and building types to improve the accuracy and generalization ability of the prediction model. For example, authors [34] formulates two different strategies to select input eigenvalues for the purposes of reduction and performance optimization. Authors [28] shows that the water supply temperature is the biggest influencing factor in the ultra-short-term load prediction, but the flow is the biggest influencing factor in the short-term load prediction, and the outdoor temperature is the biggest influencing factor in the medium-term prediction. Therefore, how to select the input eigenvalue of the heating load prediction model is a problem with complex correlation and strong coupling, which needs to be considered comprehensively from the complexity of the model, the prediction period, the actual engineering situation, the prediction purpose and other aspects.

Table 3.   Selecting input eigenvalue using feature selection methods

SCHEME RESEARCH

After a lot of researching literature and actual engineering investigation, there are still many difficulties in the practical application of heating load forecasting. This paper mainly puts forward proposals and suggestions for the two directions of the imperfect database and the disunity of the establishment and partitioning standard of the data set.

Complete Database

Heating load forecasting is based on a lot of real and effective data. However, according to the current research results, most of the district heating databases can not meet the data demand of heating load forecasting, which present some problems such as small data volume, insufficient data types and excessive data outliers. This will lead to the phenomenon of over-fitting and under-fitting of the prediction model, and the prediction accuracy and generalization ability can not reach the expected effect. Even though some scholars have proposed to apply virtual samples to the field of heating load prediction [34, 35]. A new error will be added, namely sample simulation error, compared with the real data. Therefore, the establishment of a high quality heating load forecasting database is the premise and necessary basis of applying heating load forecasting to practical engineering.

There are various reasons for the above problems in the database, such as the restriction of measuring instrument range [36] and the inconsistent data collection frequency. And most of the heat supply companies exist incomplete data phenomenon. It means only the data of the pipe network itself is monitored and collected, there is not much record for the influencing factors of heating load outside the pipe network, such as meteorological parameters and types of heat users. In addition, data anomalies caused by factors such as heating interruptions, pipe network maintenance and pipe network failures are not recorded in the database.

We think that it is necessary to enrich and perfect the heat supply load forecasting database, which should include the heat supply basic information database, the heat supply operation information database and the heat users database. The heating basic information database contains the design parameter information of heating facilities such as heat source, heating point and heat user. In addition to the operation data of heat source, pipe network, heat stations and heat users, the heat supply operation information database should also include outdoor meteorological parameters, pipe network fault records and pipe network regulation and maintenance records to ensure the accuracy of subsequent data preprocessing. The heat user database should include building types, envelope parameters, personnel movement rules, and thermal comfort requirements.

According to different forecasting purposes, the databases of heating load forecasting demand also have different difficulties. At the design stage, the heating load prediction required data are heating area, supply and return water design temperature, water mass flow, heating range, pipe network layout. In the operation management stage, the heating load prediction database should include the primary water supply and return water temperature, the primary water supply and return water pressure, the primary water supply and return water mass flow, the historical heating load, indoor and outdoor temperature, and the heating demand of heat users. On this basis, influence factors such as secondary water supply and return temperature, secondary water supply and return water pressure, secondary water supply and return water mass flow, outdoor meteorological parameters, building types, envelope parameters, heat users comfort requirements and so on should be added to the database as far as possible.

Suggestions of the Establishment and Partitioning Standard of the Data SetFootnote 2

The establishment and partitioning of data sets has the most direct impact on the development and evaluation of models [12]. The small training sample of the model will result in overfitting, the large sample size will result in low training efficiency, and the size of the test set determines the credibility of the accuracy of the model. However, most of the studies have described how to divide the data into training set and test set as well as the time span of these data sets, but there is not much attention on how to divide the optimal data set, and there is also no consensus in this area. This also makes the size and span of the divided training set and test set vary from days [37, 38], months [39] to years [34, 40]. At present, data sets are divided mainly according to the advice given in the computer field, that is, 80% of data samples are used for training and 20% of data samples are used as test sets. But, some researchers put forward different schemes, such as selecting 70 and 30% data of each month as training set and test set respectively [18], maintaining the same data distribution of training set (50 and 50%) and test set and so on.

We suggest that we should consider this problem from the three angles of validity, timeliness and regularity. If the data set itself has a large number of outliers and vacancy values, which is also common in practical engineering, the prediction effect is bound to fail to reach the expected effect. Therefore, it is an essential step to preprocess the data before establishing the prediction model. In addition, the timeliness of data is also a factor to consider. Most district heating systems are not immutable, and the transformation or expansion of the heating network has an important impact on the heating data. Therefore, the data before and after these changes have to be separated. In addition, here are two other things to consider: Is it necessary for years of data to be part of the data set? Would the size of the data set remain the same or be updated? And the heating load data itself presents diurnal cycle changes and seasonality, so it is also a direction we need to consider to divide and establish data sets according to the regularity presented by the data itself.

ENGINEERING APPLICATION

According to the current research status and application difficulties of heating load prediction, this paper is based on a district heating system in Kaifeng and improve the heating load prediction by using the above proposed scheme. The district heating system has a total of 247 heating points (HP). The heating system keeps the space heated 24 hours a day, with the heating period for 4 months. The pipeline distribution is shown in Fig. 1.

Fig. 1.
figure 1

District heating system pipeline distribution diagram.

In the previous study [41], we designed a heating load prediction method based on PSO-LSSVM (particle swarm optimization‒least square support vector machine). The population size was 30, the maximum number of iterations was 2000, the regularization coefficient was 176.98, and the kernel width coefficient was 31.82. Input parameters are outdoor temperature, wind speed, primary water supply and return temperature of the previous day, primary water supply and return pressure, flow, heating load of the first three days, and output parameters are heating load.

We applied this model to the district heating system in the heating season of 2020, but the prediction feedback results of the first month showed that there was a certain gap between the predictive value and the actual value (Fig. 2a). Through research and investigation, it is found that in the data preprocessing stage, the modified data can not reflect the actual operation of the pipe network through simple mathematical statistical means. Moreover, the district heating system has been expanded and refitted for many times, and the data storage capacity of each heating area is different in different levels. Especially in the recent expansion, a new pressure isolation station was built, which replaced the former boiler station, resulting in great fluctuation of heating data.

Fig. 2.
figure 2

One day heating load forecast at the Kaifeng HP, calculated using the original (a) and improved (b) forecasting model. (1) actual data; (2) forecast.

Therefore, we re-established the heating load prediction database and preprocessed the data according to the abnormal data caused by the actual situation such as heating interruption, instrument failure, pipe network maintenance, pipe network transformation, hydraulic failure, pipe network failure and so on, replacing the previous simple mathematical statistical method to fill in and correct the abnormal data. The data sets were divided according to the principle of similar days. The re-established prediction model was applied to the project in the third month of the heating season in 2020, and the prediction effect was better than that in the first month (Fig. 2b).

CONCLUSIONS

(1) Heating load forecasting is the base of the 4th generation district heating, and it is also an important means to achieve energy saving and emission reduction.

(2) The quality of data sets has an important influence on the prediction accuracy and generalization ability of heating load forecasting model. Data preprocessing is the premise of heating load forecasting. Heating load prediction model built by single algorithm can not meet the needs of actual engineering. It is a trend to improve the prediction accuracy and generalization performance of the model by means of combination algorithm and ensemble learning.

(3) There is no consensus among researchers on whether to choose the modeling method most suitable for heating load prediction, or to select a set of appropriate input parameters, the establishment and division of data sets, so as to achieve high level prediction results. This is because the superiority of a model in heating load prediction can not be simply evaluated, and there are many factors that affect the performance of the model, so it can only be simply analyzed and compared from a certain point of view.

(4) In order to further promote the practical application of heating load prediction technology, the author thinks that it can be further studied from the following two directions:

• improve the heating load forecasting database. Heating load prediction is based on a large number of real and effective data. At present, the database established by most heating companies has some problems, such as small amount of data, insufficient data types and excessive data outliers. Therefore, We suggest to enrich and perfect the heating load forecasting database, which should include the heating basic information database, the heating operation information database and the heat users database.

• the data set of heating load prediction can be established and divided from the perspectives of timeliness, validity and regularity. The establishment and division of the data set involves the accuracy and generalization performance of the prediction model. However, no one has given a clear standard in the field of heating at the present stage.