Keywords

1 Introduction

Accurate prediction of the short-term heat load is a prerequisite for efficient and stable operation of district heating system (DHS). Most existing thermal load prediction methods considered limited influencing factors, like meteorological and historical parameters, and the prediction accuracy is unstable [1,2,3,4,5,6]. These models usually reflect a smooth linear relationship between load and weather variables, which is of great nonlinearity and complexity actually [1]. A. Kusiak et al. used weather forecast data to predict steam load [2]. Nicolas Perez-Mora et al. used historical heat demand data to predict and manage DHS loads [3]. E. Dotzauer took weather forecasting and social component modeling into account [4]. H. A. Nielsen et al. obtained a regression equation between meteorological parameters (i.e., outdoor temperature, solar radiation, relative humidity, and wind speed) and building heat consumption [5]. O. Yetemen et al. found that the monsoon circulation has some influence on the long-term energy consumption prediction [6].

With the continuous development of machine learning theory, nonlinear prediction methods have been successfully applied in the field of load forecasting. Huang et al. [7] developed extreme learning machine (ELM), which is an evolutionary neural network method with good generalization ability. Sajjadi et al. established a DHS thermal load prediction model by using ELM method, revealing the robustness of this method, [8].

This paper studied the correlation of historical heating load, historical secondary supply/return temperature, and outdoor temperature. The selected input variables were used to predict heat load for the next 6, 12, and 24 h using ELM method. The proposed method was applied and analyzed in a DHS in Changchun, China.

2 Data Preprocessing

2.1 Data Outlier Elimination

Test values with coarse errors are called outliers, which are undesirable and should be removed from the measured data [9]. PauTa criterion is commonly used to judge the gross error, whose basic idea is that any error beyond triple standard deviation limit is considered to be gross error rather than random error.

When using the PauTa criterion to judge and eliminate outliers, the average value \(\overline{X}\) and residual error \(V_{i} = X_{i} - \overline{X}\) of the independent measurement column Xi(i = 1, 2, 3, …, n) should be calculated first. The standard deviation S of the measurement column is calculated. If the residual error Vd of a measured value Xd satisfies Vd > 3S, it is considered that Xd is an outlier needs to be rejected.

2.2 Correlation Analysis

The selection of the characteristic variables plays a crucial role in the thermal load prediction model. Through correlation analysis, the relative factors that have a great influence on load can be taken as the input factors of the prediction model to improve accuracy. In this study, the correlation coefficient method was used to analyze the correlation between two variables. r can be calculated by Eq. (1):

$$r = \frac{{\mathop \sum \nolimits_{i = 1}^{n} (X_{i} - \overline{X} )(Y_{i} - \overline{Y} )}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} (X_{i} - \overline{X} )^{2} } }{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} (Y_{i} - \overline{Y} )^{2} } }}$$
(1)

where X and Y represent the two variables. The r is between [−1, 1]. A positive value of r indicates a positive correlation, vice versa. The greater the absolute value of r, the stronger the correlation.

3 Prediction Methods

3.1 Extreme Learning Machine (ELM)

ELM refers to an artificial neural network model that is developed with the improvements on single-hidden layer feedforward networks (SLFNs) [10], as shown in Fig. 1.

Fig. 1
figure 1

Schematic of ELM network

For M arbitrary samples (xi, ti), in which xi=[xi1, xi2, …, xin]T ϵ Rn and ti = [ti1, ti2, …, tin]T ϵ Rm. The number of single-hidden layer nodes is Ñ, the standard SLFNs model with an activation function g(x) is as follows:

$$\mathop \sum \limits_{i = 1}^{\widetilde{N}} \beta_{i} g_{i} \left( {x_{j} } \right) = \mathop \sum \limits_{i = 1}^{\widetilde{N}} \beta_{i} g_{i} \left( {a_{i} \cdot x_{j} + b_{i} } \right),j = 1, \ldots ,N$$
(2)

where ai = [ai1, ai2, …, aim]T is the weight vector that connects the ith hidden layer node; bi is the threshold of ith hidden layer nodes; \(\beta_{i} = [\beta_{i1} ,\beta_{i2} , \ldots ,\beta_{im} ]^{\text{T}}\) is the output weight vector connecting ith hidden layer nodes; \(a_{i} \cdot x_{j}\) represents the inner product of ai and xj.

The ELM model can approach the output value tj of N training samples with zero error,and we get:

$$\mathop \sum \limits_{i = 1}^{\widetilde{N}} \beta_{i} g_{i} \left( {a_{i} \cdot x_{j} + b_{i} } \right) = t_{j} ,j = 1, \ldots ,N$$
(3)

Equation (4) is written in the matrix form as follows:

$$\varvec \beta \varvec{ H} = \varvec{ T}$$
(4)

where H is the hidden layer output matrix of the network; the ith column represents the output vector of the ith hidden layer node associated with the input x1, x2, …, xN, and the jth row represents the implicit layer output vector associated with the input. The hidden layer matrix day is a deterministic matrix, so training SLFNs is equivalently converted to a least-squares solution, so that βH= T, which is expressed as follows:

$$\widehat{\varvec{\beta}}= \mathop { \hbox{min} }\limits_{\beta } \left\| {\varvec{T}\left( {a_{i} , \ldots ,a_{\widetilde{N}} ,b_{i} , \ldots ,b_{\widetilde{N}} } \right)\varvec{\beta}- \varvec{T}} \right\|$$
(5)

Equation (6) can be expressed as follows:

$$\widehat{\varvec{\beta}} = \varvec{H}^{{\mathbf{ + }}} \varvec{T}$$
(6)

where \(\varvec{H}^{ + }\) is the molar generalized inverse matrix of the hidden layer output matrix.

3.2 Prediction Model Performance Evaluation Criteria

The mean absolute percentage error (MAPE) and root-mean-square error (RMSE) are used to evaluate the performance of the thermal load prediction model, which are relative and absolute indicators, respectively. They can be calculated by Eq. (7):

$$\left\{ {\begin{array}{*{20}c} {{\text{MAPE}} = \frac{1}{n}\mathop \sum \limits_{t = 1}^{n} \left| { \frac{{{\text{observed}}_{t} - {\text{predicted}}_{t} }}{{{\text{observed}}_{t} }}} \right| \times 100\% } \\ {{\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{t = 1}^{n} \left( {{\text{observed}}_{t} - {\text{predicted}}_{t} } \right)^{2} } } \\ \end{array} } \right.$$
(7)

where observedt is actual heat load and predictedt is the predicted heat load.

4 Results and Discussion

In order to verify the feasibility and effectiveness of the proposed prediction algorithm, filed test of a DHS station in Changchun City was conducted from October 21 to December 7, 2018. Outdoor temperature tw, supply temperature tg, return temperature th, and heating load q were collected every 10 min, and a total of 6840 data were collected, as shown in Fig. 2. It can be seen that tg and th are relatively stable. tw and q fluctuate more severely, which may have a certain impact on the later prediction accuracy. The measured variables were averaged every 6, 12, and 24 h, to study different timescale heat load predictions.

Fig. 2
figure 2

Measured data

4.1 Correlation Analysis

The measured factors were normalized and then calculate the correlation coefficient with heat consumption according to Eq. (1), and the results are shown in Tables 1, 2, and 3.

Table 1 Correlation coefficient of 6-h averaged heat load and measured data
Table 2 Correlation coefficient of 12-h averaged heat load and measured data
Table 3 Correlation coefficient of 24-h averaged heat load and measured data

As shown in Table 1, when the heat consumption prediction period is 6, 12, and 24 h, the historical heat consumption and the historical secondary return temperature have a strong correlation with the heating load. The correlation coefficient of historical heat consumption, historical secondary return temperature, and heating load reached the maximum when the prediction period is 12 h.

When the prediction period is 6h, 12h, 24h, the correlation coefficient between heating load and outdoor temperature is -0.485, -0.523, -0.561, respectively. Although the correlation between outdoor temperature and heating load is weak, it is the key factor in updating the heating load prediction model. Finally, we use historical heating load, secondary return temperature, and outdoor temperature as the heating load variables with prediction periods of 6, 12, and 24 h.

4.2 Prediction Analysis

The data sets are divided into two categories by setting the number of test sets: the number of training sets = 7:3. As the ELM method is used to predict the heating load of the periods of 6, 12, and 24 h, the results are shown in Figs. 3, 4, and 5, respectively. It can be seen that when the predicted period of heating load is 6, 12, and 24 h, the corresponding MAPE values are 4.1, 6.8, and 9.3%. The corresponding MSE value is 0.941, 1.459, and 2.063. Comparing the prediction results, it is found that the heating load prediction model has the best degree of agreement in 6 h, the 12-h result is the second, and the 24-h fitting degree is the worst.

Fig. 3
figure 3

Next 6-h heating load prediction results

Fig. 4
figure 4

Next 12-h heating load prediction results

Fig. 5
figure 5

Next 24-h heating load prediction results

When the predicted period of heating load is 6 h, the trend of the predicted load curve is similar to the actual load trend. At 1–20 and 35–40 sample points, the predicted value is closer to the true value. The prediction results show that the ELM method has effectiveness in the application of short-term heating load prediction research.

With the extension of prediction time, the accuracy of heating load prediction decreases gradually. The main reason may be that the collected data samples are located in the early stage of heating, the heating load fluctuates greatly, and the collected heating load and other data are insufficient.

5 Conclusions

In this paper, the method of ELM heating load prediction is studied and verified in a heating network in Changchun. Through the establishment of ELM prediction model, the following conclusions can be drawn:

  1. (1)

    Studying the influence of different characteristic variables on heat load prediction, the MAPE values of predicted future heating loads at 6 and 12 h are 4.1 and 6.8%. It is proved that the optimized feature set model has good prediction performance.

  2. (2)

    In this study, the accuracy of the future 24-h heating load prediction is lower than the heat load forecast for the future 12 and 6 h, and its improvement measures need to be further researched.