1 Introduction

Rapid development of global economy and ongoing process of worldwide industrialization have led to fast-growing energy consumption, and as such energy security is becoming increasingly significant and urgent for a country or region [1,2,3,4,5], especially for developing countries such as China and India. Energy consumption forecasts is an important reference for a country or region to make macroeconomic plans. Therefore, establishment of an accurate energy consumption prediction model is vital for effective utilization of energy, long-term energy security and sustainable development of economy for a country or region [6,7,8].

Prediction of energy consumption is a worldwide hotspot in the field of energy economics. The energy consumption is influenced by a number of uncertain factors, such as industry structure, technology level, energy price, per capita income, carbon emissions, economic growth and national policy [9,10,11,12], which makes it a challenging and complex task to forecast energy consumption. To address such tough problems, various forecast models have been applied to solve energy consumption prediction problems. In general, these prediction models could be divided into three categories: statistical analysis models, intelligent learning models and grey prediction models. Statistical analysis models, such as regression analysis(RA) [13], time series analysis [14], semi-parametric approach [15] and non-parametric method [16], mostly require a large number of sample data sets and multiple complicated variables in order to gain ideal forecasting effects [17]. Meanwhile, the sample data of energy consumption seldom satisfy the statistical distribution [18]. Intelligent learning models mainly include artificial neural network [19, 20], artificial bee colony algorithm (ABC) [21], support vector machine (SVM) [22]. The performance of intelligent learning models could be significantly affected by the number of training sample data [23, 24]. Affected by green energy conservation and environmental protection policy, the data sequence of energy system often presents to be highly non-linear and uncertain. And the available source database for energy consumption forecasts is often limited with great deviation, and sorts of factors have various influences, which cannot satisfy the requirements of these traditional forecast models [25, 26] as mentioned above. Therefore, it’s essential to select a forecasting model, which requires relatively small sample size and also bears high prediction accuracy to predict energy consumption.

Grey system theory is proposed by Professor Deng in 1989 [27], which doesn’t need to satisfy specific statistical distribution hypothesis and requires only a few sample data to estimate the behaviour of an uncertain system, and thus provides an appropriate alternative tool to settle the problem of short-term energy consumption prediction. As important forecasting techniques and tools to solve some uncertainty problems with high prediction accuracy, grey prediction models appear to be more reliable and practical than the other forecasting methods due to their applicability even in the presence of sparse data [23, 28]. GM(1,1) model, which is a primary time-series prediction model in grey system theory, is the most popular grey prediction model [29, 30]. GM(1,1) model is appropriate in the condition that the original sample data sequence satisfies or basically satisfies the trend of exponential growth and the changing speed is not very aggressive. So far, GM(1,1) model has been successfully employed in the area of medicine [29], economics [28],industrial [31, 32], education [33], energy [34,35,36,37,38] and so on. As mentioned above the factors that affect energy consumption are uncertain and complicated [25]. Therefore, GM(1,1) model is an appropriate prediction method to settle the problem of energy consumption prediction [34, 35].

Because of its applicability, GM(1,1) model shows better prediction performance for simulation and prediction than some other prediction models when dealing with homogeneous exponential data sequence [39]. So GM(1,1) model does not always produce satisfactory results if the original data sequence does not conform to the homogeneous exponential growth, especially the practical data sequence often demonstrates a more aggressive trend, which does not confirm to the homogeneous exponential distribution and therefore adversely affects the prediction accuracy of the GM(1,1) model. To deal with this challenge, it’s necessary to improve the prediction performance of GM(1,1) model [40, 41]. Modifications of the original data sequence provide a target to improve the prediction performance of GM(1,1) model. Additionally, in traditional GM(1,1) model, the grey developed coefficient a and the grey controlled variable b are obtained by using the least squares method, which depend on the background value. The background value \(Z^{(1)}(k+1)\) is defined as: \(Z^{(1)}(k+1)=\alpha x^{(1)}(k+1)+(1-\alpha )x^{(1)}(k)\). The value of \(\alpha \) is usually specified as 0.5, but this is not an optimal setting, which has influence on the prediction accuracy of GM(1,1) model [42]. Therefore the background value formula, which plays an important role in the relationship of the grey difference equation and the whitenization differential equation, is also a target that could be optimized to improve the performance of predictions for GM(1,1) model. In summary, the distribution of the original data sequence and the calculation method of the background value are two of the factors that affects the forecasting performance of GM(1,1) model and could be optimized to improve the prediction performance of GM(1,1) model [24, 37, 43,44,45,46]. Chung [24] applied an improved GM(1,1) model named as NNGM(1,1), which is neural-network-based GM(1,1) model, to solve the troublesome problem of the background value estimation by automatically determining the grey developed coefficient a and the grey controlled variable b. Zhao and Guo [37] proposed the Rolling-ALO-GM(1,1) model with improved prediction accuracy to forecast the annual electricity consumption in China. Li et al [43] proposed an improved grey model (PGM(1,1) model) based on particle swarm optimization algorithm, and achieved better prediction performance. Li et al [44] applied AGM(1,1), which is based on optimization of the background value by using an incremental weights, to predict the short-term electricity consumption in Asia-Pacific economic cooperation. Wang et al [45] proposed an improved grey model based on combination optimization of the background value and the initial item. Tien [46] has proposed the first-entry GM(1,1) model (FGM(1,1)), which includes the first-entry’s messages of the original series, and showed that FGM(1,1) could extract the messages from the data more sufficiently than the existing GM(1,1) model.

As an indispensable method in the grey system modelling, data transformation, which could smoothen the data, weaken the randomness and increase the comparability, provides an important solution to optimization of the original data sequence. So in this study we proposed a novel improved GM(1,1) model by first making data transformation for the original data sequence and then optimizing the calculation method for the background value. Thus this improved GM(1,1) model is named as TBGM(1,1). And the experimental results indicate that TBGM(1,1) could significantly reduce the forecast error and increase the prediction accuracy of GM(1,1) model. Finally, TBGM(1,1) is applied to predict energy consumption in Shanghai.

The rest of this study is arranged as follows: In Sect. 2, GM(1,1) model is briefly presented. In Sect. 3, the proposed improved GM(1,1) model is illustrated in details. A real case for prediction of Shanghai’s energy consumption in China is demonstrated in Sect. 4. The conclusion is presented in Sect. 5.

2 GM(1,1) model

GM(1,1) model is the basic GM(1,N) model in the family of Grey models, which is the first order Grey model with only one variable. It’s characterized by high computational efficiency and requirement of only one parameter fitting the model. As one of the most frequently applied grey forecasting models, GM(1,1) model is a non-linear and time series forecasting model and requires only four raw data to predict future demands with a relatively favourable prediction accuracy.

The procedure of traditional GM(1,1) modelling is described as below:

Step 1: Assume that \(x^{(0)}=(x^{(0)}(1),x^{(0)}(2),\dots ,x^{(0)}(n))\) denotes a non-negative raw data sequence, where n is the length of the raw data sequence and \( n\geqslant 4 \).

Step 2: Construct the accumulated generating operator.

The data sequence \(x^{(1)}=(x^{(1)}(1),x^{(1)}(2),\dots ,x^{(1)}(n))\) is the accumulated generating operator (AGO) of \(x^{(0)}\), and sequence \(x^{(1)}\) is monotonically increasing to smoothen the randomness of the original data sequence, where \(x^{(1)}(k)=\sum _{i=1}^{k}x^{(0)}(i)\), \(k=1,2,3,\dots ,n\).

Step 3: Establish the grey difference equation and estimate the grey developed coefficient and the grey controlled variable.

The first-order grey differential equation of GM(1,1) model is defined as

$$\begin{aligned} \begin{aligned} \frac{dx^{(1)}(t)}{dt}+ax^{(1)}(t)=b, \end{aligned} \end{aligned}$$
(1)

where t denotes the independent variable in Eq. (1), a represents the grey developed coefficient, and b is a grey controlled variable of GM(1,1) model. And a and b denote the parameters of GM(1,1) model that need estimating. Eq. (1) is also called whitening differential equation.

The equation

$$\begin{aligned} \ x^{(0)}(k)+aZ^{(1)}(k)=b \end{aligned}$$
(2)

is called grey difference equation of GM(1,1) model, which is discretization of Eq. (1).

If \( {\widehat{u}}=[a,b]^{T}\), a and b can be estimated by the least square estimation method as follows:

$$\begin{aligned} \begin{aligned} {\widehat{u}}=[a,b]^{T}=[B^{T}B]^{-1}B^{T}Y, \end{aligned} \end{aligned}$$
(3)

where \(Y=[x^{(0)}(2),x^{(0)}(3),\dots ,x^{(0)}(n)]^{T}\),

and the matrix \( B= \left| \begin{array}{ll} -Z^{(1)}(2)&{} 1 \\ -Z^{(1)}(3)&{} 1 \\ &{} \\ \vdots &{}\vdots \\ -Z^{(1)}(n)&{} 1 \\ \end{array} \right| \).

Let \(Z^{(1)}=(Z^{(1)}(2),\dots ,Z^{(1)}(n))\) be the mean value sequence of \(x^{(1)}\), and denote the background value of GM(1,1) model as

$$\begin{aligned} \begin{aligned} \ Z^{(1)}(k+1)=\frac{1}{2}[x^{(1)}(k+1)+x^{(1)}(k)], k=1,2,3,\dots , n-1. \end{aligned} \end{aligned}$$
(4)

Step 4: Obtain the solution of GM(1,1) model and the predicted data.

The solution of \(x^{(1)}\) at time k can be estimated as

$$\begin{aligned} \begin{aligned} \ {\widehat{x}}^{(1)}(k+1)=\left( x^{(0)}(1)-\frac{b}{a}\right) e^{-ak}+\frac{b}{a}. \end{aligned} \end{aligned}$$
(5)

The predicted data \({\widehat{x}}^{(0)}(k+1)\) at time k can be recovered by Eq. (6).

$$\begin{aligned} \begin{aligned} \ {\widehat{x}}^{(0)}(k+1)=(1-e^{a})\left( x^{(0)}(1)-\frac{b}{a}\right) e^{-ak}, \end{aligned} \end{aligned}$$
(6)

and \(k=1,2,3,\dots ,n\).

3 Methodology of the improved GM (1,1) model

Although many efforts have been made to improve the prediction accuracy of GM(1,1) model [46,47,48,49,50,51], further improvements are still needed in order to achieve adequate results in certain situations. In this study we proposed a novel improved GM(1,1) model, TBGM(1,1), based on data transformation for the original data sequence and optimization of the background value. And the detailed procedures are as follows.

3.1 Data transformation for the original data sequence

The basic process of data transformation for the original data sequence of GM(1,1) model is as follows:

Step 1: Take the logarithm

Taking the logarithm for the original data sequence \(x^{(0)}\) effectively weakens its fluctuation tendency.

$$\begin{aligned} \begin{aligned} \ x_1^{(0)}(k)=ln({x^{(0)}(k)}) \end{aligned} \end{aligned}$$
(7)

and \(k=1,2,\dots ,n\).

Step 2: Add a constant c in the front of the data sequence \(x_1^{(0)}\).

Tien [46] has found that addition of a constant c in the front of the original series could extract the messages from the data more sufficiently than the existing GM(1,1) model. We choose \(c=1\) in this study. Sequence \(x_1^{(0)}\) could be converted to sequence \(x_2^{(0)}\) as follows:

$$\begin{aligned} \ x_2^{(0)}=\left\{ c,x_1^{(0)}(1),x_1^{(0)}(2)\dots ,x_1^{(0)}(n)\right\} , c\geqslant 0. \end{aligned}$$
(8)

Step 3: Take exponentials.

Take exponentials for the data sequence \({\widehat{x}}^{(0)}\), and the predicted value is acquired as follows:

$$\begin{aligned} {\widehat{x}}_3^{(0)}(k)=exp({\widehat{x}}^{(0)}(k)). \end{aligned}$$
(9)

In all, the procedures for \({\widehat{x}}^{(0)}\) is as follows: At first, data transformation for the original data sequence by Step 1 and Step 2 is carried out before modelling, and then the background value \(Z^{(1)}(k+1)\) in traditional GM(1,1) model is substituted by the optimized background value applied in this study, and other procedures is the same as Sect. 2.

3.2 Optimization of the background value

The background value \(Z^{(1)}(k+1)\) in traditional GM(1,1) model is usually estimated approximately by the trapezoidal formula as follows:

$$\begin{aligned} \begin{aligned} \ Z^{(1)}(k+1)=\frac{1}{2}\left[ x^{(1)}(k+1)+x^{(1)}(k)\right] . \end{aligned} \end{aligned}$$
(10)

However, the real background value is

$$\begin{aligned} \begin{aligned} \ Z^{(1)}(k+1)=\int _k^{k+1}x^{(1)}(t)dt. \end{aligned} \end{aligned}$$
(11)

Obviously the original GM(1,1)model is biased, because the background value formula is approximate. This approximation is one of the main sources of errors. Especially when the AGO of the original modelling data sequence changes sharply, the estimation of the background value may produce significant error. So it is urgent to improve the background value formula to acquire a content result.

According to the solution of GM(1,1) model: \( {\widehat{x}}^{(1)}(k+1)=\left( x^{(0)}(1)-\frac{b}{a}\right) e^{-ak}+\frac{b}{a} \), the data sequence \( {\widehat{x}}^{(1)}(t)\) can be fitted as an exponential function as follow:

$$\begin{aligned} \begin{aligned} {\widehat{x}}^{(1)}(t)=B \cdot e^{A \cdot t}+C , \end{aligned} \end{aligned}$$
(12)

where A, B and C are the constants that need solving, and \(k=1,2,\dots ,n\).

Substitute Eq. (12) in Eq. (11), then

$$\begin{aligned} \begin{aligned} \ Z^{(1)}(k+1)&=\int _k^{k+1}x^{(1)}(t)dt {} \\&{} = \frac{{\widehat{x}}^{(1)}(k+1)-{\widehat{x}}^{(1)}(k)}{A}+C.\\ \end{aligned} \end{aligned}$$
(13)

Because

$$\begin{aligned} \begin{aligned} \ \frac{{\widehat{x}}^{(1)}(k+1)-C}{{\widehat{x}}^{(1)}(k)-C} = \frac{B.e^{A \cdot (k+1)}}{B \cdot e^{A \cdot k}} =e^{A}, \end{aligned} \end{aligned}$$
(14)

therefore

$$\begin{aligned} \begin{aligned} \ A=ln[x^{(1)}(k+1)-C]-ln[x^{(1)}(k)-C]. \end{aligned} \end{aligned}$$
(15)

When \( t=k+1 \), Eq. (12) can be converted as follow:

$$\begin{aligned} \begin{aligned} \ {\widehat{x}}^{(1)}(k+1)=B \cdot e^{A \cdot (k+1)}+C. \end{aligned} \end{aligned}$$
(16)

According to the relationship between sequence \(x^{(0)}\) and sequence \(x^{(1)}\), we could get

$$\begin{aligned} \begin{aligned} \ x^{(0)}(k+1)&=x^{(1)}(k+1)-x^{(1)}(k) {} \\&{} = [{B \cdot e^{A.(k+1)}+C}]-[{B \cdot e^{A \cdot k}+C}] \\&=B \cdot e^{A.(k+1)}-B \cdot e^{A.k}{} \\&{} =B \cdot [1-e^{-A}]e^{A \cdot (k+1)}. \end{aligned} \end{aligned}$$
(17)

Because \(x^{(1)}(k)=\sum _{i=1}^{k}x^{(0)}(i)\), then

$$\begin{aligned} \begin{aligned} \ x^{(1)}(k+1)&=\sum _{i=1}^{k+1}x^{(0)}(i) =B \cdot [1-e^{-A}].\sum _{i=1}^{k+1} \cdot e^{A \cdot i} \\&=B \cdot [1-e^{-A}]\frac{e^{A} \cdot [1-e^{A \cdot (k+1)}]}{1-e^{A}}{} \\&{} =B \cdot e^{A \cdot (k+1)}-B. \end{aligned} \end{aligned}$$
(18)

By comparing Eq. (16) and Eq. (18), we could get \( B=-C \).

Substitute \( k=0 \) in Eq. (16), then

$$\begin{aligned} \begin{aligned} \ {x}^{(1)}(0+1)={x}^{(1)}(1)=B \cdot e^{A \cdot 1}+C, B=\frac{{x}^{(1)}(1)}{e^{A}-1}. \end{aligned} \end{aligned}$$
(19)

Substitute Eq. (15) and Eq. (16) in Eq. (19), then

$$\begin{aligned} \begin{aligned} \ B=-C =\frac{{x}^{(1)}(1)\cdot [{x}^{(1)}(k)-C]}{{x}^{(0)}(k+1)}&=\frac{{x}^{(1)}(1)\cdot {x}^{(1)}(k)}{{x}^{(0)}(k+1)-{x}^{(0)}(1)}. \end{aligned} \end{aligned}$$
(20)

To sum up, according to Eqs. (13), (15) and (20), we could estimate the real background value as follows:

$$\begin{aligned} \begin{aligned} \ Z^{(1)}(k+1)&=\frac{x^{(1)}(k+1)-x^{(1)}(k)}{ln[x^{(1)}(k+1)-{x}^{(1)}(1)]-lnx^{(1)}(k)}{} \\&\quad -\frac{{x}^{(1)}(1) \cdot {x}^{(1)}(k)}{{x}^{(0)}(k+1)-{x}^{(0)}(1)}. \end{aligned} \end{aligned}$$
(21)

And Eq. (21) is the optimized background value applied in this study.

4 Forecasts of the energy consumption for Shanghai City in China

Shanghai City is the largest city and also the economic and financial center of China. Constructing the prediction models to forecast the energy consumption for Shanghai City in China and to analyse the forecasting results accordingly is essential both economically and practically. In this section, TBGM(1,1) is applied to forecast Shanghai’s energy consumption in China.

4.1 Modelling procedure of Shanghai’s total energy consumption forecasting

The primitive data sequence of Shanghai’s total energy consumption (tons of standard coal) in China is collected from the official website of Shanghai City Bureau of Statistics in China. The sample data of annual energy consumption for Shanghai City in China from 2010 to 2017 is listed in Table 1 and illustrated in Fig 1. It can be seen from Fig 1 that Shanghai’s total energy consumption (tons of standard coal) in China is characteristic of non-linear growth, and the average increasing speed of electricity consumption in these eight years is about 1.3% per year, although there might be a slight short-term fluctuation.

Table 1 The energy consumption for Shanghai City in China from 2010 to 2017 (unit: \(10^{4}\) tons SCE)
Fig. 1
figure 1

The energy consumption for Shanghai City in China from 2010 to 2017

The modelling procedure for Shanghai’s total energy consumption forecasting is as follow: Firstly TBGM(1,1), GM(1,1), FGM(1,1) [46], RGM(1,1) [50], TGM(1,1) [51], linear regression (LR) model and Exponential smoothing (ES) model are constructed with the data of annual total energy consumption of Shanghai City from 2010 to 2015 respectively, and then the prediction accuracies of the seven predictive models are validated and compared with the data of annual total energy consumption of Shanghai City from 2016 to 2017, and finally the superior model is employed to predict Shanghai’s total energy consumption from 2018 to 2022.

The brief introduction of the eight prediction models is summarized as follows:

  1. 1.

    GM(1,1): the original GM(1,1)model proposed by Deng in 1989 [27];

  2. 2.

    TBGM(1,1): a novel improved GM(1,1) model proposed in this this study, which is based on both data transformation for the original data sequence and optimization of the background value;

  3. 3.

    FGM(1,1): first-entry GM(1,1) proposed by Tien [46], which is based on the original GM(1,1) but modelled with data including the first-entry’s messages of the original series;

  4. 4.

    RGM(1,1): the GM(1,1) model with rolling mechanism [50];

  5. 5.

    TGM(1,1): the transformed GM(1,1) model with an improved background value [51];

  6. 6.

    LR: the linear regression model on time, in which the annual total energy consumption of Shanghai City is the dependent variable and the time is the independent variable;

  7. 7.

    ES: the exponential smoothing model.

4.2 Evaluation indices

In order to assess the prediction performance of the prediction models, three frequently-used statistical evaluation indicators are chosen, which are absolute percentage error (APE), mean absolute percentage error (MAPE) and root mean squared error (RMSE). And APE, MAPE and RMSE are defined by Eq. (22)–(24) respectively as follows:

$$\begin{aligned} APE= & {} \left| \frac{{\widehat{x}}^{(0)}(i)-x^{(0)}(i)}{x^{(0)}(i)}\right| \times 100\%, \end{aligned}$$
(22)
$$\begin{aligned} MAPE= & {} \frac{1}{n}\sum _{i=1}^{n}\left| \frac{{\widehat{x}}^{(0)}(i)-x^{(0)}(i)}{x^{(0)}(i)}\right| \times 100\%, \end{aligned}$$
(23)

and

$$\begin{aligned} \begin{aligned} RMSE=\sqrt{\frac{1}{T}\sum _{i=1}^{T}({{\widehat{x}}^{(0)}(i)-x^{(0)}(i))}^{2}}, \end{aligned} \end{aligned}$$
(24)

where \( {x} ^{(0)}(i) \) denotes primitive data sequence, and \( {\widehat{x}} ^{(0)}(i) \) denotes the predicted data sequence.

Table 2 Forecasting results of the energy consumption for Shanghai City in China by the compared models (unit: \(10^{4}\) tons SCE)

4.3 Comparison of the forecasting performances of the seven predictive models

The actual data and predicted data by the seven models for 2016 and 2017 are listed in Table 2 and Fig.2. And the most commonly used indices, namely APE, MAPE and RMSE, which are used as the evaluation indices of the forecasting performance for the seven predictive models, are listed in Table 3.

Table 2 and Fig. 2 both show that the predicted values by the TBGM(1,1) predictive model are closest to the actual values for both the year 2016 and 2017, thus demonstrating better forecasting performance than the other six predictive models.

The APEs of the seven predictive models as shown in Table 3 reveal that TBGM(1,1) demonstrates the lowest APE and has the best forecasting performance in both year of 2016 and 2017 compared with the other predictive models. As for MAPEs shown in Table 3, it could be concluded that all the seven predictive models present to be highly accurate (\( MAPE <10\% \)) in this study according to Lewis’ benchmark of accuracy evaluation [52]; additionally TBGM(1,1) yields the lowest MAPE (1.7447%) and bears the highest prediction accuracy, which verifies again that the new proposed model in this study is superior to the other six predictive models.

Finally according to the RMSE values for the total energy consumption, which are shown in Table 3, the findings are the same as those for APE and MAPE values above, that is TBGM(1,1) has the smallest RMSE and outperforms the other six models in terms of predicting the energy consumption.

In conclusion, TBGM(1,1) proposed in this study performs better than the other four grey forecasting models, LR model and ES model. The three evaluation indices also confirm that the novel improved GM(1,1) model (TBGM(1,1)), which is based on both data transformation for the original data sequence and optimization of background value, is most suitable for energy consumption forecasting purposes. Therefore, this novel model will be utilized for forecasting the energy consumption for Shanghai City in China from 2018 to 2022.

Fig. 2
figure 2

Forecasting results of the energy consumption for Shanghai City in China by the compared models

Table 3 Performance measurements of seven compared models for the industrial electricity consumption
Fig. 3
figure 3

Prediction results of Shanghai’s energy consumption from 2018 to 2022

4.4 Forecasting the total energy consumption for Shanghai City in China during 2018–2022

Because of its forecasting accuracy, which is superior to most existing improved GM(1,1) models, TBGM(1,1) is further applied to predict the energy consumption for Shanghai City in China from 2017 to 2022. The predicted values are illustrated in Fig. 3. It demonstrates that the total energy consumption for Shanghai City in China will exhibit a relatively stable rising trend in the following five years, and will reach nearly 126.42 million tons SCE by the year of 2022. In other word, the energy consumption for Shanghai City in China will increase by nearly 8.41 million tons SCE relative to 2017 year by the year of 2022. Under the pressure of energy shortage worldwide, it is a huge challenge for Shanghai’s energy demand strategy, and relevant departments need to make appropriate measures in advance to cope with the looming shortage of energy demand.

4.5 Discussion

From Tables 2 and 3, it shows that the MAPEs obtained by GM(1,1), TBGM(1,1), FGM(1,1), RGM(1,1), TGM(1,1), LR and ES are 2.4031 %, 1.7447 %, 1.7793 %, 2.3560 %, 2.4025%, 1.8060 % and 2.8330% respectively, and the RMSEs are 284.24, 205.85, 209.93, 278.66, 284.16, 213.17 and 338.35 respectively. So in all the predictive performance of TBGM(1,1) proposed in this study is better than that of the other six forecasting models. But it should be noticeable that the MAPEs of TBGM(1,1), FGM(1,1) and LR are close and all less than 3.0%, and TBGM(1,1) reduces the error by 1.94% compared with FGM(1,1) and by 3.40% compared with LR. So in order to get better predictive performance of GM(1,1), further exploration is needed in certain circumstances.

In grey system theory, the length of the raw data sequence for grey modelling is usually between 5 and 8 and too much data set may reduce the prediction [30]. A small set are usually used for the model validation or prediction in grey forecast [27, 53]. Long-term prediction by GM(1,1) model may produce large prediction error. In accord with the original GM(1,1) model, the performance of long-term forecasting by TBGM(1,1) model is less efficient than that of short-term forecasting. Therefore the optimization of the GM(1,1) model should be considered for long-term forecasting.

5 Conclusion

Prediction of energy consumption for a country (region) not only plays a significant role in economy and security of energy, but also is important for policy makers. Accurate prediction results could facilitate effective implementation of energy policies, also help avoid economic losses caused by insufficient energy to a certain extent and reduce operating costs and risks of economiy. One of the biggest challenge to predict the energy consumption is the rapid increase demand of energy especially in developing countries [54]. Therefore it is highly desirable to develop techniques for energy consumption forecasting to improve prediction accuracy. And many scholars have paid attentions to energy consumption forecasting recently [1, 2, 34, 35, 55, 56].

GM(1,1) model is one of the most frequently used grey prediction models, because it only requires a limited number of samples to construct a prediction model with relatively high prediction accuracy [27, 28]. And GM(1,1) model has been widely applied in the field of forecasting [29, 57]. However, GM(1,1) model needs to be improved in order to obtain higher prediction accuracy. In this paper, we propose a novel improved GM(1,1) model, which is based on both data transformation for the original data sequence and optimization of the background value, and is therefore abbreviated as TBGM(1,1). Two case studies are carried out to evaluate its simulation and prediction performance. And the results show that TBGM(1,1) has higher prediction accuracy than the traditional GM(1,1) model and some improved GM(1,1) models and has better exploration and exploitation ability. Additionally, application of TBGM(1,1) for total energy consumption forecasting in Shanghai City not only indicates an increasing energy demand in the following five years in Shanghai City but also verifies the adequate predictive performance of TBGM(1,1).

Based on the empirical results, we suggest that the TBGM(1,1), which bears higher prediction precision, could be utilized as an effective and promising forecasting tool in the future. TBGM(1,1) can be utilized in other forecasting fields, such as GDP forecasting, tourism demand forecasting, early disease prevention and control forecasting, peak load forecasting, business forecasting, and water quality prediction in the context of limited data in general.

However, it is worth noting that the prediction accuracy of GM(1,1) model and improved GM(1,1) models may decrease rapidly when the raw data sequence fluctuates dramatically or grows aggressively, thus further improvements will also be needed in such circumstances.