1 Introduction

1.1 Background

With the increase in population and large-scale industrialization, electricity consumption around the world is rising rapidly. Due to the non-storage of power resources and the uncertainty of coal, hydro, wind, and solar power, electricity consumption forecasting is very important for managing power resources successfully and using energy effectively. Excessive electricity supplying will lead to energy investment waste and energy dissipation, while insufficient electricity supplying will hinder economic development and social progress. Hence, accurate and reliable electricity consumption forecasting is critical to sustainable energy management [1]. Moreover, precise electricity consumption forecasting can guide government strategies for future energy usage and development.

1.2 Motivation and related work

There are various factors which affect electricity consumption, such as seasons, holidays, electricity policy, electricity price, human behaviours, social condition, political conditions, economy, technological development level, and so on [2], which makes the task of accurate prediction of electricity consumption more challenging. To tackle such tough problem, lots of forecasting techniques have been developed. These methods mainly include regression models [3,4,5,6], statistical models [7,8,9,10], machine learning methods [11,12,13,14], and so on. However, these widely utilized and relatively mature methods have a common limitation: requiring a great deal of training data. Practically, data for electricity consumption along with its influencing factors appear to be limited. Accordingly, the prediction model we use must be effective under this constraint.

It is worth noting that the grey forecasting model developed by Deng [15] is famous for finding the inner law based on incomplete information and few samples. Grey model (GM) with first order and one variable referred to GM (1, 1) has been utilized widely [16,17,18,19,20,21,22,23]. For example, Akay et al. [16] proposed a grey prediction model of single variable with rolling mechanism to predict the Turkey’s total and industrial electricity consumption. Lee et al. [17] developed an improved grey forecasting model of one variable, which combines residual modification with genetic programming sign estimation. Xiong et al. [18] proposed a novel GM (1, 1) model based on optimizing initial condition according to the principle of new information priority. The optimized model and five other GM (1, 1) models are applied in the modelling of China’s energy consumption and production. Hamzacebi et al. [19] proposed an optimized GM (1, 1) forecasting technique called Optimized GM (1, 1) to predict total electric energy demand of Turkey. The Optimized GM (1, 1) technique is implemented in direct and iterative manners. Zhu et al. [20] proposed to establish a self-adaptive grey fractional weighted model to predict Jiangsu’s electricity consumption, which efficiently enhances the prediction quality of electricity consumption. This model introduced the fractional weighted coefficients to design a novel initial condition, which can capture the dynamic characteristics of the electricity consumption observations. Ding et al. [21] designed a novel optimized grey prediction model based on the principle of “new information priority”, which combines a new initial condition and rolling mechanism. Wang et al. [22] proposed a novel hybrid forecasting model based on an improved grey forecasting mode optimized by multi-objective ant lion optimization algorithm to obtain satisfactory forecasting results with high accuracy. Xu et al. [23] proposed a novel grey model with optimal time response function, in which particle swarm algorithm is used to obtain optimal values of unknown variables for shrinking simulation errors and improving adaptability to characteristics of raw data.

However, realistic and complex systems are often consisted of many variables which are not independent of each other and have mutual correlation among them. Therefore, GM (1, 1) with one variable is not suited to reflect more complex dynamics behaviours of a system. To deal with this issue, research on multivariable grey model (MGM(1, m)) (m represents the number of variables and m ≥ 2) has gained extensive attention [24,25,26,27,28,29,30,31,32]. For example, Wu et al. [24] proposed a novel multivariable grey forecasting model based on fractional-order accumulation. Senapati et al. [25] proposed a multivariable grey model based on convolution integral for solar energy generation prediction. Dang et al. [26] proposed a new electricity demand prediction model which is based on the development trend of multiple driving variables. Zhong et al. [27] proposed a novel multivariable grey theory model for short-term photovoltaic power generation volume forecasting in which particle swarm optimization algorithm is used for background value optimization. Bahrami et al. [28] proposed a new model based on wavelet transform and grey model for electric load forecasting in which the wavelet transform is used to eliminate the high-frequency components of the previous data. Rasheed et al. [29] developed an improved grey forecasting model through combining background value’s interpolation optimization with original data sequence’s data transformation to predict electricity consumption. Wang et al. [30] proposed to employ rolling mechanism and differential evolution algorithm to improve the prediction accuracy of the original grey model. Ayvaz et al. [31] proposed a nonhomogeneous discrete grey model for predicting yearly net electricity consumption in Turkey. Sun et al. [32] established a multivariable grey model by combining particle swarm optimization with improved grey theory to forecast electricity consumption in Bayannur region.

Although the above research on multivariable grey model has obtained great improvement in the performance of electricity consumption prediction, there are still inherent problems to be solved. One of the problems is that in these models the parameters estimation and variables prediction are solved through different approaches, which degrades the accuracy of grey prediction model for some data [33]. If the parameters estimation and variables prediction are derived from the same approach, the prediction error will be greatly reduced. In this paper, based on the difference equation [34], an improved MGM(1, m) is proposed through rewriting the whitenization equation of MGM(1, m) as a first-order linear difference equation with constant coefficients to make parameters estimation and variables prediction derive the same approach; thus, the accuracy will be significantly improved.

1.3 Contribution

The main contributions and novelty of this paper can be summarized as follows: (a) an improved multivariable grey model (IMGM(1, m)) is developed, which solves the problem of the parameter estimation and variable prediction using different approaches.(b) A hybrid forecasting model is developed based on IMGM(1, m) for electricity consumption.

1.4 Article structure

The rest of the paper is organized as follows: Section 2 provides a brief description of existing techniques related to our proposed method, including grey relational analysis, statistical correlation analysis, and MGM(1, m). Section 3 presents the proposed improved multivariable grey model. In Sect. 4, two cases of electricity consumption prediction including Shanxi province and Beijing city are adopted to demonstrate the performance of the proposed approach. Some conclusions and discussion of this study are drawn in Sect. 5. Figure 1 displays the schematic overview of the paper.

Fig. 1
figure 1

The schematic overview of the paper

2 Materials and methods

2.1 Grey relational analysis

Before performing the estimation of variables, correlation analysis is required to do in order to quantitatively determine the degree of correlation among factors. Grey relational analysis introduced by Professor Deng [15] is one of the most widely used methods in grey system theory, which is appropriate for solving complicated interrelationships between multiple factors and variables. In this paper, we use grey relational analysis to quantitatively analyse the degree of association among factors before performing estimation of variables. The steps of grey relational analysis are as follows:

Step 1: Determine the reference sequence and the comparative sequence.

Suppose \( \chi_{0} \left( k \right) \) is the reference sequence; \( \chi_{i} \left( k \right) \) is the comparative sequence, i = 1, 2, …, m; k = 1, 2, …, n; m is the number of factors influencing on the reference variable; n is the number of samples.

Step 2: Normalize reference sequence and comparative sequence.

Since different factors have different physical meanings, the scale of the data is not necessarily the same, which makes later analysis and comparison more difficult. Therefore, data must be normalized before relational analysis. The normalization is calculated as follows:

$$ \tilde{\chi }_{i} \left( k \right) = \frac{{\chi_{i} \left( k \right) - \mathop {\hbox{min} }\limits_{1 \le i \le m} \chi_{i} \left( k \right)}}{{\mathop {\hbox{max} }\limits_{1 \le i \le m} \chi_{i} \left( k \right) - \mathop {\hbox{min} }\limits_{1 \le i \le m} \chi_{i} \left( k \right)}} $$
(1)

in which k = 1, 2, …, n; i = 0, 1, 2, …, m.

Step 3: Determine the grey relational coefficient of reference sequence and comparative sequence.

The grey relational coefficient \( \zeta_{0i} \left( k \right) \) is computed by means of Eqs. (2)–(5):

$$ \zeta_{0i} \left( k \right) = \frac{\Delta \hbox{min} + \rho \Delta \hbox{min} }{{\Delta_{0i} \left( k \right) + \rho \Delta \hbox{max} }} $$
(2)
$$ \Delta_{0i} \left( k \right) = \left| {\tilde{\chi }_{0} - \tilde{\chi }_{i} } \right|,\quad i = 1, 2, \ldots ,m $$
(3)
$$ \Delta \hbox{max} = \mathop {\hbox{max} }\limits_{1 \le i \le m} \mathop {\hbox{max} }\limits_{1 \le k \le n} \Delta_{0i} \left( k \right) $$
(4)
$$ \Delta \hbox{min} = \mathop {\hbox{min} }\limits_{1 \le i \le m} \mathop {\hbox{min} }\limits_{1 \le k \le n} \Delta_{0i} \left( k \right) $$
(5)

in which ρ ∈ [0, 1] is the distinguishing coefficient which is normally set to 0.5; k = 1, 2, …, n; i = 0, 1, 2, …, m; \( \tilde{\chi }_{0} \) is the normalized reference sequence; \( \tilde{\chi }_{i} \) is the normalized comparative sequence.

Step 4: Calculate grey relational degree.

Grey relational degree indicates the numerical measure of similarity between reference sequence and comparative sequence, which is done by averaging the grey relational coefficients through Eq. (6):

$$ \gamma_{0i} = \frac{1}{n}\mathop \sum \limits_{k = 1}^{n} \zeta_{0i} \left( k \right) $$
(6)

in which k = 1, 2, …, n; i = 0, 1, 2, …, m.

2.2 Statistical correlation analysis

To further ascertain the level of association between the reference data and the compatibility data, the statistical correlation analysis is performed, in which correlation coefficient is used to examine the degree of correlation between two variables. The greater the absolute value of correlation coefficients, the stronger the correlation degree is; the closer the correlation coefficient is to 1 or − 1, the stronger the correlation degree is; the closer the correlation coefficient is to 0, the weaker the correlation degree is. Generally, the correlation coefficient was within the range of [0.8, 1.0], and the correlation degree is considered as very strong. In this research, we use Pearson correlation coefficient (PCC) to perform statistical correlation analysis. The two variables can be defined as \( X = [x_{1} ,x_{2} , \ldots ,x_{n} ] \) and \( Y = [y_{1} ,y_{2} , \ldots ,y_{n} ] \), and their PCC is calculated as follows:

$$ \rho \left( {x,y} \right) = \frac{{X \cdot Y - \left( {\mathop \sum \nolimits_{i = 1}^{n} x_{i} \mathop \sum \nolimits_{i = 1}^{n} y_{i} } \right)/n \cdot }}{{\sqrt {\left[ {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} - \left( {\mathop \sum \nolimits_{i = 1}^{n} x_{i} } \right)^{2} /n} \right]*} \left[ {\mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} - \left( {\mathop \sum \nolimits_{i = 1}^{n} y_{i} } \right)^{2} /n} \right]}} $$
(7)

where ρ (x, y) is the PCC of variables x and y, and −1  ≤ ρ (x, y) ≤  1. ρ(x, y)  =  −1 or 1 indicates completely negative and positive correlation, respectively, and ρ(x, y)  =  0 indicates no correlation at all. Besides, the larger the value of ρ(x, y) is, the stronger the correlation degree between x and y is.

2.3 Multivariable grey model

Grey prediction theory is often used to handle with a system which has sparse data and incomplete information. The modelling procedures of the multivariable grey model are presented as follows:

Step 1: Obtain the original and transformed sequences for modelling.

Assume that \( X^{\left( 0 \right)} = \left( {X_{1}^{\left( 0 \right)} ,X_{2}^{\left( 0 \right)} , \ldots ,X_{m}^{\left( 0 \right)} } \right) \) is an original nonnegative sequence, where \( X_{i}^{\left( 0 \right)} = \left( {x_{i}^{\left( 0 \right)} \left( 1 \right),x_{i}^{\left( 0 \right)} \left( 2 \right), \ldots ,x_{i}^{\left( 0 \right)} \left( n \right)} \right)^{T} \) is the observation sequence of the ith variable at times 1, 2, …, n. i = 1, 2, …, m; m is the number of variables.

Then, the original sequence is processed using the first-order accumulated generation operator (AGO). The matrix \( X^{\left( 1 \right)} = \left( {X_{1}^{\left( 1 \right)} ,X_{2}^{\left( 1 \right)} , \ldots ,X_{m}^{\left( 1 \right)} } \right) \) is the first-order accumulating generation matrix of \( X^{\left( 0 \right)} \). \( X_{i}^{\left( 1 \right)} = \left( {x_{i}^{\left( 1 \right)} \left( 1 \right), x_{i}^{\left( 1 \right)} \left( 2 \right), \ldots , x_{i}^{\left( 1 \right)} \left( n \right)} \right)^{T} \) is the first-order accumulating generation sequence of \( X_{i}^{\left( 0 \right)} \), where

$$ x_{i}^{\left( 1 \right)} \left( j \right) = \mathop \sum \limits_{k = 1}^{j} x_{i}^{\left( 0 \right)} \left( k \right),\quad j = 1,2, \ldots ,n,\quad i = 1,2, \ldots ,m $$
(8)

Step 2: Establish the multivariable grey model.

The multivariable grey model (MGM(1, m)), which is sometimes called the whitenization equations of MGM(1, m), is established as:

$$ \left\{ {\begin{array}{*{20}c} {\frac{{{\text{d}}x_{1}^{\left( 1 \right)} \left( t \right)}}{{{\text{d}}t}} = a_{11} x_{1}^{\left( 1 \right)} \left( t \right) + a_{12} x_{2}^{\left( 1 \right)} \left( t \right) + \cdots + a_{1m} x_{m}^{\left( 1 \right)} \left( t \right) + b_{1} } \\ {\frac{{{\text{d}}x_{2}^{\left( 1 \right)} \left( t \right)}}{{{\text{d}}t}} = a_{21} x_{1}^{\left( 1 \right)} \left( t \right) + a_{22} x_{2}^{\left( 1 \right)} \left( t \right) + \cdots + a_{2m} x_{m}^{\left( 1 \right)} \left( t \right) + b_{2} } \\ \cdots \\ {\frac{{{\text{d}}x_{m}^{\left( 1 \right)} \left( t \right)}}{{{\text{d}}t}} = a_{m1} x_{1}^{\left( 1 \right)} \left( t \right) + a_{m2} x_{2}^{\left( 1 \right)} \left( t \right) + \cdots + a_{mm} x_{m}^{\left( 1 \right)} \left( t \right) + b_{m} } \\ \end{array} } \right. $$
(9)

Let

$$ X^{\left( 1 \right)} \left( t \right) = \left\{ {x_{1}^{\left( 1 \right)} \left( t \right),x_{2}^{\left( 1 \right)} \left( t \right), \cdots ,x_{m}^{\left( 1 \right)} \left( t \right)} \right\}^{T} $$
(10)
$$ A = \left( {\begin{array}{*{20}c} {a_{11} } & {\begin{array}{*{20}c} {a_{11} } & \cdots \\ \end{array} } & {a_{11} } \\ {\begin{array}{*{20}c} {a_{21} } \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} {a_{22} } \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} \cdots \\ \vdots \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {a_{2m} } \\ \cdots \\ \end{array} } \\ {a_{m1} } & {\begin{array}{*{20}c} {a_{m2} } & \cdots \\ \end{array} } & {a_{mm} } \\ \end{array} } \right),\quad B = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {b_{1} } \\ {\begin{array}{*{20}c} {b_{21} } \\ \vdots \\ \end{array} } \\ \end{array} } \\ {b_{m} } \\ \end{array} } \right) $$
(11)

A is a developing grey matrix, and B is an endogenous control grey matrix.

Then, the dynamical model (8) can be rewritten in matrix form as Eq. (12):

$$ \frac{{{\text{d}}X^{\left( 1 \right)} \left( t \right)}}{{{\text{d}}t}} = AX^{\left( 1 \right)} \left( t \right) + B $$
(12)

Step 3: Estimate the model parameters.

To identify parameter matrix A and vector B, Eq. (8) is discretized as

$$ X^{\left( 0 \right)} \left( k \right) = AZ^{\left( 1 \right)} \left( k \right) + B $$
(13)

in which \( X^{\left( 0 \right)} \left( k \right) = \left\{ {x_{1}^{\left( 0 \right)} \left( k \right),x_{2}^{\left( 0 \right)} \left( k \right), \ldots ,x_{m}^{\left( 0 \right)} \left( k \right)} \right\}^{T} \), and \( Z^{\left( 1 \right)} \left( k \right) = \left\{ {z_{1}^{\left( 1 \right)} \left( k \right),z_{2}^{\left( 1 \right)} \left( k \right), \ldots ,z_{m}^{\left( 1 \right)} \left( k \right)} \right\}^{T} \) is the mean generation matrix of \( X^{\left( 1 \right)} \), and the sequence \( Z_{i}^{\left( 1 \right)} = \left( {z_{i}^{\left( 1 \right)} \left( 2 \right),z_{i}^{\left( 1 \right)} \left( 3 \right), \ldots ,z_{i}^{\left( 1 \right)} \left( n \right)} \right)^{T} \) is the generated mean sequence of \( X_{\text{i}}^{\left( 1 \right)} \), in which

$$ z_{i}^{\left( 1 \right)} \left( k \right) = \frac{1}{2}\left( {x_{i}^{\left( 1 \right)} \left( k \right) + x_{i}^{\left( 1 \right)} \left( {k - 1} \right)} \right),\quad i = 2,3, \ldots ,m,\quad k = 2,3, \ldots ,n $$
(14)

By the least square method, the parameter matrix A and B can be obtained as

$$ \left( {A, B} \right)^{T} = (L^{T} L)^{ - 1} L^{T} M $$
(15)

where

$$ L = \left[ {\begin{array}{*{20}c} {z_{1}^{\left( 1 \right)} \left( 2 \right)} & {z_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {z_{m}^{\left( 1 \right)} \left( 2 \right)} & 1 \\ {z_{1}^{\left( 1 \right)} \left( 3 \right)} & {z_{2}^{\left( 1 \right)} \left( 3 \right)} & \cdots & {z_{m}^{\left( 1 \right)} \left( 3 \right)} & 1 \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ {z_{1}^{\left( 1 \right)} \left( n \right)} & {z_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {z_{m}^{\left( 1 \right)} \left( n \right)} & 1 \\ \end{array} } \right]\quad M = \left[ {\begin{array}{*{20}c} {X^{\left( 0 \right)} \left( 2 \right)} \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X^{\left( 0 \right)} \left( 3 \right)} \\ \vdots \\ \end{array} } \\ {X^{\left( 0 \right)} \left( n \right)} \\ \end{array} } \\ \end{array} } \right] $$

Step 4: Obtain the time response function for predicting in the transformed domain.

Using the estimated parameters and \( x_{i}^{\left( 1 \right)} \left( 1 \right) = x_{i}^{\left( 0 \right)} \left( 0 \right) \), i = 1, 2, …, m, as the initial condition, the time response function can be achieved:

$$ \hat{x}_{i}^{\left( 1 \right)} \left( k \right) = e^{{ - \hat{A} \cdot \left( {k - 1} \right)}} \left( {x_{i}^{\left( 1 \right)} \left( 1 \right) - \hat{A}^{ - 1} \hat{B}} \right) + \hat{A}^{ - 1} \hat{B}, $$
(16)

where \( \hat{X}^{\left( 1 \right)} \left( k \right) \) represents the prediction vector of \( X^{\left( 1 \right)} \left( k \right) \).

Step 5: Obtain the fitted and forecasted values in the original domain.

By using the first-level inverse accumulating generation operator (I-AGO), the fitted and predicted values in the original domain can be calculated by Eq. (17):

$$ \hat{x}_{i}^{\left( 0 \right)} \left( k \right) = \hat{x}_{i}^{\left( 1 \right)} \left( k \right) - \hat{x}_{i}^{\left( 1 \right)} \left( {k - 1} \right) $$
(17)

in which \( \hat{x}_{i}^{\left( 0 \right)} \left( k \right) \) represents the prediction vector of \( x^{\left( 0 \right)} \left( k \right) \).

3 Proposed methods

3.1 Improved multivariable grey model

Although the grey prediction model has enjoyed high popularity in many predicting applications, there are still inherent limitations to be improved. One of the drawbacks is that the estimation of parameters and the prediction of variables are deriving from different approaches, which degrades the accuracy of grey prediction model for some data. If the parameter estimation and variable prediction use the same approach, the prediction error will be greatly reduced.

In the original multivariable grey model, the whitenization equation of MGM(1, m) (Eq. (9)) is used to predict variable values, while the estimation of parameters uses the discretized equation (Eq. (13)). The estimated parameters by Eq. (13) can only be regarded as approximate values of the parameters in Eq. (9); thus, the accuracy of prediction would be reduced due to adopting different methods to estimate parameters and predict variables.

In fact, Eq. (13) can be written as follows:

$$ \left( {X^{\left( 1 \right)} \left( k \right) - X^{\left( 1 \right)} \left( {k - 1} \right)} \right) = A^{\prime}\left( {X^{\left( 1 \right)} \left( k \right) + X^{\left( 1 \right)} \left( {k - 1} \right)} \right) + B $$
(18)

Namely,

$$ X^{\left( 1 \right)} \left( k \right) =\Gamma X^{\left( 1 \right)} \left( {k - 1} \right) + \psi $$
(19)

in which \( \Gamma = \frac{{1 - A^{\prime}}}{{1 + A^{\prime}}} , \psi = \frac{B}{{1 + A^{\prime} }}, \) \( A^{\prime} = \frac{A}{2} \); Eq. (19) is a first-order linear difference equation with constant coefficients. If both parameter estimation and variable prediction are calculated using the same difference equation, the accuracy will be significantly improved.

Definition 1

\( X^{\left( 1 \right)} \left( {k + 1} \right) = AX^{\left( 1 \right)} \left( k \right) + C\left( k \right) \) is the first-order discrete grey model of constant coefficient nonhomogeneous difference equation.

Theorem 1

If \( X^{\left( 1 \right)} \left( {\text{t}} \right)\;{\text{meets}}\;\hat{X}^{\left( 1 \right)} \left( {k + 1} \right) = A\hat{X}^{\left( 1 \right)} \left( k \right) + C \), then

$$ \hat{X}^{\left( 1 \right)} \left( k \right) = A\hat{X}^{\left( 1 \right)} \left( 0 \right)A^{k} + \Omega $$
(20)

Here, \( \Omega = E - EA^{k} \), \( E = \frac{C}{1 - A} ,\left( {A,C} \right)^{T} = (Q^{T} Q)^{ - 1} Q^{T} P \)

$$ Q = \left[ {\begin{array}{*{20}c} {x_{1}^{\left( 1 \right)} \left( 1 \right)} & {x_{2}^{\left( 1 \right)} \left( 1 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 1 \right)} & 1 \\ {x_{1}^{\left( 1 \right)} \left( 2 \right)} & {x_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 2 \right)} & 1 \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ {x_{1}^{\left( 1 \right)} \left( n \right)} & {x_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( n \right)} & 1 \\ \end{array} } \right] $$
(21)
$$ P = \left[ {\begin{array}{*{20}c} {x_{1}^{\left( 1 \right)} \left( 1 \right)} & {x_{2}^{\left( 1 \right)} \left( 1 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 1 \right)} \\ {x_{1}^{\left( 1 \right)} \left( 2 \right)} & {x_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 2 \right)} \\ \cdots & \cdots & \cdots & \cdots \\ {x_{1}^{\left( 0 \right)} \left( n \right)} & {x_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( n \right)} \\ \end{array} } \right] $$
(22)

The fitted and forecasted values can be obtained by Eq. (23) in the original domain:

$$ \hat{X}^{\left( 0 \right)} \left( k \right) = \hat{X}^{\left( 1 \right)} \left( k \right) - \hat{X}^{\left( 1 \right)} \left( {k - 1} \right) $$
(23)

Theorem 2

If \( X^{\left( 1 \right)} \left( {\text{t}} \right) \) meets \( \hat{X}^{\left( 1 \right)} \left( {k + 1} \right) = A\hat{X}^{\left( 1 \right)} \left( k \right) + C\left( k \right) \), then

$$ \hat{X}^{\left( 1 \right)} \left( k \right) = A^{k} \hat{X}^{\left( 1 \right)} \left( 0 \right) + \Omega^{{\prime }} $$
(24)

in which \( \Omega^{\prime } = \mathop \sum \nolimits_{r = 1}^{k} A^{r - 1} C\left( {k - r} \right) \).

In particular, (1) when \( C_{k} = B_{0} + B_{1} k \), then

$$ \hat{X}^{\left( 1 \right)} \left( k \right) = A^{k} \hat{X}^{\left( 1 \right)} \left( 0 \right) + \Omega^{\prime \prime } $$
(25)

in which \( \Omega^{\prime \prime } = \mathop \sum \nolimits_{r = 1}^{k} A^{r - 1} \left[ {B_{0} + B_{1} \left( {k - r} \right)} \right], \) \( \left( {A, B_{0} ,B_{1} } \right)^{T} = (Q^{T} Q)^{ - 1} Q^{T} P \)

$$ Q = \left[ {\begin{array}{*{20}c} {x_{1}^{\left( 1 \right)} \left( 1 \right)} & {x_{2}^{\left( 1 \right)} \left( 1 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 1 \right)} & 1 & 1 \\ {x_{1}^{\left( 1 \right)} \left( 2 \right)} & {x_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 2 \right)} & 1 & 2 \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\ {x_{1}^{\left( 1 \right)} \left( n \right)} & {x_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( n \right)} & 1 & n \\ \end{array} } \right] $$
(26)
$$ P = \left[ {\begin{array}{*{20}c} {x_{1}^{\left( 1 \right)} \left( 1 \right)} & {x_{2}^{\left( 1 \right)} \left( 1 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 1 \right)} \\ {x_{1}^{\left( 1 \right)} \left( 2 \right)} & {x_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 2 \right)} \\ \cdots & \cdots & \cdots & \cdots \\ {x_{1}^{\left( 0 \right)} \left( n \right)} & {x_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( n \right)} \\ \end{array} } \right] $$
(27)

(2) When \( C_{k} = B_{0} + B_{1} k + B_{2} k^{2} \), then

$$ \hat{X}^{\left( 1 \right)} \left( k \right) = A^{k} \hat{X}^{\left( 1 \right)} \left( 0 \right) + \Omega^{\prime \prime \prime } $$
(28)

Here, \( \Omega^{{{\prime \prime \prime }}} = \mathop \sum \limits_{r = 1}^{k} A^{r - 1} \left[ {B_{0} + B_{1} \left( {k - r} \right) + B_{2} \left( {k - r} \right)^{2} } \right], \left( {A,B_{0} ,B_{1} ,B_{2} } \right)^{T} = (Q^{T} Q)^{ - 1} Q^{T} P \)

$$ Q = \left[ {\begin{array}{*{20}c} {x_{1}^{\left( 1 \right)} \left( 1 \right)} & {x_{2}^{\left( 1 \right)} \left( 1 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 1 \right)} & 1 & 1 & 1 \\ {x_{1}^{\left( 1 \right)} \left( 2 \right)} & {x_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 2 \right)} & 1 & 2 & 2 \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\ {x_{1}^{\left( 1 \right)} \left( n \right)} & {x_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( n \right)} & 1 & n & {n^{2} } \\ \end{array} } \right] $$
(29)
$$ P = \left[ {\begin{array}{*{20}c} {x_{1}^{\left( 1 \right)} \left( 1 \right)} & {x_{2}^{\left( 1 \right)} \left( 1 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 1 \right)} \\ {x_{1}^{\left( 1 \right)} \left( 2 \right)} & {x_{2}^{\left( 1 \right)} \left( 2 \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( 2 \right)} \\ \cdots & \cdots & \cdots & \cdots \\ {x_{1}^{\left( 0 \right)} \left( n \right)} & {x_{2}^{\left( 1 \right)} \left( n \right)} & \cdots & {x_{m}^{\left( 1 \right)} \left( n \right)} \\ \end{array} } \right] $$
(30)

The fitted and forecasted values can be obtained by Eq. (31) in the original domain:

$$ \hat{X}^{\left( 0 \right)} \left( k \right) = \hat{X}^{\left( 1 \right)} \left( k \right) - \hat{X}^{\left( 1 \right)} \left( {k - 1} \right) $$
(31)

3.2 Hybrid prediction model

In the part, we integrate correlation analysis test and improved multivariable grey model (IMGM) to construct a hybrid prediction model of electricity consumption. First, correlation analysis test is done using grey relational analysis and statistical correlation analysis before performing the estimation of variables of interest. Then, the values got from correlation analysis test are processed using the first-order accumulated generation operator (AGO). Finally, we input the transformed sequence by AGO into the IMGM to get the fitted and forecasted values. The detailed steps of this hybrid model are shown in Fig. 2.

Fig. 2
figure 2

The proposed hybrid prediction model

4 Experimental analysis

In this part, we predict the electricity consumption using our hybrid IMGM approach to test the effectiveness and performance of our approach. All the experimental data sets are performed in the MATLAB R2016b environment on the Microsoft Windows 7 Pro operating system. The hardware environment is on a computer with Intel(R) Core(TM) i5-4440 3.10 GHz CPU and 8-GB RAM.

The electricity consumption data are driving from Shanxi province and Beijing city of China from 1999 to 2018 shown in Fig. 3, which is available from National Bureau of Statistics of China (http://data.stats.gov.cn/). It is noteworthy that the electricity consumption can be affected by many influencing factors such as gross domestic product (GDP), third industry proportion, urbanization rate, mean annual temperature, consumption level of urban residents, second industry proportion, third industry proportion, total volume of retail sales, permanent residents, fixed assets investment, urbanization rate, and so on [35]. In this research, we consider six influencing factors including GDP, consumption level of urban residents, second industry proportion, total volume of retail sales, permanent residents, fixed assets investment based on the grey relational grade and the statistical correlations. The grey relational grade is to measure the degree of correlation between the two factors according to the similarity or difference of the two factors. The values of grey relational grade within the range of [0.5, 0.9] are considered moderate link among factors [39]. The statistical correlation analyses the degree of connection between the two factors. The degrees of the statistical correlations above 0.9 are marked a strong inter-correlation among the factors of interest [40]. Here, we define the variable \( \chi_{0} \) as the electricity consumption, and \( \chi_{1} \), \( \chi_{2} \), \( \chi_{3} \), \( \chi_{4} \), \( \chi_{5} \), and \( \chi_{6} \) denote consumption level of urban residents, second industry proportion, total volume of retail sales, GDP, permanent residents, and fixed assets investment, respectively, as shown in Table 1. The data of electricity consumption and its influencing factors are shown in Tables 2 and 3.

Fig. 3
figure 3

The electricity consumption in Shanxi province and Beijing city from 1999 to 2019

Table 1 Nomenclature of the reference variable and its influence factors
Table 2 Data of all factors from 1999 to 2018 for Shanxi province
Table 3 Data of all factors from 1999 to 2018 for Beijing city

The data set is divided into two groups: the data from 1999 to 2013 are utilized as the training data set and 2014–2018 as the testing data set. To verify the presented hybrid forecasting model, we compare our proposed improved multivariable grey model (IMGM) with the original multivariable grey model (MGM), generalized regression neural network model with fly optimization algorithm (FOAGRNN) [36], grey prediction model with convolution integral (GMC) [24], and self-adaptive fractional weighted grey model (SFOGM) [20].

The parameters of MGM model are calculated using the least square method. The parameters of IMGM model are estimated by Eqs. (20)–(31). FOAGRNN model uses fly optimization algorithm (FOA) for the parameter selection of the generalized regression neural network (GRNN) model. The main parameters of FOA are the maximum iteration number (maxgen), the population size (sizepop), the initial fruit fly swarm location (X_axis, Y_axis), and the random flight distance range (FR). In this case, suppose maxgen = 100, sizepop = 10, (X_axis, Y_axis) ⊂ [0, 1], FR ⊂ [− 10, 10]. SFOGM model utilizes particle swarm optimization (PSO) to determine the parameters of the self-adaptive fractional parameter and the input time parameter through minimizing the mean absolute percentage error between the simulation value and the actual value. The main parameters of PSO are the acceleration factors (c1 and c2), population size (popsize), iteration times (itetimes), the dimensions (Dim), and inertia weight (w). In this experiment, set c1 = c2 = 2, popsize = 30, itetimes = 1000, Dim = 2.

In multivariable grey model, the parameter estimation and variable prediction are crucial; otherwise, this can lead to performance degradation. The original multivariable grey model uses the whitenization equation of MGM(1, m) (Eq. 9) and discretized equation (Eq. (13)) to estimate variable values and parameter values. The parameters estimated through Eq. (13) are approximate values of the parameters; thus, the accuracy of prediction will be reduced. In this study, a first-order linear difference equation with constant coefficients is introduced to compute both parameter estimation and variable prediction, as the parameter estimation and variable prediction use the same approach, and the accuracy will be significantly improved.

To verify the effectiveness of our method, we use two evaluation criteria including the mean absolute error (MAE) [37] and the mean absolute percent error (MAPE) [38]. MAE is used to reflect the overall level of errors; MAPE is regarded as a measure of the prediction accuracy of a forecasting method in statistics. The evaluation criteria are given as follows:

$$ {\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {y_{i} - \hat{y}_{i} } \right| $$
(32)
$$ {\text{MAPE}} - = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right| \times 100 $$
(33)

in which N is the number of forecasting data points, which is 20 in our simulation, \( y_{i} \) and \( \hat{y}_{i } \) represent the actual and predicted values of electricity consumption, respectively, at time i.

4.1 The correlation test

Before conducting the prediction of electricity consumption, the correlation test is required to be done to determine the variables for modelling. Tables 4 and 6 display the calculation results of the grey relational grade of two cases when the distinguishing coefficient ρ equals 0.5. The overall values change between 0.5 and 0.9 with the minimum and maximum values being 0.5572 and 0.8979, respectively. This indicates moderate link among factors [39]. To further ascertain the level of association between the reference data and the compatibility data, the statistical correlations [40] are computed, as shown in Tables 5 and 7. It is apparent that most correlation degrees are above 0.9 which marks a strong inter-correlation among the factors of interest. Both measures agree well with each other which can therefore suggest that the factors considered in our model are reasonably correlated. Therefore, we select six influencing factors including GDP, consumption level of urban residents, second industry proportion, total volume of retail sales, permanent residents, and fixed assets investment in our prediction model.

Table 4 Results of grey relational grade for Shanxi province
Table 5 Results of statistical correlation for Shanxi province
Table 6 Results of grey relational grade for Beijing city
Table 7 Results of statistical correlation for Beijing city

4.2 Simulation results

According to the proposed IMGM, we first use the first-order accumulated generation operator to process the original sequence. And then based on the multivariable grey model, the first-order linear difference equation with constant coefficients (Eq. 19) is used to calculate the parameters of developing grey matrix and endogenous control grey matrix. Finally, using the estimated parameters and the first-order accumulating generation sequence, the fitted and forecasted values can be obtained by Eq. (31) in the original domain.

Table 8 lists results of MAPE and MAE for IMGM, SFOGM, GMC, FOAGRNN, and MGM models in the example of Shanxi province and Beijing city from 2014 to 2018, where the values in bold represent the smallest values of MAE and MAPE. According to the results of the above-mentioned forecasting models listed in Table 8 and Fig. 4, it can be observed that the proposed hybrid forecasting model obtains the highest prediction accuracy (via the MAE and MAPE criteria). We can see that the developed model significantly outperforms than the four compared models in most indices, except for SFOGM model in the MAPEs index. The MAPEs index of the developed model is 5.1667 and 5.3468% for Shanxi province and Beijing city, respectively. For the original grey model of MGM, the prediction accuracy is poor because this model uses the different approaches to estimate the parameters and variables; thus, the estimated parameters are the approximate values of developing grey matrix and endogenous control grey matrix, which degrades the accuracy of grey prediction model. Among the four compared forecasting methods of SFOGM, GMC, FOAGRNN, and MGM, SFOGM gets higher prediction accuracy when compared with GMC, FOAGRNN, and MGM, because SFOGM uses the particle swarm algorithm to estimate the adjustable fractional weighted coefficients and corresponding time parameter of the initial condition, which promotes the forecasting precision. In addition, this model introduces the fractional weighted coefficients to design the optimized initial condition that captures the dynamic characteristics of the electricity consumption observations. SFOGM and GMC perform better than MGM in both cases. However, when comes to the forecasting models using optimization algorithm, such as FOAGRNN and SFOGM, GMC still shows large forecasting errors. The MAPEs of the SFOGM, FOAGRNN, and GMC are 4.0485, 5.6373, and 6.6547% for Shanxi province and 4.9212, 6.7878, and 10.2873% for Beijing city, respectively. The grey forecasting model using optimization algorithm, i.e. SFOGM, obtained more reliable and accurate electricity consumption forecasting results than GMC and MGM, both of which are not using optimization algorithm. The forecasting results of SFOGM are more consistent with the actual values; however, SFOGM estimates the parameters and variables using different approaches, which degrades the accuracy of grey prediction model. All in all, the proposed model is more suitable than the others to forecast annual electricity consumption.

Table 8 Results of MAPE and MAE for IMGM, SFOGM, GMC, FOAGRNN, MGM models in the example of Shanxi province and Beijing city
Fig. 4
figure 4

The illustration of MAE, MAPE, RMSE of different models for Shanxi province and Beijing city

Table 9 and Fig. 5 give the forecasting results of electricity consumption with the IMGM, MGM, FOAGRNN, GMC, SFOGM models. Table 9 also lists the relative errors of the five forecasting models. According to Table 9 and Fig. 5, it can be clearly seen that all of the five forecasting models capture the changing trend, but the performance of IMGM, SFOGM, GMC, and FOAGRNN is better than the MGM model.

Table 9 Forecasting results of electricity consumption using IMGM, SFOGM, GMC, FOAGRNN, MGM models
Fig. 5
figure 5

Forecasting results of Shanxi’s annual electricity consumption from 2014 to 2018

Table 10 gives the number of the relative errors falling in the scope of [− 3, + 3%] and the number of the relative errors falling in the scope of [− 1, + 1%]. Table 10 also presents the maximum and minimum relative errors. The error range [− 3, + 3%] is always considered as a standard to measure the performance of the forecasting model [41]. Therefore, this paper uses this range to compare the five forecasting models.

Table 10 The maximum and minimum relative errors and the number of the relative error in [− 3, + 3%] and [− 1, + 1%], in which \( rate\_1 = \frac{{{\text{Points}}\;{\text{in}} \left[ { - 1, + 1\% } \right]}}{{{\text{Total}}\;{\text{Points }}}} \), \( rate\_3 = \frac{{{\text{Points}}\;{\text{in}} \left[ { - 3, + 3\% } \right]}}{{{\text{Total}}\;{\text{Points }}}} \)

From Table 10, it can be seen that the relative errors of IMGM, SFOGM, GMC, and FOAGRNN models are in the scope of [− 3, + 3%]. The number of the relative error in [− 1, + 1%] using IMGM, SFOGM, GMC, FOAGRNN model is 3, which means 60% of forecasting points are in [− 1, + 1%]. Using MGM models, the number of the relative error in [− 1, + 1%] is 2, which means 40% of forecasting points are in [− 1, + 1%]. The maximum error using IMGM model is 1.834%, smaller than that of 3 models containing MGM, FOAGRNN, and GMC model. The minimum relative error using IMGM is − 0.157%, better than that of three models containing FOAGRNN, GMC, and SFOGM model.

Figure 6 describes the error analysis of the five forecasting models. From Fig. 6, the deviation between the forecasting results and the actual value can be captured, which shows the performance of IMGM, SFOGM, GMC, FOAGRNN is better than the MGM model.

Fig. 6
figure 6

Error analysis for Shanxi province

Table 11 lists the electricity consumption of Shanxi province and Beijing city in 2019 and 2020 forecasted by the five models.

Table 11 Forecasting results of electricity consumption of Shanxi province and Beijing city in 2019 and 2020

5 Conclusion and discussion

This paper primarily focuses on the prediction of electricity consumption. We use the grey forecasting model since it can achieve prediction based on restricted number of available data. Since realistic and complex systems are often consisted of many variables which are not independent of each other and have mutual correlation among them, we focus on research on multivariable grey model. Different from the original multivariable grey model, which uses different approaches for parameters estimation and variables prediction thus degrading the prediction accuracy, the proposed improved multivariable grey model used the same first-order linear difference equation with constant coefficients for both parameters estimation and variables prediction so that the accuracy is significantly improved.

To investigate the performance of the proposed methods in the prediction and estimation of electricity consumption, other driving factors of the system are first acquired. These include consumption level of urban residents, second industry proportion, total volume of retail sales, GDP, permanent residents, and fixed assets investment. Grey relational analysis and statistical correlation analysis are used to justify their strong relations. Using 1999–2018 historical data, it is found that IMGM, SFOGM, GMC, and FOAGRNN models have superior estimation performance over the MGM model. For training data, the MAPE of traditional MGM model is 10.4578, that of GMC model is 6.6547, that of FOAGRNN is 5.6373, and that of IMGM is 5.1667. IMGM is best. For the out-of-sample data in 2014, the relative error of MGM model is 1.614, that of SFOGM model is 0.807, that of GMC is 1.649, that of SFOGM is 0.552, and that of IMGM is 0.138. IMGM is the smallest. Therefore, it can be used for other real cases for electricity consumption forecasting. In theory, the grey forecasting model is suitable for addressing the limited sample forecasting problems [42]. Limited sample is suitable for short-term projection. In practice, the trends of these relative factors may change or the relationship between the reference series and comparison series may vary in the long term, so the IMGM is also applicable for short-term projection.

Although the proposed model got the competitive performance, one of limitations is that this model cannot deal with the conflict between prediction accuracy and generalization ability well. The first-order linear difference equation with constant coefficients used for both parameters estimation and variables prediction lacks generalization ability. In future studies, we will devote to studying first-order nonhomogeneous difference equation with variable coefficients and second-order nonhomogeneous difference equation with variable coefficients, which can be used to study more complex models.