Introduction

China is a developing country and is in a period of rapid development. Electricity consumption forecast is an important part of Power Economic Planning, energy investment, and environmental protection (Lin and Liu 2016). Electricity consumption forecast has become an important research area in the operation and management of modern power systems (Kavousi-Fard et al. 2014). The high and low accuracy of electricity consumption forecasting (Amber et al. 2018) is of great significance to economic development and power planning. Accurate electricity consumption forecast is affected by a series of factors, such as population (Hussain et al. 2016), economic growth (Lin and Liu 2016), power facilities (Khosravi et al. 2012), and climate factors (Hernández et al. 2013), making the prediction problem a challenging and complex task. In order to solve these problems, in recent years, many domestic and foreign experts, scholars and related research institutions have done a lot of in-depth research on electricity consumption forecast models. The main methods are non-linear intelligent models (Bekiroglu et al. 2018; Hernandez et al. 2014), traditional statistical analysis models (Chui et al. 2009; Mohamed and Bodger 2005), and grey prediction models (Xiao et al. 2017).

Ghani and Ahmad (2010) used SPSS software to establish a linear regression model based on the multiple regression method to predict and analyze the fish landing to demonstrate the effectiveness and feasibility of the method. Wang J et al. (2018) used an autoregressive moving average (ARMA) model based on the time series algorithm to predict short-term wind power. Zhang et al. (2019) combined BP and RBF neural network methods to predict and analyze wind speed to verify the effectiveness and accuracy of the combined prediction model. The above models usually require a large amount of data, so it is difficult to get accurate results when dealing with limited data. Grey theory (Julong 1982) is used to solve the problem of uncertainty with limited data and poor information. It focuses on building a grey prediction model with a small amount of information. The grey theory is not to find the statistical law of time series data, but to associate the random process with time, and use the cumulative generation operation to process the original data. It reduces the inherent randomness of the data, transforms the irregular data into an exponential form, generates a sequence with strong regularity, and can predict the future direction of the data according to the theory. Grey systems are often used to discover the laws hidden in chaotic data. Wang et al. (1819) used the grey management degree and grey theory to establish a grey system model based on the basic data of urban heating and forecast its demand. Guo et al. (2013) proposed a new comprehensive adaptive grey model, CAGM(1,N), which can be applied to any actual forecasting issues and can obtain higher fitting and prediction accuracy compared with the traditional GM(1,N) model (Pai et al. 2008; Tien 2008).

These methods make predictions based on raw data by maximizing the fitting accuracy, but do not take into account the complex diversity of things themselves. The single grey model has a large error in the process of forecasting, and it is difficult to achieve the expected accuracy. The Markov model is suitable for the prediction of random problems, and it can better describe the dynamic trend of randomly changing objects and can make up for the shortcomings of the grey model. Therefore, two prediction models are combined. The grey model is used to forecast, and the predicted values are corrected by the Markov chain to effectively improve the accuracy of the prediction, so as to achieve the purpose of scientific prediction and analysis. So far, the combination of grey theory and Markov model has applied many areas of prediction, such as Yong and Yidan (1992) put forward the grey Markov model for the first time by combining the advantages of grey model and Markov theory, which has since been widely used in the prediction of traffic, natural disasters, energy consumption, and other fields. Kumar and models (2010) combined the grey Markov and time series model to ground breakingly predict energy consumption in India, which provided a feasible scheme for the prediction of India’s energy consumption. CAO Jian et al. (2019) explored the internal relationship among the accidents of road transport on hazardous chemical and the traffic accidents in China based on grey Markov model, and analyzed the grey Markov combined prediction model in the prediction of safety accident. The effectiveness and feasibility of the method had been verified by experiments.

The above research shows that the grey Markov model has better prediction accuracy and ability. However, the actual operation found that when faced with the prediction of small sample data, the model still has a certain degree of contingency. Therefore, according to the characteristics of China’s electricity consumption data, this paper optimizes and improves the traditional grey Markov model, and proposes RGPMM(λ,1,1) to predict China’s electricity consumption more accurately, so as to provide more accurate information for the rational distribution of energy.

objective

This article has carried out the work in the following aspects:

  1. 1)

    Grey power prediction model (Wang et al. 2011). It is a new type of nonlinear grey prediction model. Its power index can reflect the nonlinear development characteristics of the data and is used to describe the development of things. The nonlinear situation has good prediction results.

  2. 2)

    The power index λ is generally taken as integer in the traditional grey prediction models, such as λ = 2, which is called Verhulst model. In this paper, λ belongs to the real number R, that is, λ can be taken the fraction to establish the model and the value of can be estimated through optimization theory. Then a novel grey prediction model can be established. In the process of modeling, the robustness of λ is analyzed.

  3. 3)

    The introduction of rolling mechanism (Akay and Atak 2007). In the forecasting process, because the data from the far past has little effect on the forecast, the Rolling Mechanism is introduced to continuously update the input information, which breaks the constraint of constant initial value in the classic grey prediction model and complies with the principle of “new information priority” (Julong 1989).

  4. 4)

    In this paper, the relative error of the grey power model is used as the index, and the weighted Markov theory (Liu et al. 2018) is used to correct the grey power model, which further improves the prediction accuracy and adaptability of the model.

Organization

The rest of this article is organized as follows: the “Basic knowledge” section briefly introduces the historical background of the grey model and the traditional GM(1,1) model, Markov theory, rolling mechanism, and the grey development zone. The “Methodology of improved grey prediction model” section introduces how to build the RGPMM(λ,1,1). The “Case studies on forecasting the total electricity consumption in China” section illustrates the practicability of the RGPMM(λ,1,1) by experiment, and forecasts the total electricity consumption in the next few years by this model. The “Conclusion” section contains the conclusions and suggestions for future work.

Basic knowledge

Basic GM(1,1)

In 1982, Professor Deng Julong first proposed the concept of grey system and built the GM(1,1). The process of the GM(1,1) is as follows Julong (1982), Lin et al. (2012), and Zeng et al. (2020):

Step 1: Transforing the original data. Let a set of non-negative sequences is \(X^{\left (0\right )}=\left \{x^{\left (0\right )}\left (1\right ),\right . x^{\left (0\right )}\left (2\right ),\cdots ,x^{\left (0\right )}\) \(\left .\left (n\right )\right \},(n\geq 4)\). The 1-AGO sequence is given by

$$ X^{\left( 1\right)}=\left\{x^{\left( 1\right)}\left( 1\right),x^{\left( 1\right)}\left( 2\right),\cdots,x^{\left( 1\right)}\left( n\right)\right\} $$
(1)

where, \(x^{\left (1\right )}\left (k\right )={\sum }_{i=1}^{k}x^{\left (0\right )}\left (i\right ), k=1,2,\cdots ,n.\)

Step 2: Based on the sequence \(X^{\left (1\right )}\), the whitening form equation of the prediction model can be established:

$$ \frac{dx^{\left( 1\right)}}{dt}+a\cdot x^{\left( 1\right)}=b. $$
(2)

In Formula (2), a and b are the parameters to be estimated. The grey differential equation is:

$$ x^{\left( 0\right)}\left( k\right)+a\cdot z^{\left( 1\right)}\left( k\right)=b, $$
(3)

where, \(z^{\left (1\right )}\left (k\right )=\frac {1}{2}\cdot \left [x^{\left (1\right )}\left (k\right )+x^{\left (1\right )}\left (k-1\right )\right ]\) is the background value and \(Z^{\left (1\right )}=\left \{z^{\left (1\right )}\left (2\right ),z^{\left (1\right )}\left (3\right ),\cdots ,z^{\left (1\right )}\left (n\right )\right \}\) is the mean sequence of \(X^{\left (1\right )}\) (Xiong et al. 2014).

Step 3: Estimating the model parameters. Set the parameters vector to be estimated as \(\begin {pmatrix}\hat {a}\\\hat {b}\end {pmatrix}\) and solve it according to the least squares method to obtain

$$ \left( \begin{array}{cc}\hat{a}\\\hat{b} \end{array}\right)=\left( \begin{array}{cc}B^{T}B \end{array}\right)^{-1}B^{T}Y, $$
(4)

where \(B=\left (\begin {array}{cc}-z^{\left (1\right )}\left (2\right )&1\\ -z^{\left (1\right )}\left (3\right )&1\\{\cdots } & {\cdots } \\ -z^{\left (1\right )}\left (n\right )&1 \end {array}\right )\), \(Y=(x^{\left (0\right )}\left (2\right ),x^{\left (0\right )}\left (3\right ),\) \(\cdots ,x^{\left (0\right )}\left (n\right ) )^{T}\).

Step 4: Obtaining the time response function. According to Eq. 4, solve (2), then the time corresponding equation is computed as:

$$ \begin{array}{@{}rcl@{}} \hat{x}^{\left( 1\right)}\left( t+1\right)&=&\left( x^{\left( 0\right)}\left( 1\right)-\frac{b}{a}\right)\cdot e^{-at}+\frac{b}{a},\\ t&=&1,2,\cdots,n,n+1,\cdots. \end{array} $$

Step 5: Obtaining the fitted and predicted values in the original domain. The simplified predicted value of the first-order accumulation operator sequence is \(\hat {x}^{\left (0\right )}\left (t\right )=\hat {x}^{\left (1\right )}\left (t\right )-\hat {x}^{\left (0\right )}\left (t-1\right ),\) namely,

$$ \begin{array}{@{}rcl@{}} \hat{x}^{\left( 0\right)}\left( t\right)&=&\left[x^{(0)}(1)-\frac{\hat{b}}{\hat{a}}\right]\cdot\left( 1-e^{\hat{a}}\right)\cdot e^{-\hat{a}(t-1)},\\ t&=&2,3,\cdots,n,n+1,\cdots. \end{array} $$

where \(\hat {x}^{(0)}(t) (t\le n)\) are called fitted values, and \(\hat {x}^{(0)}(t)(t>n)\) are called predicted values.

The flow chart is shown in Fig. 1.

Fig. 1
figure 1

The flow chart of GM(1,1)

Markov process

Markov process (Zhao et al. 2014) is a theory that studies the state of things and their transition. A Markov process in which time and state are both discrete is called a Markov chain. Markov chain analysis is a statistical analysis method based on the probability theory and stochastic process theory, using stochastic mathematical models to analyze the quantitative relationship of objective objects in the development and change process. Its characteristic is no after-effect, that is, the current state of the system is only related to the previous state, and has nothing to do with the subsequent state.

Transition probability and transition probability matrix

In Markov process, the transition probability and the transition probability matrix of states need to be calculated, which are defined as follows:

Definition 1

Let {Xn, nT} be a Markov chain, and call the conditional probability pij(n,T) = P(Xn+ 1|Xn = i),i,jT the one-step transition probability of the Markov chain {Xn, nT} at time n, which is referred to as the transition probability. That is, the conditional probability that the particle is in state i at time n and then is in state j after one step. The matrix composed of transition probability is the transition probability matrix. In a Markov chain, the system state transition can be represented by the transition probability matrix P as follows:

$$ P=\left( \begin{array}{cccc}p_{11}&p_{12}&\cdots&p_{1n}\\ p_{21}&p_{22}&\cdots&p_{2n}\\ {\vdots} & {\vdots} &{\ddots} &\vdots\\ p_{n1}&p_{n2}&\cdots&p_{nn} \end{array}\right) $$

The steps of Markov process

Markov process is introduced to obtain the transition probability of residual state, so as to determine the state of the residual when t > n. The steps are as follows:

Step 1: Determine the residual state;

Step 2: Calculate the state transition probability matrix P according to the residual state;

Step 3: Determine the initial state vector;

Step 4: According to the state transition formula, calculate the result of the t th state transition, and take the one with higher probability of occurrence status.

Methodology of improved grey prediction model

The grey power model

The GPM(λ,1,1) is an extension of the traditional GM(1,1). In this paper, the power exponent of GPM(λ,1,1) is analyzed according to the information covering principle of grey system, and the following definitions are given.

Definition 2

(The grey power model) Assuming \(X^{\left (0\right )}\) is a non-negative unimodal raw data sequence, \(X^{\left (1\right )}\) is the 1 − AGO sequence of \(X^{\left (0\right )}\). \(Z^{\left (1\right )}\) is a sequence generated next to the mean of the \(X^{\left (1\right )}\). Then, there is the following non-linear model which meets the three conditions of gray modeling, and the grey power model is

$$ x^{\left( 0\right)}\left( k\right)+a\cdot z^{\left( 1\right)}\left( k\right)=b\cdot \left[z^{\left( 1\right)}\left( k\right)\right]^{\lambda} $$

The whitening equation of the grey power model is

$$ \frac{dx^{\left( 1\right)}}{dt}+a\cdot x^{\left( 1\right)}=b\cdot \left[x^{\left( 1\right)}\right]^{\lambda} $$
(5)

Solving the above model by the solution method of GM(1,1), we can get the solution of the whitening equation is

$$ x^{\left( t+1\right)} = \left\{e^{-\left( 1-\lambda\right)at}\left[\left( 1 - \lambda\right)\int be^{\left( 1-\lambda\right)at}dt+c\right]\right\}^{\frac{1}{1-\lambda}}. $$
(6)

Parameters analysis of G P M(λ,1,1)

Parameter λ estimation method

The parameter λ is an important coefficient in the GPM(λ,1,1). According to the above formulas, since \(x^{\left (1\right )}\neq 0\), divide both sides of Eq. 5 by \(\left [x^{\left (1\right )}\right ]^{\lambda }\) and then take the deriation about t to get (7) as follows:

$$ \begin{array}{@{}rcl@{}} \frac{d^{2}x^{\left( 1\right)}}{dt^{2}}&\cdot& \left[x^{\left( 1\right)}\right]^{\lambda}-\lambda \cdot \left( \frac{dx^{\left( 1\right)}}{dt}\right)^{2}\left[x^{\left( 1\right)}\right]^{\lambda-1}=-a\left( 1-\lambda\right)\\&\cdot& \left[x^{\left( 1\right)}\right]^{\lambda}\cdot \frac{dx^{\left( 1\right)}}{dt} \end{array} $$
(7)

According to the information coverage principle of grey derivative, we cover \(\frac {dx^{\left (1\right )}}{dt}\) and \(\frac {d^{2}x^{\left (1\right )}}{dt^{2}}\) in Eq. 7 with the first grey derivatives and the second grey derivatives of \(x^{\left (1\right )}\), then we will get

$$ \begin{aligned} &\left[x^{\left( 0\right)}\left( t\right)-x^{\left( 0\right)}\left( t-1\right)\right]\cdot \left[z^{\left( 1\right)}\left( t\right)\right]^{\lambda}\\&-\lambda \cdot \left[x^{\left( 0\right)}\left( t\right)\right]^{2}\cdot\left[z^{\left( 1\right)}\left( t\right)\right]^{\lambda-1}\\ &=-a\left( 1-\lambda\right)\cdot \left[z^{\left( 1\right)}\left( t\right)\right]^{\lambda}\cdot x^{\left( 0\right)}\left( t\right) \end{aligned} $$
(8)

Dividing the Eq. 8 with t = k by the Eq. 8 with t = k + 1, we can eliminate the unknown parameter a and get

$$ \begin{aligned} &\frac{\left[x^{\left( 0\right)}\left( k\right)-x^{\left( 0\right)}\left( k-1\right)\right]\cdot \left[z^{\left( 1\right)}\left( k\right)\right]^{\lambda}-\lambda \cdot \left[x^{\left( 0\right)}\left( k\right)\right]^{2}\cdot\left[z^{\left( 1\right)}\left( k\right)\right]^{\lambda-1}}{\left[x^{\left( 0\right)}\left( k+1\right)-x^{\left( 0\right)}\left( k\right)\right]\cdot \left[z^{\left( 1\right)}\left( k+1\right)\right]^{\lambda}-\lambda \cdot \left[x^{\left( 0\right)}\left( k+1\right)\right]^{2}\cdot\left[z^{\left( 1\right)}\left( k+1\right)\right]^{\lambda-1}}\\ &=\frac{\left[z^{\left( 1\right)}\left( k\right)\right]^{\lambda}\cdot x^{\left( 0\right)}\left( k\right)}{\left[z^{\left( 1\right)}\left( k+1\right)\right]^{\lambda}\cdot x^{\left( 0\right)}\left( k+1\right)} \end{aligned} $$
(9)

It follows from Eq. 9 that

$$ \begin{aligned} \lambda=&\frac{\left[x^{\left( 0\right)}\left( k+1\right)-x^{\left( 0\right)}\left( k\right)\right]\cdot z^{\left( 1\right)}\left( k+1\right)\cdot z^{\left( 1\right)}\left( k\right)\cdot x^{\left( 0\right)}\left( k\right)}{\left[x^{\left( 0\right)}\left( k+1\right)\right]^{2}\cdot z^{\left( 1\right)}\left( k\right)\cdot x^{\left( 0\right)}\left( k\right)-\left[x^{\left( 0\right)}\left( k\right)\right]^{2}\cdot z^{\left( 1\right)}\left( k+1\right)\cdot x^{\left( 0\right)}\left( k+1\right)}\\ &-\frac{\left[x^{\left( 0\right)}\left( k\right)-x^{\left( 0\right)}\left( k-1\right)\right]\cdot z^{\left( 1\right)}\left( k\right)\cdot z^{\left( 1\right)}\left( k+1\right)\cdot x^{\left( 0\right)}\left( k+1\right)}{\left[x^{\left( 0\right)}\left( k+1\right)\right]^{2}\cdot z^{\left( 1\right)}\left( k\right)\cdot x^{\left( 0\right)}\left( k\right)-\left[x^{\left( 0\right)}\left( k\right)\right]^{2}\cdot z^{\left( 1\right)}\left( k+1\right)\cdot x^{\left( 0\right)}\left( k+1\right)} \end{aligned} $$
(10)

From the expression of λ, we can see that it can not only reflect the grey derivative of the original data, but also reflect the role of grey integral. When k = 2,3,⋯ ,n − 1, the corresponding \(\left (n-2\right )\) values of λ can be computed, which is {λk}.

Let \(g\left (\lambda \right )=\sum \limits _{k=2}^{n-1}{\left (\lambda -\lambda _{k}\right )^{2}}\), the value of λ that makes \(g\left (\lambda \right )\) take the minimum value is the constant value to be determined.

Since \(g\left (\lambda \right )\) is a parabola with an opening upward, according to the first order condition of unconditional extremum, the optimal value of λ is

$$ \begin{aligned} \hat{\lambda}=\frac{1}{n-2}\sum\limits_{k=2}^{n-1}{\lambda_{k}}. \end{aligned} $$
(11)

In this case, \(g\left (\hat {\lambda }\right )\) takes the minimum value.

Estimates of parameters a and b

After the optimal value of λ is determined, the parameters a and b can be estimated directly according to the least square method. Then, we can get the theorem 1.

Theorem 1

Assuming \(X^{\left (0\right )}\), \(X^{\left (1\right )}\) and \(Z^{\left (1\right )}\) are as defined in Definition 1, then the least squares estimate of the parameter sequence in GPM(λ,1,1) is

$$ \begin{pmatrix}\hat{a}\\\hat{b}\end{pmatrix}=\begin{pmatrix}B^{T}B\end{pmatrix}^{-1}B^{T}Y, $$
(12)

where \(B=\begin {pmatrix}-z^{\left (1\right )}\left (2\right )&\left [z^{\left (1\right )}\left (2\right )\right ]^{\hat {\lambda }}\\ -z^{\left (1\right )}\left (3\right )&\left [z^{\left (1\right )}\left (3\right )\right ]^{\hat {\lambda }}\\{\cdots } & {\cdots } \\ -z^{\left (1\right )}\left (n\right )&\left [z^{\left (1\right )}\left (n\right )\right ]^{\hat {\lambda }}\end {pmatrix}\), \(Y=x^{\left (0\right )}\left (2\right ),x^{\left (0\right )}\) \(\left (3\right ),\cdots ,x^{\left (0\right )}\left (n\right )^{T}\).

Solution of G P M(λ,1,1)

According to Eq. 12 and the estimation results of parameters, we can simplify it to get \(\hat {x}^{\left (1\right )}\left (t+1\right )=\left [c\cdot e^{-\left (1-\hat {\lambda }\right )\hat {a}t}+\left . \hat {b} \middle / \hat {a} \right .\right ]^{\frac {1}{1-\hat {\lambda }}}\). If the initial value \(\hat {x}^{\left (1\right )}\left (1\right )=x^{\left (0\right )}\left (1\right )\), then the solution of GPM(λ,1,1) is

$$ \begin{array}{@{}rcl@{}} \hat{x}^{\left( 1\right)}\left( t+1\right)&=&\left\{\left[\left( x^{\left( 0\right)}\left( 1\right)\right)^{1-\hat{\lambda}}-\left. \hat{b} \middle/ \hat{a} \right.\right]\right.\\&&\left.\cdot e^{-\left( 1-\hat{\lambda}\right)\hat{a}t}+\left. \hat{b} \middle/ \hat{a} \right.\right\}^{\frac{1}{1-\hat{\lambda}}}. \end{array} $$
(13)

Rolling modeling mechanism

It is easy to produce some unacceptable errors in practical applications. In order to reduce the errors, rolling mechanism is proposed. The length of training data set is set as c, and the predicted period of rolling modeling is set as d. The steps are as follows:

Step 1: The original sequence \(\left \{x^{\left (0\right )}\left (1\right ),x^{\left (0\right )}\right .\left (2\right ),\cdots ,\) \(\left .x^{\left (0\right )}\left (c\right )\right \}\) is used to model and the d-period prediction value \(\left \{\hat {x}^{\left (0\right )}\left (c+1\right ),\hat {x}^{\left (0\right )}\left (c+2\right ),\cdots ,\hat {x}^{\left (0\right )}\left (c+d\right )\right \}\) is obtained;

Step 2: When predicting the sequence \(\{\hat {x}^{\left (0\right )}(c+d+1),\hat {x}^{\left (0\right )}\left (c+d+2\right ),\cdots ,\hat {x}^{\left (0\right )}\left (c+2d\right )\}\), we use the latest c data points \(\left \{\hat {x}^{\left (0\right )}\left (d+1\right ),\hat {x}^{\left (0\right )}\left (d+2\right ),\cdots ,\hat {x}^{\left (0\right )}\left (d+c\right )\right \}\) to predict;

Step 3: Repeat step 2 and use the latest sequence to predict the next set of d data points until the required data points are predicted.

The flow chart is shown in Fig. 2.

Fig. 2
figure 2

The rolling mechanism

Building the R G P M M(λ,1,1)

Due to the complexity of the real situation, there will always be a certain difference between the fitting value obtained by GPM(λ,1,1) and the real value. Then, the accuracy index of grey fitting is random and non-stationary. In order to correct the predictable result and improve the prediction accuracy of GPM(λ,1,1), the fluctuation of grey fitting accuracy index is analyzed and predicted by Markov theory in this paper. Combined with the rolling mechanism, the electricity consumption is forecasted in the future. The steps of RGPMM(λ,1,1) are as follows and the flow chart is shown in Fig. 3.

Fig. 3
figure 3

The flow chart of RGPMM(λ,1,1)

Step 1: Calculate fitted values and predicted values

According to the time series \(X^{\left (0\right )}=\{x^{\left (0\right )}\left (1\right ),x^{\left (0\right )}\left (2\right ),\cdots ,x^{\left (0\right )}\) \(\left (n\right )\}\), GPM(1,1) is established to obtain

$$ \begin{aligned} &\hat{x}^{\left( 0\right)}\left( t+1\right)\\=&\!\left\{\left[\left( x^{\left( 0\right)}\left( 1\right)\right)^{1-\hat{\lambda}}-\frac{\hat{b}}{\hat{a}}\right]\cdot e^{-\left( 1-\hat{\lambda}\right)at}+\frac{\hat{b}}{\hat{a}}\right\}^{\frac{1}{1-\hat{\lambda}}}\\ & - \left\{\left[\left( x^{\left( 0\right)}\left( 1\right)\right)^{1-\hat{\lambda}} - \frac{\hat{b}}{\hat{a}}\right]\cdot e^{-\left( 1-\hat{\lambda}\right)a(t-1)}+\frac{\hat{b}}{\hat{a}}\right\}^{\frac{1}{1-\hat{\lambda}}} \end{aligned} $$
(14)

where \(\hat {x}^{(0)}(t) (t\le n)\) are called fitted values, and \(\hat {x}^{(0)}(t)(t>n)\) are called predicted values.

Step 2: Calculate the grey fitting accuracy index

The grey fitting accuracy index is set to \(Y\left (t\right )=\left . x^{\left (0\right )}\left (t\right ) \middle / \hat {x}^{\left (0\right )}\left (t\right ) \right .\), which reflects the deviation degree of the data fitted by model from the original data.

Step 3: Division of state interval

Considering that \(Y\left (t\right )\) is divided into m state \(E_{i}=\left [\otimes _{1i}, \otimes _{2i}\right ], i=1,2,\cdots ,m\). The grey elements ⊗1i and ⊗2i are the lower bound and the upper bound of the i th state, where \(\otimes _{1i}=Y\left (t\right )+a_{i}\overline {Y}\), \(\otimes _{2i}=Y\left (t\right )+b_{i}\times \overline {Y}\), \(\overline {Y}=\frac {1}{n}\cdot {\sum }_{i=1}^{n}Y(i)\). The ai and bi are constants that need to be determined based on experience and data.

Considering the limited amount of electricity consumption data in this paper, it is more appropriate to use the cluster analysis to determine the number of classification classes and the classification interval.

Step 4: Establish a state transition matrix

The transition probability of state Ei to state Ej is

$$ p_{ij}\left( \omega\right)=\frac{M_{ij}\left( \omega\right)}{M_{i}}. $$

where, \(M_{ij}\left (\omega \right )\) is the number of samples \(Y\left (t\right )\) transferred from the state of Ei to the state of Ej through ω steps; Mi is the total number of occurrences of Ei, and satisfies \({\sum }_{j=1}^{m}p_{ij}\left (\omega \right )=1\), i,j = 1,2,⋯ ,m. Therefore, \(p_{ij}\left (\omega \right )\) reflects the probability of transition from Ei to Ej through ω steps.

The state transition probability matrix is

$$ R\left( \omega\right)=\left( \begin{array}{cccc} p_{11}\left( \omega\right)&p_{12}\left( \omega\right)&\cdots&p_{1m}\left( \omega\right)\\ p_{21}\left( \omega\right)&p_{22}\left( \omega\right)&\cdots&p_{2m}\left( \omega\right)\\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ p_{m1}\left( \omega\right)&p_{m2}\left( \omega\right)&\cdots&p_{mm}\left( \omega\right) \end{array}\right) $$

The \(R\left (\omega \right )\) reflects the transfer law between the various states of the system. By examining \(R\left (\omega \right )\) and the current state, we can predict the future development and change of the system.

The autocorrelation coefficient of each order is

$$ r_{\omega}=\frac{\sum\limits_{l=1}^{n-\omega}\left[Y\left( l\right)-\overline{Y}\right]\cdot \left[Y\left( l+\omega\right)-\overline{Y}\right]}{\sum\limits_{l=1}^{n}{\left[Y\left( l\right)-\overline{Y}\right]^{2}}} $$

By normalizing rω, the Markov weight of each order is

$$ \theta_{\omega}=\frac{|r_{\omega}|}{\sum\limits_{\omega=1}^{m}|r_{\omega}|}, \omega \le m. $$

where, 𝜃ω is the Markov weight of the ωth order, and the ωth order generally is the maximum order when |rω|≥ 0.3.

Step 5: Calculate more accurate the predicted value

The transition probability matrix is used to predict the state interval Ei of the grey fitting precision index. The interval interpolation is used to determine the predicted value. Therefore, \(\tilde {x}^{\left (0\right )}(n+1)=\hat {x}^{(0)}(n+1)\ast \widehat {Y}\left (n+1\right )\), where

$$ \widehat{Y}\left( n+1\right)=\otimes_{1i} \times {\frac{p_{i-1}}{p_{i-1}\ast p_{i+1}}}+\otimes_{2i}\times{\frac{p_{i+1}}{p_{i-1}\ast p_{i+1}}} $$
(15)

Step 6: Calculate more accurate predicted values in the next few years

Through a rolling mechanism, the input data is updated and the RGMM(λ,1,1) is established to forecast the next year’s value. Continue the above process until the desired data forecasted.

Evaluation metrics

Here are three kinds of evaluation metrics to evaluate the prediction accuracy. Only when the three metrics are all passed, the RGPMM(λ,1,1) can be used to predict, and its predicted values have reference significance. There are three kinds of evaluation metrics as follows:

A: The residual test

The three statistical indicators are determined, namely MAE (Mean Absolute Error) (Hamzaçebi 2007), MAPE (Mean Absolute Percentage Error) (Azadeh et al. 2008), and RMSE (The Root Mean Squared Error) (Geem and Roper 2009). The formulas for MAE,MAPE and RMSE are as follows:

$$ \renewcommand{\arraystretch}{1} \begin{array}{rl} &MAE=\frac{1}{n}\cdot{\sum}_{i=1}^{n}{\left|x^{(0)}(i)-\hat{x}^{(0)}(i)\right|},\\&MAPE=\frac{1}{n}\cdot{\sum}_{i=1}^{n}{\left|\frac{x^{(0)}(i)-\hat{x}^{(0)}(i)}{x^{(0)}(i)}\right|}\\ &RMSE=\sqrt{\frac{1}{n}\cdot{\sum}_{i=1}^{n}{\big(x^{(0)}(i)-\hat{x}^{(0)}(i)\big)^{2}}} \end{array} $$

where x(0)(i) is the original value at time i, and the \(\hat {x}^{(0)}(i)\) is the fitted value at time i. Table 1 shows criteria of forecasting performance.

Table 1 MAPE criteria for model evaluation

B: The correlation degree

The test method of correlation degree is γg:

$$ \begin{array}{@{}rcl@{}} \gamma_{g}&=&\frac{1}{n-1}\cdot{\sum}_{i=1}^{n}\gamma_{g}(i)=\frac{1}{n-1}\\&&\cdot {\sum}_{i=1}^{n}\frac{\min{\{\Delta(i)\}}+\rho\cdot\max{\{\Delta(i)\}}}{\Delta(i)+\rho\cdot\max{\{\Delta(i)\}}} x\end{array} $$

where, ρ = 0.5, \({\Delta }(i)=\left |\hat {x}^{(0)}(i)-x^{(0)}(i)\right |\), i = 1,2,⋯ , n, n is the number of samples.

C: The posterior error test

The process of posterior error test is as follows:

Step 1: Calculate \(S_{0}=\sqrt {\frac {{\sum }_{i=1}^{n}(X^{(0)}(i)-\overline {X}^{(0)})^{2}}{n-1}}\) of {X(0)(i)} and \(S_{1}=\sqrt {\frac {{\sum }_{i=1}^{n}(\varepsilon ^{(0)}(i)-\overline {\varepsilon }^{(0)})^{2}}{n-1}}\) of {ε(0)(i)}, where \(\overline {X}^{(0)}=\frac {1}{n}\cdot {\sum }_{i=1}^{n}X^{(0)}(i)\), \(\varepsilon ^{(0)}(i)=x^{(0)}(i)-\hat {x}^{(0)}(i)\), \(\overline {\varepsilon }^{(0)}=\frac {1}{n}\cdot {\sum }_{i=1}^{n}\varepsilon ^{(0)}(i)\), i = 1,2,⋯ ,n

Step 2: The standard deviation ratio \(C=\frac {S_{1}}{S_{0}}\);

Step 3: The error probability \(P=\{|\varepsilon ^{(0)}(i)-\overline {\varepsilon }^{(0)}(i)|<0.6745\times S_{0}\}\);

Step 4: The discrimination rules are shown in Table 2.

Table 2 the gradation of prediction accuracy

Case studies on forecasting the total electricity consumption in China

Since forecasting electricity consumption is important for the dispatch and operation of the power system, the total electricity consumption forecast at the national level is put forward. To prove the prediction accuracy of RGPMM(λ,1,1) in the “Methodology of improved grey prediction model section”, it is compared with GM(1,1), GM(1,1,Xn) (Bahrami et al. 2014; Dang et al. 2004), OICGM(1,1) (Akdi et al. 2020), and NOGM(1,1) models (Ding et al. 2018).

Experimental data

The experimental data is from the China Statistical Yearbook published by the National statistics Bureau of China (Bureau, 2019). The experimental data set includes the total amount of electricity consumption from 2008 to 2017, as shown in Table 3.

Table 3 Total electricity energy consumption in China from 2008 to 2017(1012kwh)

Selecting the input data set and determining the length of the input data set are the prerequisites to affect the accuracy of the prediction model. In this paper, the optimized subset method proposed by Wang (Wang et al. 2011) is used to determine the optimal length of the input data set, and c = 8 can be obtained. Eight data points are used as input data points, and the data of the last two years are used to test the prediction effect of the model.

The prediction results of experiment

In Table 3, the original sequence is X(0) = (3.45414, 3.70322, 4.19345, 4.70009,v4.97626, 5.42034, 5.63837, 5.80200,6.12971,6.48210). The flow chart of using the RGPM(λ,1,1) model to predict China’s electricity consumption is shown in Fig. 4.

Fig. 4
figure 4

The rolling mechanism

According to Theorem 1 and Formula (11) in the “Methodology of improved grey prediction model” section, the parameters of λ and \(\left (\begin {array}{cc} a\\b \end {array}\right )\) are estimated twice by MATLAB software programming to predict the total electricity consumption in 2016, 2017. Its calculation and results are as follows:

In 2016, the estimates calculated by MATLAB are \(\hat {\lambda }=0.2539\) and \(\left (\begin {array}{cc} \hat {a}\\\hat {b} \end {array}\right )= \left (\begin {array}{cc} 0.0041\\2.4323 \end {array}\right )\).

Substitute the parameter estimates into Eq. 13, the time response sequence function by RGPM(λ,1,1) model in 2016 is

$$ \begin{array}{@{}rcl@{}} \hat{x}^{(1)}(t+1)&=&\left\{\left[\left( x^{(0)}(1)\right)^{(1-0.2539)}-\frac{2.4323}{0.0041}\right]\right.\\&&\left.\ast e^{-(1-0.2539)\ast 0.0041\ast t}+\frac{2.4323}{0.0041}\right\}^{\frac{1}{(1-0.2539)}} \end{array} $$

Similarly, by entering the new initial value, the estimated values of parameters in 2017 are \(\hat {\lambda }=-0.0035\) and \(\left (\begin {array}{cc} \hat {a}\\\hat {b} \end {array}\right )=\left (\begin {array}{cc} -0.0587\\4.0794 \end {array}\right )\).

Then, the time response sequence function of 2017 is

$$ \begin{array}{@{}rcl@{}} \hat{x}^{(1)}(t+1)&=&\left\{\left[\left( x^{(0)}(1)\right)^{(1+0.0035)}-\frac{4.0794}{-0.0587}\right]\right.\\&&\left.\ast e^{(1+0.0035) \ast 0.0587\ast t}+\frac{4.0794}{-0.0587}\right\}^{\frac{1}{(1+0.0035)}} \end{array} $$

The different parameter values of each step indicate that the RGPM(λ,1,1) can make the prediction result dynamic according to the characteristics of the input data. Figure 5 intuitively shows that the forecast results by RGPM(λ,1,1) are in good agreement with the real values.

Fig. 5
figure 5

The predicted value of RGPM(λ,1,1)

The prediction values of each grey model are shown in Table 4 and Figs. 6 and 7.

Table 4 Comparison of the original data and the predicted values of each model
Fig. 6
figure 6

Comparison of the original data and the predicted values of each model

Fig. 7
figure 7

The histogram of actual values and predicted values for each model

The related parameters and run time of each grey model are shown in Table 5.

Table 5 Related parameters and run time of each model

Through Table 4 and Figs. 6 and 7, it can be intuitively shown that the fitting degree between the predicted results of RGPM(λ,1,1) and the real values is better than that of GM(1,1), GM(1,1,Xn), OICGM(1,1), and NOGM(1,1). This result can also be verified by using the experimental test indicators mentioned in the “Evaluation metrics” section. The prediction effect and the detection results of each grey model are shown in Table 6.

Table 6 The absolute error and the inspection result of grey models

Through Table 6, the values of 6 detection indexes by GM(1,1) are MAE = 0.6041, MAPE = 10.9339%, RMSE = 0.7006, rg = 0.4991, P = 0.9000, and C = 0.3870. Since MAPE = 10.9339% ∈ [10%,20%], it indicates that the prediction ability of GM(1,1) is weaker than other 4 grey prediction models. Moreover, its P = 0.9000 < 1.0000, which further indicates that the predictive ability of GM(1,1), is weaker than that of GM(1,1,Xn), OICGM(1,1), NOGM(1,1), and RGPM (λ,1,1). This result further shows that the grey model with non-integer power is more suitable for the prediction of total electricity consumption, and the intervention of the rolling mechanism further improves the accuracy of the prediction results.

In summary, RGPM(λ,1,1) is superior to other four grey models in all detection indices. That is, its prediction performance is better than other four models. MAE and MAPE are increased by several percentage points respectively, indicating that the prediction accuracy of RGPM(λ,1,1) is greatly improved compared with other four grey models. So, RGPM(λ,1,1) is a better choice to analyze the forecast value of the country’s total electricity consumption. However, the more accurate the prediction of national electricity consumption, the better it can provide more accurate forecast information for power workers. In order to improve the prediction performance of the grey power model, the weighted Markov model is used to modify it.

The prediction results of R G P M M(λ,1,1)

The RGPM(λ,1,1) is used to get the fitted value \(\hat {x}^{(0)}(t)\), t = 1,2,⋯ ,8 in each year, where 1-8 denote the 2008-2015, respectively. The accuracy index of grey fitting is obtained according to \(Y(t)=\frac {x^{(0)}(t)}{\hat {x}^{(0)}(t)}\) presented in the “Methodology of improved grey prediction model” section, as shown in Table 7.

Table 7 Grey fitting accuracy index and state division

From Table 7, it can be found that the grey fitting accuracy index has a strong volatility, and the weighted Markov model can be used to predict the state of the grey fitting accuracy.

The Q-type clustering (Narasimhan et al. 2005) is used to divide the state. The clustering result is shown in Fig. 8. According to the cluster diagram, the accuracy indices of grey fitting can be divided into 5 states, which are denoted as {E1, E2, E3, E4, E5}. The clustering results are shown in Fig. 8 and Table 8. Then, the Markov state interval can be obtained as follows:

$$ \renewcommand{\arraystretch}{1.5} \begin{array}{rl} &E_{1}\in \left[95.7350\%,96.98035\%\right),\\&E_{2}\in \left[96.98035\%,99.5000\%\right),\\&E_{3}\in \left[99.5000\%,101.0000\%\right),\\ &E_{4}\in \left[101.0000\%,101.7210\%\right),\\&E_{5}\in \left[101.7210\%,102.2500\%\right). \end{array} $$
Fig. 8
figure 8

The figure of Q-cluster for 2008–2015

Table 8 The correlation coefficients (r) and Q-cluster results during 2008-2015

According to the grey fitting accuracy index sequence, the autocorrelation coefficients of each order can be calculated as r0 = 1.0000,r1 = 0.2306,r2 = − 0.0694,r3 = − 0.0412,r4 = − 0.4134,r5 = − 0.1910,r6 = − 0.0172, r7 = 0.0016. The autocorrelation diagram is shown in Fig. 9 and Table 8.

Fig. 9
figure 9

The each order of autocorrelation coefficient

According to Fig. 10, it is found that when m is taken as 4, the condition of \(\left |r_{\omega }\right |\ge 0.3\) is satisfied, and there are r1 = 0.2306,r2 = − 0.0694,r3 = − 0.0412,r4 = − 0.4134. Then the Markov weights of each order are obtained as follows:

$$ \renewcommand{\arraystretch}{1.5} \begin{array}{ll} \theta_{1}=\frac{0.2306}{0.2306+0.0694+0.0412+0.4134}=0.3056, \theta_{2}\\=\frac{0.0694}{0.2306+0.0694+0.0412+0.4134}=0.0920,\\ \theta_{3}=\frac{0.0412}{0.2306+0.0694+0.0412+0.4134}=0.0546, \theta_{4}\\=\frac{0.4134}{0.2306+0.0694+0.0412+0.4134}=0.5478. \end{array} $$
Fig. 10
figure 10

The figure of Q-cluster for 2008-2015

The Markov transition probability matrixes of each step are calculated as follows:

$$ \begin{array}{cc} p\left( 1\right)=\left( \begin{array}{ccccc} 0&1&0&0&0\\ 0&0&0&1&0\\ 0.5&0&0&0&0.5 \\ 0&0&1&0&0\\ 0&0&0&1&0 \end{array}\right),&p\left( 2\right)=\left( \begin{array}{ccccc} 0&0&0&1&0\\ 0&0&1&0&0\\ 0&0.5&0&0.5&0\\ 0&0&0&0&1\\ 0&0&1&0&0 \end{array}\right), \\ p\left( 3\right)=\left( \begin{array}{ccccc} 0&0&1&0&0\\ 0&0&0&0&1\\ 0&0&0.5&0.5&0\\ 0&0&0&1&0\\ 0&0&0&0&0 \end{array}\right),& p\left( 4\right)=\left( \begin{array}{ccccc} 0&0&0&0&1\\ 0&0&0&1&0\\ 0&0&1&0&0\\ 0&0&1&0&0\\ 0&0&0&0&0 \end{array}\right). \end{array} $$

The accuracy index of grey fitting is in the state of E3 in 2015, so we can get:

The one-step transition probability vector is (0.5,0,0, 0,0.5), and the corresponding Markov weight is 𝜃1 = 0.3056;

The two-step transition probability vector is (0,0.5,0, 0.5,0), and the Markov weight is 𝜃2 = 0.0920;

The three-step transition probability vector is (0,0,0.5, 0.5,0), and the Markov weight is 𝜃3 = 0.0546;

The four-step transition probability vector is (0,0,1, 0,0), and the Markov weight is 𝜃4 = 0.5478.

According to \(p_{i}=\sum \limits _{\omega =1}^{m}{\theta _{\omega }p_{i}(\omega )}\), the weighted Markov prediction probability is calculated, as shown in Table 9.

Table 9 Calculation results of weighted Markov transition probability

From Table 9, the weighted probability value of the transition probability can be obtained. Based on results, max{pi} = p3 = 0.5751, that is, the probability of being in the state of E3 in 2016 is the largest. And its adjacent states are E2 and E4. According to Formula (15), perform linear interpolation on the interval of state E3, and the predicted value of grey fitting accuracy is

$$ \begin{array}{@{}rcl@{}} \widehat{Y}(9)\!&=&\!\otimes_{1i} \times {\frac{0.0460}{0.0460+0.0733}}+\otimes_{2i}\times {\frac{0.0733}{0.0460+0.0733}}\\\!&=&\!100.4300\% . \end{array} $$

Through the correction of the grey fitting accuracy index, the more accurate predicted value in 2016 can be obtained as follows:

$$ \widetilde{x}^{(0)}(9)=\hat{x}^{(0)}(9)\ast \widehat{Y}(9)=6.0682\times 100.4300\%=6.0943 $$

In modeling process, the introduction of rolling mechanism makes the established model make full use of the latest information to forecast the electricity consumption in 2017. Then using the latest data from 2009 to 2016, the proposed RGPMM(λ,1,1) model can be built to obtain the predicted value in 2017. The results are as follows:

The grey fitting accuracy index in 2017 is in Table 10.

Table 10 Grey fitting accuracy index of 2017

The clustering result and the autocorrelation diagram are shown in Table 11 and Figs. 10 and 11.

Table 11 The correlation coefficients (r) and Q-cluster results during 2009–2016
Fig. 11
figure 11

The each order of autocorrelation coefficient

After the correction by Markov blanket, the more accurate prediction value in 2017 is 6.5258.

Comparing the forecasting performance of R G P M(λ,1,1) and R G P M M(λ,1,1)

The predicted performance of RGPMM(λ,1,1) model is compared with the RGPM(λ,1,1) model. MSE and MAPE are selected to evaluate the predictive performance of the models, and their expressions are respectively as follows:

$$ \renewcommand{\arraystretch}{1.5} \begin{array}{l} \mathit{MSE}=\frac{1}{2}\ast \sum\limits_{i=9}^{10}{(x_{i}-\hat{x}_{i})^{2}}, MAPE\\=\frac{1}{2}\ast \sum\limits_{i=9}^{10}{\left|\frac{x_{i}-\hat{x}_{i}}{x_{i}}\right|\times 100\%} \end{array} $$

MSE and MAPE of each model are calculated. The result is in Table 12.

Table 12 Comparison of prediction effects from the models

In Table 12, it is found that the values of MSE and MAPE by RGPMM(λ,1,1) are smaller than RGPM(λ,1,1). The result shows that the RGPMM(λ,1,1) further improves the accuracy.

Forecast the total electricity consumption in the next 6 years

Because that RGPMM(λ,1,1) has been shown to provide accurate prediction, we use it to forecast China’s electricity consumption from 2018 to 2023. During the process, the results by Q-cluster and the correlation coefficients of each year are in Tables 1314151617, and 18 and Figs. 1213141516171819202122, and 23.

Table 13 The correlation coefficients (r) and Q-cluster results during 2010-2017
Table 14 The correlation coefficients (r) and Q-cluster results during 2011-2018
Table 15 The correlation coefficients (r) and Q-cluster results during 2012–2019
Table 16 The correlation coefficients (r) and Q-cluster results during 2013–2020
Table 17 The correlation coefficients (r) and Q-cluster results during 2014–2021
Table 18 The correlation coefficients (r) and Q-cluster results during 2015–2022
Fig. 12
figure 12

The figure of Q-cluster during 2010-2017

Fig. 13
figure 13

Each order of autocorrelation coefficient for 2018

Fig. 14
figure 14

The figure of Q-cluster during 2011–2018

Fig. 15
figure 15

The each order of autocorrelation coefficient for 2019

Fig. 16
figure 16

The figure of Q-cluster during 2012–2019

Fig. 17
figure 17

The each order of autocorrelation coefficient of 2020

Fig. 18
figure 18

The figure of Q-cluster during 2013–2020

Fig. 19
figure 19

The each order of autocorrelation coefficient for 2021

Fig. 20
figure 20

The figure of Q-cluster during 2014-2021

Fig. 21
figure 21

The each order of autocorrelation coefficient for 2022

Fig. 22
figure 22

The figure of Q-cluster during 2015–2022

Fig. 23
figure 23

The each order of autocorrelation coefficient for 2023

The prediction results are modified by Markov theory during 2018-2023 are shown in Figs. 2425262728, and 29.

Fig. 24
figure 24

The predicted value of RGPMM(λ,1,1) for 2018

Fig. 25
figure 25

The predicted value of RGPMM(λ,1,1) for 2019

Fig. 26
figure 26

The predicted value of RGPMM(λ,1,1) for 2020

Fig. 27
figure 27

The predicted value of RGPMM(λ,1,1) for 2021

Fig. 28
figure 28

The predicted value of RGPMM(λ,1,1) for 2022

Fig. 29
figure 29

The predicted value of RGPMM(λ,1,1) for 2023

The parameters and forecasted results are presented in Table 19.

Table 19 The predicted values(1012 kwh) the model parameters of RGPMM(λ,1,1) during 2018–2023

It can be seen from experimental results in Table 19 that the total electricity consumption will maintain a growth trend. It is estimated that by 2023, electricity consumption will increase from 6.48210 ∗ 1012kwh in 2017 to 9.9826 ∗ 1012kwh, which is nearly 1013kwh, almost twice that of 2012. This prediction result is of great significance to energy planning and policy-making. In order to ensure secure and stable energy and power supply, it is necessary to accelerate the establishment and improvement of auxiliary services for the power allocation, continuously improve the capacity of power system, and ensure the perfection of power infrastructure in the supply chain. Meanwhile, it is worth noting that huge consumption will bring a heavy burden to energy planning and environmental protection. Therefore, the government needs to take appropriate actions and plans to meet the high energy demand in the future, so as to avoid the phenomenon of resource waste caused by unreasonable resource arrangement.

Conclusion

Electricity consumption forecast is of great significance to the economic development and the guarantee of people’s life. However, due to the increase in electricity demand and in the complexity of the power system, it is becoming more and more difficult to accurately predict the power consumption. Therefore, it is meaningful to design a prediction method which is suitable for the limited data.

Aiming at the problem of power demand under limited data, a grey power-Markov forecasting model based on rolling mechanism is proposed. It predicts the electricity consumption on the basis of grey theory by introducing rolling mechanism and Markov state prediction. According to the research results, the following conclusions can be drawn:

  1. 1)

    In view of the problems faced by power system load forecasting, RGPMM(λ,1,1) is proposed. The example shows that it gets a better prediction effect than the traditional GM(1,1) models, and it improves the prediction accuracy to a certain extent. This model provides a new way to forecast electricity consumption. The structure of the new grey prediction model is simple, the modeling process is easy to operate, and it is easy to be applied in other fields.

  2. 2)

    Total electricity consumption is not only an important indicator of economic development, but also an important indicator of formulating the energy strategy and related environmental protection policies. The forecast results of the total electricity consumption indicate that China’s total power consumption will continue to maintain a strong growth trend in the next few years. It has certain reference value for formulating the energy strategy and related environmental protection policies.

However, the grey prediction model proposed in this paper also has some defects. It only solves the univariate prediction problem, and no corresponding solution is proposed for the multivariate prediction problem. Therefore, there is still a lot of work to be done in the future. Future work: the forecasting model proposed in this article is for univariate, and multivariate gray models can be considered in the future. In addition, the new model proposed in this paper can also be used to forecast the industrial electricity consumption, the agricultural electricity consumption, the residential electricity consumption and other fields.