Introduction

Human development is facing major challenges due to global warming and other climate crises caused by greenhouse gas emissions. The primary issues stem from human overreliance on energy sources like fossil fuels, resulting in a continuous increase in greenhouse gas levels within the atmosphere (Anser et al. 2020). In order to cope with this challenge, reducing CO2 emissions, improving energy efficiency, and achieving low-carbon development have become urgent tasks facing the international community. In China, the attainment of “carbon peaking and carbon neutrality” objectives within the region holds significant importance for the nation’s progress towards low-carbon development (Chen et al. 2022, Tian and Qi 2023). The energy and power industry is a key area for CO2 emission reduction, and CO2 emission reduction in the thermal power industry is crucial for China to achieve “carbon peaking and carbon neutrality.” This study uses measured data, which can more accurately predict the CO2 emission trends of key thermal power enterprises in the region, compared with statistical data. It provides a scientific basis for leaders to formulate carbon emission reduction policies and measures. On a global scale, it can also serve as a reference for other countries to formulate CO2 emission targets and strategies for similar regions or scenarios. The purpose of this study is to provide a theoretical basis and decision-making reference for policymakers to formulate CO2 emission targets and measures for thermal power generation areas, contributing to the low-carbon development of global thermal power generation and global climate action.

At present, research in the field of CO2 emission forecasting has produced many valuable academic results. Numerous scholars employ the STIRPAT model, scenario analysis, and various techniques to forecast CO2 emissions. These approaches are frequently integrated with the grey prediction model (Yin et al. 2023), LEAP model (Cai et al. 2023, Wang et al. 2022b), and system dynamic model (Feng et al. 2013) for assessing CO2 emission predictions. Zhao et al. (2022) employed the extended STIRPAT model to assess the factors influencing China’s CO2 emissions and subsequently utilized the model’s regression outcomes to project CO2 emissions for the period spanning 2020 to 2040 across six distinct scenarios. The findings indicated that China’s CO2 emissions might reach their peak in 2029, as per the established baseline scenario. Ma et al. (2019) employed the LEAP model to predict CO2 emissions in the Beijing-Tianjin-Hebei metropolitan areas for highway passenger transport. The results indicate that total regional carbon emissions will continue to increase by 2030. Wang et al. (2022a), Wei et al. (2023), and Rao et al. (2023) applied the STIRPAT model and scenario analysis to forecast the timing of CO2 emissions peaking in Shanghai, Henan Province, and Hubei Province. Their findings suggest that achieving the carbon peak before 2030 is attainable through a low-carbon and green development approach. In addition, Hamzacebi and Karakurt (2015) used a gray prediction model to forecast that Turkey’s energy-related CO2 emissions will reach 496,404 Mt in 2025. Mirzaei and Bekri (2017) employed system dynamic models to predict that Iran’s total CO2 emissions would reach 985 Mt by 2025. Wakiyama and Kuramochi (2017) adopted scenario analysis and time series regression models to predict CO2 emissions from electricity in the Japanese residential sector for the period of 2016 to 2030, demonstrating that Japan can surpass its post-2020 emission reduction target. Babbar et al. (2021) predicted the carbon sequestration in Sariska Tiger Reserve for the year 2035 using Markov chain and InVEST model and found that the area will reduce 0.107 Tg of carbon from 2018 to 2035. Handayani et al. (2022) used the LEAP model to analyze the power industry in the Association of Southeast Asian Nations, indicating that greenhouse gas emissions will reach zero emissions by 2050.

Due to the widespread adoption and utilization of artificial intelligence algorithms, an increasing number of researchers have incorporated machine learning techniques, particularly artificial neural networks, into recent studies on predicting CO2 emissions (Zhang et al. 2020). Machine learning is presently a prominent area of focus in CO2 emission prediction research. Through the training and learning of known data, it can predict CO2 emissions efficiently and accurately (Zhao et al. 2023). It has good adaptability to nonlinear problems and has become a useful tool for predicting CO2 emissions (Acheampong and Boateng 2019, Zhang et al. 2022a). Several scholars have conducted relevant research on CO2 emissions in China as a whole or in some regions. Qiao et al. (2020) introduced a novel hybrid algorithm in conjunction with a genetic algorithm to enhance the traditional least squares support vector machine (LSSVM) model. Their study, which relied on CO2 emission data from developed countries, developing countries, and the global dataset, revealed an upward trend in China’s CO2 emissions from 2018 to 2025. Zhang et al. (2022b) constructed a genetic algorithm (GA) to enhance a BP neural network for forecasting building-related CO2 emissions in Jiangsu Province. Their findings indicate a future decline in the total CO2 emissions from buildings in the province. Su et al. (2022) recommended utilizing the monarch butterfly optimization approach to optimize the GM (1,1) model for predicting total primary energy production and CO2 emissions in Tianjin. The outcomes indicated that Tianjin’s CO2 emissions are projected to peak in 2031 at 65.1009 Mt. Li et al. (2023) applied the random forest model for screening factors affecting CO2 emissions. Subsequently, they utilized the BP neural network to predict China’s CO2 emissions, revealing that a peak in these emissions can be expected in 2030 under the 14th Five-Year Plan scenario. Some scholars have also conducted carbon-related research in other countries and regions outside China. Ameyaw et al. (2020) predicted the CO2 emissions of West Africa countries from 2015 to 2030 using a recurrent neural network bidirectional long short-term memory algorithm and found that the future emission levels will be on an upward trend. The scholar (Ağbulut 2022) used three machine learning algorithms, deep learning, support vector machine, and artificial neural network, to predict Turkey’s transportation CO2 emissions, and found that all three algorithms reflect high prediction accuracy. Aryai and Goldsworthy (2023) proposed a particle swarm optimization extreme random tree regression model and showed that it outperforms long- and short-term memory and extreme learning machine for day-ahead prediction of emission intensity in the Australian national electricity market. Seo and Park (2023) used six operating parameters of diesel vehicles to predict their CO2 emissions based on artificial neural networks, and found that engine torque and fuel/air ratio can improve the accuracy of prediction.

As shown above, numerous scholars have undertaken substantial research of both theoretical significance and practical value in the field of predicting CO2 emissions from thermal power generation. Nonetheless, current research primarily centers on forecasting CO2 emission trends at the national, provincial, or urban levels, with limited attention devoted to CO2 emission predictions for provincial-level enterprises. Meanwhile, CO2 emission studies based on measured activity data are also rare and need to be further expanded.

Gansu province, situated in western China, serves as a vital energy hub with an extensive thermal power generation capacity and robust peaking capabilities. These factors are instrumental in ensuring regional power supply stability and fostering economic development (Shi et al. 2022). Thermal power enterprises are the basic unit of the power industry, and also one of the main sources of CO2 emissions. This study initiates its investigation at the enterprise level within Gansu Province, encompassing a substantial portion of the thermal power enterprises operating in the region. Drawing from the 2021 annual greenhouse gas emission reports and carbon verification data of these enterprises, it employs factor analysis to identify common factors among the indicators influencing CO2 emissions in the thermal power sector in Gansu Province. Subsequently, the study utilizes the CCHZ-DISO assessment system to meticulously evaluate three prediction models: multiple linear regression, support vector regression (SVR), and genetic algorithm-based backpropagation (GA-BP) neural network. The aim is to rigorously select the optimal prediction model for making accurate projections of CO2 emissions in thermal power enterprises within Gansu Province. Ultimately, based on the research results, this paper provides targeted carbon reduction strategies and policy recommendations.

Data and methodology

Data sources

The data for this research is sourced from the 2021 Annual Greenhouse Gas Emission Report and Carbon Verification Materials of 17 thermal power enterprises within Gansu Province. In 2021, the total thermal power generation in Gansu Province was 104.472 billion kWh, and the total power generation of the 17 thermal power enterprises involved in this study was 60.474 billion kWh, accounting for 57.89%, which is representative. These units are all key greenhouse gas emission units. According to the requirements of the Department of Ecology and Environment of Gansu Province, they calculated their greenhouse gas emissions in 2021 according to the “Enterprise Greenhouse Gas Emission Accounting Method and Reporting Guide-Power Generation Facilities” and the “Enterprise Greenhouse Gas Emission Reporting Verification Guide (Trial).” At the same time, they also accepted the on-site and document verification by a third-party verification agency organized by the Gansu Academy of Eco-environmental Sciences, ensuring the authenticity, accuracy, and completeness of the data. The data indicators and their specific sources are presented in Table 1.

Table 1 Data indicators and specific sources involved in this study

Methodology

This paper employs factor analysis to extract the common components of CO2 emission determinants of thermal power enterprises. These components are then used as input variables for the prediction model, which compares the performance of three models: multiple linear regression, SVR, and the GA-BP neural network for forecasting TEE. To evaluate the overall performance of each model, we employ the DISO index and select the optimal one for CO2 emission prediction.

Factor analysis

The technique of factor analysis finds extensive application in diminishing the dimensions within a high-dimensional variable space. It employs multiple comprehensive variables to depict the connection among numerous initial variables (Bi et al. 2022, Lu et al. 2020). This study adopts factor analysis to analyze the indicators of factors affecting CO2 emissions from thermal power enterprises, transforms the data of several indicators into several comprehensive variables by dimensionality reduction, and reveals the dominant factors affecting CO2 emissions from thermal power enterprises in Gansu Province by extracting common factor.

In the selection of influencing factors, scholars have shown that economic scale, population size, energy structure, power structure, power generation efficiency, energy consumption, power supply standard coal consumption, and other factors have an impact on CO2 emissions in the power industry (Chen et al. 2021, Wang et al. 2022a). This paper sets the following seven indicators based on the 2021 Annual Greenhouse Gas Emission Report and Carbon Verification Materials of 17 thermal power enterprises in Gansu Province: power generation of enterprise units (PGU), power supply of enterprise units (PSU), fuel consumption by fossil fuel (FCF), lower heating value of fossil fuel (LHV), carbon content per unit calorific value (CCV), coal consumption per unit electricity supply (CCS), and operating hours of enterprise units (OHU). This study develops an indicator framework for factors influencing CO2 emissions within Gansu Province’s thermal power enterprises. Employing SPSS software, it extracts common factors from multiple indicators related to CO2 emission influence, aiming to unveil the underlying relationships and patterns among these factors. Concurrently, this research employs the common factors derived from factor analysis as input variables for the prediction model. This strategic approach serves to diminish the intricacy and dimensionality of the prediction model while enhancing its generalizability and stability.

Prediction model construction

In this study, three prediction models of multiple linear regression, SVR, and GA-BP neural network are constructed for enterprise units. The introduction and specific content of the model construction are as follows:

Multiple linear regression model

Multiple linear regression is a regression analysis method that describes the relationship between multiple independent and dependent variables through a linear function, then determines the parameters of the linear function by minimizing the error, and finally uses the obtained regression equation to predict the change in the dependent variable.

Multiple linear regression is a statistical method that employs a linear function to delineate the relationships among multiple independent variables and a dependent variable. It accomplishes this by minimizing errors to ascertain the parameters of the linear function. Subsequently, the derived regression equation is utilized to forecast changes in the dependent variable.

The underlying principle revolves around the assumption of a linear functional relationship between the dependent variable Y and the general variables x1, x2, x3, …, xk. This relationship typically follows the general form outlined below:

$$Y={\beta}_0+{\beta}_1{x}_1+{\beta}_2{x}_2+{\beta}_3{x}_3+{\beta}_k{x}_k+\mu$$
(1)

In the formula, βj(j = 1, 2, 3, …, k) is a linear regression coefficient, β0 is a linear regression constant, μ is a random error, and k represents the number of independent variables. The matrix expression of n random equations expressed by Eq. (2) is as follows:

$$Y= X\beta +U$$
(2)

In the formula, Y is the n × 1 vector of the dependent variable data, X is the n × k matrix of the independent variable data, β is the k × 1 vector of the regression coefficient, U is the n × 1 vector of the random error data.

SVR model

Support vector regression is a machine learning method capable of effectively addressing problems related to function fitting and regression prediction. The core of this model is to find a hyperplane so that the distance from all data to this hyperplane is minimized. The study of finite samples can theoretically obtain the global optimal solution. In this study, the ε-SVR model is selected to predict the CO2 emissions of thermal power enterprises. The steps are as follows:

For the given training set as in Eq. (3):

$$\left\{\begin{array}{c}T=\left\{\left({x}_1,{y}_1\right),\left({x}_2,{y}_2\right),\cdots, \left({x}_i,{y}_i\right),\cdots, \left({x}_n,{y}_n\right)\right\}\\ {}{x}_i\in {R}^n,{y}_i\in R,i=1,2,\cdots, n\end{array}\right.$$
(3)

In the formula, xi is the input factor, yi is the expected value, and n is the total number of data samples in the training set.

Set a linear function expression on Rn as:

$$y=\omega \bullet x+b$$
(4)

In the formula, ω is the weight vector, b ∈ R, SVR operates on the principle of structural risk minimization, aiming to identify the optimal regression function. This transforms the problem of function estimation into the optimization Eqs. (5) and (6).

$$\mathit{\min}\frac{1}{2}\parallel \omega {\parallel}^2+C\frac{1}{n}\sum\nolimits_{i=1}^n\left({\xi}_i+{\xi}_i^{\ast}\right)$$
(5)
$$s.t.\left\{\begin{array}{c}{y}_i-\left(\omega \bullet {x}_i\right)-b\le \varepsilon +{\xi}_i,i=1,2,\cdots, n\\ {}\left(\omega \bullet {x}_i\right)+b-{y}_i\le \varepsilon +{\xi}_i^{\ast },i=1,2,\cdots, n\\ {}{\xi}_i,{\xi}_i^{\ast}\ge 0,i=1,2,\cdots, n\end{array}\right.$$
(6)

In the formula, C represents the penalty parameter, ε represents the insensitive coefficient, and \({\upxi}_{\textrm{i}}\ \textrm{and}\ {\upxi}_{\textrm{i}}^{\ast }\) represent the slack variables.

During the solving process, it is common to convert the model into its dual counterpart. The kernel function K(xi ∙ yi) is utilized to convert the linear regression problem into a non-linear regression problem within the Hilbert space. The constructed ε-SVR model is as follows:

$$\mathit{\min}\frac{1}{2}\sum\nolimits_{i,j=1}^n\left({\alpha}_i^{\ast }-{\alpha}_i\right)\left({\alpha}_j^{\ast }-{\alpha}_j\right)K\left({x}_i\bullet {y}_i\right)+\varepsilon \sum\nolimits_{i=1}^n\left({\alpha}_i^{\ast }+{\alpha}_i\right)-\sum\nolimits_{i=1}^n{y}_i\left({\alpha}_i^{\ast }-{\alpha}_i\right)$$
(7)
$$s.t.\left\{\begin{array}{c}\sum_{i=1}^n\left({\alpha}_i-{\alpha}_i^{\ast}\right)=0\\ {}{\alpha}_i\ge 0,{\alpha}_i^{\ast}\le \frac{C}{n},i=1,2,\cdots, n\end{array}\right.$$
(8)

Then, the optimal solution is \(\overline{\alpha}={\left({\overline{\alpha}}_1,{\overline{\alpha}}_1^{\ast },{\overline{\alpha}}_2,{\overline{\alpha}}_2^{\ast },\cdots, {\overline{\alpha}}_n,{\overline{\alpha}}_n^{\ast}\right)}^T\), and the final regression function is shown in Eq. (9).

$$f(x)=\sum\nolimits_{i=1}^n\left({\overline{\alpha}}_i^{\ast }-{\overline{\alpha}}_i\right)K\left({x}_i\bullet {y}_i\right)+\overline{b}$$
(9)

In the formula, \(\overline{b}\) is the bias term of the support vector regression model.

GA-BP neural network model

This study employs a GA-BP neural network to predict CO2 emissions from thermal power enterprises. The GA-BP neural network combines genetic algorithm (GA) and back propagation (BP) neural network techniques. GA optimizes the initial weights and thresholds of BP, enhancing its convergence speed and accuracy.

MATLAB software is used for GA-BP neural network programming, following the flowchart in Fig. 1: CO2 emission and common factor data from Gansu Province’s thermal power enterprises are imported. An 80-20 split creates training and test sets, respectively. Both input and output variables for these sets are normalized to the [0,1] interval, facilitating network training and testing. A three-layer BP neural network is constructed: the input layer matches input variables, the hidden layer has nine nodes, and the output layer corresponds to CO2 emissions. The neural network runs for a maximum of 5000 iterations, with an error threshold of 1e−6 and a learning rate of 0.01. The genetic algorithm optimizes initial BP weights and thresholds over 50 generations with a population size of 5. Parameters are encoded as real numbers between − 1 and 1 with an accuracy of 1e−6. The selection function is a normal geometric selection with a parameter of 0.09; the crossover employs an arithmetic function with a parameter of 2, and the variance function is non-uniform with parameters [2 50 3]. Mean square error serves as the fitness function, and the best individual of each generation advances to the next.

Fig. 1
figure 1

GA-BP neural network model flow chart

Repetition ends when a preset iteration limit is reached or fitness function value reaches a threshold. The best individual, which includes the best weights and thresholds, is then assigned to the BP neural network. Training begins with the training set to calculate training error. The BP neural network is then tested using the test set, and the test error is computed.

CCHZ-DISO Assessment System

In this study, we employed the CCHZ-DISO (Chen, Chen, Hu, and Zhou distance between indices of simulation and observation) assessment system to assess the overall performance of the three prediction models and identify the most suitable model. The DISO index is a statistical measurement comprehensive index constructed by Hu et al. (2022, 2019), which can simulate the distance between each observation index and quantify the overall performance of the model.

The CCHZ-DISO assessment system can solve the scientific problems of comprehensive quantitative evaluation of big data and models when multiple different system indicators are contradictory, and has very good expansibility and flexibility for the selection of statistical indicators (Zhou et al. 2021). Deng et al. (2021) constructed a comprehensive statistical index DISO using four statistical indicators: bias, correlation coefficient, root mean square error, and relative standard deviation to measure the performance of soil moisture and heat transfer simulation. In addition, Kalmár et al. (2021) used DISO index and Taylor diagram to evaluate the performance of the RegCM4.5 model in simulating precipitation, indicating the overall performance of DISO index energy simulation, and the results of DISO index evaluation are not much different from those of Taylor diagram.

The calculation of DISO index can be carried out in the following three steps:

  • Step 1: Calculate the statistical measure of each model for the reference data OBS; the statistical indicators are n, in the form of \(\left({s}_i^1,{s}_i^2,\dots, {s}_i^n\right)\), where i = 0, 1, …, m; m is the number of models. \(\left({s}_0^1,{s}_0^2,\dots, {s}_0^n\right)\) is the statistical index value of OBS relative to itself.

  • Step 2: Normalize all statistical indicators by dividing the difference between the maximum and minimum values.

    $$\left( nor{s}_i^1, nor{s}_i^2,\dots, nor{s}_i^n\right)=\left(\frac{s_i^1}{p^1},\frac{s_i^2}{p^2},\dots, \frac{s_i^n}{p^n}\right)$$
    (10)

    In the formula, \({p}^j=\max \left(\max \left({s}_i^j\right)-\min \left({s}_i^j\right),\mid \max \left({s}_i^j\right)\mid, \mid \min \left({s}_i^j\right)\mid \right),i=0,1,\dots, m,j=1,2,\dots n\) and the normalized value range is [− 1 ~ 1].

  • Step 3: Use the Euclidean distance between \(\left( nor{s}_i^1, nor{s}_i^2,\dots, nor{s}_i^n\right)\) to calculate the DISO index:

    $$\textrm{DIS}{\textrm{O}}_i=\sqrt{{\left( nor{s}_i^1- nor{s}_0^1\right)}^2+{\left( nor{s}_i^2- nor{s}_0^2\right)}^2+\cdots +{\left( nor{s}_i^n- nor{s}_0^n\right)}^2}$$
    (11)

When i = 0, DISO0 = 0 represents the distance between OBS and itself. The DISO index of the simulation model can be obtained by the formula (11). Enhanced overall performance is associated with a lower DISO index in the model. By quantifying the DISO value, one can conveniently and quantitatively assess the overall performance among different models.

Result analysis

Measurement of common factors of CO2 emissions

In this study, the influencing factors (PGU, PSU, FCF, LHV, CCV, CCS, and OHU) were used as the original variables for factor analysis using the analysis software SPSS. Table 2 displays the test results for KMO and Bartlett. The KMO value stands at 0.717, and the significance of the Bartlett spherical test is 0.000. These outcomes signify a correlation among variables and affirm the appropriateness of the selected variables for factor analysis. The Pearson correlation coefficient and its significance level between the seven variables are shown by the correlation matrix in Table 3. The table reveals that the correlation coefficient between 15 variable pairs demonstrates statistical significance at a significance level of p = 0.01. There is a strong positive correlation between PGU and PSU (r = 0.999), while there is no significant linear relationship between CCV and CCS (r = − 0.001). A significant positive correlation exists among the three variables: PGU, PSU, and FCF, with a correlation coefficient (r) exceeding 0.9 and a p value (P) less than 0.01, suggesting the presence of common underlying factors. At the same time, the correlation coefficient between the variable of CCS and other variables has not reached a significant level, which may mean that it has no common potential factor with other variables. CCS is a key indicator measuring the energy efficiency level (Lin and Zhu 2020, Wang et al. 2018, Yu et al. 2023) and constituting the carbon verification data of thermal power enterprises. Based on the results of KMO and Bartlett’s test, this study uses principal component analysis to extract common factors from 7 variables including CCS, and performs factor rotation.

Table 2 Test results of KMO and Bartlett
Table 3 Correlation matrix

From the gravel diagram of the eigenvalues of the common factors (Fig. 2), it becomes evident that component F1 is the most significant common factor. The eigenvalues of the first three common factors F1, F2, and F3 are greater than or equal to 1 (4.014, 1.377, and 1.000, respectively). The first common factor, F1, contributes 57.34%, the second common factor, F2, contributes 19.670%, and the third common factor, F3, contributes 14.281%. In total, these three common factors cumulatively contribute to 91.29%. Therefore, it can be considered that these three common factors are the representatives of the original data and can participate in further research.

Fig. 2
figure 2

Gravel diagram of common factor eigenvalues

Utilizing the maximum variance method, we achieve orthogonal rotation of the three comprehensive factors, resulting in the rotated component matrix, which is illustrated in Table 4. By considering the load, we can elucidate the connection between F1, F2, and F3 and the original input variables. From the table, it is evident that the load coefficients of the variables FCF, PGU, and PSU on F1 exceed 0.9, while their load coefficients on F2 and F3 are notably lower. This observation suggests that these three variables are primarily influenced by the common factor F1, with limited associations with F2 and F3. The load coefficients of LHV and CCV on F2 are greater than 0.9, while the load coefficients on F1 and F3 are very low, indicating that these two variables are mainly affected by F2 and have little relationship with F1 and F3. In addition, OHU has a higher load on F1 and F2, which is 0.586 and 0.590, respectively, indicating that this variable has a certain correlation with F1 and F2. The common factor F3 contains only one variable of CCS, and its load coefficient is 1.0.

Table 4 Rotated component matrix

Based on the above analysis and the component score coefficient matrix (Table 5), each common factor is given an appropriate label to reflect the potential concept it represents, as follows: The first common factor F1 is named “energy consumption and output factor,” which reflects the combined level of four variables: FCF, PGU, PSU, and OHU, mainly related to energy use or production. The second common factor of F2 is named “energy quality factor,” which reflects the combined level of three variables: LHV, CCV, and OHU, and is mainly related to the quality and characteristics of fuels used by thermal power producers. The third common factor F3, is named “energy efficiency factor,” which mainly reflects one variable, CCS, and is related to fuel utilization and energy saving and optimization. Next, the three common factors are used as input variables for the multiple linear regression, SVR, and BP neural network models to construct the CO2 emission prediction model.

Table 5 Component score coefficient matrix

CO2 emission prediction of thermal power enterprises

CO2 emission prediction model of thermal power enterprises

In this study, the three common factors F1, F2, and F3 obtained from factor analysis were incorporated as input variables in the prediction model. This was done to reduce model complexity and dimensionality, ultimately enhancing its generalization ability and stability. The results of the three models of multiple linear regression, SVR, and BP neural network are presented as follows:

Multiple linear regression model

Within the framework of the multiple linear regression model, we employ the three common factors as predictors for estimating the total enterprise carbon emission (TEE) of the dependent variables. To evaluate the model’s overall fit, we conducted an analysis of variance (ANOVA), as presented in Table 6. The outcomes indicate a highly statistically significant F value of 5525.041, with a p value less than 0.001. Figure 3 describes the comparison between the predicted value calculated by the multiple linear regression model and the actual CO2 emission value. There is a strong linear relationship between the actual value and the predicted value in the figure. The confidence interval represents the range of the prediction error. Most of the points fall within the 95% confidence interval near the diagonal, indicating that the prediction model has high accuracy and calibration. Specifically, the R-squared of the model is 0.97148, the correlation coefficient R is 0.98564, the root mean square error (RMSE) of the data sample is 16,719.568, and the mean absolute error (MAE) is 12,381.481, which indicates that the regression results are more satisfactory and can be used for the prediction of emissions.

Table 6 ANOVA table for multiple linear regression model
Fig. 3
figure 3

Comparison of predicted and actual CO2 emissions based on the multiple linear regression model

SVR model

In this study, MATLAB software was used to predict the TEE of enterprises based on SVR model with three common factors F1, F2, and F3 as input variables. After normalizing both the training and test set data, we employed the radial basis kernel function as our kernel. The penalty factor C was configured at 4.0, the radial basis function parameter g was set to 0.8, and the error tolerance p was established as 0.01. Figures 4 and 5 show the comparison between the predicted and actual values of CO2 emissions in the training set and the test set. The diagram reveals that the R-squared for both the training and test datasets is 0.98244 and 0.98295, respectively. Additionally, the correlation coefficient R stands at 0.99120 and 0.98155 for the training and test sets, respectively, signifying a superior model fitting effect. Meanwhile, both the training and test sets display low values for MAE and RMSE. This points to the SVR model’s accurate predictive capability for CO2 emissions in Gansu Province’s thermal power enterprises.

Fig. 4
figure 4

Comparison of predicted and actual CO2 emissions for the training set based on the SVR model

Fig. 5
figure 5

Comparison of predicted and actual CO2 emissions for the test set based on the SVR model

GA-BP neural network model

In this study, MATLAB software was used to predict the TEE of enterprises based on GA-BP neural network model with three common factors F1, F2, and F3 as input variables. The predicted and actual values of CO2 emissions in the training set and test set are shown in Figs. 6 and 7. The figure illustrates that the GA-BP neural network model demonstrates high prediction accuracy and correlation in both the training and test sets. RMSE and MAE are diminutive, accompanied by Pearson’s r and R-squared values approaching 1. These outcomes point to the model’s adeptness in data fitting and its strong generalization capacity. At the same time, the evaluation index of the model on the test set is slightly better than that on the training set, indicating that the model does not have the problem of overfitting or underfitting. This indicates that the GA-BP neural network model serves as an efficient predictor for CO2 emissions in Gansu Province’s thermal power enterprises.

Fig. 6
figure 6

Comparison of predicted and actual CO2 emissions for the training set based on the GA-BP neural network

Fig. 7
figure 7

Comparison of predicted and actual CO2 emissions for the test set based on the GA-BP neural network

Comparative evaluation of models

In order to comprehensively compare and evaluate the three prediction models of the multiple linear regression model, SVR model, and GA-BP neural network, the CCHZ-DISO assessment system was used in this study. RMSE, MAE, and R-squared were selected to calculate the DISO index of each model according to formula (10) and formula (11). The SVR model and GA-BP neural network use the evaluation index of the test set to calculate the DISO index. The model’s superior overall performance is indicated by the smaller DISO index value. Refer to Table 7 for the results. The DISO index calculations reveal values of 0.95, 1.18, and 1.41 for the GA-BP neural network, SVR, and multiple linear regression models, respectively. Among these models, the GA-BP neural network exhibits the lowest DISO value, signifying its superior overall performance. Consequently, it will be employed for CO2 emission prediction.

Table 7 Comparison of DISO values for the three models

CO2 emission prediction based on GA-BP neural network model

CO2 emission scenario setting

This study will set three CO2 emission scenario models: low-carbon, benchmark, and high-carbon. In the past five years (2016–2021), the total thermal power generation in Gansu Province increased from 704.18 to 104.472 billion kilowatt hours, with an annual growth rate of 8.2%, and the total energy consumption of electricity production and supply increased from 491.69 to 680.56 million tons of standard coal, with an annual growth rate of 6.7%. At the same time, “Gansu Province’s 14th Five-Year Energy Development Plan” proposes that the scale of thermal power installed capacity will reach 35.58 million KW from 2020 to 2025, with an annual growth rate of 10%. It can be predicted that during the “14th Five-Year Plan” period (2020–2025), the energy consumption and output factor F1 will show a certain growth trend. In addition, during the “15th Five-Year Plan” period (2026–2030), the “Implementation Plan for Carbon Peaking in Gansu Province” pointed out that the region should gradually reduce coal consumption and achieve the goal of peaking CO2 emissions by 2030. Therefore, the annual growth rate of energy consumption and output factor F1 during the “15th Five-Year Plan” period will be significantly lower than that during the “14th Five-Year Plan” period. Based on the above analysis, on the basis of considering historical data and existing planning, reflecting the policy effects under different scenarios, the annual growth rates of F1 energy consumption and output factors in low-carbon, benchmark, and high-carbon scenarios from 2021 to 2025 are set to 3%, 8%, and 12% respectively, and the annual growth rates of the three scenarios from 2026 to 2030 are set to 1%, 4%, and 6% respectively.

Among the 17 thermal power enterprises in Gansu Province studied in this paper, 12 adopt general bituminous coal or medium and high volatile bituminous coal, and the remaining 5 adopt lignite or mixed coal. The average carbon content per unit calorific value in 2021 is 27.2 TC/TJ. In order to realize the energy saving and carbon reduction plan of Gansu Province and establish the economic system of green low-carbon cycle development, the thermal power enterprises in the province will gradually improve the quality of fuel and reduce the dependence and consumption of low-quality coal (such as lignite and mixed coal). In this paper, the carbon content per unit calorific value of bituminous coal provided by the “Provincial Greenhouse Gas Inventory Guidelines (Trial)” is 26.1 TC/TJ as the reference value in 2030, and the source quality factor growth rate from 2021 to 2030. The annual growth rates of low-carbon, benchmark, and high-carbon scenarios are set to − 1.0%, − 0.5%, and 0%, respectively. In addition, the average power supply coal consumption of the 17 thermal power enterprises studied in this paper is 347.34 g/kwh in 2021. In order to promote the clean and low-carbon transformation of the power industry and achieve the goal of carbon peak and carbon neutralization on schedule, the “Implementation Plan for the Transformation and Upgrading of National Coal-Fired Power Units” points out that the average power supply coal consumption of national thermal power will be reduced to below 300 g/kwh in 2025. Based on this standard, this paper reflects the differences in technological progress and policy regulation under different scenarios on the basis of conforming to the actual situation and future goals of thermal power enterprises in Gansu Province. The annual growth rates of F3 energy efficiency factors under low-carbon, benchmark, and high-carbon scenarios by 2030 are − 5.2%, − 3.6%, and − 2.0%, respectively. The scenario of CO2 emission prediction of thermal power enterprises in Gansu Province is set as shown in Table 8 below.

Table 8 CO2 emission prediction scenario setting of thermal power enterprises in Gansu Province

CO2 emission scenario prediction results

The prediction results of CO2 emissions under the three scenarios are shown in Fig. 8. The prediction results show that under the baseline scenario, the CO2 emissions of 17 thermal power enterprises in Gansu Province show a steady growth trend, from 71.17 Mt in 2021 to 79.25 Mt in 2030, an increase of 8.08 Mt, equivalent to 11.36% of the level in 2021, with an annual growth rate of 1.20%. Under the low-carbon scenario, the CO2 emissions of 17 thermal power enterprises in Gansu Province showed a slow growth trend, from 71.17 Mt in 2021 to 71.58 Mt in 2030, an increase of 0.41 Mt, equivalent to 0.58% of the level in 2021, with an average annual growth rate of 0.06%. In addition, the growth rate of CO2 emissions in this scenario will further slow down after 2025. The average annual growth rate of CO2 emissions from 2025 to 2030 is only 0.02%, and there is no significant change in CO2 emissions. Under the high-carbon scenario, the CO2 emissions of 17 thermal power enterprises in Gansu Province showed a rapid growth trend, from 71.17 Mt in 2021 to 87.97 Mt in 2030, an increase of 16.80 Mt, equivalent to 23.61% of the level in 2021, with an average annual growth rate of 2.38%.

Fig. 8
figure 8

Prediction results of three CO2 emission scenarios: low carbon, benchmark, and high carbon

By comparing the differences between different scenarios, in all years, the CO2 emissions under the high-carbon scenario are always the highest, followed by the baseline scenario, and the CO2 emissions under the low-carbon scenario are the lowest. By 2030, the CO2 emissions of the high-carbon scenario will be 1.23 times that of the low-carbon scenario and 1.11 times that of the baseline scenario. Under the baseline and high-carbon scenarios, the growth rate of energy consumption and output is faster, which may be the main reason for the rapid growth of CO2 emissions. In contrast, the CO2 emissions of thermal power enterprises in low-carbon scenarios are relatively low, and the growth rate is slow. This is due to the implementation of measures such as controlling energy consumption and output, optimizing energy quality, and improving energy efficiency, which effectively slows down and controls the growth rate of CO2 emissions of thermal power enterprises.

The overall CO2 emissions of 17 thermal power generation enterprises in Gansu Province are on the rise during 2022–2030. Although the CO2 emissions have been effectively controlled under the low-carbon scenario, they have not reached the peak of CO2 emissions. Due to the limitations of practical reasons, this paper does not predict the future deployment of carbon capture, utilization, and storage technology (CCUS) in thermal power plants in Gansu Province to further reduce CO2 emissions. CCUS technology is an important CO2 emission reduction technology. Recent research underscores the pivotal role of CCUS technology in curbing CO2 emissions from power generation (Cai et al. 2022). Future investigations should delve deeper into applying CCUS technology to mitigate emissions in Gansu Province’s thermal power sector, thus advancing the province’s power industry towards carbon peak and neutrality goals.

Conclusions and policy implications

Conclusions

In this paper, the factor analysis method is used to measure the common factors of the CO2 emission influencing factors of 17 thermal power enterprises in Gansu Province. Based on the three CO2 emission prediction models of multiple linear regression, SVR, and GA-BP neural network, the DISO index is used to evaluate the overall performance of the prediction model. Based on setting three CO2 emission scenarios of low-carbon, baseline, and high-carbon, the CO2 emissions of these 17 thermal power enterprises during 2022–2030 were predicted by using GA-BP neural network prediction model. The main conclusions of this study are as follows:

  1. 1.

    Through factor analysis, the three common factors F1 energy consumption and output factor, F2 energy quality factor, and F3 energy efficiency factor can be effectively used to predict the CO2 emissions of thermal power enterprises, and are in line with the current CO2 emission verification work of thermal power enterprises.

  2. 2.

    Among the three CO2 emission prediction models of multiple linear regression, SVR, and GA-BP neural network, the GA-BP neural network model has the best overall performance. Its DISO value is 0.95; RMSE and MAE are 11848.236 and 7880.543, respectively. The second is the SVR model, the DISO value is 1.18, and the overall performance is good. The DISO value of the multiple linear regression model is 1.41, and the overall performance is poor.

  3. 3.

    The overall CO2 emissions of 17 thermal power enterprises in Gansu Province are on the rise during 2022–2030. Under the baseline and high-carbon scenarios, the rapid growth of energy consumption and output is the main reason for the growth of CO2 emissions. It is expected that by 2030, the CO2 emissions under the baseline scenario will reach 79.25 Mt, and the CO2 emissions under the high-carbon scenario will reach 87.97 Mt. However, under the low-carbon scenario, CO2 emissions will reach 71.58 Mt by 2030, with an average annual growth rate of only 0.06%.

Policy implications

  1. 1.

    Reasonable planning of energy consumption and output, optimizing power scheduling. Power generation enterprises should formulate reasonable energy consumption and output plans according to the national and local carbon emission control requirements, and control the growth rate of fuel consumption and power generation. At the same time, the power grid should improve the flexibility and intelligence level of scheduling, accelerate the implementation of the flexibility transformation of existing thermal power units in the province, tap the peak shaving potential of thermal power units, encourage coal-fired units to increase efficient energy storage facilities, and establish a priority scheduling energy system that adapts to the characteristics of wind and light power. In addition, power generation enterprises also need to optimize the operation and start-up mode of the unit, make rational use of all kinds of peak shaving resources in the system, and give full play to the clean and efficient advantages of 600 MW and above large-capacity high-parameter units when undertaking basic load.

  2. 2.

    Improve and ensure the fuel quality of the power plant. Thermal power enterprises in Gansu Province should optimize the fuel structure, phase out low-quality and high-polluting fuels, and increase the proportion of clean and efficient fuel use. At the same time, strengthen fuel quality monitoring and management, improve fuel efficiency and safety. By giving priority to the release of high-quality production capacity of coal mine projects and ensuring the railway capacity of coal cross-regional transportation, the level of guarantee for coal production and transportation needs is improved. Under the same conditions, priority should be given to ensuring the fuel supply of coal-fired generating units with advanced energy efficiency. Give full play to the role of the market, stabilize the sharp fluctuation of the price of electric coal, ensure that the power plant burns the designed coal, and avoid the increase of the actual operation energy consumption of the unit caused by the fluctuation of fuel quality to the greatest extent.

  3. 3.

    Enhancing energy efficiency, increasing policy support for energy-saving and emission-reduction upgrades to power units, and advancing research and demonstrations of CCUS technologies are crucial steps. Gansu Province should encourage thermal power generation companies to elevate their energy conservation and emissions reduction efforts. This can be achieved through the adoption of cutting-edge clean technologies and equipment, as well as by enhancing the operational efficiency and reliability of thermal power generation units. Furthermore, optimizing operational scheduling strategies, establishing a mechanism linking power generation to energy consumption levels, and promoting the increased electricity generation of coal-fired units with low coal consumption are essential objectives. Additionally, expediting the development of a well-rounded auxiliary service market mechanism and pricing system will ensure that peaking units participating in flexibility enhancements receive appropriate benefits.