Introduction

Since the industrial revolution, in order to accelerate the global industrialization process, it is necessary to consume a large amount of coal and oil, and because of the rapid growth of population and deforestation for economic development, the greenhouse gases with strong heat absorption such as carbon dioxide emitted into the atmosphere by human beings are increasing year by year; carbon dioxide emissions are the main cause of the greenhouse effect (Shakun et al. 2012). Therefore, the greenhouse effect of the atmosphere has also increased, causing a series of serious problems such as global warming, which has attracted the attention of all countries in the world. In order to collectively address the challenge of climate change and slow the global warming trend, in December 2015, nearly 200 parties jointly adopted The Paris Agreement, which sets out action arrangements for how the world will address climate change after 2020. As a carbon emitter and the world’s second largest economy, China’s advocacy on climate change has far-reaching influence. As a signatory to The Paris Agreement, China’s carbon peak in 2030 and carbon neutrality in 2060 is China’s commitment to the world. Therefore, the accurate prediction of China’s carbon dioxide emissions has become a research hotspot.

There are many prediction models for carbon dioxide emissions in China, and the types of prediction models are quite different, and the results obtained are also different. Li et al. using a hybrid approach combining the log-mean divisor index (LMDI) and decoupling index methods, the results suggest that China may struggle to achieve its goal of peaking CO2 emissions by 2030 (Li and Qin 2019). Zhang et al. conducted a dynamic simulation and projection of China’s carbon dioxide emissions from a system dynamics perspective. The results show that China will be able to achieve its emission reduction target by 2030 (Zhang et al. 2018). Qiao et al. proposed a hybrid model based on the improved lion swarm optimizer and LSSVM for predicting carbon dioxide emissions; the results show that global carbon emissions will continue to increase in the next few years, and only Japan is likely to achieve carbon emissions targets, while other countries, including China, are challenging to achieve the emissions targets (Qiao et al. 2020). Niu et al. proposed a prediction model combining improved fireworks algorithm (IFWA) optimization and generalized regression neural network (GRNN), and the results showed that there is still some pressures to achieve the carbon intensity reduction target under the current policy efforts in China (Niu et al. 2020). Although there are many models for forecasting carbon dioxide emissions in China, the types of models are quite different, and the predicted results are also different. Therefore, it is of great significance to establish an accurate and effective model for forecasting carbon dioxide emissions in China.

The main objective of this research is to use machine learning methods and meta-heuristic algorithms to establish a scientific and effective model for forecasting China’s carbon dioxide emissions and accurately predict China’s carbon dioxide emissions in the next five years. The main question of this research is how to build an accurate model for forecasting China’s carbon dioxide emissions. The motivation of this research is to accurately predict the future trend of China’s carbon dioxide emissions. Based on the above research goals, research questions, and research motivations, this study combined the multi-kernel support vector regression model with the improved marine predator algorithm (MPA) to establish a forecast model for China’s carbon dioxide emissions. To make up for the limitations of traditional MPA, this paper integrates the elite opposition–based learning strategy and the Golden Sine Algorithm to improve the traditional MPA. Since carbon dioxide emission data are usually small sample data, this paper chooses multi-kernel support vector regression model as the prediction model of carbon dioxide emission. The multi-kernel support vector regression model selects different kernel functions for weighted combinations, which effectively enhances the learning ability and generalization ability of the model and improves the accuracy and stability of the prediction model. The main contributions of this paper are summarized as follows:

  1. (1)

    A novel variant of MPA-EGMPA is proposed, which introduces the elite opposition–based learning strategy and the golden sine algorithm into MPA, solving the problems of low convergence accuracy and the tendency to fall into local optimum of traditional MPA.

  2. (2)

    A new multi-kernel support vector regression is proposed. Multi-kernel function combines multiple kernel functions to replace single kernel function, which improves the learning effect of support vector regression model effectively, and EGMPA is used to optimize the main parameters of multi-kernel support vector regression, which solves the problem of parameter selection of multi-kernel support vector regression;

  3. (3)

    The hybrid model of EGMPA and multi-kernel support vector regression is selected to predict carbon dioxide emissions in China. The proposed hybrid model is applied to predict China’s carbon dioxide emissions during the “14th Five-Year Plan” period to assess whether China will be able to achieve its carbon peak target by 2030.

The rest of this paper is organized as follows: “Literature review” is literature review; “Methods” briefly introduces the relevant theories and the hybrid forecasting model framework of this paper; “Algorithm test” uses 23 sets of benchmark functions to test EGMPA algorithms; “Results and discussion” uses the proposed model to forecast China’s carbon dioxide emissions; “Conclusions” is the conclusion of this paper and future research directions.

Literature review

Meta-heuristic algorithms

Meta-heuristic algorithms are a method for solving optimal solutions to complex optimization problems based on computational intelligence mechanisms. Meta-heuristic algorithms can be divided into three categories: evolutionary algorithms, swarm intelligence algorithms, and physics-based algorithms.

Evolutionary algorithm is mainly through the simulation of the survival of the fittest evolutionary law in nature, to achieve the overall progress of the population, and finally complete the solution of the optimal solution. Evolutionary algorithms are mainly represented by genetic algorithm (GA) (Holland 1975) and differential evolution (DE) (Storn and Price 1997). Genetic algorithm (GA) is a random global search optimization method, which simulates the replication, cross-over, and mutation phenomena occurring in natural selection and heredity. Differential evolution algorithm (DE) is proposed based on genetic algorithm. Later, many other famous evolutionary algorithms were derived, such as evolutionary strategy (ES) (Hansen et al. 2003) and biogeography-based optimization (BBO) (Simon 2008).

Swarm intelligence algorithm is an emerging optimization method that simulates various group behaviors of social animals and uses the information interaction and cooperation among individuals in a group to achieve optimization purposes. Particle swarm optimization (PSO) (Eberhart and Kennedy 1995) is a typical representative of swarm intelligence algorithms, which is a global stochastic search algorithm based on swarm intelligence by simulating the migration and clustering behavior of birds during foraging. In addition, many intelligent optimization algorithms simulate the behavior of other swarms. Artificial bee colony (ABC) (Karaboga and Basturk 2007) is an optimization method proposed to mimic the behavior of bees. It only requires comparing the quality of problem solutions. Grey wolf optimizer (GWO) (Mirjalili et al. 2014) is an optimization search method inspired by the hunting activities of grey wolves. Whale optimization algorithm (WOA) (Mirjalili and Lewis 2016) is proposed based on the behavior of whales to round up prey. The local search of the algorithm is realized by the mechanism of shrinking enclosing and spiraling updates, and the global search of the algorithm is realized by the random learning strategy. Salp swarm algorithm (SSA) (Mirjalili et al. 2017) simulates the swarm behavior of the salp chain. In each iteration, the leader guides followers to move towards food in a chain behavior. In the process of movement, leaders explore globally, while followers fully explore locally, which greatly reduces the situation of falling into local optimum.

The physics-based algorithms are mainly inspired by different physical laws in nature. Gravitational search algorithm (GSA) (Rashedi et al. 2009) is a novel optimization algorithm based on the laws of gravity and particle interactions in the universe. Multi-verse optimizer (MVO) (Mirjalili et al. 2016) is an optimization algorithm based on the multiverse theory in physics, which simulates the motion behavior of the multiverse population under the joint action of white holes, black holes, and wormholes. Sine cosine algorithm (SCA) (Mirjalili 2016) uses sine and cosine mathematical models to solve the optimization problem. Golden sine algorithm (Golden-SA) (Tanyildizi and Demir 2017) is inspired by the scanning within the unit circle of the sine function, which is similar to the spatial search of the solution to be optimized. The algorithm has strong local exploitation capabilities.

Marine predator algorithm (MPA) is a new meta-heuristic algorithm proposed by Faramarzi et al. in 2020. The main inspiration for MPA comes from a wide range of foraging strategies, namely, Lévy and Brownian movements of marine predators, and policies for optimal encounter rates for biological interactions between predators and prey (Faramarzi et al. 2020). MPA has been widely studied by scholars for its powerful optimization ability. MPA has been used in many practical problems, such as rolling bearing fault diagnosis (Wang et al. 2021c), rotor angle stability of power systems (Yakout et al. 2021), sensor optimization design (Aly et al. 2021), and COVID-19 confirmed case prediction (Al-Qaness et al. 2020). In addition, in view of the disadvantages of the traditional MPA, such as slow convergence speed and easy to fall into local optimal, scholars have made various improvements to it. Zhong et al. combined MPA with teaching-based optimization algorithm and incorporated mutation and crossover strategies to effectively increase predator diversity and avoid premature convergence (Zhong et al. 2020). Elaziz et al. introduced the quantum theory to MPA and applied it to solve the problem of multi-threshold image segmentation (Elaziz et al. 2021). Oszust introduced local escaping operator (LEO) to balance the exploration and exploitation capabilities of MPA (Oszust 2021). Abdel et al. proposed a binary version of MPA to solve the 0-1 knapsack (KP01) problem (Abdel et al. 2020). Yang et al. proposed a multi-strategy MPA containing a chaotic opposition learning strategy, a neighborhood dimensional learning strategy, adaptive inertia weights, and adaptive step control factors, and applied it to logging oil formation identification (Yang et al. 2021). Wang et al. introduced the location update strategy of PSO to make up for the shortcomings of MPA in global search (Wang et al. 2021b). Houssein et al. introduced opposition learning strategy into MPA to improve the convergence speed of MPA (Houssein et al. 2021a, b). Aiming at the problems of slow convergence speed, weak local search ability, and easy to fall into local optimal solution of MPA, this paper combines elite opposition-based strategy and golden sine algorithm to improve MPA. Firstly, elite opposition–based learning strategy is introduced in the initialization stage to improve the diversity of the initialization population, increase the search space, and enhance the global search ability of the algorithm. Secondly, gold sine algorithm is integrated into the position update mode of MPA in phase 3 to enhance the optimization performance of MPA.

Carbon dioxide emissions forecast

In recent years, many scholars have proposed different models to predict carbon dioxide emissions. According to the summary of the literature reading, the main carbon dioxide emission prediction models mainly include econometric and mathematical models, machine learning models and hybrid models. For econometrics and mathematical models, Wang et al. used a nonlinear grey multivariate model for predictive analysis of China’s carbon dioxide emissions, and the proposed model has higher prediction accuracy (Wang and Ye 2017). Mga et al. introduced the environmental kuznets curve hypothesis and differential information principle in a fractional grey Ricardian model, and the results suggest that the USA, China, and Japan will gradually reduce their carbon emissions in the future (Gao et al. 2020). Wang et al. investigated the drivers of carbon emissions from electricity generation in China based on a two-stage LMDI approach (Wang et al. 2021a). Song et al. used ST-LMDI model to identify potential dynamic trends of China’s carbon emissions under different scenarios(Song et al., 2022b). For machine learning models, Abbas et al. proposed a multi-stage method of integrated clustering, machine learning, and dimensionality reduction to predict carbon dioxide emissions based on two important variables, economic growth, and energy consumption. The developed multi-stage method can accurately predict carbon dioxide emissions (Mardani et al. 2020). Lin et al. proposed an improved attention mechanism–based long short-term memory (LSTM) neural network (attention-LSTM) to predict the carbon dioxide emissions of different countries or regions (Lin et al. 2021). Sun et al. combined PCA and RELM to predict carbon dioxide emissions in China (Sun and Sun 2017). In order to make up for the shortcomings of a single model, many scholars used hybrid models to predict carbon dioxide emissions. Zhou et al. used support vector machines and improved particle swarm optimization (IPSO) to forecast carbon dioxide emissions in the Chinese power sector (Zhou et al. 2018). Qiao et al. proposed a hybrid model based on the improved lion swarm optimizer and LSSVM for predicting carbon dioxide emissions. (Qiao et al. 2020). Niu et al. proposed a prediction model combining improved fireworks algorithm (IFWA) optimization and generalized regression neural network (GRNN) (Niu et al. 2020). Wen and Yuan combined random forest, PSO, and BP neural network to establish a carbon dioxide hybrid prediction model (Wen and Yuan 2020). Some scholars have analyzed the impact of COVID-19 on air quality in India and the UK, and the results show that the blockade measures during the COVID-19 period cannot improve air quality in the long run (Shehzad et al. 2021 ; Ropkins and Tate 2021).

In addition, many scholars have analyzed and studied the factors influencing carbon dioxide emissions. Sangeetha and Amudha, Heydari et al. and Lin et al. selected historical data of energy sources such as oil, coal, natural gas, and primary energy sources to predict carbon dioxide emissions, and concluded that various energy consumptions play a crucial role in carbon dioxide emissions. (Sangeetha and Amudha 2018; Heydari et al. 2019; Lin et al. 2021). Song et al. investigated the characteristics of carbon intensity in various regions of China, and energy intensity was the main factor affecting the spatial differences and temporal dynamics (Song et al. 2019). Niu et al. considered seven categories of influencing factors for carbon dioxide emissions projections: economy, population, economic structure, industrial structure, energy intensity, energy mix, and stringency of environmental policies (Niu et al. 2020). You et al. investigated the effect between coal-related carbon emissions (CCE) and economic growth, and showed that activity and energy intensity had the greatest effect on CCE (You et al. 2020). Song et al. investigated the temporal and spatial differences in per capita carbon emissions across Chinese provinces, and energy consumption was the main factor affecting per capita carbon emissions. (Song et al. 2022a). From the above literature on carbon dioxide emissions, it can be seen that traditional econometric models and machine learning models have been widely used in carbon dioxide emissions prediction. However, a single model may have problems such as poor robustness. Therefore, this paper combines the multi-kernel support vector regression model with the improved MPA algorithm to establish an accurate and effective hybrid model for forecasting China’s carbon dioxide emissions.

Methods

Multi-kernel support vector regression

Support vector regression

Support vector regression (SVR) (Liang and Sun 2003) is the application of support vector machine (SVM) (Cortes and Vapnik 1995) to regression problems. Given the training datasetD = {(x1, y1), (x2, y2), ⋯, (xm, ym)}, SVR is to construct the regression function with the following form:

$$f(x)={\omega}^Tx+b$$
(1)

where ω is a coefficient vector, and b is a constant. The optimization problem of SVR can be expressed in Eq. (2):

$$ \underset{\omega, b}{\min}\frac{1}{2}{\left\Vert \omega \right\Vert}^2+C{\sum}_{i=1}^m{\ell}_{\varepsilon}\left(f\left({x}_i\right)-{y}_i\right) $$
(2)

where C is the regularization constant and ε is the ε-insensitive loss function. By introducing non-negative slack variables ξi and \({\hat{\xi}}_i\), the constraint optimization problem can be expressed as follows:

$$ {\displaystyle \begin{array}{c}\underset{\omega, b,{\xi}_i\hat{\xi_i}}{\min}\frac{1}{2}{\left\Vert \omega \right\Vert}^2+C{\sum}_{i=1}^m\left({\xi}_i+\hat{\xi_i}\right)\\ {}s.t\kern0.5em \left\{\begin{array}{l}f\left({x}_i\right)-{y}_i\le \varepsilon +{\xi}_i,\\ {}{y}_i-f\left({x}_i\right)\le \varepsilon +{\xi}_i,\\ {}{\xi}_i\ge 0,\hat{\xi_i}\ge 0,i=1,2,\cdots m.\end{array}\right.\end{array}} $$
(3)

Through constructing a Lagrangian function, the optimization problem can be transformed into the dual problem and its solution is given by Eq. (4),

$${\displaystyle \begin{array}{l}\underset{\alpha, \hat{\alpha}}{\mathit{\max}}\sum\limits_{i=1}^m{y}_i\left({\hat{\alpha}}_i-{\alpha}_i\right)-\varepsilon \left({\hat{\alpha}}_i+{\alpha}_i\right)-\frac{1}{2}\sum\limits_{i=1}^m\sum\limits_{j=1}^m\left({\hat{\alpha}}_i-{\alpha}_i\right)\left({\hat{\alpha}}_j-{\alpha}_j\right){x}_i^T{x}_j\\ {}s.t.\kern0.5em \left\{\begin{array}{l}\sum\limits_{i=1}^m\left({\hat{\alpha}}_i-{\alpha}_i\right)=0,\\ {}0\le {\alpha}_i,{\hat{\alpha}}_i\le C.\end{array}\right.\end{array}}$$
(4)

where αi and \({\hat{\alpha}}_i\) are both Lagrange multiplier vectors. The solution of Eq. (1) is obtained by solving the dual problem:

$$f(x)=\sum\limits_{i=1}^m\left({\hat{\alpha}}_i-{\alpha}_i\right){x}_i^Tx+b$$
(5)

Considering the feature mapping form, the corresponding SVR solution can be expressed as:

$$f(x)=\sum\limits_{i=1}^m\left({\hat{\alpha}}_i-{\alpha}_i\right)\varphi {\left({x}_i\right)}^T\varphi \left({x}_j\right)+b$$
(6)

If φ(xi)Tφ(xj) is written in kernel function form, the final decision function can be written as:

$$f(x)=\sum\limits_{i=1}^m\left({\hat{\alpha}}_i-{\alpha}_i\right)\kappa \left(x,{x}_i\right)+b$$
(7)

Kernel function

Kernel functions have many forms, common kernel functions are the following:

  1. (1)

    Linear kernel

$${\kappa}_{{Linear}}\left({x}_i,{x}_j\right)={x}_i^T{x}_j$$
(8)

Linear kernel is the simplest kernel function and is mainly used for linearly separable cases. However, since most problems in practice are nonlinear, nonlinear kernel is more commonly used.

  1. (2)

    Polynomial kernel

$${\kappa}_{{Poly}}\left({x}_i,{x}_j\right)={\left(\left({x}_i^T{x}_j\right)+R\right)}^d$$
(9)

where d ≥ 1 is the order of the polynomial.

  1. (3)

    RBF kernel

$${\kappa}_{RBF}\left({x}_i,{x}_j\right)=\mathit{\exp}\left(-\frac{{\left\Vert {x}_i-{x}_j\right\Vert}^2}{2{\sigma}^2}\right)$$
(10)

where σ > 0 is the bandwidth of the RBF kernel function, which determines the scope of action of the kernel function.

  1. (4)

    Sigmoid kernel

$${\kappa}_{{Sigmoid}}\left({x}_i,{x}_j\right)=\mathit{\tanh}\left(\beta {x}_i^T{x}_j+\theta \right)$$
(11)

where tanh is the hyperbolic tangent function, β is the slope, and θ is the intercept.

Multi-kernel support vector regression

In the process of SVR modeling, the selection of kernel function is very important. Single kernel cannot achieve good learning results, so multi-kernel learning has attracted extensive attention. Multi-kernel learning can effectively improve the learning effect of the model by weighting multiple kernel functions together instead of single kernel functions. A multi-kernel function can be expressed as:

$${\displaystyle \begin{array}{l}{\kappa}_{{multiple}}\left({x}_i,{x}_j\right)=\sum\limits_{\alpha =1}^N{m}_{\alpha }{\kappa}_{\alpha}\left({x}_i,{x}_j\right)\\ {}\begin{array}{cc}\sum\limits_{\alpha =1}^N{m}_{\alpha }=1,0<{m}_{\alpha }<1,& \alpha =1,2\cdots, N\end{array}\end{array}}$$
(12)

where mα is the weight of different kernel functions in multi-kernel functions, and the sum of the weights is 1. In this paper, multi-kernel functions are constructed based on linear kernel, RBF kernel, polynomial kernel, and Sigmoid kernel, and the multi-kernel functions are used in the multi-kernel support vector regression model.

Marine predator algorithm

The marine predator algorithm (MPA) is a novel meta-heuristic optimization algorithm proposed by Afshin Faramarzi et al. in 2020, which is inspired by the survival of the fittest theory of the marine, where marine predators choose the best foraging strategy by choosing between Lévy or Brownian wandering.

According to Darwin’s theory of survival of the fittest, top predators in nature are more talented in foraging. Therefore, an Elite matrix is constructed based on top predators. This matrix supervises the search and finding of prey based on information about its location.

$${Elite}={\left[\begin{array}{cccc}{X}_{1,1}^I& {X}_{1,2}^I& \cdots & {X}_{1,d}^I\\ {}{X}_{2,1}^I& {X}_{2,2}^I& \cdots & {X}_{2,d}^I\\ {}\vdots & \vdots & \vdots & \vdots \\ {}{X}_{n,1}^I& {X}_{n,2}^I& \cdots & {X}_{n,d}^I\end{array}\right]}_{n\times d}$$
(13)

where \({\overrightarrow{X}}^I\) represents the top predator vector, which is replicated n times to construct the Elite matrix. n is the number of search agents, and d is the dimension.

Another matrix with the same dimension as Elite is called Prey, and the predator positions are updated based on this matrix.

$${Prey}={\left[\begin{array}{cccc}{X}_{1,1}& {X}_{1,2}& \cdots & {X}_{1,d}\\ {}{X}_{2,1}& {X}_{2,2}& \cdots & {X}_{2,d}\\ {}\vdots & \vdots & \vdots & \vdots \\ {}{X}_{n,1}& {X}_{n,1}& \cdots & {X}_{n,d}\end{array}\right]}_{n\times d}$$
(14)

where Xi, j represents the jth dimension of the ith prey.

MPA optimization process is described as follows:

  • Step 1: Initialization phase.

Similar to most of the meta-heuristic algorithms, MPA randomly initializes the prey position in the search space and initializes the Prey matrix with Eq. (15):

$${X}_0={X}_{\mathrm{min}}+\mathit{\operatorname{rand}}\left({X}_{\mathrm{max}}-{X}_{\mathrm{min}}\right)$$
(15)

where Xmax and Xmin is the upper and lower bound of the search space, rand is a uniform random vector in the range of 0 to 1.

  • Step 2: MPA optimization phase.

  • Phase 1: This phase is mainly used for global search. When the predator is moving faster than the prey, the prey takes a Brownian motion and the predator does not move. The mathematical model of this rule is applied as:

$${\displaystyle \begin{array}{c} Iter<\frac{1}{3}\mathit{\operatorname{Max}}\_ Iter\\ {}\left\{\begin{array}{cc}\begin{array}{l} stepsize={R}_B\bigotimes \left({Elite}_i-{R}_B\otimes {Prey}_i\right)\\ {} Pre{y}_i=\mathit{\Pr}{ey}_i+P\bullet R\otimes \mathrm{step} siz{e}_i\end{array}& i=1,2,\cdots, n\end{array}\right.\end{array}}$$
(16)

where RB is a vector containing random numbers based on normal distribution representing the Brownian motion. P = 0.5 is a constant number. R is a vector of uniform random numbers in [0,1]. Iter is the current iteration, while Max_Iter is the maximum one.

  • Phase 2 In this phase, the global search is transformed into local search for the current optimal solution. Half the population is used for exploitation and the other half for exploration. When the predator and prey are at the same speed, prey is responsible for exploitation based on Lévy motion strategy. Predator is responsible for exploration based on Brownian motion strategy. The mathematical description of exploitation and exploration is as follows:

$${\displaystyle \begin{array}{c}\frac{1}{3}\mathit{\operatorname{Max}}\_{Iter}<{Iter}<\frac{2}{3}\mathit{\operatorname{Max}}\_{Iter}\\ {}\left\{\begin{array}{cc}\begin{array}{l}\mathrm{step} siz{e}_i={R}_L\otimes \left({{Elite}}_i-{R}_L\otimes {{Prey}}_i\right)\\ {} Pre{y}_i=\mathit{\Pr}{ey}_i+P\bullet R\otimes \mathrm{step} siz{e}_i\end{array}& i=1,2,\cdots, n/2\end{array}\right.\end{array}}$$
(17)
$$\left\{\begin{array}{cc}\begin{array}{l}\mathrm{step}size_i=R_B\otimes\left(R_B\otimes Elite_i-Prey_i\right)\\Prey_i=Elite_i+P\bullet CF\otimes\mathrm{step}size_i\end{array}&i=n/2\cdots,n\end{array}\right.$$
(18)

where RL is a vector of random numbers based on Lévy distribution. \(\mathrm{CF}={\left(1-\frac{{Iter}}{\mathit{\operatorname{Max}}\_{Iter}}\right)}^{\left(2\frac{{Iter}}{\mathit{\operatorname{Max}}\_{Iter}}\right)}\) is considered as an adaptive parameter to control the step size for predator movement.

  • Phase 3: In this phase, the local search for the current optimal solution location is carried out. When the predator is slower than the prey, the predator adopts the exploitation strategy based on Lévy motion. This phase is presented as:

$${\displaystyle \begin{array}{l}\frac{2}{3}\mathit{\operatorname{Max}}\_{Iter}<{Iter}<\mathit{\operatorname{Max}}\_{Iter}\\ {}\left\{\begin{array}{cc}\begin{array}{l}\mathrm{step} siz{e}_i={R}_L\otimes \left({R}_L\otimes {{Elite}}_i-{{Prey}}_i\right)\\ {} Pre{y}_i={{Elite}}_i+P\bullet CF\otimes \mathrm{step} siz{e}_i\end{array}& i=1,\cdots, n\end{array}\right.\end{array}}$$
(19)
  • Step 3: FADs’ effect and Eddy formation.

Fish aggregation devices (FADs) or eddy effects usually alter the foraging behavior of marine predators, a strategy that enables MPAs to overcome premature convergence problems and escape local extremes during the search for an optimum. Its mathematical description is as follows:

$${{Prey}}_i=\left\{\begin{array}{c}{{Prey}}_i+ CF\left[{X}_{\mathrm{min}}+R\otimes \left({X}_{\mathrm{max}}-{X}_{\mathrm{min}}\right)\right]\otimes U\\ {}{{Prey}}_i+\left[\mathrm{FAD}s\left(1-r\right)+r\right]\left({{Prey}}_{r1}-{{Prey}}_{r2}\right)\end{array}\right.{\displaystyle \begin{array}{c}i{f}r\le {FADs}\\ {}i{f}r\le {FADs}\end{array}}$$
(20)

where FADs = 0.2 is the probability of FAD effect on the optimization process. U is the binary vector with arrays including 0 and 1. r is the uniform random number in [0,1]. r1 and r2 subscripts denote random indexes of Prey matrix.

Elite opposition-based learning strategy

The performance of the meta-heuristic algorithm is affected by the quality of the initial solution.

In order to further improve the population diversity of MPA, this paper applies the elite opposition-based learning strategy to the initial population stage of MPA. The elite individual most probably has more useful information than other individuals. The opposition solution of current elite individuals is generated, and the opposition solution of elite individuals is integrated with the current solution, and excellent individuals are selected to form a new population, which effectively increases the population diversity of MPA, improves the quality of population, and avoids the algorithm falling into local optimum.

The individual \( {X}_e=\left({X}_{e,1},{X}_{e,2},\cdots, {X}_{e,D}\right) \) corresponding to the extreme point of fitness function in algorithm iteration is regarded as the elite individual of the population \( {X}_i=\left({X}_{i,1},{X}_{i,2},\cdots, {X}_{i,D}\right) \)). The elite opposition solution \( {X}_i^{\prime }=\left({X}_{i,1}^{\prime },{X}_{i,2}^{\prime },\cdots, {X}_{i,D}^{\prime}\right) \) can be defined as:

$$ {X}_{i,j}^{\prime }=k\left({lb}_j+{ub}_j\right)-{X}_{e,j},\kern0.5em {\displaystyle \begin{array}{cc}i=1,2,\cdots, n;& j=1,2,\cdots D.\end{array}} $$
(21)

Where n is the population size, k is the random number of [0, 1], and lbj and ubj are the dynamic boundary, lbj =  min (Xi, j), ubj =  max (Xi, j). If the elite opposition solution exceeds the given boundary, it can be randomly generated according to Eq. (22):

$${\displaystyle \begin{array}{cc}{X}_{i,j}^{\prime }=\mathit{\operatorname{rand}}\left({lb}_j,{ub}_j\right),& \mathrm{if}\ {X}_{i,j}^{\prime }<{lb}_j\ \mathrm{or}\ {X}_{i,j}^{\prime }>{ub}_j\end{array}}$$
(22)

Golden sine algorithm

Golden sine algorithm (Golden-SA) is a new meta-heuristic algorithm proposed by Tanyildizi et al. in 2017. Its inspiration comes from the spatial search of the solution to the problem to be optimized by scanning inside the unit circle of sine function. The golden ratio is used to narrow the search space to approximate the optimal solution. In the iterative process, the golden sine algorithm firstly generates the initial positions of N individuals, and updates the positions of each individual according to Eq. (23) :

$$X\left(t+1\right)=X(t)\ast \left|\mathit{\sin}\left({R}_1\right)\right|-{R}_2\ast \mathit{\sin}\left({R}_1\right)\ast \left|{m}_1\ast {X}_{\boldsymbol{best}}(t)-{m}_2\ast X(t)\right|$$
(23)

where Xbest(t) is the target position and it is the global optimal position in Gold-SA. R1 ∈ [0, 2π], R2 ∈ [0, π], m1, and m2 are the coefficients obtained by the golden section method. These coefficients effectively narrow the search area and guide individuals to the optimal value. The values of a and b will change with the iteration of the algorithm; initial default values for a and b are considered to be −π and π, respectively.

$${m}_1=a\ast \left(1-\tau \right)+b\ast \tau$$
(24)
$${m}_2=a\ast \tau +b\ast \left(1-\tau \right)$$
(25)
$$\tau =\frac{\sqrt{5}-1}{2}=0.618033$$
(26)

MPA improvements

In order to improve the performance of MPA, this paper combines elite opposition–based learning strategy and golden sine algorithm to improve MPA. Elite opposition–based learning strategy can effectively improve the population quality of MPA. Firstly, the MPA incorporating elite opposition–based learning strategy improves the diversity of the initial population and increases the search space; Secondly, in each iteration, the elite opposition–based learning strategy can generate an opposition solution far from the local extreme point, which leads the MPA to jump out of the local extreme point and enhances the global search ability of the algorithm. In addition, the elite opposition-based learning strategy uses the tracking search mode of dynamic boundary to locate individuals in the gradually reduced search area, which can effectively improve the convergence accuracy and speed of MPA.

In addition, based on retaining the traditional MPA’s position update method in phase1 and phase2, the golden sine algorithm is used to improve the position update method of the MPA in phase3 to enhance the optimization performance and accuracy of the algorithm. The improved position update method as shown in Eq. (27):

$${{Prey}}_i={{Prey}}_i\times \left|\mathit{\sin}\left({R}_1\right)\right|-{R}_2\times \mathit{\sin}\left({R}_1\right)\times \left|{m}_1\times {{Elite}}_i-{m}_2\times {{Prey}}_i\right|$$
(27)

In order to simplify the algorithm and improve the operating efficiency, according to the research of Xie et al (Xie et al. 2019), m1 and m2 are set as constants. This approach guarantees that the search ability of the agents will not weaken, while increasing the stability of the algorithm. The specific formulas of m1 and m2 as shown in Eq.(28) and Eq.(29):

$${m}_1=-\pi +\left(1-\tau \right)\ast \pi$$
(28)
$${m}_2=-\pi +\tau \ast \pi$$
(29)

Based on the above improvement of MPA, the pseudo-code of the proposed EGMPA is shown in the Table 1:

Table 1 Pseudo-code of EGMP

Hybrid forecasting model framework

The specific flow of the hybrid prediction model in this paper is shown in Fig. 1, which consists of four steps: data preprocessing, forecasting model based on multi-kernel support vector regression, improved marine predator optimization model, and model evaluation criteria.

  • Step 1: Data preprocessing

Fig. 1
figure 1

Hybrid forecasting model framework

To eliminate the adverse effects of large size differences in the raw data on the prediction, the raw data were preprocessed using normalization and restricted to the range of [0,1].

  • Step 2: Forecasting Model

Multi-kernel support vector regression is used as the forecasting model of carbon dioxide emissions. Linear kernel, RBF kernel, polynomial kernel, and Sigmoid kernel are selected to make a weighted combination of the kernel functions, and they are used in the support vector regression model;

  • Step 3: Optimization Model

EGMPA is used to optimize the main parameters of multi-kernel support vector regression: penalty factor C, kernel function parameters g, and kernel function weight ω. The fitness function selected in this paper is the root-mean-square error (RMSE) between the real value and the predicted value.

  • Step 4: Evaluation Criteria

MAE, RMSE, and MAPE are used as evaluation criteria to further evaluate the performance of the model.

Algorithm test

In this paper, 23 benchmark functions including seven unimodal, six multimodal, and ten fixed-dimension multimodal optimization problems are selected to verify the effectiveness of the proposed EGMPA. It is compared with eight classical meta-heuristics, namely, DE, PSO, CS, MVO, SCA, MFO, SSA, and GWO.

Each algorithm carries out 30 experiments to ensure fairness. In addition, the maximum iteration number is set to 500, the dimension is set to 50, and the population size is set to 30. The algorithm tests are conducted in Python3.8, under Windows 10 system with a 64-bit 2.30 GHz Intel(R) Core (TM) i7-10510U and 16.0GB of RAM.

Table 2 shows the performance of nine different meta-heuristic algorithms on 23 benchmark functions. The convergence curves of the benchmark functions are shown in Figs. 2, 3 and 4.

Table 2 Algorithm test results
Fig. 2
figure 2

Convergence curves of nine algorithms for unimodal test function (F1, F2, F4, F7)

Fig. 3
figure 3

Convergence curves of nine algorithms for multimodal test functions (F8, F10, F11, F12)

Fig. 4
figure 4

Convergence curves of nine algorithms for fixed-dimension multimodal test functions (F15, F21, F22, F23)

As can be seen from Table 2. and Figs. 2, 3 and 4:

  1. (1)

    EGMPA performs significantly better than other meta-heuristic algorithms except F6 functions. SSA has the best performance on F6 functions. Although EGMPA is not the optimal algorithm, it is second only to SSA;

  2. (2)

    For unimodal functions F1–F7, EGMPA shows strong optimization ability on other benchmark functions except F6;

  3. (3)

    For multimodal functions F8–F13, EGMPA is significantly better than other algorithms on all multimodal functions, especially on F9 and F11 functions, MPA and EGMPA can get the actual optimal solution of the function;

  4. (4)

    For fixed-dimension multimodal functions F14–F23, almost all algorithms can find the optimal solution of functions F16–F19, but the standard deviation of EGMPA is significantly smaller than other algorithms. For F14, F15, and F20–F23 functions, the optimal solution obtained by EGMPA is closer to the actual optimal solution;

  5. (5)

    It can be seen from the convergence curves, EGMPA has a steeper convergence curve and higher convergence accuracy for unimodal functions F1, F2, F4, and F7 and multimodal functions F8, F10, F11, and F12. For fixed-dimension multimodal functions F15, F21, F22, and F23, several meta-heuristic algorithms can find the actual optimal solution, but EGMPA has a faster convergence speed.

In summary, the convergence speed and convergence accuracy of EGMPA are significantly better than those of the eight meta-heuristics.

Results and discussion

Data description

In this paper, China’s carbon dioxide emissions from 1965 to 2020 are used as the research object. According to previous studies (Sangeetha and Amudha 2018; Heydari et al. 2019; Lin et al. 2021), energy consumption is a major factor in carbon dioxide emissions, so in this paper, five important influencing factors of coal, oil, natural gas, hydroelectricity, and primary energy are selected to forecast carbon dioxide emissions (Table 3). Since historical carbon dioxide emissions affect carbon dioxide emissions, historical carbon dioxide emissions in China for the first 3 years are also used as input variables for prediction to improve the accuracy of prediction. The data comes from the “Statistical Review of World Energy 2021 | 70th edition” (https://www.bp.com/) released by BP in 2021. The data from 1965 to 2014 were used as the training set, and the data from 2015 to 2020 were used as the test set. The relevant descriptive statistical analysis is shown in Table 3. Due to the large variation in the size of the selected raw data, the normalization method is used in this paper to restrict the data to the range of [0,1]. Fig. 5 shows the normalized line graph of the five influencing factors and carbon dioxide emissions. It can be seen that the emissions of coal, oil, natural gas, hydroelectricity, primary energy, and carbon dioxide emissions are increasing year by year. Fig. 6 shows the annual growth rate of carbon dioxide emissions in China. It can be seen that the annual growth rate of carbon dioxide emissions shows a relatively large fluctuation, and even shows a negative growth. Since 2005, the annual growth rate of China’s carbon dioxide emissions has shown an overall downward trend, which shows that China is increasingly concerned about carbon emissions. The annual growth rate in 2019 and 2020 continued to decline, but its growth rate remained positive, indicating that COVID-19 has had an impact on China’s carbon dioxide emissions.

$${x}_i^{\ast }=\frac{x_i-{x}_{\mathrm{min}}}{x_{\mathrm{max}}-{x}_{\mathrm{min}}}$$
(30)

where xi is the original data, xmin is the minimum value in xi, xmax is the maximum value in xi, and \({x}_i^{\ast }\) is the normalized data.

Table 3 Descriptive statistics of input and output variables
Fig. 5
figure 5

Normalized carbon dioxide emissions and influencing factors

Fig. 6
figure 6

Annual growth rate of carbon dioxide emissions

Evaluation criteria

In order to evaluate the performance of the carbon dioxide prediction model, three commonly used evaluation criteria, including MAE, RMSE, and MAPE, are used, and the calculation formulas are as follows:

$$MAE=\frac{1}{T}\sum\limits_{t=1}^T\left|{y}_t-{\overline{y}}_t\right|$$
(31)
$${RMSE}\sqrt{\frac{1}{T}\sum\limits_{t=1}^T{\left({y}_t-{\overline{y}}_t\right)}^2}$$
(32)
$${MAPE}={100}\times \frac{1}{T}\sum\limits_{t=1}^T\left|\frac{y_t-{\overline{y}}_t}{y_t}\right|$$
(33)

where T represents the number of samples, yt represents the real value of data at time t, and \({\overline{y}}_t\) represents the predicted value of the model at time t.

Forecasting results comparison

Multi-kernel support vector regression comparison

This paper adopts the support vector regression model suitable for small sample modeling as the prediction model of carbon dioxide emissions. Meanwhile, four different kernel functions, linear kernel, polynomial kernel, RBF kernel, and Sigmoid kernel, are weighted and combined to construct the multi-kernel support vector model. The main parameters of multi-kernel support vector regression including penalty factor C, kernel function parameters g, and kernel function weight ω were optimized by EGMPA. The maximum number of iterations of the EGMPA algorithm is set to 100, and the number of populations is set to 30. The optimization range of penalty factor C and kernel function parameters g is set to [0.01,100], and kernel function weight ω are set to [0.001,0.999]. Table 4 shows the error analysis of different multi-kernel functions on the test set, and Fig. 6 shows the prediction results of support vector regression for different multi-kernel functions on the test set.

Table 4 Error analysis results of different multi-kernel support vector regression models

It can be seen from Table 4 and Fig. 7.

Fig. 7
figure 7

Error analysis results of different multi-kernel support vector regression models

RMSE, MAE, and MAPE of EGMPA optimized single kernel support vector regression are significantly smaller than those of the unoptimized model, and linear kernel has the best performance in single kernel support vector regression. The error of multi-kernel support vector regression is obviously smaller than that of single kernel support vector regression, which indicates that multi-kernel support vector regression can obtain better learning effect. Among the 10 models, the best model is EGMPA-RBF-Sigmoid, whose RMSE is 37.43, MAE is 30.63, and MAPE is 0.32, which is obviously better than other models.

This is because the RBF kernel is a typical local kernel with strong learning ability but weak generalization ability, while the Sigmoid kernel is a global kernel with weak learning ability but strong generalization ability. The weighted combination of the two kernel functions can meet the requirements of strong learning ability and strong generalization ability at the same time, and a better learning effect can be obtained. The prediction results of different multi-kernel support vector regression on the test data show that the EGMPA-RBF-Sigmoid model and the EGMPA-Poly-RBF model are closer to the true value than the other models, and the difference with the true value is smaller.

In summary, the optimized support vector regression of EGMPA is better than the unoptimized model. The error result of multi-kernel support vector regression is significantly smaller than that of single-kernel support vector regression. In multi-kernel support vector regression, RBF-Sigmoid-EGMPA has the best learning ability and generalization ability。

Different model comparison

The prediction model is compared with BPNN, LSTM, RNN, and GRU models and used to highlight the advantages of the proposed model in predicting carbon dioxide emissions. Table 5 shows the error analysis of different models on the test set, and Fig. 8 shows the prediction results of different models on the test set.

Table 5 Error analysis results of different models

It can be seen from Table 5 and Fig. 8:

Fig. 8
figure 8

Error analysis results of different models

RMSE, MAE, and MAPE of the EGMPA-RBF-Sigmoid model were 37.43, 30.63, and 0.32, respectively, which were significantly lower than those of other comparable models. According to the evaluation criteria, the prediction accuracy of the model is ranked from high to low: EGMPA-RBF-Sigmoid > BPNN > RNN > LSTM > GRU. It can be seen from the predicted values of China’s carbon dioxide emissions by different models that the predicted values of EGMPA-RBF-Sigmoid are closest to the real predicted values, the predicted results of GRU and BPNN are significantly higher than the real predicted values, while the predicted results of LSTM and RNN are lower than the real predicted values.

In summary, the prediction accuracy of EGMPA-RBF-Sigmoid on carbon dioxide emissions in China is significantly higher than that of BPNN, RNN, LSTM, and GRU models, which further demonstrates the superiority of the proposed model.

China’s carbon dioxide emissions forecast during the ”14th Five-Year Plan“ period

In this paper, the EGMPA-RBF-Sigmoid model is used to forecast China’s carbon dioxide emissions during the “14th Five-Year Plan” period (2021-2025). According to the “Ten Year Outlook for China’s Energy Revolution (2021–2030),” it is expected that during the “14th Five-Year Plan” period, the growth rate of primary energy demand will decline steadily and slightly, with an average annual growth rate of about 2.5%. In 2025, the total energy consumption will exceed 5.5 billion tons of standard coal; after hard work, the total proportion of clean energy such as non-fossil energy and natural gas will exceed 30%; the oil demand will be about 700 million tons, accounting for nearly 18%, and the proportion of coal will decrease. To less than 50%, according to the above policies, the predicted values of influencing factors of China’s carbon dioxide emissions during the “14th Five-Year Plan” period are determined, as shown in Table 6.

Table 6 Predicted values of influencing factors

According to the predicted values of the influencing factors, the EGMPA-RBF-Sigmoid model can be used to predict the carbon dioxide emissions in China during the “14th Five-Year Plan” period. It can be seen from the Fig. 9.

Fig. 9
figure 9

Forecast results of China’s carbon dioxide emissions during the “14th Five-Year Plan” period

The predicted carbon dioxide emission in 2021 is 9949.14 Mt, and then increases year by year. The predicted carbon dioxide emissions in 2025 will reach 10,258.95 Mt. Although China’s carbon dioxide emissions have gradually increased, the increase has gradually decreased, indicating that China is getting closer to its goal of “Carbon Peak.” The above predicted results are consistent with the studies of Niu et al. (2020) and Qiao et al. (2019). Niu et al. predicted that China’s carbon dioxide emissions will continue to increase from 2020 to 2025 under the basic as usual, policy tightening and market allocation scenarios. Qiao et al. predict that China’s carbon dioxide emissions will reach 11521.21 MT in 2025.

The “14th Five-Year Plan” is a critical period and a window period for the “Carbon Peak” target. In order to achieve the “Carbon Peak” by 2030 and the “Carbon Neutrality” by 2060, the Chinese government has taken a series of measures in different fields to implement energy Green and low-carbon transformation action, industrial carbon peaking action, transportation green and low-carbon action, green and low-carbon technological innovation action, etc. China is also continuing to promote the energy supply revolution, the energy technology revolution and the energy system revolution, and intensify efforts to develop green energy.

Carbon dioxide emissions are closely related to economic development. Accurately predicting China’s carbon dioxide emissions can provide an important basis for future economic development. Green and circular development will be one of China’s important economic growth drivers in the future, as well as an important direction for developing emerging industries. At the same time, the accurate prediction of carbon dioxide emissions also has certain guiding significance for China’s future low-carbon emission reduction and economic sustainable development.

In summary, during the “14th Five-Year Plan” period, China’s carbon dioxide emissions will continue to grow, but the growth rate will slow down significantly, and China is approaching the goal of “Carbon Peak.”

Conclusions

In this paper, MPA combining elite opposition-based learning strategy and golden sine algorithm is proposed to optimize the parameters of the multi-kernel support vector regression model, and it is used to predict carbon dioxide emissions in China. The following conclusions can be drawn from the comparison between the algorithm test and the carbon dioxide emissions prediction results in China:

  1. (1)

    Compared with the other eight meta-heuristic algorithms, EGMPA, which combines elite opposition-based learning strategy and golden sine algorithm, has stronger global search ability, faster convergence speed and higher accuracy;

  2. (2)

    Based on the prediction results of multi-kernel vector regression models with different kernel functions, it can be seen that RMSE, MAE, and MAPE of single-kernel support vector regression after EGMPA optimization are significantly smaller than those of the unoptimized model, and the error of multi-kernel support vector regression model is significantly smaller than that of single-kernel support vector regression model. It shows that the multi-kernel support vector regression model can obtain a better learning effect;

  3. (3)

    Among the 10 different kernel support vector regression models, the EGMPA-RBF-Sigmoid model has the best performance, which is because RBF kernel is a classical local kernel, while Sigmoid kernel is a global kernel. The weighted combination of the two kernels can simultaneously meet the requirements of strong learning ability and strong generalization ability of kernel function, get better learning results;

  4. (4)

    The EGMPA-RBF-Sigmoid model is superior to BPNN, LSTM, RNN, and GRU in RMSE, MAE, and MAPE, which further demonstrates the accuracy and effectiveness of the proposed model in carbon dioxide emissions prediction;

  5. (5)

    The EGMPA-RBF-Sigmoid model, which has the highest prediction accuracy, is used to forecast China’s carbon dioxide emissions during the “14th Five-Year Plan” period. According to the prediction results, China’s carbon dioxide emissions will continue to increase during the “14th Five-year Plan” period, but the growth rate will significantly slow down, indicating that China is gradually approaching the goal of “Carbon Peak.”

In the future, the carbon dioxide emission prediction model constructed in this paper can be used in more countries and provide a forecasting tool for national policy makers and relevant researchers. At the same time, more factors, such as economic factors, social factors, and energy prices, can be added in future studies to explore the dynamic changes of carbon dioxide emissions.