Introduction

Global warming is closely related to carbon emissions, while the increasing emissions are originated from rapid economic development and industrial advancement (Wu et al. 2021; Nazifi and Milunovich 2010). Limiting the rise of global warming within 1.5 °C is the most ambitious goal of the Paris Agreement signed in 2015; however, this goal is almost impossible at the current emission level. According to the International Energy Agency (IEA), the global emissions of energy-related greenhouse gases in 2022 reached 41.3 billion tons, while China’s carbon emissions have improved from 7.71 billion tons to 11.477 billion tons between 2009 and 2022. As the global climate issue is getting increasingly severe, China, the world’s largest energy consumer and greenhouse gas emitter, is facing severe environmental pressure. To solve the environmental constraints on high-quality development, the Chinese government officially implemented the “carbon peak, carbon neutral” strategy in September 2020, and the construction of a carbon market has become a concrete measure to reduce carbon emissions. The carbon market plays a crucial role in helping China reduce industrial pollutant emissions (Byun and Cho 2013; Zhang et al. 2019). The higher the market efficiency, the more obvious the reduction of emissions performance (Chevallier 2009; Schneider et al. 2019); therefore, studying the price formation and driving mechanism of the carbon market and forecasting the carbon price accurately are the keys to achieving the goals of the proposed strategy. However, the mainstream carbon price forecasting models based on empirical mode decomposition (EMD) usually present the problems of large decomposition errors and signal complexity identifying subjectively, which may result in low forecasting accuracy. This paper develops a new secondary decomposition carbon price-forecasting model that integrates the fuzzy dispersion entropy and deep learning paradigm; noteworthy, we design an error-corrected idea to strengthen the deep learning forecasting ability which is commonly ignored in most previous studies. The expected forecasting accuracy can support valuable market decision and contribute to emission reduction at a low cost.

The rest of this paper is organized as follows: the “Literature review” section is the literature review. The “Methodology” section introduces the basic models and describes the logical framework of the proposed model. The “Empirical discussion” section discusses and analyzes the out-of-sample forecasting results and analyzes the robustness in different forecasting periods. The “Conclusions” section is the conclusions.

Literature review

Most studies have found that the carbon price has nonlinear, non-stationary, non-normal, and multi-scale characteristics (Zhu et al. 2015; Pan et al. 2023; Zhang et al. 2023; Yue et al. 2023). Different from the volatility modeling technologies, the EMD technology has become the main method for carbon price forecasting in recent years (Tang et al. 2017; Mao and Zeng 2023). Based on the decomposition differences, existing literature can be divided into primary decomposition and secondary decomposition carbon price forecasting studies.

Primary decomposition carbon price forecasting studies

Conducted the EMD technology to decompose the carbon price, used the least squares support vector machine optimized by the particle swarm optimization algorithm (PSO-LSSVM), and generalized autoregressive conditional heteroskedasticity (GARCH) model to take the out-of-sample forecasting, the results put that the EMD-PSO-LSSVM and EMD-GARCH models have high forecasting accuracy in Europe carbon future price (Jianwei et al. 2021; Zhu et al. 2018; Zhang and Wu 2022). The ensemble empirical mode decomposition (EEMD) and complete EEMD (CEEMD) are developed to primary decompose the carbon price; the autoregressive integrated moving average (ARIMA), local polynomial prediction (LPP), and the particle swarm optimized gray neural network (PSO-GNN) are employed for forecasting, and it is found that the EEMD-ARIMA-LPP, EEMD-LPP, and CEEMD-PSO-GNN models are efficient carbon price forecasting tools (Qin et al. 2020; Zhang et al. 2018). Furthermore, the modified EEMD (MEEMD) and variational mode decomposition (VMD) technologies can also effectively reduce decomposition errors and improve decomposition efficiency (Yang et al. 2020; Guo et al. 2022). The mode reconstruction (MR) method is introduced to identify the contribution of each component, and long short-term memory (LSTM) and GARCH models are used to forecast; the conclusions suggested that the model of VMD-MR-LSSVM has the highest forecasting accuracy in China’s Hubei and Shenzhen carbon markets, while the VMD-LSTM-GARCH model has higher forecasting accuracy in European carbon price (Zhu et al. 2019; Huang et al. 2021). Adopted the multi-resolution singular value decomposition (MRSVD) technology to decompose the original carbon price and utilized the extreme learning machine (ELM) optimized by the adaptive whale optimization algorithm (AWOA) to forecast the carbon price, the findings maintained that the forecasting accuracy of MRSVD-AWOA-ELM model in China and European carbon market is better than other comparative models such as EMD-AWOA-ELM and EMD-AWOA-BP (Sun and Zhang 2018). The model of complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), improved CEEMDAN (ICEEMDAN), and sample entropy (SE) are used to decompose and extract the complex carbon price signals; the studies suggested that the CEEMDAN-SE-LSTM and ICEEMDAN-type models have stronger robustness and generalization ability in China short-term carbon price forecasting (Wang et al. 2021; Yun et al. 2023; Hao et al. 2020).

Secondary decomposition carbon price forecasting studies

Although the EMD-type models can decompose the original carbon price into multi-scale time-frequency signals, they may produce larger decomposition errors and mode mixing problems theoretically (Nguyen and Phan 2022; Junior et al. 2020), while the secondary decomposition technology can reduce the decomposition errors effectively by recognizing the high-complexity signals to some extent (Kong et al. 2022).

For example, employed the EMD, EEMD, CEEMD, CEEMDAN, and VMD technologies to primary decompose and secondary decompose the carbon price; classified high-complexity signals by the sample entropy (SE), multiscale fuzzy entropy (MFE), and partial autocorrelation function (PACF) (Li et al. 2021;Yang et al. 2023a, 2023b); and further built the deep learning models such as PSO-LSSVM, backpropagation network (BP), gate recurrent unit (GRU), LSTM, and ELM models for out-of-sample forecasting (Li et al. 2021; Wang et al. 2022), the results convinced that the secondary decomposition EMD-VMD-BP and EMD-VMD-LSTM models have stable forecasting performance in Beijing and Shanghai carbon markets (Sun and Huang 2020), while VMD-EEMD-GRU, CEEMD-VMD-BP, and CEEMDAN-VMD-LSTM models have superior forecasting accuracy in Hubei and Guangdong carbon markets (Wu and Liu 2020; Zhou and Wang 2021; Liu et al. 2023). Theoretically, ICEEMDAN technology has obvious advantages over other EMD-type models (Liu et al. 2023). The research applied the ICEEMDAN and CEEMD technologies to primary and secondary decompose the carbon price and adopted the support vector machine (SVM) and multilayer perceptron (MLP) for price forecasting, the findings show that the ICEEMDAN-CEEMD-SVM, ICEEMDAN-CEEMD-SVM, and ICEEMDAN-CEEMD-MLP models are significantly superior to other primary decomposition models for forecasting China carbon price (Li and Liu 2023; Yang et al. 2023a; Yang et al. 2023b).

Research gaps and contributions

In summary, the carbon price forecasting models in previous studies have developed from primary decomposition to secondary decomposition. However, there are still some improvements for further study. (1) The EMD-type technologies have theoretical defects of large decomposition errors, mode mixing, endpoint effects, and mode alignment problems, while the developed CEEMDAN technology may lead to false component signals during the initial stage of mode decomposition. (2) The complex signals recognized methods such as sample entropy and fuzzy entropy algorithm may have the problems of discontinuous entropy data, relatively sensitive data length and high sensitivity to noise interference. A new signal complexity identification method needs to be developed. (3) The existing models ignore the forecasting performance of the proposed model under different forecasting periods, and the reliability of the conclusions needs to be strengthened. (4) For convincing the forecasting ability, previous studies mainly focus on comparing the gap between the predicted price and the real price, while the role of predicted errors term in improving the final forecasting results has been ignored in most previous studies.

Based on the special carbon price characteristics, this article carried out a new logic of “primary decomposition - complexity recognized - secondary decomposition - forecasting and summing - error corrected” for guiding the hybrid model construction in forecasting China’s carbon price. The contribution is designing a new error-corrected secondary decomposition hybrid model integrated fuzzy dispersion entropy and deep learning paradigm, named ICEEMDAN-FDE-VMD-PSO-LSTM-EC model. Specifically, initially, the ICEEMDAN technology is used to primary decompose the original carbon price to solve the common problems of larger decomposition errors and mode mixing in traditional EMD-type technologies. Secondly, the fuzzy dispersion entropy (FDE) algorithm is conducted to identify the high-complexity signal to overcome the discontinuous entropy and sensitivity to noise in other entropy algorithms. Then, the VMD technology is used to secondary decompose the high-complexity signals that are recognized by the FDE algorithm. Fourthly, we conduct the deep learning LSTM model that has a time series fitting ability to forecast the acquired low complexity signals and further use the particle swarm optimization (PSO) algorithm to optimize the model parameters and improve the forecasting ability, and then the predicted price can be obtained by summing up the predicted components. Finally, the error-corrected (EC) method is used to forecast the errors of each model again, and the final predicted price with the error corrected has been calculated by adding up the predicted errors and the predicted price.

Methodology

ICEEMDAN model

The improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) model is an improved adaptive decomposition form based on the traditional EMD technology (Colominas et al. 2014). The classical EMD and EEMD technologies may cause problems of larger reconstruction errors and mode mixing (Wu and Huang 2009). In contrast, the core of the CEEMDAN model is adding positive and negative pairs of white noise to the residual of the decomposition process and calculating the average mode information after obtaining the first-order intrinsic mode functions (IMF) components (Zhou et al. 2022). Those special noise-added styles can solve the problem of white noise transfer from high frequency to low frequency effectively so that the reconstruction errors can be reduced greatly. Different from the CEEMDAN model, the significant feature of the ICEEMDAN model is regarding the IMF components as the white noise for adding it to the decomposition process, conducting the adaptive empirical mode decomposition (AEMD) to decompose the multi-scales signals. The decomposition process of the carbon price by ICEEMDAN can be as follows:

Step 1: define the operator Ek(⋅) as the k-th mode component, M(⋅) indicate the local mean signal that needs to be decomposed, and define 〈〉 as the mean process. Then, add the Gaussian white noise to the original carbon price signal x:

$${x}_1^{(i)}=x+{\varepsilon}_0{E}_1\left({w}^{(i)}\right)$$
(1)

Among them, w(i) means that the i-th white noise needs to be added and ε0 is the noise standard deviation when the first signal is decomposed. E1(⋅) represents the first IMF calculation process. Then, the residual of the first decomposition is expressed as r1 = 〈M(x(i))〉.

Step 2: calculate the first mode component IMF1:

$$\textrm{IMF}1=x-{r}_1$$
(2)

Step 3: calculate the second mode component IMF2:

$$\textrm{IMF}2={r}_1-{r}_2$$
(3)

Step 4: calculate the k-th mode component IMFk:

$$\textrm{IMF}k={r}_{k-1}-{r}_k$$
(4)

Step 5: iteration termination condition:

$$\sigma =\frac{{\left\Vert \textrm{IMF}k-\textrm{IMF}\left(k-1\right)\right\Vert}_2}{{\left\Vert \textrm{IMF}k\right\Vert}_2}$$
(5)

where r2 = 〈M(r1 + α1E2(w(i)))〉, rk = 〈M(rk − 1 + αk − 1Ek(w(i)))〉, and k = 2, 3, ⋯, N, and σmeans the standard deviation between two IMF components. When k ≥ 2, the first mode decomposition is completed, and the next decomposition is continued until it meets the iteration termination condition. This paper adopts the Cauchy criterion as the iterative convergence condition, that is, when the standard deviation σ < 0.2, the iteration terminates. The idea of the ICEEMDAN decomposition process is shown in Fig. 1.

Fig. 1
figure 1

The mode decomposition framework of the ICEEMDAN model

Fuzzy dispersion entropy

Fuzzy dispersion entropy (FDE) is a reliable indicator to measure the time series complexity by calculating the probability of new signal generation (Rostaghi et al. 2021). The FDE algorithm overcomes the defects of discontinuous entropy data, sensitive data length that commonly exists in the calculation of sample entropy, fuzzy entropy, and other methods. For a given carbon price series x = {x1, x2, ⋯, xN}, the FDE can be calculated as follows:

Firstly, map the carbon price series xj(j = 1, 2, ⋯, N) to yj(j = 1, 2, ⋯, N) by the standard normal distribution function \({y}_j=\frac{1}{\sigma \sqrt{2\pi }}{\int}_{-\infty}^{x_j}{e}^{\frac{-{\left(t-\mu \right)}^2}{2{\sigma}^2}} dt\).

Secondly, continue to transmit yj to c × yj + 0.5 by linear function of \({z}_j^c=\left(c\times {y}_j+0.5\right)\), where c represents the number of categories.

Thirdly, calculate the dispersion pattern \({\pi}_{v_0{v}_1\cdots {v}_{m-1}}\left(v=1,2,\cdots, c\right)\), where \({v}_0={u}_i^c\), \({v}_1={u}_{i+d}^c\), \({v}_{m-1}={u}_{i+\left(m-1\right)d}^c\), m represents the embedding dimensions, d means the time delay, \({u}_i^{m,c}=\left\{{u}_i^c,{u}_{i+d}^c,\cdots {u}_{i+\left(m-1\right)d}^c\right\}\), i = 1, 2, ⋯, N − (m − 1)d, and \({u}_i^c=\textrm{round}\left(c\times {y}_i+0.5\right)\). round(⋅) indicates the round function, the dispersion pattern consists of c-digit numbers, each number has m values, and there are cm corresponding patterns.

Fourthly, calculate each dispersion pattern probability \({\pi}_{v_0{v}_1\cdots {v}_{m-1}}\).

Finally, based on the Shannon entropy definition, the FDE is defined as

$$\textrm{FDE}=\textrm{Fuzzy}\_ DismEN\left(x,m,c,d\right)=-\sum \limits_{\pi =1}^{c^m}p\left({\pi}_{v_0{v}_1\cdots {v}_{m-1}}\right)\times \ln \left(p\left({\pi}_{v_0{v}_1\cdots {v}_{m-1}}\right)\right)$$
(6)

Theoretically, if the entropy value is greater than 1, it means that the input signal has a strong complexity and the signal noise is relatively large that needs to conduct the secondary decomposition process to reduce the noise complexity (Wang et al. 2022).

VMD model

The typical advantages of VMD are determining the mode number of the original carbon price signals flexibly and effectively avoiding the common problems of endpoint effect and mode mixing that are presented in traditional recursive algorithms (Dragomiretskiy and Zosso 2013). So, the VMD model is suitable for dealing with complex, nonlinear, and non-stationary time series theoretically. Assuming the carbon price is composed of a signal with a specific center frequency and finite width, the VMD model applies the Wiener filter method to adaptively search for the best center and width during the variational search and solution process. The decomposition steps of the VMD are as follows:

Firstly, decompose the carbon price signal S into K IMF components and ensure the decomposed signals are mode components with a specific center frequency and limited width, the signal variation expression is defined as:

$$\left\{\begin{array}{l}\underset{\left\{\left. IMFK\right\},\left\{\left.{w}_K\right\},\right.\right.}{\min }{\left\{\sum \limits_{k=1}^K\left\Vert {\partial}_t\left[\left(\delta (t)+\frac{j}{\pi t}\right)\times \left.\textrm{IMF}k(t)\right]{e}^{- jwkt}\right.\right\Vert \right.}_2^2\\ {}s.t.\sum \limits_{k=1}^K\textrm{IMF}k=S\end{array}\right.$$
(7)

where {IMFk} = {IMF1, IMF2, ⋯, IMFk} is the obtained carbon price multi-scale signals, {wk} = {w1, w2, ⋯, wk} is the center frequency of each mode signal, and δ(t) means the pulse signal.

Secondly, introduce the second-order penalty coefficient α and Lagrangian function L to solve the above optimization and variational problems:

$${\displaystyle \begin{array}{l}L\left[\left\{\left.\textrm{IMF}k(t)\right\},\left\{\left.{w}_k(t)\right\},\tau (t)\right.\right]\right.=\alpha \sum \limits_{k=1}{\left\Vert {\partial}_t\left[\left(\delta (t)+j/\pi t\right)\times \textrm{IMF}k(t)\right]{e}^{- jwkt}\right\Vert}_2^2\\ {}+{\left\Vert S(t)-\sum \limits_{k=1}^K\textrm{IMF}k(t)\right\Vert}_2^2+\left\langle \tau (t),S(t)-\sum \limits_{k=1}^K\textrm{IMF}k(t)\right\rangle \end{array}}$$
(8)

where τ is the Lagrange multiplier.

Finally, update each mode component and its center frequency and calculate the final optimal solution by the following formula:

$$\hat{I}{MFk}^{n+1}(w)=\frac{\hat{S}(w)-\sum \limits_{i\ne k}{\hat{v}}_i(w)+{\hat{\tau}}_i(w)/2}{1+2\alpha {\left(w-{w}_k\right)}^2}$$
(9)

In the above formula, w represents the fluctuation frequency and \(\hat{I}{MFk}^{n+1}(w)\), \({\hat{v}}_i(w)\), and \({\hat{\tau}}_i(w)\) are the Fourier transforms of IMFkn +1(w), vi(w), and τi(w) respectively. IMFkn +1(w) is the residual after Wiener filtering of \(\hat{S}(w)-\sum \limits_{i\ne k}{\hat{v}}_i(w)\).

According to the obtained IMF components, the center frequency w of the current mode can be updated by the following transformation:

$${w}_k^{n+1}=\frac{\int_0^{\infty }w{\left|{IMFk}^{n+1}(w)\right|}^2 dw}{\int_0^{\infty }{\left|{IMFk}^{n+1}(w)\right|}^2 dw}$$
(10)

PSO algorithm

Particle swarm optimization (PSO) is an improved algorithm with low parameter dependence for searching global solutions (Wang et al. 2018). The hidden layers, neurons, and learning rate of a neural network are important factors affecting the training performance. So, we use the PSO algorithm to optimize those parameters. According to the theory, any particle only has two attributes of movement speed and position; each particle conducts a separate optimization movement to search for the local optimal values. During this process, individual extreme value is shared with other particles to determine the global optimal value. Similarly, other particles also adjust their search speed and position accordingly. The optimization process of the PSO algorithm is as follows:

$${v}_{i,t+1}=w\times {v}_{i,t}+{c}_1\times \mathit{\operatorname{rand}}\left(\right)\times \left({pbest}_i-{x}_{i,t}\right)+{c}_2\times \mathit{\operatorname{rand}}\left(\right)\times \left({gbest}_i-{x}_{i,t}\right)$$
(11)
$${x}_{i,t+1}={x}_{i,t}+\lambda {v}_{i,t+1}$$
(12)

Among them, vi, t indicates the current movement speed, rand() represents a random number between 0 and 1, pbest and gbest are the local and global optimal solutions, respectively, w represents the inertia coefficient, c1 and c2 are the learning factors, vi, t + 1 denotes the maximum value of vi, t, and λ is the speed coefficient.

PSO-LSTM forecasting model

Based on the research design, the deep learning LSTM model is used for the nonlinear fitting of each low-complex component after the VMD process. One reason for choosing the LSTM model is that the model can effectively filter, screen, and update input information by a specially designed gate structure and retain the memory characteristics of the sample series (Hochreiter and Schmidhuber 1997). The training process of the LSTM is as follows:

The function of the forget gate is to screen each carbon price mode component to obtain the filtered output.

$${f}_t=\sigma \left({W}_f\times \left[{h}_{t-1},{x}_t\right]+{b}_f\right)$$
(13)

The input gate updates the forgetting output, so as to calculate the updated characteristics of each mode component by the following function:

$${i}_t=\sigma \left({W}_i\times \left[{h}_{t-1},{x}_t\right]+{b}_i\right)$$
(14)
$$\tilde{C_t}=\mathit{\tanh}\left({W}_C\times \left[{h}_{t-1},{x}_t\right]+{b}_C\right)$$
(15)
$${C}_t={f}_t\times {C}_{t-1}+{i}_t\times \tilde{C_t}$$
(16)

The output gate determines the mode component characteristics that need to be memorized. Thus, the output of this network can be acquired by the activation function.

$${o}_t=\sigma \left({W}_o\times \left[{h}_{t-1},{x}_t\right]+{b}_o\right)$$
(17)
$${h}_t={o}_t\times \mathit{\tanh}\left({C}_t\right)$$
(18)

Among them, ft represents the data output screened by the forget gate, it and \(\tilde{C_t}\) are the update features of the input gate and candidate cell vector, Ct indicates the updated vector, ot means the output information of the cell, ht is the network output, w and b represent the weight and bias, and σ is the sigmoid activation function.

This article takes the hidden layers, neurons, and learning rate of the LSTM model as the optimization variables for PSO particles. By updating the speed and position of the particles, the fitness value of carbon price forecasting is minimized, and the optimal model parameters can be obtained. The forecasting steps of the PSO-LSTM are as follows:

Firstly, initialize the particle swarm. We need to set the initial position and speed for each particle (that is the hidden layers, neurons, and learning rate of the LSTM model). Those initial values are usually randomly generated, and the range is determined by the constraint conditions. Secondly is fitness assessment. For each particle, construct an LSTM model using the current parameters, and then use this model to forecast. The forecasting errors are recognized as the fitness value of the particle. Thirdly, update particle position and speed. Update the position and speed of the particles based on their global optimal positions. Fourthly is iterative optimization. Repeat the above steps until the stop condition is met. In each iteration, the position and speed of particles are updated, and the fitness values are re-evaluated. Finally, model forecasting. After the parameters training steps above, use the obtained global optimal parameter to construct the final LSTM model and forecast the test data.

The proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model

To adapt the special carbon price characteristics of nonlinear, non-stationary, and non-normal, this paper designs an error-corrected secondary decomposition hybrid model integrated fuzzy dispersion entropy and deep learning paradigm. That is integrating the advantages of ICEEMDAN, FDE, VMD, PSO-LSTM, and EC methods to improve the forecasting accuracy of carbon prices. The idea of the proposed model is shown in Fig. 2.

Fig. 2
figure 2

The logical framework of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model

Step 1: adopt the ICEEMDAN technology to decompose the original carbon price signals, so as to obtain several IMF signals and a residual term. Step 2: employ the FDE algorithm to calculate the signal complexity of the IMF that is acquired by the ICEEMDAN technology. Step 3: conduct the VMD technology to perform secondary decomposition on the high-complexity signals identified in step 2 to reduce the signal noise and decomposition errors. Step 4: apply the deep learning paradigm of PSO-LSTM to perform one-step-forward forecasting on the low-complexity signals. The predicted price can be obtained by summing up the predicted values of each IMF component. Step 5: re-modify the final predicted results with the error-corrected method. Specifically, (1) subtract the real carbon price from the predicted price obtained in step 4 to acquire the real errors and (2) use the PSO-LSTM model to forecast the errors to obtain the predicted errors, and (3) the final predicted price with the error corrected can be calculated by summing the predicted errors and predicted price.

For estimating the carbon price forecasting performance of the proposed model, this paper also constructed hybrid and single comparative models based on the decomposition technologies of CEEMDAN, EEMD, EMD, and machine learning models such as GRU and BP.

Evaluation criteria

The error evaluation indicators root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and the Pearson correlation (Corr) are conducted as criteria for evaluating the forecasting performance of the proposed model and its comparative models. Among them, the smaller the error indicators, the better the model’s superiority. The Corr reflects the correlation between the final predicted price and the real price. The greater the correlation, the model forecasting accuracy is commonly recognized as higher; otherwise, the forecasting ability is poor.

$$RMSE=\sqrt{\frac{\sum_{i=1}^N{\left({y}_i-{\hat{y}}_i\right)}^2}{N}}$$
(19)
$$MAE=\frac{1}{N}{\sum}_{i=1}^N\left|{y}_i-{\hat{y}}_i\right|$$
(20)
$$MAPE=\frac{1}{N}{\sum}_{i=1}^N\left|\frac{y_i-{\hat{y}}_i}{y_i}\right|$$
(21)
$${Corr}_{y,\hat{y}}=\frac{\operatorname{cov}\left(y,\hat{y}\right)}{\sigma_y{\sigma}_{\hat{y}}}=\frac{E\left[\left(y-{\mu}_y\right)\left(y-{\mu}_{\hat{y}}\right)\right]}{\sigma_y{\sigma}_{\hat{y}}}$$
(22)

Among them, yi and \({\hat{y}}_i\) are the real carbon price and the predicted price, respectively, σy and \({\sigma}_{\hat{y}}\) represent the variance, μy and \({\mu}_{\hat{y}}\) indicate the mean value, and N represents the sample.

Empirical discussion

Data and basic statistics

Since October 2011, the Chinese government has carried out carbon trading pilots in Beijing, Tianjin, Shanghai, Chongqing, Hubei, Guangdong, Shenzhen, and Fujian. Due to the large differences in economic development, energy consumption, and market regulation among pilot regions, the trading volume and trading price of each market show differences correspondingly. Especially, the Hubei carbon market has become the most representative and active trading pilot. For example, at the end of 2022, the Hubei carbon market had a total trading volume of 375 million tons, accounting for 44.6% of China’s whole share. Furthermore, as the official establishment of China’s unified carbon market in 2021, the Hubei carbon market plays a guiding role in market registration and trading settlement of the proposed market (Zhou and Li 2019; Zhang et al. 2020). Therefore, we select the Hubei carbon market as the research object.

The data are sourced from the Hubei carbon emissions trading center (https://www.hbets.cn), and the sample is the daily transaction price from April 28, 2014, to May 31, 2023, and the totaling 2174 samples are obtained. Additionally, after the decomposition of the original price by the EMD-type technologies, the first 80% of the obtained components are used for the parameter optimization of the neural network, and the last 20% are used for the one-step forecasting.

According to Fig. 3, the carbon price volatility is high, the nonlinear characteristics are obvious, and its distribution histogram and probability density do not conform to the normal distribution. Specifically, firstly, the average carbon price is 28.14, and the standard deviation and skewness are 10.76 and 0.624 according to Table 1. A positive right skewed distribution means that the carbon price has a right “outlier” (Hubert and Vandervieren 2008), and its density function is different from the normal distribution form. Secondly, the ADF statistical value is 1.714, which is not statistically significant, which means that the carbon price rejects the stationary assumption, it has non-stationary characteristics. Thirdly, the critical values of BD statistics are significant at the 1% level, suggesting that carbon price has nonlinear characteristics. Therefore, the original carbon price signal has non-normal, non-stationary, and nonlinear characteristics, which make it suitable for multi-scale decomposition by the mode decomposition technology.

Fig. 3
figure 3

The carbon price and its probability density

Table 1 Descriptive statistics (2014.4.28–2023.5.31)

Decomposition of the carbon price signal

Primary decomposition by the ICEEMDAN technology

During the carbon price decomposition process of ICEEMDAN, the ratio of noise deviation to decomposed signal deviation is set to 0.3, the number of noise addition is set to 100, and the maximum iteration is set to 1000.

The results present that IMF9 components and 1 residual term can be obtained, as depicted in Fig. 4, and the signal frequency, average period, and time-frequency characteristics of the proposed IMF signals are significantly different. Specifically, the variance contribution, average period, and correlation of IMF1 are all the smallest according to Table 2, with values of 0.002, 2.812, and 0.059, respectively. Those findings mean that the IMF1 signal has a higher fluctuation frequency and a shorter fluctuation period that may hide more decomposition noise, while the variance contribution of the residual is the highest at 0.796, with a sample period of 2174 days, and the highest correlation at 0.873. Those indicate that the decomposition noise of the residual term is low, as a result, we regard it as a long-term trend with low complexity. Furthermore, for other mode signals from IMF1 to IMF9, the variance contribution, average period and correlation gradually improve, which indicates a decrease in signal noise and an improved ability to explain the carbon premium.

Fig. 4
figure 4

The primary decomposition of carbon price based on the ICEEMDAN model

Table 2 The mode statistic of carbon price signal after the ICEEMDAN primary decomposition

Additionally, as a comparison, this article also employs CEEMDAN, EEMD, and EMD to decompose the original carbon price (as shown in Appendix 1 of Figs. 13, 14, and 15). Especially, the variance contribution, average period, and correlation of the obtained IMF signals gradually increase, that means the volatility of the signal components gradually decreases (as shown in Appendix 2 of Tables 9, 10, and 11). Those findings are completely consistent with the decomposition results of the ICEEMDAN technology mentioned above.

Secondary decomposition by the ICEEMDAN-FDE-VMD technology

The obvious advantage of FDE is focusing on the threshold determination and mapping probability of complex fuzzy signals (Rostaghi et al. 2021). This article conducts the FDE to measure the entropy values of obtained IMF signals. The results show (the last column of Table 2) that only the entropy value of the IMF1 is greater than 1, which is 1.486, while other IMF signal entropy is less than 1. Therefore, we conclude that the IMF1 series hides more chaotic and complex information compared with other IMF signals.

Based on this, we further adopt the VMD model to secondary decompose the IFM1, empirically setting the number of decomposed signals to 9, as a result, 8 mode components and 1 residual term can be obtained (as shown in Fig. 5). Although the mode decomposition numbers of the ICEEMDAN model have increased after secondary decomposition (as shown in Fig. 6), the FDE value of the high complexity IMF1 has been greatly reduced. The entropy values of the secondary decomposition signals are all less than 1, and the whole sample complexity has been reduced greatly. For example, according to the results in Table 3, the IMF number of the ICEEMDAN primary decomposition is 10, with a maximum FDE value of 1.4855 and an average FDE value of 0.4378. After secondary decomposition, the maximum and average FDE values are only 0.8822 and 0.3350. Thus, the secondary decomposition process effectively reduces the signal complexity.

Fig. 5
figure 5

The VMD secondary mode decomposition after the ICEEMDAN primary decomposition (IMF0 means the high-complexity mode signal that needs to be decomposed)

Fig. 6
figure 6

The curve of fuzzy dispersion entropy based on the ICEEMDAN and other EMD-type models

Table 3 Comparison of the FDE value before and after the secondary decomposition

Similarly, the mode number of other mode decomposition technologies also significantly increases after secondary decomposition, while the signal complexity gradually decreases. The corresponding results are shown in Appendix 3 of Figs. 16, 17, and 18.

Out-of-sample carbon price forecasting based on the proposed model

Parameter optimization

The PSO algorithm is designed to optimize the network structure of the LSTM-type hybrid carbon price forecasting models. Based on experience, we set the initial particle swarm to 20, the initial population to 2, the iterations number to 100, the learning factors c1 and c2 to 2, and the maximum and minimum weight to 1.2 and 0.8, respectively.

For the network design of the LSTM, more hidden layers and neurons can improve the training and forecasting performance but may lead to over-fitting or high training costs, while fewer hidden layers and neurons may lead to defects of weak predictive ability (Bengio et al. 2013; Shen et al. 2015). Therefore, we use the step-by-step experimental method to select the optimal parameters calculated by the PSO-LSTM experiment and empirically preset the number of hidden layers to 2, that is, test the validation errors of each neuron with 2, 4, 8, 16, 32, 64, 128, and 256, respectively. The parameter optimization is operated on the training set data that is classified above, with the first 90% of the training data for parameter optimization of the PSO-LSTM model, and the last 10% used to verify the forecasting performance. The results suggest (Table 4) that, when the hidden layer neurons are 2 and 256, the loss errors MAE and RMSE are the smallest, with 1.043 and 1.512, respectively. The correlation coefficient is 0.895, which is the biggest of the whole sample. The learning rate is 0.15. Therefore, we select the neurons of the PSO-LSTM model to 2 and 256 for out-of-sample forecasting.

Table 4 The parameter training of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM model

Analysis on the superiority of ICEEMDAN-type secondary decomposition models

Secondary decomposition is re-decomposing the high-complexity signals, and its purpose is to reduce the signal complexity and improve the potential forecasting ability. The findings regarding the ICEEMDAN secondary decomposition are as follows:

  1. (1)

    The forecasting errors of the secondary decomposition hybrid carbon price forecasting models are generally lower than those of the primary decomposition, and the correlation is also high. Therefore, the forecasting accuracy and reliability of the secondary decomposition models are relatively satisfactory, and the reduction of signal complexity and decomposition errors can improve the model’s out-of-sample forecasting performance; these findings have been proved in the research of Li and Liu (2023) and Yang et al. (2023a, 2023b). For example, the forecasting errors RMSE, MAE, and MAPE of the secondary decomposition model ICEEMDAN-FDE-VMD-LSTM are 0.2431, 0.1692, and 0.0038, respectively, which are significantly lower than the errors of the corresponding primary decomposition model ICEEMDAN-LSTM. The forecasting errors of the secondary decomposition model CEEMDAN-FDE-VMD-LSTM are 0.3361, 0.2476, and 0.0058, respectively, which are also lower than the errors of the corresponding primary decomposition model CEEMDAN-LSTM. Similarly, the forecasting errors of EEMD-FDE-VMD-LSTM and EMD-FDE-VMD-LSTM models are also lower than their corresponding primary decomposition model. Furthermore, according to Table 5, the forecasting ability of the secondary decomposition carbon price-forecasting model based on GRU and BP is also significantly better than their corresponding primary decomposition models.

  2. (2)

    The carbon price forecasting models based on ICEEMDAN technology have better forecasting performance, especially the secondary decomposition ICEEMDAN-type models present higher forecasting accuracy. The superiority of the ICEEMDAN-type hybrid models has been proved in previous studies of forecasting China’s carbon prices (Li et al. 2022; Zhu et al. 2023). For example, for the secondary decomposition hybrid forecasting models based on LSTM, GRU, and BP, the forecasting errors of the ICEEMDAN-type models are significantly lower than those of other CEEMDAN-type, EEMD-type, and EMD-type models. As shown in Fig. 7, the deviations between the predicted price and the real price of the ICEEMDAN-FDE-VMD-LSTM, ICEEMDAN-FDE-VMD-GRU, and ICEEMDAN-FDE-VMD-BP models are relatively low. The predicted price is basically consistent with the actual one, and the difference is small. So, the forecasting performance of ICEEMDAN-type models is superior to other CEEMDAN-type, EEMD-type, and EMD-type forecasting models.

Table 5 The carbon price forecasting errors and Pearson correlation of the ICEEMDAN-FDE-VMD-PSO-LSTM-EC hybrid model and its comparative models
Fig. 7
figure 7

The carbon price forecasting performance of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model and its comparative models

  1. (3)

    It is interesting to note that although the forecasting performance of the secondary decomposition hybrid models is better than that of the primary decomposition models, the evidence does not show the forecasting results of the primary decomposition models are completely better than the single models. For example, the forecasting errors RMSE, MAE, and MAPE of the EEMD-BP model are 3.4916, 2.9202, and 0.0618, respectively, and the forecasting errors of the EMD-BP model are 4.2496, 3.4595, and 0.0730, respectively. In contrast, the forecasting performance of the single model LSTM, GRU, and BP are significantly better than the EEMD-BP and EMD-BP primary decomposition models. Especially, the forecasting errors of the single model LSTM are lower than most primary decomposition models. One reason is that the primary decomposition process may lead to high-complexity signals and easily result in lower out-of-sample forecasting accuracy.

Analysis on the superiority of PSO-LSTM hybrid models

  1. (1)

    Compared with GRU-type and BP-type hybrid and single models, the LSTM-type hybrid models and single models have relatively low forecasting errors, which suggests high forecasting accuracy and high correlation with the real price. For example, the forecasting errors RMSE, MAE, and MAPE of the ICEEMDAN-FDE-VMD-LSTM model are 0.2431, 0.1692, and 0.0038, respectively, which are significantly lower than the errors of ICEEMDAN-FDE-VMD-GRU and ICEEMDAN-FDE-VMD-BP models. The forecasting errors of the ICEEMDAN-LSTM model are also lower than the corresponding errors of the ICEEMDAN-GRU and ICEEMDAN-BP models. Similarly, the forecasting errors of LSTM-type models based on other CEEMDAN, EEMD, and EMD technologies are also lower than those of GRU-type and BP-type models. Those findings proved that the LSTM-type models are more suitable for out-of-sample price forecasting in China’s carbon market, which is generally consistent with the conclusion of Zhou et al. (2022).

  2. (2)

    The forecasting performance of the LSTM model optimized by the PSO algorithm is obviously better than other models that have not been optimized. This indicates that the optimization of network parameters has played a positive role in improving the carbon price forecasting accuracy, that is, consistent with the optimization results proposed in the studies of Jianwei et al. (2021) and Zhang and Wu (2022). For example, the secondary decomposition model ICEEMDAN-FDE-VMD-PSO-LSTM, CEEMDAN-FDE-VMD-PSO-LSTM, EEMD-FDE-VMD-PSO-LSTM, and EMD-FDE-VMD-PSO-LSTM have lower forecasting errors than other comparative models that do not be optimized. Especially the forecasting errors RMSE, MAE, and MAPE of the ICEEMDAN-FDE-VMD-PSO-LSTM model are 0.2176, 0.1529, and 0.0034, respectively, which are the lowest among all non-error corrected hybrid models.

Discussions on the advantages of the error-corrected secondary decomposition models

The error-corrected process improves the final forecasting performance by playing the role of the predicted error term. The results in Table 5 show that the forecasting errors of the error-corrected secondary decomposition models are relatively lower compared with other models, and the correlation is also high, consequentially, the effectiveness of the error-corrected carbon price forecasting models can be proven. For example, the forecasting errors RMSE, MAE, and MAPE of the EMD-FDE-VMD-PSO-LSTM-EC model are 0.1624, 0.0678, and 0.0014, respectively; the forecasting errors of the EEMD-FDE-VMD-PSO-LSTM-EC model are 0.1394, 0.0645, and 0.0014, respectively; and the forecasting errors of the CEEMDAN-FDE-VMD-PSO-LSTM-EC model are 0.1337, 0.0616, and 0.0013, respectively, which are significantly lower than the forecasting performance of other comparative models without error corrected.

Discussions on the forecasting results of the ICEEMDAN-FDE-VMD-PSO-LSTM-EC model

Based on the above discussion, we find that, firstly, the hybrid carbon price forecasting models based on secondary decomposition technology have better forecasting accuracy and price correlation than other primary decomposition hybrid models. Especially, the ICEEMDAN-type hybrid models have higher forecasting ability, demonstrating the necessity of selecting ICEEMDAN and VMD technologies for the primary and secondary decomposition of original carbon price in this paper. Secondly, the forecasting performance of the deep learning LSTM-type secondary decomposition models is relatively better than the GRU-type and BP-type models, especially the LSTM-type secondary decomposition models optimized by the PSO algorithm have higher accuracy. Thirdly, the forecasting errors of the error-corrected hybrid forecasting models are significantly lower than other comparative models. Noteworthy, the forecasting errors of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model constructed in this article are the smallest among all the comparative models, and the correlation is also the maximum value. As shown in Fig. 8, the proposed model has relatively fewer forecasting errors, and the error deviation is relatively stable.

Fig. 8
figure 8

The dynamic MAPE carbon price forecasting errors of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model and its comparative models

Retesting of the proposed model based on forecasting term differences

To test the stability of the proposed model on different forecasting periods, this paper readjusts the test set and retests the out-of-sample forecasting performance of the proposed model. That is, the data from the last 750, 500, and 250 consecutive trading days of the carbon price are intercepted as the test set, and the rest samples are used for training. Several models with better forecasting performance are conducted to test the long-term, medium-term, and short-term carbon price forecasting effects. The forecasting results are shown in Tables 6, 7, and 8, respectively.

Table 6 The carbon price forecasting errors and Pearson correlation of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model and its comparative models in the long-term periods (750 daily trading data)
Table 7 The carbon price forecasting errors and Pearson correlation of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model and its comparative models in the medium-term periods (500 daily trading data)
Table 8 The carbon price forecasting errors and Pearson correlation of the proposed ICEEMDAN-FDE-VMD-PSO-LSTM-EC model and its comparative models in the short-term periods (250 daily trading data)

The results show that, firstly, as for the secondary decomposition hybrid models, the long-term forecasting performance has better forecasting performance, while the short-term forecasting is relatively poor (the forecasting performance of each period is shown in Figs. 9, 10, and 11). For example, the average forecasting errors RMSE, MAE, MAPE, and the correlation of the secondary decomposition models in the long-term period are 0.2516, 0.1726, 0.0047, and 0.9996, respectively (as shown in Table 6); the average errors and correlation of the secondary decomposition models in the medium-term period are 0.2728, 0.3255, 0.0060, and 0.9943, respectively (as shown in Table 7); and the average errors and correlation of the secondary decomposition models in the short-term period are 0.7645, 0.6656, 0.0228, and 0.9143 respectively (as shown in Table 8). This evidence reflects that the secondary decomposition hybrid models are better for performing price forecasting in long-term periods. One possible reason is that the secondary decomposition process can reduce the decomposition errors and signal noise effectively (Li et al. 2022; Sun et al. 2022), especially the large amount of IMF predicted data can maximally offset the forecasting errors during the price integration stage, so as to improve the forecasting accuracy.

Fig. 9
figure 9

The one-step-forward out-of-sample forecasting performance of the proposed model in the long-term periods (750 daily trading data)

Fig. 10
figure 10

The one-step-forward out-of-sample forecasting performance of the proposed model in the medium-term periods (500 daily trading data)

Fig. 11
figure 11

The one-step-forward out-of-sample forecasting performance of the proposed model in the short-term periods (250 daily trading data)

Secondly, the ICEEMDAN-FDE-VMD-PSO-LSTM-EC model constructed in this paper has the best forecasting performance and accuracy in all the periods, especially in the long-term period, which is significantly superior to other comparative models. This conclusion is completely consistent with the findings in the “Discussions on the forecasting results of the ICEEMDAN-FDE-VMD-PSO-LSTM-EC model” section above. From the error curve depicted in Fig. 12, it is obvious that the dynamic errors of the proposed model are relatively small compared with other comparative models; this finding reflects the higher fitting ability between the predicted price and the real price.

Fig. 12
figure 12

The dynamic MAPE carbon price forecasting errors of the proposed model in the long-medium short-term periods

Thirdly, as for the primary decomposition models and the single models, the long-term forecasting performance is the worst. For example, in the long-term forecasting period, the average errors RMSE, MAE, and MAPE of the ICEEMDAN-LSTM, ICEEMDAN-GRU, LSTM, and GRU are 0.8757, 0.6308, 0.0157, and 0.9971, respectively. Those errors are significantly higher than the medium-term and short-term forecasting performance, and the correlation is also lower than other periods. This evidence maintains that compared with the long-term forecasting advantage of the secondary decomposition models, the primary decomposition carbon price forecasting models are more suitable for medium and short-term price forecasting.

Conclusions

The carbon price not only reflects the supply and demand of market allowances but also reveals the nonlinear price formation mechanism. As of July 2023, it has been 2 years since China’s power industry was formally included in the national carbon emissions trading system (ETS). With the development of the carbon market, more and more Chinese companies have begun to incorporate the emissions cost into their daily business decision. Therefore, this study focuses on the emerging China carbon market and constructs a novel error-corrected secondary decomposition hybrid model integrated fuzzy dispersion entropy and deep learning paradigm to forecast the carbon price. The main conclusions are as follows:

Firstly, the forecasting performance of the ICEEMDAN-type secondary decomposition hybrid carbon price forecasting models is significantly better than primary decomposition models, and other CEEMDAN, EEND, and EMD-type secondary decomposition hybrid models. These findings show the signal decomposition efficiency of ICEEMDAN technology is relatively high, which can reduce the decomposition errors and improve the forecasting accuracy. Furthermore, the FDE algorithm plays a positive role in identifying high-complexity signals for price forecasting. These conclusions provide modeling ideas for revealing the formation mechanism of complex carbon prices. Secondly, the deep learning paradigm of LSTM-type models optimized by the PSO algorithm has obvious advantages in fitting and forecasting China’s carbon price, the advantage of the LSTM model in dealing with financial time series has been proved. Thirdly, the error-corrected method for improving the forecasting accuracy has achieved satisfactory results. Especially, the ICEEMDAN-FDE-VMD-PSO-LSTM-EC model presents the best forecasting ability, which can provide more accurate modeling technology for investors to take carbon market trading. Finally, the proposed model has obvious superiority in different forecasting periods, particularly, the long-term forecasting for 750 consecutive trading prices is outstanding. This result shows that the proposed model is more suitable for long-term price forecasting in China’s carbon market.

The above conclusions provide a valuable reference for judging the price characteristics of China’s carbon market and formulating effective market regulations. However, as for the applied aspect, there are some limitations. (1) Due to the limited trading data of China’s national carbon emissions trading system, we only use the Hubei carbon price to represent the situation of the whole China’s carbon market, and future studies can be developed as the enrichment of the representative price. (2) More additional factors, such as the price and consumption of fossil fuels, the trading volume of carbon allowance, and the air pollution index, can also be included in future price forecasting studies.