1 Introduction

As the most important global energy source, crude oil has become a pivotal strategic resource. Owing to the COVID-19 pandemic (Li et al., 2021), the demand for crude oil saw an unprecedented decline of 9.3% (a decrease of 9.1 million barrels per day) in 2020. Nevertheless, according to the 2021 BP Statistical Review of World Energy (BP China, 2021), crude oil still accounts for the largest share (31.2%) of global primary energy consumption in 2020. The crude oil price is closely related to socioeconomic factors, international politics, and national security (Sun et al., 2020). For example, the three oil crises (in 1973, 1978, and 1990, respectively) caused by surges in crude oil prices led to a reduction in production and a significant slowdown in economic growth in industrialized countries worldwide (Qin, 2020). Therefore, the prediction of crude oil prices (COP) is crucial for investors, companies, and governments. The forecasting of the COP is also a challenging task because the COP can be influenced by multiple factors, including inflation, exchange rates, supply and demand, international politics, wars, and pandemics (Chai et al., 2018; You et al., 2021). COP time series have many characteristics, including intrinsic nonlinearity, randomness, sudden structural changes, volatility, and a chaotic nature (Cerqueti and Fanelli, 2021; Zhao et al., 2021). With the continuous development of machine learning (ML) models and optimization algorithms, the prediction of COP has become increasingly accurate. Because the international crude oil futures prices (COFP) are generally regarded as the reference prices for the international COP (Zhang et al., 2021), an increasing number of researchers have engaged themselves in the prediction of the COFP.

Based on a review of existing studies, forecasting methods applied to the COFP can be divided into two categories, i.e., single and hybrid models.

The single models used for COFP can be divided into ML and traditional time-series analysis models. Traditional time-series analysis methods such as the vector autoregression model (VAR) (Mirmirani and Li, 2004), generalized autoregressive conditional heteroskedasticity (GARCH) model (Agnolucci, 2009), and autoregressive integrated moving average model (ARIMA) (He et al., 2010) are widely used for COP forecasting. Although these models are effective in capturing the linear information in a time series, they fail to capture its nonlinear information to a certain extent (Abdollahi and Ebrahimi, 2020). Therefore, these models are not very effective when predicting the COFP owing to the nonlinear characteristics of COFP (Sun et al., 2018). Furthermore, these models have many strict assumptions (Li et al., 2019), but these assumptions are often difficult to practically satisfy. To compensate for the shortcomings of traditional time-series analysis models, an increasing number of ML methods have been used to forecast the COFP. ML models are not only free of strict assumptions, but they can also capture nonlinear information in a time series (Abedin, Chi, et al., 2021; Abedin, Moon, et al., 2021). Initially, a multilayer perceptron (MLP) (Guotai et al., 2017a), support vector machine (SVM) (Abedin et al., 2019; Abedin et al., 2019; Guo et al., 2012; Li, Chen, et al., 2020; Li, Wen, et al., 2020), back propagation neural network (BPNN) (Mingming and Jinliang, 2012), and extreme learning machine (ELM) (Wang, Athanasopoulos, et al., 2018) were used to improve the accuracy of COFP forecasts. In recent years, deep learning methods have attracted extensive attention owing to their excellent performance in terms of prediction accuracy and stability (Abedin, Chi, et al., 2021; Abedin, Moon, et al., 2021; Lv et al., 2022; Wang et al., 2022; Wang, Athanasopoulos, et al., 2018; Wang, Du, et al., 2018). Long short-term memory (LSTM) networks (Zhang et al., 2020), deep belief networks (DBN) (Zhang and Ci, 2020), and bidirectional long short-term memory networks (BiLSTM) (Abedin, Moon, et al., 2021) are gradually used in COFP predictions. However, there are still certain drawbacks in using ML models to predict the COFP, such as their poor generalization ability and local optimum problems (Wang et al., 2020).

To address these issues, many hybrid models have been proposed for COFP forecasting. A hybrid model is a combination of several single models through an optimization algorithm. It can inherit the advantages of every single model, thereby improving the prediction accuracy and stability (Guotai et al., 2017b; Hu et al., 2021; Jiang et al., 2020; Khalilpourazari and Doulabi, 2021). In existing studies, the most commonly used optimization algorithms are the genetic algorithm (GA) (Yang et al., 2019), particle swarm optimization (PSO) (Ribeiro et al., 2021), ant lion optimization algorithm (ALO) (P. et al., 2018), frog-leaping algorithm (FLA) (He et al., 2021), and whale optimization algorithm (WOA) (Lin and Zhang, 2021). Only one optimization objective can be set when these optimization algorithms are used to build hybrid models. To simultaneously consider more optimization objectives, researchers have proposed many multiobjective optimization algorithms, such as the multiobjective ant lion optimization algorithm (MOALO) (Wang, Du, et al., 2018) and multiobjective whale optimization algorithm (MOWOA) (Wang et al., 2017). Due to the fact hybrid models constructed using multiobjective optimization algorithms can improve forecasting stability and accuracy, an increasing number of researchers are applying such models to COFP forecasting (Abdollahi and Ebrahimi, 2020; Chai et al., 2018; Zhao et al., 2021). The COFP time series has chaotic characteristics (Wang et al., 2020). In the present study, the Lyapunov exponents of the Brent crude oil futures price (BCOFP) and the West Texas Intermediate crude oil futures price (WCOFP) are also calculated, the results of which (i.e., 0.0376 and 0.0036, respectively, both of which are greater than zero) indicate the chaotic characteristics of these time series. However, existing studies cannot sufficiently extract chaotic information from a time series to improve the COFP prediction accuracy. The characteristics and representative studies of the COFP forecasting models are listed in Table 1.

Table 1 Summary of available studies

In this paper, a hybrid prediction model applying time varying filtering for empirical mode decomposition (TVF_EMD) and a multiobjective slime mold algorithm (MOSMA), called TVF_EMD_MOSMA, is proposed for COFP forecasting. Based on this framework, the results of the point forecast (PF) and interval forecast (IF) are derived. First, we decompose the COFP time series using TVF_EMD and eliminate redundant noise series. Second, we obtain the PF results of COFP based on TVF_EMD_MOSMA. Finally, we apply the maximum likelihood estimate (MLE) to estimate the probability density function of the COFP residuals based on the fitting errors. Thereafter, we apply MOSMA to determine the confidence interval adjustment coefficient (CIAC). Subsequently, we obtain the IF results of COFP through the PF results, CIAC, and optimal distribution. The contributions of this study are as follows:

  1. (1)

    A new hybrid prediction model (TVF_EMD_MOSMA) for COFP is presented. The results of comparative experiments show that TVF_EMD_MOSMA exhibits high accuracy and stability of PF and IF for COFP. The predictive performance of TVF_EMD_MOSMA is efficient and robust.

  2. (2)

    A novel data denoising method (TVF_EMD) is used for COFP data processing. Based on the idea of “decomposition and combination,” the COFP time series is reconstructed after removing the high-frequency noise using TVF_EMD, thus improving the accuracy of the prediction model.

  3. (3)

    The proposed MOSMA is applied for COFP prediction for the first time. MOSMA applies an archive component to store all non-dominated Pareto solutions and implements multiobjective optimization based on non-dominated sorting and the crowding distance mechanism. Experimental results show that MOSMA can effectively enhance the prediction accuracy and stability of the hybrid model for COFP prediction.

  4. (4)

    To obtain an IF with a narrower width and higher prediction accuracy, the CIAC determined using MOSMA is added. The contradiction between the interval prediction accuracy and prediction interval width was balanced, significantly improving the IF performance.

The remainder of this paper is organized as follows. Section 2 introduces the proposed hybrid prediction model and its submodels. Section 3 describes the performance evaluation metrics and data used in the study. Section 4 presents the experimental results. Section 5 further discusses TVF_EMD_MOSMA and describes the results of the sensitivity analysis, the accuracy and stability improvement ratio, and forecasting effect analysis. Section 6 discusses the practical applications, limitations, and scope of future research work. Finally, Sect. 7 presents the conclusions of this study.

2 Research method

In this section, the basic models, interval estimation theory, and proposed hybrid prediction model are introduced.

2.1 Time varying filter empirical mode decomposition

Empirical mode decomposition (EMD) can decompose a COFP time series into a finite number of intrinsic mode function (IMF) signals based on the time-scale characteristics of the COFP time series (Wang, Athanasopoulos, et al., 2018; Wang, Du, et al., 2018). Since the decomposed time series facilitates the extraction of time-series features, EMD is extensively used in time-series forecasting (Wang and Wang, 2020). However, EMD is plagued by marginal effects and pattern confusion which leads to reduced accuracy in a time-series decomposition. Recursive empirical mode decomposition (REMD) has been proposed to alleviate the problems of marginal effects and pattern confusion in EMD (Wang and Wang, 2020). Recently, TVF_EMD was proposed to simultaneously solve the modal separation and intermittent operation problems. Simultaneously, the physical meaning of the model parameters in TVF_EMD is clear, which facilitates parameter selection (Wang, Niu, et al., 2021). The detailed calculation process of TVF_EMD is shown below.

Step 1: The frequency \(f^{\prime}(p)\) and instantaneous amplitude \(\psi (p)\) of the COFP time series \(o\left( p \right)\) are calculated using the Hilbert transform.

$$ f^{\prime}_{1} (p) = \frac{{\varsigma_{1} (p)}}{{2m_{1}^{2} (p) - 2m_{1} (p)m_{2} (p)}} + \frac{{\varsigma_{2} (p)}}{{2m_{1}^{2} (p) + 2m_{1} (p)m_{2} (p)}} $$
(1)
$$ f^{\prime}_{2} (p) = \frac{{\varsigma_{1} (p)}}{{2m_{2}^{2} (p) - 2m_{1} (p)m_{2} (p)}} + \frac{{\varsigma_{2} (p)}}{{2m_{2}^{2} (p) + 2m_{1} (p)m_{2} (p)}} $$
(2)

In Eqs. (1) and (2), \(\varsigma_{1} (p)\) and \(\varsigma_{2} (p)\) are obtained by interpolating \(f^{\prime}\left( {\left\{ {p_{max} } \right\}} \right)\psi^{2} \left( {\left\{ {p_{max} } \right\}} \right)\) and \(f^{\prime}\left( {\left\{ {p_{\min } } \right\}} \right)\psi^{2} \left( {\left\{ {p_{\min } } \right\}} \right)\). In addition, \(\psi \left( {\left\{ {p_{\min } } \right\}} \right)\) and \(\psi \left( {\left\{ {p_{\max } } \right\}} \right)\) are the local minimum and maximum of \(\psi \left( p \right),\) respectively; \(m_{1} \left( p \right) = {{\left( {\mu_{1} \left( p \right) + \mu_{2} \left( p \right)} \right)} \mathord{\left/ {\vphantom {{\left( {\mu_{1} \left( p \right) + \mu_{2} \left( p \right)} \right)} 2}} \right. \kern-\nulldelimiterspace} 2}\) and \(m_{2} \left( p \right) = {{\left( {\mu_{2} \left( p \right) - \mu_{1} \left( p \right)} \right)} \mathord{\left/ {\vphantom {{\left( {\mu_{2} \left( p \right) - \mu_{1} \left( p \right)} \right)} 2}} \right. \kern-\nulldelimiterspace} 2}\) represent the instantaneous mean value and instantaneous envelope; \(\mu_{1} \left( p \right)\) and \(\mu_{2} \left( p \right)\) are obtained by interpolating \(\psi \left( {\left\{ {p_{\max } } \right\}} \right)\) and \(\psi \left( {\left\{ {p_{\min } } \right\}} \right)\).

$$ f^{\prime}_{bis} \left( p \right) = \frac{{f^{\prime}_{1} \left( p \right) + f^{\prime}_{2} \left( p \right)}}{2} = \frac{{\mu_{2} \left( p \right) - \mu_{1} \left( p \right)}}{{4m_{1} \left( p \right)m_{2} \left( p \right)}} $$
(3)

In Eq. (3), \(f^{\prime}_{bis} \left( p \right)\) represents the local cut-off probability. Arrange \(f^{\prime}_{bis} \left( p \right)\) to solve the intermission problem: Define a signal as \(k\left( p \right) = \cos \left[ {\int {f^{\prime}_{bis} \left( p \right)} \,dp} \right]\) and use the extreme point of \(k\left( p \right)\) as the node. By approximating the COFP time series through a B-spline interpolation, the approximate result \(z\left( p \right)\) is obtained.

Step 2: Determine the cut-off condition, \(\delta \left( p \right)\). If \(\delta (p) \le \xi\), \(o\left( p \right)\) is considered an IMF. Otherwise, set \(o_{1} \left( p \right) = o\left( p \right) - z\left( p \right)\) and repeat Steps 1 and 2.

Step 3: Using these above steps, decompose the COFP time series into multiple IMFs.

2.2 Volterra adaptive filter based on phase space reconstruction

To extract the chaotic information in the COFP time series, the Volterra adaptive filter is used. The following introduces the principle of the phase space reconstruction, the calculation of the optimal embedding dimension and delay time, and the principle of the Volterra adaptive filter.

2.2.1 Phase space reconstruction

Takens theorem states that a single variable chaotic time series can be reconstructed into a multidimensional phase space, and this space can contain the chaotic features of the original time series. Thus, the laws and properties of a chaotic time series can be accurately captured (Lin and Zhang, 2021).

Suppose that the COFP time series is \(\left\{ {o(p),\;p = 1,2,...,N} \right\}\) and that an m-dimensional vector is formed through the delay time:\(O\left( p \right) = \left[ {o\left( p \right),o\left( {p + \tau } \right),...,o\left( {p + \left( {m - 1} \right)\tau } \right),p = 1,2,...,M} \right]\), where \(m\) is the optimal embedding dimension, \(\tau\) is the predicted delay time, \(O\left( p \right)\) is the phase point in m-dimensional phase space, and \(M\) is the number of phase points,\(M = N - (m - 1)\tau\). In addition,\(\left\{ {O\left( p \right),\,\;p = 1,2,...,M} \right\}\) describe the evolutionary trajectory of a dynamical system in the phase space, and thus, the chaotic behavior of a dynamical system can be studied in reconstructed m-dimensional phase space.

2.2.2 Determination of delay time

In this study, the mutual information (MI) method is used to determine the delay time of the COFP time series. The probability of occurrence of \(o\left( k \right)\) in the time series \(\left\{ {o(p),\;p = 1,2,...,N} \right\}\) is defined as \(P\left( {o\left( k \right)} \right)\); the probability of the occurrence of \(O\left( k \right)\) in time series \(\left\{ {O\left( p \right),\;p = 1,2,...,N} \right\}\) is defined as \(P\left( {O\left( {k + \tau } \right)} \right)\); and the joint probability of occurrence of \(P\left( {o\left( k \right)} \right)\) and \(P\left( {O\left( {k + \tau } \right)} \right)\) in the two series is defined as \(P\left( {o\left( k \right),O\left( {k + \tau } \right)} \right)\), where \(P\left( {o\left( k \right)} \right)\) and \(P\left( {O\left( {k + \tau } \right)} \right)\) can be solved based on the probability of occurrence in their respective time series. The joint probability \(P\left( {o\left( k \right),O\left( {k + \tau } \right)} \right)\) can then be obtained by counting the lattice on the plane \(\left( {o\left( p \right),O\left( {p + \tau } \right)} \right)\). The MI function is as follows.

$$ I\left( \tau \right) = \sum\limits_{i = 1}^{N} {P\left( {o\left( k \right),O\left( {k + \tau } \right)} \right)} \cdot \log_{2} \frac{{P\left( {o\left( k \right),O\left( {k + \tau } \right)} \right)}}{{P\left( {o\left( k \right)} \right) \cdot P\left( {O\left( {k + \tau } \right)} \right)}} $$
(4)

We apply the \(\tau\) when the MI function \(I\left( \tau \right)\) takes the first minimal value point as the delay time.

2.2.3 Determination of the optimal embedding dimension

In this study, the false nearest neighbor method (FNN) is used to determine the optimal embedding dimension. The calculation procedure for the FNN is as follows.

  • Step 1: In the embedding space with embedding dimension m, find the Euclidean distance nearest neighbor of all points. The Euclidean distance between \(O\left( h \right)\) and \(O\left( l \right)\) is calculated using Eq. (5).

    $$ \left\| {O\left( h \right),O\left( l \right)} \right\| = \left[ {\sum\limits_{l = 0}^{m - 1} {\left( {o\left( {h + g\tau } \right) - o\left( {l + g\tau } \right)} \right)}^{2} } \right]^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}}} $$
    (5)
  • Step 2: When any pair of nearest neighbors satisfies the following criterion, it is an FNN point.

    $$ \left[ {\frac{{R_{m + 1}^{2} \left( {h,l} \right) - R_{m}^{2} \left( {h,l} \right)}}{{R_{m + 1}^{2} \left( {h,l} \right)}}} \right]^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}}} = \frac{{\left| {o(h + m\tau ) - o(l + m\tau )} \right|}}{{R_{m} \left( {h,l} \right)}} \ge R_{tol} $$
    (6)

    In Eq. (6), \(R_{m + 1}^{2} \left( {h,l} \right)\) and \(R_{m}^{2} \left( {h,l} \right)\) represent the squared distance between any pair of nearest neighbors at the optimal embedding dimensions of m + 1 and m, respectively, and \(R_{tol}\) represents the threshold value.

  • Step 3: When \(m = 1\), the ratio of the FNN points to the total number of phase points is calculated, and m is gradually increased until the ratio is less than 5%. The chaotic attractor geometry is considered to be completely opened and \(m\) is the optimal embedding dimension at this time.

2.2.4 Volterra adaptive filter prediction model

The Volterra adaptive filter prediction model can predict a chaotic time series using only a small sample of data, and it can automatically track the chaotic motion trajectory. This model has high prediction accuracy for a chaotic time series (Qiao et al., 2020).

Assuming that the input variable is \(O\left( p \right) = \left[ {o\left( p \right),o\left( {p - \tau } \right),...,o\left( {p - \left( {m - 1} \right)\tau } \right)} \right]\) and the output variable is \(y\left( p \right) = o\left( {p + 1} \right)\), the Volterra adaptive second-order filtering model is as follows:

$$ \hat{o}\left( {p + 1} \right) = h_{0} + \sum\limits_{i = 0}^{m - 1} {h_{1} } \left( i \right)o\left( {p - i\tau } \right) + \sum\limits_{i = 0}^{m - 1} {\sum\limits_{j = i}^{m - 1} {h_{2} \left( {i,j} \right)} } {\kern 1pt} o\left( {p - i\tau } \right)o\left( {p - j\tau } \right) $$
(7)

The coefficient and input vectors are expressed through Eqs. (8) and (9), respectively.

$$ H\left( p \right) = \left[ {h_{0} ,h_{1} \left( 0 \right),...,h_{1} \left( {m - 1} \right),h_{2} \left( {0,0} \right),h_{2} \left( {0,1} \right),...,h_{2} \left( {m - 1,m - 1} \right)} \right]^{T} $$
(8)
$$ Z\left( p \right) = \left[ {1,o\left( p \right),o\left( {p - \tau } \right),...,o\left( {p - \left( {m - 1} \right)\tau } \right),o^{2} \left( p \right),o\left( p \right)o\left( {p - \tau } \right),...,o^{2} \left( {p - \left( {m - 1} \right)\tau } \right)} \right]^{T} $$
(9)

Based on Eqs. (8) and (9), Eq. (7) can be expressed as

$$ \hat{o}\left( {p + 1} \right) = {\mathbf{H}}^{T} \left( p \right){\mathbf{Z}}\left( p \right) $$
(10)

2.3 ARIMA model

The ARIMA model is a traditional time-series analysis model that is widely used for time-series forecasting (Ribeiro et al. 2021). The model provides a better prediction for a linearly smoothed time series.

Here, \(ARIMA\left( {p,k,q} \right)\) is expressed through Eq. (11).

$$ \left\{ {\begin{array}{*{20}l} {\Phi \left( \Lambda \right)\nabla^{k} x_{t} = \Theta \left( \Lambda \right)\varepsilon_{t} } \hfill \\ \begin{gathered} E\left( {e_{t} } \right) = 0, \hfill \\ Var\left( {e_{t} } \right) = \sigma_{e}^{2} , \hfill \\ E\left( {e_{t} e_{s} } \right) = 0,s \ne t \hfill \\ \end{gathered} \hfill \\ {E\left( {e_{s} e_{t} } \right) = 0,\forall s < t} \hfill \\ \end{array} } \right. $$
(11)

In Eq. (11), \(\nabla^{k} = \left( {1 - \Lambda } \right)^{k}\);\(\Phi \left( \Lambda \right) = 1 - \phi_{1} \Lambda - \cdot \cdot \cdot - \phi_{p} \Lambda^{p}\) indicates the autoregressive coefficient polynomial in ARIMA, and \(\Theta \left( \Lambda \right) = 1 - \theta_{1} \Lambda - \cdot \cdot \cdot - \theta_{q} \Lambda^{q}\) indicates the moving smoothing coefficient polynomial.

2.4 ELM

ELM is a feedforward neural network with a single hidden layer. This model structure comprises an input layer, implicit layer, and output layer, similar to an ANN (Lin and Zhang, 2021). The layers are connected to each other using a characteristic mapping function. Information from the input layer is processed by the implicit layer and passed to the output layer, which then derives the calculated value according to the mapping function. Although the random initialization of the parameters improves the generalization of the ELM, it also requires the ELM to add a large number of nodes to achieve accurate training. For large samples, several nodes consume an excessive number of computational resources and may cause overfitting.

2.5 Bidirectional long-short term memory model

The bidirectional long-short term memory model (BiLSTM) is divided into two independent LSTMs, and the input sequences are input into the two LSTMs in forward and inverse order for feature extraction. LSTM is a recurrent neural network, and it is proposed to solve the gradient disappearance and explosion problems. The design concept of BiLSTM is to simultaneously obtain the characteristics of the data with information between the past and future (Wang et al., 2020). BiLSTM outperforms a single LSTM approach in terms of efficiency and performance.

2.6 MOSMA

The slime mold algorithm (SMA) is a population-based metaheuristic algorithm proposed by Li, Chen, et al. (2020). The slime mold can establish the best pathway for connecting food in a relatively superior manner through a combination of positive and negative feedback. Therefore, SMA adjusts the search path and obtains the optimal result based on a positive and negative feedback system. SMA simulates three different morphotypes in the hunting process of slime mold: finding food, wrapping food, and approaching the food morphotype. The mathematical model of SMA is as follows:

$$ \left\{ {\begin{array}{*{20}l} {X_{nl} = R_{1} \cdot \left( {Ub - Lb} \right) + Lb,} \hfill & {if\left( {R_{1} < 0.03} \right)} \hfill \\ {X_{nl} = X_{b} \left( k \right) + Vb \cdot \left( {W \cdot X_{A} \left( k \right) - X_{B} \left( k \right)} \right),} \hfill & {if\left( {R_{2} < p} \right)} \hfill \\ {X_{nl} = Vc \cdot X\left( k \right),} \hfill & {if\left( {R_{2} \ge p} \right)} \hfill \\ \end{array} } \right. $$
(12)

where

$$ \left\{ {\begin{array}{*{20}l} {W\left( {SmellIndex\left( j \right)} \right) = \left\{ {\begin{array}{*{20}l} {1 + R_{2} \cdot \log \left( {\frac{oF - M\left( j \right)}{{oF - wF}} + 1} \right),} \hfill & {condition} \hfill \\ {1 - R_{2} \cdot \log \left( {\frac{oF - M\left( j \right)}{{oF - wF}} + 1} \right),} \hfill & {others} \hfill \\ \end{array} } \right.} \hfill \\ {SmellIndex = sort\left( M \right)} \hfill \\ {Vb = \left[ { - l,l} \right]} \hfill \\ {l = \arctan h\left( { - \left( {\frac{k}{Max\_t}} \right) + 1} \right)} \hfill \\ {p = \tanh \left| {M\left( j \right) - BF} \right|} \hfill \\ \end{array} } \right. $$
(13)

In Eqs. (12) and (13), \(X\) indicates the current position of the slime mold, \(k\) indicates the number of current iterations, \(Lb\) and \(Ub\) represent the lower and upper bounds of the search range, \(Vb\) is the vibration parameter, \(W\) represents the weight of the slime mold, \(bF\) denotes the optimal fitness, \(wF\) denotes the worst fitness, and \(BF\) denotes the optimal fitness in all iterations. \(SmellIndex\) denotes the sequence of sorted fitness values (ascending in the minimum value problem), and \(R_{1}\) and \(R_{2}\) denote a random value within the range of [0,1].

SMA can effectively solve many practical problems (Li, Chen, et al., 2020; Li, Wen, et al., 2020). MOSMA is proposed to effectively achieve multiple goals. It is a multiobjective improvement algorithm based on the SMA algorithm, using an elite non-dominated sorting method to estimate the Pareto optimal solutions. In addition, to ensure the diversity of Pareto optimal solutions, MOSMA added a crowding distance mechanism to increase the coverage of all objectives (Premkumar et al., 2021). The steps for an elitist non-dominated sorting approach are as follows:

  • Step 1: Calculate the non-dominated results of the objective function.

  • Step 2: Sort the non-dominated results using non-dominated sorting.

  • Step 3: Find the non-dominated ranking of all non-dominated results to determine the optimal solution.

The crowding distance (\(cd_{i}^{n}\)) is calculated through the following formula:

$$ cd_{i}^{n} = \frac{{f_{i}^{{{\kern 1pt} n + 1}} - f_{i}^{{{\kern 1pt} n - 1}} }}{{f_{i}^{{{\kern 1pt} \max }} - f_{i}^{{{\kern 1pt} \min }} }} $$
(14)

In Eq. (14), \(f_{i}^{{{\kern 1pt} \max }}\) and \(f_{i}^{{{\kern 1pt} \min }}\) denote the maximum and minimum values of the \(i - th\) objective function, respectively. The non-dominated ranking and crowding distance (\(cd_{i}^{n}\)) are used to determine the optimal solution.

To optimize the prediction accuracy and robustness of the hybrid prediction model, the multiobjective functions are defined as follows:

$$ \left\{ {\begin{array}{*{20}c} {f_{1}^{PF} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {{{\left( {OP_{i} - FP_{i} } \right)} \mathord{\left/ {\vphantom {{\left( {OP_{i} - FP_{i} } \right)} {OP_{i} }}} \right. \kern-\nulldelimiterspace} {OP_{i} }}} \right| \times 100\% } } \\ {f_{2}^{PF} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {P_{i} - \overline{P}} \right)}^{2} } ,P_{i} = OP_{i} - FP_{i} } \\ \end{array} } \right. $$
(15)

In Eq. (15), \(OP_{i}\) and \(FP_{i}\) denote the \(i - th\) actual and predicted COPs, respectively. In addition, \(\overline{P}\) denotes the mean of \(P_{i}\).

2.7 Interval forecasting method

Although the PF provides an explicit value, the reliability of providing this value is not given by the PF. Therefore, the IF is proposed to fill in this gap. The IF can provide considerable information to users regarding forecast results (Sun et al., 2020). The common distributions in the field of energy price forecasting are the Gumbel, generalized extreme value (GEV), and gamma distribution (Jiang et al., 2021; Wang, Niu, et al., 2021). In this study, the MLE is used to fit the optimal distribution of the prediction error series. However, the interval width and prediction accuracy of the IF are irreconcilable. To obtain an IF with a narrow width and high prediction accuracy, the CIACs (\(\zeta_{1}\) and \(\zeta_{2}\)) are added (Jiang et al., 2021), and are determined using MOSMA. The calculation formulas of the \(i - th\) day’s IF are as shown below:

$$ Ub_{i}^{{\left( {1 - \alpha } \right)}} = FP_{i} + \zeta_{1} \cdot Dist_{{1 - \left( {{\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2}} \right)}}^{ * } \cdot \sqrt {Var\left( {E_{TVF\_EMD\_MOSMA} } \right)} $$
(16)
$$ Lb_{i}^{{\left( {1 - \alpha } \right)}} = FP_{i} - \zeta_{2} \cdot Dist_{{\left( {{\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2}} \right)}}^{ * } \cdot \sqrt {Var\left( {E_{TVF\_EMD\_MOSMA} } \right)} $$
(17)

where \(Lb_{i}^{{\left( {1 - \alpha } \right)}}\) and \(Ub_{i}^{{\left( {1 - \alpha } \right)}}\) are the lower and upper bounds of the confidence interval at the confidence level (\(1 - \alpha\)), \(FP_{i}\) denotes the \(i - th\) day’s PF result, and \(Dist_{{1 - \left( {{\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2}} \right)}}^{ * }\) and \(Dist_{{\left( {{\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2}} \right)}}^{ * }\) are the \(1 - \left( {{\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2}} \right)\) quantile and \(\left( {{\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2}} \right)\) quantile of the optimal distribution, respectively. In addition, \(E_{TVF\_EMD\_MOSMA}\) is the prediction error sequence of TVF_EMD_MOSMA, and \(\zeta_{1}\) and \(\zeta_{2}\) are the CIACs determined through MOSMA. The objective functions for the IF in MOSMA can be defined as follows:

$$ \left\{ {\begin{array}{*{20}l} {f_{1}^{IF} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{IW_{i}^{(1 - \alpha )} }}{{\max \left\{ {OP_{i} ,i = 1,2,...,n} \right\} - \min \left\{ {OP_{i} ,i = 1,2,...,n} \right\}}}} \right|} } \hfill \\ {f_{2}^{IF} = - \sum\limits_{i = 1}^{n} {A_{i}^{(1 - \alpha )} } } \hfill \\ {A_{i}^{(1 - \alpha )} = \left\{ {\begin{array}{*{20}l} { - 2\alpha \cdot IW_{i}^{(1 - \alpha )} - 4\left( {Lb_{i}^{{\left( {1 - \alpha } \right)}} - FP_{i} } \right),} \hfill & {if\;FP_{i} < Lb_{i}^{{\left( {1 - \alpha } \right)}} } \hfill \\ { - 2\alpha \cdot IW_{i}^{(1 - \alpha )} ,} \hfill & {if\;FP_{i} \in \left[ {Lb_{i}^{{\left( {1 - \alpha } \right)}} ,Ub_{i}^{{\left( {1 - \alpha } \right)}} } \right]} \hfill \\ { - 2\alpha \cdot IW_{i}^{(1 - \alpha )} - 4\left( {FP_{i} - Ub_{i}^{{\left( {1 - \alpha } \right)}} } \right),} \hfill & {if\;FP_{i} > Ub_{i}^{{\left( {1 - \alpha } \right)}} } \hfill \\ \end{array} } \right.} \hfill \\ {IW_{i}^{(1 - \alpha )} = Ub_{i}^{(1 - \alpha )} - Lb_{i}^{(1 - \alpha )} } \hfill \\ \end{array} } \right. $$
(18)

where \(OP_{i}\) and \(FP_{i}\) denote the \(i - th\) actual and predicted COFP, respectively; and \(Lb_{i}^{{\left( {1 - \alpha } \right)}}\) and \(Ub_{i}^{{\left( {1 - \alpha } \right)}}\) are the lower and upper bounds of the confidence interval at the confidence level (\(1 - \alpha\)). When the objective functions \(f_{1}^{IF} \;{\text{and}}\;f_{2}^{IF}\) take the minimum value, the position of the slime mold is the values of \(\zeta_{1}\) and \(\zeta_{2}\).

2.8 Framework of proposed TVF_EMD_MOSMA

The proposed TVF_EMD_MOSMA is introduced in this section. The overall framework is shown in Fig. 1. In this study, the point and interval prediction results of WCOFP and BCOFP are derived using TVF_EMD_MOSMA. The detailed steps are as follows:

Fig. 1
figure 1

Framework of the proposed TVF_EMD_MOSMA

  • Step 1: Data processing

Using the TVF_EMD method, the high-frequency noise in the COFP time series is filtered out. Consequently, a smooth COFP time series is obtained. Thereafter, the processed COFP time series is used in the TVF_EMD_MOSMA prediction model.

  • Step 2: Point forecast

The processed COFP time series is input into the four submodels (Volterra adaptive filter, ARMIA, ELM, and BiLSTM) as an input variable, and the output is \(\left\{ {FP_{i}^{\left( 1 \right)} ,FP_{i}^{\left( 2 \right)} ,FP_{i}^{\left( 3 \right)} ,FP_{i}^{\left( 4 \right)} } \right\}\). Here, \(FP_{i}^{\left( j \right)}\) denotes the predicted result from the \(j - th\) submodel. In addition, \(\left\{ {FP_{i}^{\left( 1 \right)} ,FP_{i}^{\left( 2 \right)} ,FP_{i}^{\left( 3 \right)} ,FP_{i}^{\left( 4 \right)} } \right\}\) is combined as the final predicted value \(FP_{i}\) by a set of weights \(\left\{ {\varpi^{\left( 1 \right)} ,\varpi^{\left( 2 \right)} ,\varpi^{\left( 3 \right)} ,\varpi^{\left( 4 \right)} } \right\}\) determined using MOSMA.

  • Step 3: Interval forecast

The optimal distribution of the prediction error series is determined using the MLE. The CIACs (\(\zeta_{1}\) and \(\zeta_{2}\)) are determined using MOSMA. Finally, the confidence intervals with confidence levels of 90%, 95%, and 99% are calculated by Eqs. (15) and (16).

3 Studied data and performance evaluation metrics

In this section, the studied data (original, training set, and test set data) are presented in detail. The performance evaluation metrics for the PF and IF are also presented.

3.1 Studied data

WCOFP and BCOFP have a large international influence and are often regarded as the reference prices for the international oil spot market. Therefore, the study sample includes the daily settlement prices of the WCOFP and BCOFP from 4 January 2010 to 30 September 2021. These data are available on Investing.com (https://cn.investing.com/com-modities/crude-oil-historical-data). The period from 4 January 2010 to 30 September 2020 is selected as the training set for training the prediction model. The remaining data (1 October 2020 to 30 September 2021) are used as the test set to evaluate the model prediction performance. More details of the dataset are presented in Table 2.

Table 2 Detailed description of the studied data (Data from https://cn.investing.com/com-modities/crude-oil-historical-data)

3.2 Performance evaluation metrics

In this section, a system of metrics for evaluating the performances of PF and IF is presented (Wang, Niu, et al., 2021; Wang Wang et al., 2021; Wang et al., 2021; Wang et al., 2021). A detailed description of the metrics is presented in Table 3. The abbreviations used in this study are described in "Appendix 1".

Table 3 Description of the metrics used by the model

4 Experiments and analysis

In this section, two control experiments to test the performance of the proposed TVF_EMD_MOSMA prediction model are introduced in detail. Experiment I demonstrated the excellent PF performance of the proposed hybrid prediction model by setting up multiple sets of controlled trials. Experiment II determined the optimal distribution of the prediction error series using MLE. Confidence intervals with confidence levels of 90%, 95%, and 99% were calculated using MOSMA. The IF performance of the proposed hybrid prediction model was tested.

4.1 Experiment configuration

4.1.1 Experimental environment

All experiments in this study were conducted using MATLAB R2020a. The experimental platform was a laptop computer with a 64-bit 1.80 GHz AMD Ryzen7 4800U CPU with a Radeon graphics card and 16 GB of RAM, running on Windows 10.

4.1.2 Model parameter settings

The proposed hybrid prediction model consists of four submodels (Volterra adapter filter, ARIMA, ELM, and BiLSTM) and MOSMA optimization algorithms. Three types of control models have been developed: (1) Benchmark models: BPNN, LSTM, Volterra adapter filter, ARIMA, ELM, BiLSTM, TVF_EMD_BPNN, TVF_EMD_Voterra, TVF_EMD_ARIMA, TVF_EMD_ELM, and TVF_EMD_BiLSTM. (2) The models using different data denoising methods: EMD_MOSMA and REMD_MOSMA. (3) The models using different optimization algorithms: TVF_EMD_SFL, TVF_EMD_SMA, TVF_EMD_MOALO, and TVF_EMD_MOWOA. The detailed parameter settings for the models are listed in Table 4.

Table 4 Parameter settings of the model and optimization algorithm

4.2 Experiment I: PF result analysis

Three types of control trials were conducted to demonstrate the superiority of the PF performance of the proposed hybrid prediction model. The results of Experiment I are shown in Fig. 2.

Fig. 2
figure 2

Results of Experiment I

4.2.1 Comparison with benchmark models

Multiple benchmark models were used as control models to demonstrate that the TVF_EMD_MOSMA outperformed the single model. The PF performances of TVF_EMD_MOSMA and multiple benchmark models are presented in Table 5.

  1. (1)

    By comparing the model using the TVF_EMD denoising method and the model without denoising, the PF’s accuracy and stability of the model using TVF_EMD are better than those of the model without denoising. For instance, the evaluation metrics of the ELM using TVF_EMD for WCOFP are \(MAPE_{TVF\_EMD\_ELM}^{WCOFP} = 0.9317\), \(MdAPE_{TVF\_EMD\_ELM}^{WCOFP} = 0.7231\), \(MAE_{TVF\_EMD\_ELM}^{WCOFP} = 0.5431\), and \(RMSE_{TVF\_EMD\_ELM}^{WCOFP} = 0.7114\). However, the evaluation metrics of the ELM without denoising for WCOFP are \(MAPE_{ELM}^{WCOFP} = 1.6058\),\(MdAPE_{ELM}^{WCOFP} = 1.3558\),\(MAE_{ELM}^{WCOFP} = 0.9392\), and \(RMSE_{ELM}^{WCOFP} = 1.2365\). The MAPE, MdAPE, MAE, and RMSE of the TVF_EMD_ELM are much smaller than those of the ELM, indicating that the PF performances of ELM are improved by using the TVF_EMD denoising method.

  2. (2)

    The benchmark models show different predictive performances for the different datasets. For WCOFP, TVF_EMD_ELM achieved the best PF performance. However, TVF_EMD_ARIMA was the best-performing benchmark model for BCOFP. Therefore, to obtain the optimal prediction performance for different datasets, the proposed hybrid prediction model combines submodels using the optimal weights determined by MOSMA.

  3. (3)

    For the BCOFP, all evaluation metrics of TVF_EMD_MOSMA are smaller than those of all the benchmark models (\(MAPE_{TVF\_EMD\_MOSMA}^{BCOFP} = 0.8281\),\(MdAPE_{TVF\_EMD\_MOSMA}^{BCOFP} = 0.7109\),\(MAE_{TVF\_EMD\_MOSMA}^{BCOFP} = 0.5862\), and \(RMSE_{TVF\_EMD\_MOSMA}^{BCOFP} = 0.7488\)). For WCOFP, the model comparison results are similar to those of BCOPF, demonstrating that the TVF_EMD_MOSMA prediction model achieves a better PF performance than all benchmark models.

Table 5 PF performance of TVF_EMD_MOSMA and benchmark models

4.2.2 Comparison of different denoising methods

To compare the different denoising methods (TVF_EMD, EMD, and REMD), EMD_MOSMA and REMD_MOSMA were set as the control models. The PF performances of TVF_EMD_MOSMA and the control models with different denoising methods are listed in Table 6.

Table 6 The PF performance of TVF_EMD_MOSMA and the control models

By comparing the evaluation indicators of TVF_EMD_MOSMA, EMD_MOSMA, and REMD_MOSMA, the evaluation indicators (MAPE, MAE, MdAPE, and RMSE) of TVF_EMD_MOSMA were smaller than those of EMD_MOSMA and REMD_MOSMA. The results indicate that the TVF_EMD method shows excellent performance. In other words, TVF_EMD is more suitable for data processing of COFP prediction than the other denoising methods (EMD and REMD).

4.2.3 Comparison of different optimization algorithms

In this section, different optimization algorithms are compared. To demonstrate the superiority of MOSMA, multiple control models (TVF_EMD_SFL, TVF_EMD_SMA, TVF_EMD_MOALO, and TVF_EMD_MOWOA) were constructed by keeping the submodels and denoising methods unchanged and solely changing the optimization algorithm. The parameter settings for these models are listed in Table 4. The PF performances of TVF_EMD_MOSMA and the control models with different optimization algorithms are listed in Table 6.

Comparing the evaluation metrics of TVF_EMD_MOSMA and the control models using different optimization algorithms, only the MdAPE of TVF_EMD_MOSMA for BCOFP is larger than that of the optimal model (TVF_EMD_MOALO) by 0.04. For other cases, TVF_EMD_MOSMA has the smallest evaluation metric values (MAPE, MAE, MdAPE, and RMSE) for the WCOFP and BCOFP. In other words, MOSMA achieves higher stability and accuracy than SFL, SMA, MOALO, and MOWOA. Experiment I helps conclude that the proposed hybrid prediction model is more effective for the PF of COFP.

4.3 Experiment II: IF result analysis

In this section, the optimal distribution of the prediction error series is determined using MLE. The confidence intervals with confidence levels of 90%, 95%, and 99% and the IF performance of the proposed hybrid prediction model are presented. The results of Experiment II are shown in Fig. 3.

Fig. 3
figure 3

Results of experiment II

4.3.1 Selection of the optimal distribution function for COFP prediction error series

The IF can provide more information to users regarding the forecast results. The common distributions in the field of energy price forecasting are the Gumbel, GEV, and gamma distributions (Jiang et al., 2021; Wang, Niu, et al., 2021). Therefore, we chose these distribution functions to fit the distribution characteristics of the COFP prediction error series using the MLE. The \(R\) and RMSE are used as evaluation metrics to evaluate the fitting effect of the distribution function. And \(R\) is the correlation coefficient of the fitted distribution (Gumbel, GEV, and gamma) with empirical distribution (observations). The results of fitting the distribution functions of the WCOFP prediction error series (WCOFP_E) and BCOFP prediction error series (BCOFP_E) are listed in Table 7.

Table 7 Results of fitting the distribution functions of WCOFP_E and BCOFP_E

For WCOFP_E, the \(R\) of the Gumbel and gamma distributions are close, and the RMSE of the gamma distribution is the smallest. Therefore, the gamma distribution is chosen as the distribution function of WCOFP_E. For the distribution function of WCOFP_E, the shape parameter is 5.2 and the inverse scale parameter is 0.4, as estimated using the MLE.

For BCOFP_E, the \(R\) of the gamma distribution was the largest, and the RMSE of the gamma distribution was the smallest. Therefore, the gamma distribution is considered to be the optimal distribution of BCOFP_E. For the distribution function of BCOFP_E, the shape parameter is 16.63 and the inverse scale parameter is 0.19, as estimated using the MLE.

4.3.2 IF performance of TVF_EMD_MOSMA

After determining the optimal distribution function of COFP_E using the MLE, the CIACs were added to the hybrid prediction model to enable the IF with a narrow width and high prediction accuracy. The confidence intervals with confidence levels of 90%, 95%, and 99% were derived using the PF results, optimal distribution, and CIACs. The IF performance of the proposed TVF_EMD_MOSMA is presented in Table 8.

Table 8 IF performance of the proposed TVF_EMD_MOSMA

The AIS is used to measure the accuracy of the IF, and a larger AIS indicates a higher prediction accuracy. The FICP is also used to measure the accuracy of IF, and it is the frequency at which observations fall into the prediction intervals. FINAW is used to measure the width of the confidence interval, with smaller values indicating a better IF performance. As presented in Table 8, the value of the FICP is extremely close to the confidence level, indicating that the IF results in this study are reasonable. The values of FINAW and AIS are quite small when the given confidence level is achieved, which indicates that the IF results show high accuracy and narrow interval width. In other words, the IF performance of the proposed hybrid prediction model is excellent.

5 Discussion

TVF_EMD_MOSMA is further discussed in this section, including the sensitivity analysis, the accuracy and stability improvement ratio, and the forecasting effect analysis.

5.1 Sensitivity analysis

Sensitivity analysis was employed to measure the robustness of the model’s predictive performance; in particular, sensitivity analysis was used to measure the effect of a change in the model parameters on the prediction results. This study considered the effect of the variation of two parameters, population size and iteration number, on the model’s prediction performance. The sensitivity analysis indicators \(A_{s}^{M}\) are calculated as follows (Wu et al., 2022).

$$ A_{s}^{M} = \sum\limits_{j = 1}^{m} {\frac{1}{m} \cdot \left( {M_{j} - \overline{M} } \right)}^{2} $$
(19)

where \(M_{j}\) is the evaluation metric (MAPE, MdAPE, MAE, and RMSE) of \(j - th\) experiment, \(\overline{M}\) is the mean of \(M_{j}\), and \(m\) is the number of experiments.

In this study, the population size \(S_{p}\) takes individual values from (100, 150, 200*, and 250), and the number of iterations \(N_{i}\) takes individual values from (50, 100*, 150, and 200). The * indicates the optimal parameter of MOSMA. The pattern \(S_{p} \overline{\overline{{N_{i} }}}\) indicates that the population size \(S_{p}\) changes and the number of iterations \(N_{i}\) remains unchanged at 100. The pattern \(\overline{\overline{{S_{p} }}} N_{i}\) indicates that the number of iterations \(N_{i}\) changes and the population size \(S_{p}\) remains unchanged at 200. The results of the sensitivity analysis are listed in Table 9.

Table 9 Results of the sensitivity analysis

The values of all sensitivity analysis indicators are extremely small. Take \(A_{s\_WCOFP}^{MAPE} \left( {S_{p} \overline{\overline{{N_{i} }}} } \right)\) as an example,\(A_{s\_WCOFP}^{MAPE} \left( {S_{p} \overline{\overline{{N_{i} }}} } \right)\) is 0.0004. However, the smallest MAPE for WCOFP is 0.8281, which is more than two thousand times as large as \(A_{s\_WCOFP}^{MAPE} \left( {S_{p} \overline{\overline{{N_{i} }}} } \right)\). Therefore, the effect of the parameter changes on the prediction performance is small. In other words, the prediction performance of the proposed hybrid prediction model is robust.

5.2 Accuracy and stability improvement ratio

The improvement of the proposed TVF_EMD_MOSMA was measured in comparison with the control model. \(IR_{a}\) and \(IR_{s}\) were proposed to measure the improvements in the prediction accuracy and stability, respectively. \(IR_{a}\) and \(IR_{s}\) were calculated as follows:

$$ IR_{a} = \frac{{MAPE_{control} - MAPE_{TVF\_EMD\_MOSMA} }}{{MAPE_{control} }} \times 100\% $$
(20)
$$ IR_{s} = \frac{{MAE_{control} - MAE_{TVF\_EMD\_MOSMA} }}{{MAE_{control} }} \times 100\% $$
(21)

where \(MAPE_{control}\) and \(MAE_{control}\) denote the MAPE and MAE of the control model, respectively, and \(MAPE_{TVF\_EMD\_MOSMA}\) and \(MAE_{TVF\_EMD\_MOSMA}\) denote the MAPE and MAE of the proposed TVF_EMD_MOSMA. The results of \(IR_{a}\) and \(IR_{s}\) are presented in Table 10.

Table 10 Results of \(IR_{a}\) and \(IR_{s}\)

Compared with the control model, the proposed TVF_EMD_MOSMA showed a significant improvement in the prediction accuracy and stability. For the WCOFP, TVF_EMD_MOSMA improved the prediction accuracy and stability by an average of 39.0965% and 33.4563%, respectively. For the BCOFP, TVF_EMD_MOSMA improved the prediction accuracy and stability by an average of 37.5749% and 32.6658%, respectively. Compared with LSTM, TVF_EMD_MOSMA achieved the largest improvement in the prediction accuracy (\(IR_{a}^{LSTM\_WCOFP} = 66.2892\%\) and \(IR_{a}^{LSTM\_BCOFP} = 68.5897\%\)). Compared with Volterra, TVF_EMD_MOSMA has the largest improvement in prediction stability (\(IR_{s}^{Volterra\_WCOFP} = 60.0070\%\) and \(IR_{s}^{Volterra\_BCOFP} = 60.8264\%\)). Compared with the benchmark model, the model using different denoising methods, and the model using different optimization algorithms, the proposed hybrid prediction model significantly improved prediction accuracy and stability.

5.3 Forecasting effect analysis

The first- and second- order effectiveness (\(TE_{1}\) and \(TE_{2}\), respectively) were introduced to measure the forecasting effect of the model (Wang et al., 2021b, c, d; Wang, Niu, et al., 2021).

$$ \left\{ {\begin{array}{*{20}l} {TE_{1} = e_{1} } \hfill \\ {TE_{2} = e_{1} \left( {1 - \sqrt {e_{2} - \left( {e_{1} } \right)^{2} } } \right)} \hfill \\ {e_{k} = \sum\limits_{i = 1}^{n} {Q_{i} A_{i}^{k} } ,\;Q_{i} = \frac{1}{n}} \hfill \\ {A_{i} = 1 - \left| {b_{i} } \right|} \hfill \\ {b_{i} = \left\{ {\begin{array}{*{20}l} { - 1,} \hfill & {{{\left( {OP_{i} - FP_{i} } \right)} \mathord{\left/ {\vphantom {{\left( {OP_{i} - FP_{i} } \right)} {OP_{i} }}} \right. \kern-\nulldelimiterspace} {OP_{i} }} < - 1} \hfill \\ {{{\left( {OP_{i} - FP_{i} } \right)} \mathord{\left/ {\vphantom {{\left( {OP_{i} - FP_{i} } \right)} {OP_{i} }}} \right. \kern-\nulldelimiterspace} {OP_{i} }},} \hfill & {{{ - 1 \le \left( {OP_{i} - FP_{i} } \right)} \mathord{\left/ {\vphantom {{ - 1 \le \left( {OP_{i} - FP_{i} } \right)} {OP_{i} }}} \right. \kern-\nulldelimiterspace} {OP_{i} }} \le 1} \hfill \\ {1,} \hfill & {{{\left( {OP_{i} - FP_{i} } \right)} \mathord{\left/ {\vphantom {{\left( {OP_{i} - FP_{i} } \right)} {OP_{i} }}} \right. \kern-\nulldelimiterspace} {OP_{i} }} > 1} \hfill \\ \end{array} } \right.} \hfill \\ \end{array} } \right. $$
(22)

where \(OP_{i}\) and \(FP_{i}\) denote the \(i - th\) actual and predicted COFP, respectively, and \(n\) is the number of predicted values. In addition, \(TE_{1}\) and \(TE_{2}\) are larger, indicating a higher predictive efficiency of the model. The evaluation results regarding the forecasting efficiency of the multiple models are listed in Table 11.

Table 11 Evaluation results of multiple models’ forecasting efficiency

The results in Table 11 demonstrate that the first- and second- order effectiveness (\(TE_{1}\) and \(TE_{2}\), respectively) of the proposed TVF_EMD_MOSMA are the highest. In other words, the proposed hybrid prediction model achieved the highest prediction efficiency in comparison with the other control models. By comprehensively considering the sensitivity analysis, accuracy and stability improvement ratio, and forecasting effect analysis, the proposed TVF_EMD_MOSMA can achieve a higher forecasting accuracy, better forecasting stability, and higher forecasting efficiency compared with other control models. Therefore, the proposed hybrid prediction model is reliable, valid, and significant.

6 Practical applications and limitations of the model

In this section, the practical applications and limitations of the proposed TVF_EMD_MOSMA, as well as future research in this area, are presented.

6.1 Practical applications

COPs are closely related to socioeconomic, international, and national security. However, international COPs are volatile and uncertain, owing to various factors. The proposed TVF_EMD_MOSMA for the prediction of COFP is reliable, valid, and significant. It can provide valuable reference information for investors (Medina–Olivares et al., 2021), companies, and governments.

  1. (1)

    By considering a combination of the spot prices and the forecast prices of COFP, investors can decide on an investment strategy to achieve their profit goals.

  2. (2)

    Companies that use crude oil as a feedstock or produce oil for sale can apply hedging operations based on the forecast results of COFP to control their production costs and sales risks.

  3. (3)

    Governments can decide on the import, export, storage, and use of crude oil according to the forecast results of the COFP to ensure the stability of domestic oil prices. Stable domestic oil prices are crucial for the smooth development of economies and society.

6.2 Limitations and future research

The proposed TVF_EMD_MOSMA uses only the COFP time series to predict future prices. COFP can be influenced by multiple factors, including inflation, exchange rates, supply and demand, international politics, wars, and epidemics. Hence, these factors should be considered in future studies. In the future, the proposed TVF_EMD_MOSMA should be extended to the forecasting of other energy prices, such as coal and natural gas prices.

7 Conclusion

Crude oil is the most important energy source in the world, and fluctuations in crude oil prices can have a significant impact on investors, companies, and governments. Therefore, accurate prediction of COFP is crucial. In this paper, the TVF_EMD_MOSMA prediction model is proposed to improve the accuracy and robustness of the prediction. A new data denoising method, TVF_EMD, is used for COFP data processing. The chaotic time-series prediction method, shallow neural networks, linear model prediction methods, and deep learning methods are adopted as submodels for COFP prediction. The predicted values of submodels are combined with the optimal weight that is determined using MOSMA. The results of IF with a narrower width and higher prediction accuracy were derived by introducing CIACs determined using MOSMA. The conclusions of this study are as follows.

  1. (1)

    The new data denoising method, TVF_EMD, can significantly improve the prediction accuracy of COFP. Comparison experiments helped determine that the prediction accuracy of the model using TVF_EMD was significantly higher than that of other denoising methods (EMD and REMD).

  2. (2)

    The chaotic time-series prediction method, shallow neural network, linear prediction model, and deep learning method were adopted as submodels. Combining their prediction results with MOSMA can obtain accurate and stable prediction results. The MAPE and MAE of WCOFP and BCOFP were \(MAPE_{TVF\_EMD\_MOSMA}^{WCOFP} = 0.7538\),\(MAPE_{TVF\_EMD\_MOSMA}^{BCOFP} = 0.8281\),\(MAE_{TVF\_EMD\_MOSMA}^{WCOFP} = 0.5306\), and \(MAE_{TVF\_EMD\_MOSMA}^{BCOFP} = 0.5862\), respectively.

  3. (3)

    The PF performance of the proposed TVF_EMD_MOSMA is highly robust. The sensitivity analysis demonstrated that the variation in the model parameters had a slight effect on the prediction performance. The maximum value of the sensitivity analysis indicator \(A_{s}^{M}\) is 0.0059, which is extremely small.

  4. (4)

    The IF performance of the proposed TVF_EMD_MOSMA is excellent. By introducing the CIAC determined using MOSMA, the contradiction between the prediction accuracy and interval width is balanced.