Keywords

1 Introduction

Crude oil forecasting is always a significant topic which has drawn a large amount of attention from the area of economy and finance. As some difficulties for disobeying the statistical assumptions in dealing with nonlinear and non-stationary time series when considering the crude oil price accurately, the predicting becomes more challenging.

There are numerous and effective models proposed to for predicting crude oil price series. Besides some classical statistical models [1, 2], some models are developed based on machine learning technique in recent years [3,4,5,6,7,8,9,10,11,12]. From the machine learning approaches, support vector machine (SVM) [3,4,5] is one of the artificial intelligent based approaches which can capture the crude oil prices’ volatility. Another popularly and frequently used model is neural network [6, 7]. Neural network has the advantages on capturing the nonlinear functional relation through the interconnected neurons and training modes. Some applicable types of neural networks like radial basis function neural network, recurrent neural network, wavelet neural network, etc [8,9,10,11,12] have also been utilized to predict financial time series. Consequently, hybrid methods [13, 14] and some neural network models combined with random jump or with random time effective [15,16,17,18] have been widely employed in forecasting.

Although the artificial intelligent based technologies have achieved markable results, there are still limitations. As far as we know, a large number of models do not pay attention to the interactions among the inputs with different scales of features. For example, most of the neural network models handle the non-linearities by the activation functions and they were widely used in some applications such as image processing, mechanical translation, speech recognition and so on. Factorization machine (FM) is firstly introduced by Rendle [19] who used it for collaborative recommendations. FM is an advantageous method in learning feature interactions. In contrast to the matrix factorization method which only model the relation between two entities [20], FM shows its flexibility. FM also has applications in industry and gives a promising direction for the prediction purpose in regression, classification and ranking [21,22,23].

In this paper, we combine factorization machine with neural network technique (FM-NN) for crude oil price forecasting. In order to show the effects of stochastic time on data in different time, we incorporate the stochastic time effective function [17] to modify the weights when computing the global error. It helps to modify the weights of the proposed neural network (FM-ST-NN) more effectively. To evaluate the model’s performance, we select WTI crude oil data from January 3, 2012 to December 31, 2018. To show the advantages of the proposed FM-ST-NN model, we compare the predicting results with two neural network models including NN and FM-NN. The performance of different models are evaluated through three accuracy measures finally.

The rest of this paper is presented as follows. In Sect. 2, we give the prediction model FM-ST-NN. In this section, we first construct the model and introduce some needed ingredients. Then the algorithm of FM-ST-NN is given. Section 3 presents the main predicting results of FM-ST-NN model. The forecasting results shows more accuracy compared with the other two neural network models, i.e., NN and FM-NN. In Sect. 4, we highlight some necessary conclusions finally and give some directions for future work.

2 Methodology

2.1 FM-ST-NN (Factorization Machine-Neural Network with a Stochastic Time Effective Function)

In this section, we first describe a three layer neural network incorporating factorization machine ideas which is shown in Fig. 1. Compared with neural network, the main differences are located in the hidden layer from the structure. Be more concrete, the blue nodes receive data after the linear connections from the set of input neurons. And then compute the results through the nonlinear activation function’ operation. But the red nodes compute the results through the activation function of the factorized interactions among the inputs.

The output of FM models all interactions between each pair of features, i.e., a linear interactions and an inner product of each feature. The estimating results of FM is

$$\begin{aligned} \hat{y}(x)= & {} w_0+\sum ^n_{i=1}w_ix_i+\sum ^n_{i=1}\sum ^n_{j=i+1}\langle \varvec{v_i},\varvec{v_j}\rangle x_ix_j \nonumber \\= & {} w_0+\sum ^n_{i=1}w_ix_i+\frac{1}{2}\sum ^{k'}_{f=1}((\sum ^n_{i=1}v_{if}x_i)^2-\sum ^n_{i=1}v^2_{if}x^2_i), \end{aligned}$$
(1)

where \(k'\) is a user-specified dimension parameter for feature \(x_i\).

Fig. 1
figure 1

The architecture of factorization machine-neural network

Suppose that there are n input features in the input layer, m neurons in the hidden layer and p output neurons in the output layer. Then, we describe the forecasting process of FM-ST-NN step by step below.

1). The hidden layer: the neurons in the hidden layer are partitioned into two parts:

$$\begin{aligned} y_j= & {} f(\sum ^n_{i=1}v_{ij}x_i),i=1,2,...,n,j=1,2,...,m-k,\end{aligned}$$
(2)
$$\begin{aligned} y_{m-k+j}= & {} f(\frac{1}{2}((\sum ^n_{i=1}\hat{v}_{i,m-k+j}x_i)^2-\sum ^n_{i=1}(\hat{v}_{i,m-k+j})^2x^2_i)), \nonumber \\&i=1,2,...,n,j=1,2,...,k, \end{aligned}$$
(3)

where \(x_i\) is the value of ith node, \(y_j\) denotes the value of the jth node, \(v_{ij}\) is the linear weight relates the ith input node to the jth output node which is undetermined, \(\hat{v}_{ij}\) is the latent weight corresponding to the ith feature which is also undermined, k is the user-specified dimension, and f is the activation function.

2). The output layer:

$$\begin{aligned} O_j=f(\sum ^m_{i=1}w_{ij}y_j),j=1,2,...,p, \end{aligned}$$
(4)

where \(w_{ij}\) is the connection weight which is undetermined, \(O_j\) is the value of the jth node in the output layer.

3). The loss layer: the squared loss function is a commonly used loss function and the error of the sample \(e_{i_0}\) is computed as

$$\begin{aligned} e_{i_0}(t)=\frac{1}{2}\phi (t_{i_0})\sum ^p_{j=1}(T_j(t)-O_j(t))^2, i_0=1,2,...,N, \end{aligned}$$
(5)

where \(i_0\) is the number of data sets, \(\phi (t_{i_0})\) is the stochastic time effective function which can be referred to [17] given in Eq. (6), and \(T_j\) is the actual value from the output data sets.

$$\begin{aligned} \phi (t_{i_0})=\frac{1}{\beta }\exp \{\int ^{t_{i_0}}_{t_0} \mu (t)dt+ \int ^{t_{i_0}}_{t_0} \sigma (t)dB(t) \}, \end{aligned}$$
(6)

where \(\beta \) is the time strength coefficient, \(t_0\) is the time of the newest data in the data training set, and \(t_{i_0}\) is an arbitrary time point in the data training set. \(\mu (t)\) is the drift function, \(\sigma (t)\) is the volatility function, and B(t) is the standard Brownian motion.

Then the total error of the output layer for average set in FM-ST-NN is defined as

$$\begin{aligned} l= & {} \frac{1}{N}\sum ^N_{i_0=1}e_{i_0}(t) \nonumber \\= & {} \frac{1}{N}\sum ^N_{i_0=1}\phi (t_{i_0})\frac{1}{2}\sum ^p_{j=1}(T_j(t)-O_j(t))^2. \end{aligned}$$
(7)

To optimize the FM-ST-NN model, we often use the stochastic gradient decent method to update the weights until it achieves convergence. The algorithm of FM-ST-NN will be given in the following subsection.

2.2 The Algorithm of FM-ST-NN

The referred stochastic time effective function [17] which reflects the time random effects. When the events happened near, the investor will be affected heavily. The recent information has stronger effect than the old information does. It is used to modify the weights when training and finally predict the target more accurately. The training process of FM-ST-NN is detailed as follows:

(1). Choose five kinds of crude oil price related indexes as input data sets: daily opening price, daily highest price, daily lowest price, daily closing price and daily trade volume. Choose the closing price of crude oil price in the next trading day as the output (target) data sets. So there are five nodes in the input layer and one node in the output layer.

(2). Divide the sample data into training and testing data sets. Let the connective weights in vector \(\varvec{v}\) and \(\varvec{w}\) follow the uniform distribution on \((-1,1)\).

(3). Give the stochastic time effective function value \(\phi (t)\) according to some parameters. The transfer functions from the input layer to the hidden layer and the hidden layer to the output layer are both assumed as the sigmoid activation function.

(4). Predefined a minimum training threshold \(\varepsilon \) until the predicting result meet the standard rule. That is, if l is below the value \(\varepsilon \), use the input data in the testing data sets to compute the prediction result (step 6). Otherwise, update the connective weights in the next step.

(5). Update the weights through the backward process and the goal is to minimize the value l. For the gradients of the weights in the output layer are computed as

$$\begin{aligned} \Delta w_{ij}=-\eta \frac{\partial l}{\partial w_{ij}}, i=1,2,...,m, j=1,2,...,p, \end{aligned}$$
(8)

where \(\eta \) is the learning rate. So the update rule is

$$ w_{ij} \leftarrow w_{ij}-\eta \frac{\partial l}{\partial w_{ij}}.$$

The gradients of the weights in the hidden layer are separately given by

$$\begin{aligned} \Delta v_{ij}=-\eta \frac{\partial l}{\partial v_{ij}}, i=1,2,...,n, j=1,2,...,m-k, \end{aligned}$$
(9)
$$\begin{aligned} \Delta \hat{v}_{i,m-k+j}=-\eta \frac{\partial l}{\partial \hat{v}_{i,m-k+j}},i=1,2,...,n, j=1,2,...,k. \end{aligned}$$
(10)

So the update rule is

$$ v_{ij} \leftarrow v_{ij}-\eta \frac{\partial l}{\partial v_{ij}};$$
$$ \hat{v}_{i,m-k+j} \leftarrow \hat{v}_{i,m-k+j}-\eta \frac{\partial l}{\partial \hat{v}_{i,m-k+j}}.$$

(6). When the error l is below the predefined value \(\varepsilon \), the predicting result for the output node p can be derived through

3 Simulation Results and Evaluation

In this paper, we select the data of WTI crude oil from January 3, 2012 to December 31, 2018. At the beginning of modelling, all the data should be normalized. We use the formula

$$\begin{aligned} X'=\frac{X-\min X}{\max X-\min X} \end{aligned}$$
(11)

to normalize the data in the range of [0, 1]. And it is easy to obtain the real value after predicting through converting as \(X=X'(\max X-\min X)+\min X\).

We first use FM-ST-NN model predicting the crude oil price. The normalized data are divided into two parts: training data and testing data, with a proportion of 2:1. The number of neurons m in the hidden layer is 20 in the experiment and the maximum iteration number is 1000. The learning rate \(\eta \) is set as 0.03 and the minimum preset training threshold \(\varepsilon \) is \(5\times 10^{-5}\). As we considering the stochastic time effects on predicting, \(\mu (t)\) and \(\sigma (t)\) are assumed as

$$\begin{aligned} \beta= & {} 1, \\ \mu (t)= & {} \frac{1}{(b-t)^3}, \\ \sigma (t)= & {} 1. \end{aligned}$$

Figures 23 show the prediction results for training data sets and for testing data sets respectively.

Fig. 2
figure 2

The prediction results for training data sets (\(k=19\))

Fig. 3
figure 3

The prediction results for testing data sets (\(k=19\))

Then we compare the predicting results from FM-ST-NN with different neural network models, i.e., NN and FM-NN. To analyze and evaluate the predicting performance of different models, the accuracy measures with the corresponding definitions are used below.

$$\begin{aligned} MAE= & {} \frac{1}{N}\sum ^{N}_{i_0=1}\left| d_{i_0}-O_{i_0}\right| , \\ RMSE= & {} \sqrt{\frac{1}{N}\sum ^{N}_{i_0=1}(d_{i_0}-O_{i_0})^2}, \\ MAPE= & {} 100\times \frac{1}{N}\sum ^{N}_{i_0=1}\left| \frac{d_{i_0}-O_{i_0}}{d_{i_0}}\right| . \end{aligned}$$

In order to know more clearly about the effects of factorization machine and stochastic time effect function, we evaluate the models through three accuracy measures listed in Table 1 and Table 2. Tables 1 and 2 give the performances of different predicting models, i.e., NN, FM-NN and FM-ST-NN on training data and testing data separately. From the two tables, we could find that both factorization machine technique and stochastic time effective function can improve the accuracies. And the FM-ST-NN model is the best among the other models on testing data. What’s more, deep interactions among the input nodes will improve the accuracies.

Table 1 The performances of different predicting models on training data
Table 2 The performances of different predicting models on testing data

4 Conclusion

In this paper, the factorization machines based neural network combined with stochastic time effective function has been studied to predict the WTI crude oil price. We constructed the proposed neural network model and gave the algorithm to illustrate the method. After the experiments, we compared the forecasting results from the FM-ST-NN model with the other two neural network models including neural network and factorization machine based neural network. The aim is to find the advantages of the proposed model. According to the three accuracy measures, i.e., the Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE), FM-ST-NN model performs better. The innovation of this paper is combining factorization machine and stochastic time effect with neural network for predicting. On one way, the effect of stochastic time on the predicting can be explored deeply. And on the other way, we believe that combine factorization machine technology in various algorithm design will be a future work tackling problems arising from different applications.