1 Introduction

With the advent of the era of big data, cloud computing technology is becoming more and more perfect, leading to more and more investors to enter the stock market, trying to capture the potential mode of the market (Hu et al. 2015; Qureshi 2018). Influenced by corporate decisions, government policies, cross-market news and other factors, the stock market is highly volatile and unstable, which makes it more difficult to predict the future price trend (Hu and Qi 2017). Stock price prediction can better understand the operation law of stock market and grasp the transmission mechanism of monetary policy. In practice, stock forecasting can effectively select and implement monetary policy when the stock market fluctuates violently, which helps to alleviate the negative impact of stock market instability (Gupta et al. 2016). The quality of macroeconomic operation of each country will be further improved (Nuij et al. 2013). Therefore, a good stock forecasting model is very valuable.

Time series is a series of data points arranged by time index (Wang et al. 2019b; Cui et al. 2019). Time series analysis methods can be divided into time domain analysis and frequency domain analysis methods (Wang et al. 2014). Time domain analysis regards time series as ordered point series and analyzes their correlation, such as hidden Markov model (Ahuja and Deval 2018; Sharieh and Albdour 2017) and ARIMA model (Wang et al. 2015; Clohessy et al. 2017). Frequency domain analysis uses transform algorithms [such as discrete Fourier transform (Wang et al. 2019a) and EMD decomposition (Wang et al. 2015)] to transform time series into spectrum, which can be used to analyze the characteristics of the original sequence (Zhang et al. 2019). The frequency domain prediction method can lead researchers to go deep into the interior of time series, find problems that cannot be seen from time domain perspective, and mine the deterministic characteristics and motion laws of time series. However, there is a lack of effective modeling of time series in frequency domain. In view of this, this paper proposes an adaptive wavelet transform model based on XGBoost algorithm, wavelet transform and LSTM to mine the frequency domain patterns of stock sequences. The main contributions of this paper are as follows:

  1. 1.

    XGBoost model measures the importance of stock features, which avoids the problem that too few input features produce large prediction errors and too many input features lead to too long model training time.

  2. 2.

    Wavelet transform can fully extract the high-frequency and low-frequency information of stock by analyzing the frequency domain information of stock sequences and refining the sequence in multiple scales and aspects.

  3. 3.

    The core of this paper is to add an adaptive layer after the LSTM, which can automatically pay attention to different frequency components according to the dynamic evolution of the input sequence and mine the frequency pattern of timing information.

  4. 4.

    AWTM combines the advantages of the above methods and has a better prediction result when the stock fluctuates violently.

2 Related work

At present, a variety of statistical models and neural network models have been applied to stock forecasting, and some progress has been made in some fields. Autoregressive moving average model (ARMA) estimates its prediction parameters by analyzing the dynamics of stock index returns (Rounaghi and Zadeh 2016). However, ARMA is more suitable for stationary linear time series. While stock price series are usually highly nonlinear and non-stationary, which limits the practical application of ARMA model in stock price series (Rojas et al. 2008). With the development of machine learning, researchers can build nonlinear prediction models based on a large amount of historical data, for example, gradient boosting decision tree (GBDT) and convolutional neural network (CNN) (Di and Honchar 2016; Haratizadehn 2019). Through iterative training of data, they gradually approach the real data and can obtain more accurate prediction results than traditional statistical models. However, this method does not reflect the dependence of stock data. Later, Jordan et al. proposed a recurrent neural network (RNN) with unique advantages in processing time series prediction (Rather et al. 2015; Akita et al. 2016). RNN can make full use of the dependence between stock data to predict the future trend of the sequence and fit the future data. However, when the number of hidden layers increases, it is easy to produce the problem of gradient disappearance and gradient explosion, which is not suitable for long-term prediction. LSTM is an improvement in recurrent neural network (Gers et al. 1999). Compared with RNN, LSTM effectively solves the problem of gradient disappearance and gradient explosion by increasing the connection of cell states and has better performance in predicting longer stock sequences (Chen et al. 2015).

LSTM is an excellent variant model of RNN, which can use historical information of stock context to play a strong adaptability in time series analysis (Petersen et al. 2019). However, the deficiency of LSTM is that it does not reveal the multi-frequency characteristics of stocks and the frequency domain of the data cannot be modeled. Zhang et al. (2017) proposed to extract time–frequency information of data by using Fourier transform, and combined time–frequency information with neural network for prediction. Fourier transform connects the characteristics of stock in time domain and frequency domain, observing and analyzing from the time domain and frequency domain of the sequence. But they are absolutely separated, that is, no time domain information is included in the frequency domain, and no frequency domain information can be found in the time domain (Daubechies 1990). Therefore, the contradiction of time–frequency localization often arises when the Fourier transform deals with non-stationary sequences. Compared with Fourier transform, wavelet transform (Pu et al. 2015) effectively solves the above problems by introducing variable scale factor and translation factor. Wavelet transform can refine time series at multiple scales and in many aspects through scaling and shifting, and finally achieve the purpose of frequency segmentation. It also can automatically adapt to the requirements of stock analysis, focus on any aspect of stock data, and solve the difficult problem of Fourier transform (Liu et al. 2013).

The combination of wavelet transform and LSTM can fully extract the time–frequency information of data and establish a model for the frequency of time series. However, the prediction performance is not optimal. Because different sequences depend on different frequency patterns, wavelet transform and LSTM cannot focus on important frequency components. Therefore, an adaptive layer is added after LSTM in this paper. According to the importance relationship between frequency domains , different weights are set for different frequency components to capture the frequency pattern of stocks.

In view of the above discussion, an adaptive wavelet transform model (AWTM) is proposed. AWTM uses wavelet transform to decompose the time–frequency characteristics of stock data and refine the stock in many dimensions to fully extract the high-frequency and low-frequency information of the data. LSTM is used to predict and model the data frequency domain. At the same time, an adaptive layer is added after LSTM to learn different weights for different frequency components, highlight key frequency patterns, and improve the prediction ability of the model. Compared with most existing methods, AWTM can represent trading patterns of different frequencies to infer future trends in stock prices. AWTM is one of the most excellent stock frequency domain prediction models, which has strong robustness, that is, it still has strong prediction ability when the stock fluctuates violently.

3 AWTM model

The goal of stock prediction is to use the data of the first n days to predict the current data (Qin et al. 2017). The n-step prediction is defined as follows:

$$\begin{aligned} X_T=f(X_{T-n},X_{T-n+1},\ldots X_{T-1}) \end{aligned}$$
(1)

where f is the nonlinear mapping of the historical price of the previous n day to the current price \(X_T\).

In this paper, AWTM is used to establish an n-step prediction model for stock sequence frequency information, and its overall structure is shown in Fig. 1:

Fig. 1
figure 1

AWTM architecture

AWTM consists of four parts: feature selection and data preprocessing, wavelet decomposition, model prediction, wavelet reconstruction and data output. The detailed operation of stock sequences in the model will be described in the following sections

3.1 Feature selection and data preprocessing

XGBoost is an improved algorithm based on gradient enhancement decision tree (Zheng et al. 2017), which can effectively construct the enhancement tree and conduct parallel operation. Equation (2) is an algorithm description of XGBoost (Jadad et al. 2019). Its core is to use gradient enhancement to build an enhancement tree to intelligently obtain feature scores, so as to indicate the importance of each feature to the training model (Chen and Guestrin 2016).

$$\begin{aligned} W_{x}^{2}(T)=\frac{1}{M}\sum _{\tfrac{}{}1}^{M} \tau \tfrac{2}{t}(T_{M}) \end{aligned}$$
(2)

where \(W_{x}^{2}(T)\) represents the importance score of each prediction feature X, M represents the number of decision trees. Model selection is the feature that provides the maximum estimation improvement in the risk of squared error. There are numerous features of stock sequences, and a single feature cannot well reflect complex application scenarios, but more features will increase the training complexity of the model. Therefore, this paper uses XGBoost algorithm to measure the importance of stock features and select features of high importance to the prediction target.

The input of the model is \(X=\{X_1,X_2,\ldots ,X_T\} \) , which represents the stock sequence of T moments. Each moment includes the following characteristics: stock trading volume, yield, price change, Ma5, Ma10, MACD, 5-day average of trading volume, 10-day average of trading volume, and PriceChangeRatio. The first step of the model is to use XGBoost to measure the importance of the above nine stock features.

Fig. 2
figure 2

XGBoost experiment diagram

The experimental results of XGBoost are shown in Fig. 2. Figure 2a is the test error and training error curve of XGBoost. The training error is within the range of test error, indicating that the model is highly explanatory. The importance of the feature depends on whether the prediction performance changes significantly when the feature is replaced by random noise. In order to ensure the clarity of experimental results, the above 9 characteristics are represented in order of 0 to 8 in Fig. 2b. It can be seen from the figure that the prediction error of feature 3 (ma5) and feature 5 (MACD) is relatively large when they are replaced by random noise, indicating that the two features have a great influence on the prediction performance and are of high importance. Therefore, this paper takes ma5 and MACD as selected features and uses Eq. (3) together with price information of stocks (closing price, opening price, high price and low price) for standardization, so as to accelerate the convergence speed of the model.

$$\begin{aligned} {\tilde{X}}=\frac{X-{\bar{X}}}{\sqrt{\sum (X-{\bar{X}})}} \end{aligned}$$
(3)

where \({\tilde{X}}\) is a standardized feature sequence and \({\bar{X}}\) is the mean of frequency components. The standardized features will be used as the input of the model for wavelet decomposition.

3.2 Wavelet decomposition

Wavelet transform is a sequence analysis method with time–frequency localization, whose key is wavelet decomposition (Zhang and Benveniste 1992). When the random sequence f(t) is processed by wavelet transform, the scaling operation and translation operation are usually carried out at some discrete points, and the scaling factor a and translation factor b are discretized, which is called discrete wavelet transform (Ribeiro et al. 2019).

$$\begin{aligned} W_{f}(m,n)=a_{0}^{-\frac{m}{2}}\int f(t)\varPsi (a_{0}^{-m}t-nb_{0})\hbox {d}t \end{aligned}$$
(4)

where \(\varPsi (t)\) is the wavelet basis function and \( W_{f}(m,n)\) is the wavelet transform coefficient.

The second step of the model is to decompose the normalized sequence into frequency components by discrete wavelet transform. The low-frequency signal and high-frequency signal generated by wavelet layer i decomposition are expressed as \({\tilde{X}}_{T}^{L}(i)\) and \({\tilde{X}}_{T}^{H}(i)\), respectively. Low-frequency signals continue to enter the next layer and are decomposed into \({\tilde{X}}_{T}^{L}(i+1)\) and \({\tilde{X}}_{T}^{H}(i+1)\).

Fig. 3
figure 3

Wavelet decomposition graph

The decomposition schematic diagram of feature sequences is shown in Fig. 3a. \({\tilde{X}}=\left\{ {\tilde{X}}_{1}, {\tilde{X}}_{2},\ldots ,{\tilde{X}}_{T}\right\} \) represents the standardized sequence at time T; C stores decomposed low-frequency information and high-frequency information, L records the length of decomposed feature sequence in C. Wavelet decomposition is applied to the original sequence or the low-frequency sequence of the previous layer. The result of decomposition is a low-frequency sequence and a high-frequency sequence, and the length is half of the previous sequence information. In this paper, a large number of experiments show that the optimal number of decomposition layers is 2, and the 2-layer wavelet decomposition diagram with single feature is shown in Fig. 3b.

Fig. 4
figure 4

Characteristic frequency components

The decomposition results of standardized feature sequences are shown in Fig. 4, where Fig. 4a–c, respectively, corresponds to the 2-layer low-frequency components, the 2-layer high-frequency components and the 1-layer high-frequency components of the six features. These frequency components retain most of the valid information of the original data.

3.3 Model prediction

LSTM is an excellent time domain prediction model, including input vector, two LSTM hiding layer, dense layer and output layer (Nelson et al. 2017). Theoretically, the more LSTM layers there are, the stronger the nonlinear fitting ability is. However, due to the large amount of time spent on training, the scheme with better effect and less time is generally selected. In this paper, the 2-layer LSTM can achieve a good effect in less time. The number of neurons in LSTM of the first layer is 128, and the number of neurons in LSTM of the second layer is 64, so as to reduce the volume of data flow and reduce the interference of redundant data. The role of the adaptive layer is to assign weights to the frequency components, highlighting the influence of different frequency components on the prediction target.

Fig. 5
figure 5

Prediction model

As shown in Fig. 5, the third step of the model is to input the low-frequency components and high-frequency components obtained in Sect. 3.2 into LSTM as separate sequences. The model is trained by the objective function. The training function is:

$$\begin{aligned}&\zeta ^{f}=\frac{1}{T}\sum _{t=1}^{T}({\hat{y}}-y)^{2} \end{aligned}$$
(5)
$$\begin{aligned}&\varTheta -\eta \frac{\vartheta \zeta (\varTheta )}{\vartheta \varTheta }\rightarrow \varTheta \end{aligned}$$
(6)

where \({\hat{y}}\) is the predicted value obtained by the model and y is the actual value. Equation (6) is the optimization function of the model, \(\varTheta \) is the optimization parameter of the model, and \(\eta \) is the adjustable learning rate. The model uses BPTT algorithm to update optimization parameters iteratively until each LSTM predicts the frequency information of a group of original sequences. After obtaining the predicted frequency components, LSTM will enter the adaptive layer, which weights different frequency components.

$$\begin{aligned} g=f(W_T{\hat{y}}+b) \end{aligned}$$
(7)

where f is the activation function of the adaptive layer, \({\hat{y}}\) is the predicted frequency component, and g is the weighted frequency characteristic.

The last step of AWTM model is wavelet reconstruction. The inverse transformation of Eq. (4) is used to fuse the predicted frequency components to obtain the predicted stock sequence and output the predicted results

4 Experiment

This section is divided into three sections. Section 4.1 discusses model complexity. Section 4.2 compares AWTM with baseline model. Section 4.3 is the interpretive analysis of the model. Considering the impact of the 2008 global financial crisis on financial markets, a large number of experiments were carried out with stock data from 2009 to date to prove the excellent performance of the model. In terms of quantitative comparison of experimental results, this paper used MAPE and RMSE to evaluate the performance of the model.

4.1 Complexity analysis

The complexity of the model has a direct impact on the predictive ability. The complexity of the model in this paper can be measured by two parameters: wavelet decomposition layers and time step. In the following sections, we examine the impact of two parameters on predictive performance and reveal some insights into AWTM parameter settings.

Fig. 6
figure 6

Parameter comparison chart

The effect of the number of wavelet decomposition layers on the model is shown in Fig. 6a. Both the 1-layer decomposition and the 3-layer decomposition can capture the change trend of the sequence, but the 2-layer decomposition has a better prediction effect.

Table 1 AWTM wavelet decomposition layer error analysis

As can be seen from the error analysis in Table 1, AWTM’s prediction results in the 1-layer decomposition and the 3-layer decomposition have large errors. The reason for this phenomenon is that the frequency information is not fully extracted by the 1-layer decomposition , and the excessive frequency generated by the 3-layer decomposition leads to noise and increase in experimental errors in the process of prediction and reconstruction. Therefore, the 2-layer decomposition is adopted in this model.

As shown in Fig. 6b, the fitting curves of 1-step prediction and 3-step prediction basically coincide, but the deviation between the predicted curve and the true value curve under the state of 5-step is relatively large. Table 2 shows the MAPE and RMSE values of AWTM for 1-step, 3-step and 5-step, from which it can be seen that the experimental error of 3-step prediction is small, but the experimental error of 5-step is large. This is due to the increase in cell memory information during the 5-step prediction process, which is beyond the scope of the cell ability.

Table 2 AWTM time step error analysis

It can be seen from the above two experiments that the parameters of wavelet decomposition layer and time step have important influences on AWTM prediction results. In addition, through a lot of experiments, the time step is set as 3 and the number of wavelet decomposition layers is 2.

4.2 Baseline model comparison

To further verify the performance of the model, AWTM was compared with the following baseline models :ARMA, CNN, RNN.

Fig. 7
figure 7

Baseline comparison chart

Figure 7 is the broken line graph of the predicted value and the actual value of the above model. It can be seen from the figure that all the four algorithms can fit the future trend of the sequence, but there is a big difference in the degree of fitting. The prediction curve of AWTM has the highest accuracy compared with the other three models, which is basically consistent with the change trend of the actual curve. Table 3 is a summary of the errors of the actual and predicted values of different models. It shows that the average percentage error and mean square error of AWTM are lower than other three models. It is worth noting that AWTM’s fitting curves at peaks and troughs have smaller deviations than other models, which highlights AWTM’s excellent performance in predicting non-stationary sequences.

Table 3 Baseline error analysis

4.3 Model interpretive analysis

In traditional models, single feature is used as the model input for prediction. In this paper, XGBoost algorithm is used to select high-importance features for multivariate prediction of stocks. In order to verify the necessity of multivariable prediction, LSTM and AWTM are used for single-feature prediction and multi-feature prediction, respectively. Figure 8 shows the single-feature and multi-feature prediction curve fitting graph of the above model. It can be seen from the figure that the prediction results of the multi-feature model are closer to the real value and have less hysteresis. The error analysis in Table 4 shows that the multi-feature prediction error of both LSTM and AWTM is smaller than that of single-feature prediction, which also reflects the superiority of multi-feature prediction.

Fig. 8
figure 8

Comparison of single-feature and multi-feature model prediction

Table 4 Characteristic error analysis

In order to validate the feasibility of adaptive frequency domain modeling, AWTM is compared with LSTM, Wavelet-LSTM (Sugiartawan et al. 2017) and mLSTM (Wang et al. 2018). LSTM is a typical classical neural network applied to stock prediction. Wavelet-LSTM is an effective method recently put forward that utilizes Wavelet transform and LSTM to model frequency information. mLSTM is a new neural network which embedded wavelet transform into LSTM.

Fig. 9
figure 9

Model comparison chart

In order to ensure the scientificity and effectiveness of the experiment, this experiment is multi-feature prediction, and the other parameters are the same. It can be seen from Fig. 9 that AWTM has the highest curve fitting accuracy and the smallest hysteresis, and the curve basically coincides with the real sequence. It is also easy to find from the error analysis in Table 5 that the MAPE and RMSE of AWTM are only 1.6556 and 0.5604, which are the minimum values of the above models. The above experiments prove the feasibility and superiority of AWTM in multi-feature modeling of stocks.

Table 5 Interpretative error analysis

5 Conclusion and future work

The frequency information of the stock reflects the trading pattern of the stock with different regularity. The discovery of stock frequency patterns provides useful clues to future trends. In order to explore the multi-frequency patterns of stock, an adaptive wavelet transform model (AWTM) is designed in this paper. AWTM adopts XGBoost algorithm to realize multi-feature prediction and combines the wavelet transform and LSTM to predict different frequency components of stocks. The main idea of AWTM is to add an adaptive layer, through which AWTM can automatically focus on different frequency components according to the dynamic evolution of the input sequence and reveal the multi-frequency pattern of stocks. In this paper, the performance of the model is tested by using \( {S \& P500}\) real market data. Experimental results show that AWTM is more predictive and less hysteresis than other traditional models.

However, there are still some shortcomings in our works. For example, only the time–frequency information of stocks is considered, other text information, such as news and comments, are not mined. The research on the deficiency of this paper will continue to explore the further use of time–frequency information and text information, so as to achieve better prediction effect.