Stock prediction using deep learning

Singh, Ritika; Srivastava, Shashi

doi:10.1007/s11042-016-4159-7

Stock prediction using deep learning

Published: 17 December 2016

Volume 76, pages 18569–18584, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Stock prediction using deep learning

Download PDF

12k Accesses
205 Citations
1 Altmetric
Explore all metrics

Abstract

Stock market is considered chaotic, complex, volatile and dynamic. Undoubtedly, its prediction is one of the most challenging tasks in time series forecasting. Moreover existing Artificial Neural Network (ANN) approaches fail to provide encouraging results. Meanwhile advances in machine learning have presented favourable results for speech recognition, image classification and language processing. Methods applied in digital signal processing can be applied to stock data as both are time series. Similarly, learning outcome of this paper can be applied to speech time series data. Deep learning for stock prediction has been introduced in this paper and its performance is evaluated on Google stock price multimedia data (chart) from NASDAQ. The objective of this paper is to demonstrate that deep learning can improve stock market forecasting accuracy. For this, (2D)²PCA + Deep Neural Network (DNN) method is compared with state of the art method 2-Directional 2-Dimensional Principal Component Analysis (2D)²PCA + Radial Basis Function Neural Network (RBFNN). It is found that the proposed method is performing better than the existing method RBFNN with an improved accuracy of 4.8% for Hit Rate with a window size of 20. Also the results of the proposed model are compared with the Recurrent Neural Network (RNN) and it is found that the accuracy for Hit Rate is improved by 15.6%. The correlation coefficient between the actual and predicted return for DNN is 17.1% more than RBFNN and it is 43.4% better than RNN.

Stock Price Prediction Using Deep Learning

Stock Market Price Trend Prediction – A Comprehensive Review

Stock Market Prediction Using Deep Learning Techniques for Short and Long Horizon

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Accurate stock market multimedia (chart) prediction is considered impossible by the old school of thought. The Efficient Market hypothesis states that stock prices reflect all current information and new information leads to unpredictable stock prices. Random walk concluded that stock prices cannot be accurately predicted using historical values [17].

However, the attraction of good returns has led to myriad methods for price prediction. In the last three decades abundant research has been done in this area. But still researchers are of the view that prediction of stocks on non-linear non-stationary financial time series is one of the most challenging tasks. Several mathematical models have been developed but the results are still dissatisfying [21]. Studies focusing on forecasting the stock markets have been mostly preoccupied with forecasting volatilities [6].

Professional Traders use fundamental and technical analysis for price prediction. There is plethora of literature which suggests different methods for stock price and indices prediction. Fundamental approach is the traditional approach using company parameters [19]. Technical analysis is based on Dow Theory [19] and uses price history for prediction. It ranges from traditional statistical modelling to methods based on artificial intelligence and machine learning [28]. In literature many ANN models are evaluated against the statistical models for stock prediction. ANN has also been compared with different data mining classification algorithms [9, 28] and the comparison suggests that ANN models give better results [6]. Literature suggests that the first neural network in stock market prediction was given by White [29]. Classical ANNs were mostly used in stock in the later part of last century. However, Ten years ago researchers focused on applying Multi layer perceptron (MLP) for stock prediction [20, 24]. Of late, different variants of ANN in hybrid models have been applied to stock market prediction. Atsalakis et al. [1] have surveyed ANNs applied for stock prediction but they have not pointed out the feature extraction methods used for stock prediction. The literature includes Genetic Algorithm to optimise a RNN for stock forecasting [13]. Artificial Fish Swarm Algorithm (AFSA) was used to optimise RBFNN for stock prediction [23]. Extreme learning machine has also been used to construct Decision Support System for stock prices prediction and trading strategies [25]. Moreover, different feature extraction methods were used with ANN such as Curvilinear Component Analysis along with RBFNN was used for Bel20 Stock Market Index [16]. Fusion technique was used in Indian context to model stock data [22]. Very recently, combination of feature extraction using 2-Directional 2-Dimensional Principal Component Analysis ((2D)²PCA) with RBFNN has been applied for stock price prediction [4].

Some of the studies, however, have shown that ANN has few drawbacks and it is not suitable for stock prediction because stock market data has enormous noise and complex dimensionality. ANN exhibits inconsistent and unpredictable behaviour on this data [23]. Most of the neural networks have shallow architecture and are thus designed with one hidden layer. One of the reasons may be unsuccessful training strategy for multi-layer. Another problem with neural networks is that many of them tend to fall into a local optimum solution thus over-fitting [14]. However, deep architectures can overcome these problems [14] and they have already yielded promising performance in many fields including language [30], speech [18] and image [8, 10, 34]. Hinton first proposed the idea of deep learning and harnessed the power of model beyond three-level nets [7]. According to some recent papers DNNs can give better approximation to nonlinear functions than shallow models [15, 26]. DNNs have been applied to some time series forecasting and they have shown good results [11, 12].

Recent literature suggests that researchers are attempting to use deep learning for stock prediction. Successful application in speech domain [18] has led to the idea that since speech is a time series data and stock data is also time-series so this method can be used. However, the DNN techniques are not fully harnessed. A paper which uses autoencoders to extract features from the input variables has been proposed for stock trading strategies [27].

In this paper DNN has been introduced as a classifier to stock multimedia (chart) trend prediction and when compared with state of the art technologies [23] applied to stock this perform better. Section 2 describes the research methodology suggested for stock prediction. Section 3 provides the framework for the proposed model. It depicts the working of the model through block diagram. Along with this different features of the deep learning which are used in the model have been described. Section 4 implements the framework and describes the experimental setup. Section 5 is a results and analysis section which shows that DNN performs better than some of the latest techniques applied in this domain. Finally, Section 6 carries the conclusion and the future direction.

2 Research methodology

The proposed methodology uses (2D)²PCA for dimensionality reduction [33] and thereafter it uses DNN as a predictor. For (2D)²PCA, N samples in I consists of {I ₁₁ , I ₁₂ , I _1n , I _m1 , I _m2 , I _mn} where N = m*n and the covariance can be defined as

$$ Cov(I)=\frac{1}{N}{\displaystyle \sum_{i=1}^m{\displaystyle \sum_{j=1}^n\left({I}_{ij}-\overline{I}\right)*{\left({I}_{ij}-\overline{I}\right)}^T}} $$

(1)

Where, $ \overline{I}=\frac{1}{N}{\displaystyle \sum_{i=1}^m{\displaystyle \sum_{j=1}^n{I}_{ij}}} $ is the mean of all samples. Here single value decomposition (SVD) is used to compute projecting subspace V _d for first d largest Eigen value. The feature matrix Y for (2D)PCA is obtained by (2)

$$ {Y}_i = {I}^T*\ {V}_d $$

(2)

Next,Y ^T isused as the new training sample in place of I and the process is repeated to compute the output of (2D)²PCA as Z _i. The output of (2D)²PCA is fed to the DNN.

DNN is a multi-layer feed forward neural network and it uses supervised learning as shown in Fig. 1. Here X _i are nodes in the input layer and Y _j represent neurons in the 1 ^st hidden layer and it uses hyperbolic tangent function for computation. Z _k represent neurons in the 2 ^nd layer and it again uses hyperbolic tangent function for computation. Finally the output layer has two nodes P _l which uses the softmax function for classification and linear function for regression. The hyperbolic tangent represents the activation function for the network. U _ij are the weights connecting the input and 1 ^st hidden layer and b _j are the biases for 1 ^st hidden layer. V _jk are the weights connecting the 1 ^st hidden layer and the 2 ^nd hidden layer and c _k are the biases for 2 ^nd hidden layer. Finally W _kl are the weights connecting the 2 ^nd hidden layer and the output layer and d _l are the biases for output layer.

$$ {Y}_j=f\left({X}_i,{U}_{ij},{b}_j\right)= tanh\left\{\left({\displaystyle \sum_{i=1}^3{X}_i*{U}_{ij}}\right)+{b}_j\right\}=\frac{e^{\left\{\left({\displaystyle \sum_{i=1}^3{X}_i*{U}_{ij}}\right)+{b}_j\right\}}-{e}^{-\left\{\left({\displaystyle \sum_{i=1}^3{X}_i*{U}_{ij}}\right)+{b}_j\right\}}}{e^{\left\{\left({\displaystyle \sum_{i=1}^3{X}_i*{U}_{ij}}\right)+{b}_j\right\}}+{e}^{-\left\{\left({\displaystyle \sum_{i=1}^3{X}_i*{U}_{ij}}\right)+{b}_j\right\}}} $$

(3)

$$ {Z}_k={f}_1\left({Y}_j,{V}_{jk},{c}_k\right)= tanh\left\{\left({\displaystyle \sum_{j=1}^4{Y}_j*{V}_{jk}}\right)+{c}_k\right\}=\frac{e^{\left\{\left({\displaystyle \sum_{j=1}^4{Y}_j*{V}_{jk}}\right)+{c}_k\right\}}-{e}^{-\left\{\left({\displaystyle \sum_{j=1}^4{Y}_j*{V}_{jk}}\right)+{c}_k\right\}}}{e^{\left\{\left({\displaystyle \sum_{j=1}^4{Y}_j*{V}_{jk}}\right)+{c}_k\right\}}+{e}^{-\left\{\left({\displaystyle \sum_{j=1}^4{Y}_j*{V}_{jk}}\right)+{c}_k\right\}}} $$

(4)

$$ {P}_l={f}_2\left({Z}_k,{W}_{kl},{d}_l\right)= softmax\left\{\left({\displaystyle \sum_{k=1}^4{Z}_k*{W}_{kl}}\right)+{d}_l\right\}=\frac{e^{\left\{\left({\displaystyle \sum_{k=1}^4{Z}_k*{W}_{kl}}\right)+{d}_l\;\right\}}}{{{\displaystyle \sum_{k=1}^4e}}^{\left\{\left({\displaystyle \sum_{k=1}^4{Z}_k*{W}_{kl}}\right)+{d}_l\;\right\}}} $$

(5)

Learning occurs when these weights are adapted to minimize the error on labelled training data. The loss error function which is the objective function is minimized for the model depending on whether the model terminates in a linear regression or classification. W is the collection {w _i }_1:N-1, where w _i denotes the weight matrix connecting layers i and i + 1 for a network of N layers. B is the collection {b _i }_1:N-1, where b _i denotes the column vector of biases for layer i + 1.

The model given in Fig. 1 is a regression problem. For regression the loss function is given below:

$$ Mean\kern0.5em Squared\kern0.5em Error=L\left(W,\kern0.5em \left.B\right|j\right)=\frac{1}{2}{\displaystyle \sum_{j=1}^n{\left({y}_j-{\widehat{y}}_j\right)}^2} $$

(6)

Here, y _j is the actual output and ŷ _j is the predicted output where j denotes number of training examples. The loss function for classification is given below:

$$ Cross\kern0.5em Entropy=L\left(W,\left.B\right|j\right)=-{\displaystyle \sum_{j=1}^n ln\left({\widehat{y}}_j\right)*{y}_j+ ln}\left(1-{\widehat{y}}_j\right)*\left(1-{y}_j\right) $$

(7)

In order to update weights and biases of the network a supervised training algorithm Stochastic Gradient Descent (SGD) is used. Following process is iterated till the convergence criteria are reached. First, W and B are initialized and then updated according to the following equations.

$$ {w}_{jm}={w}_{jm}-\alpha *\frac{\partial L\left(W\left.,B\right|j\right)}{\partial {w}_{jm}} $$

(8)

$$ {b}_{jm}={b}_{jm}-\alpha *\frac{\partial L\left(W\left.,B\right|j\right)}{\partial {b}_{jm}} $$

(9)

Here, α is the learning rate and w _jm is the weight for m ^th neuron connecting layer j and j + 1. Similarly, b _jm is the bias for m ^th neuron connecting layer j and j + 1 whereas $ \frac{\partial L\left(W\left.,B\right|j\right)}{\partial {w}_{jm}} $ is computed using backward propagation. The chain rule is used to compute this function and for the last output layer the computation is shown below:

$$ \frac{\partial L\left(W\left.,B\right|j\right)}{\partial {w}_{jm}}=\frac{\partial L\left(W\left.,B\right|j\right)}{\partial {f}_2\left({Z}_k,{w}_{kl},{d}_l\right)}*\frac{\partial f{\left({Z}_k,{w}_{kl},{d}_l\right)}^2}{\partial \left({\displaystyle \sum_{k=1}^4{Z}_k*{w}_{kl}+{d}_l}\right)}*\frac{\partial \left({\displaystyle \sum_{k=1}^4{Z}_k*{w}_{kl}+{d}_l}\right)}{\partial {w}_{jm}} $$

(10)

3 Proposed method

In this paper, we have two purposes: one is to introduce DNN as a proposed model in stock prediction and the other is to demonstrate that this method gives improved result compared to state of the art method. Figure 2 represents the proposed model.

3.1 Data Collection

The data is collected from NASDAQ and the prediction is done for individual stock. This is because the index data does not consider firms characteristics and company wise prediction is more useful for the investors [20]. Therefore the data is collected for Google stock multimedia (chart) which is an American multinational technology company specializing in Internet-related services and products. Our goal is to consider a time period long enough to capture a high diversity in price movements and also to avoid data snooping. The data set used for experiment is from August 19, 2004 to December 10, 2015. Hence, the model is built for working 2843 days. Further the data set is divided in training set and testing set. Training set consists of data from August 19 2004 to May 31 2011 and testing set from June 1 2011 to December 10 2015.

In the problem, each record of data set includes daily information which consists of the closing price, the highest price, the lowest price, and the opening price named at day t as x(t), x _h(t), x _l(t) and x _o(t) respectively. Other technical analysis parameters used as input include the leading, lagging and trend change indicators to get a composite result.

This paper uses 36 variables for forecasting as used in literature [23]. The variables I ₁ to I ₃₆ are computed based on the equations in Table 1. However, two parameters have been replaced with Bollinger Bands which compare the volatility and the relative price levels [28]. These variables when used as input to the model provide the forecast for the closing price on the next day. The forecast is for short-term because data far from the forecasting date provides less and less information useful to forecasting value [16].

Table 1 Input variables for the stock market data set

Full size table

3.2 Dimension reduction

(2D)²PCA is used to reduce the dimensions of the data set [33] as it projects the original raw data matrix into a projection matrix. There is loss of information due to this method but the processing time and the convergence speed of the model increases many fold. On a large data set such loss of information would not cause much variation in the output. The output of (2D)²PCA is fed to the DNN input nodes as shown in Fig. 2.

3.3 Forecasting

The forecasting is done in two phases where in the first phase the training is done to compute weights W and biases B of the model. In the second phase testing is done where W and B are used to compute the output. Before this the output of the (2D)²PCA is normalized [23] according to the (11) to bring it in a range [0,1] .

$$ {Z}_{ij}=\frac{Z_{ij}- min\left({Z}_i\right)}{max\left({Z}_i\right)- min\left({Z}_i\right)} $$

(11)

Here in (11), Z _i denotes the output of (2D)²PCA and Z _ij is the normalized output which is used as the input to the DNN

3.4 Regularization

DNN is a complicated network and uses large number of parameters and as the complexity of the model increases the bias decreases but variance increases. In order to balance the bias-variance trade-off regularization is used and it makesthe model simpler. Further it reduces the variance by limiting the biases and making few of them 0 thus reducing the generalization error (error rate observed on validation data) and avoiding model overfitting [3, 5, 31].

The model uses ℓ1 and ℓ2 norms for regularization as it modifies the loss function mentioned below:

$$ L\hbox{'}\left(W,\left.B\right|j\right)=L\left(W,\left.B\right|j\right)+{\lambda}_1{R}_1\left(W,\left.B\right|j\right)+{\lambda}_2{R}_2\left(W,\left.B\right|j\right) $$

(12)

For ℓ1 regularization R ₁(W,B I j) is sum of all absolute weights and biases. For ℓ2 regularization R ₂(W,B I j) is sum of squares of all weights and biases. The constants λ ₁ and λ ₂ are chosen to be very small, for example 10⁻⁵.

3.5 Advanced optimization

Adaptive learning rate algorithm ADADELTA [32] automatically combines the benefits of learning rate annealing and momentum training to avoid slow convergence. It predicts the stock prices very fast and gives more accurate result. Learning rate annealing is a heuristics approach and the drawback of this method is that it tends to slow down at local minima and move fast whenever suitable and moreover the learning rate is applied to all dimensions of the parameter [32].

Momentum is a per-dimension training method and it is an improvement over SGD. The gradients along the minima are much smaller but since they are in the same direction they keep accumulating, hence speeding up the training.

$$ {w}_t+1={w}_t+\varDelta {w}_t $$

(13)

$$ {b}_t+1={b}_t+\varDelta {b}_t $$

(14)

$$ \varDelta {w}_t=\left(\rho \varDelta {w}_t-1\right)-\eta \frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t} $$

(15)

$$ \varDelta {b}_t=\left(\rho \varDelta {b}_t-1\right)-\eta \frac{\partial L\left(W,\left.B\right|t\right)}{\partial {b}_t} $$

(16)

Here, ρ is a constant which is controlling the decay of previous parameter updates and η is global learning rate shared by all dimensions. ADAGRAD uses an update rule for bias b and for weight w which is given in (17)

$$ \varDelta {w}_t=-\frac{\eta }{\sqrt{{\displaystyle \sum_{i=1}^t{\left(\frac{\partial L\left(W,\left.B\right|i\right)}{\partial {w}_i}\right)}^2}}}*\left(\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t}\right) $$

(17)

The ADAGRAD method is sensitive to the choice of learning rate η and since the denominator is continual accumulation of squared gradient η will continue to decay. As suggested by Zeiler [3] ADADELTA is used to overcome these limitations. ADADELTA accumulates the gradient for a certain window size and the denominator uses local estimate for recent gradients where at time t the running average is given by (18)

$$ E\left[{\left(\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t}\right)}^2\right]=\rho E\left[{\left(\frac{\partial L\left(W,\left.B\right|t-1\right)}{\partial {w}_{t-1}}\right)}^2\right]+\left(1-\rho \right)*{\left(\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t}\right)}^2 $$

(18)

$$ \varDelta {w}_t=-\frac{\eta }{\sqrt{E\left[{\left(\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t}\right)}^2\right]}+\varepsilon }*\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t} $$

(19)

In order to match the units of the numerator and the denominator an added term is put in (19) which becomes

$$ \varDelta {w}_t=-\frac{\sqrt{E\left[\varDelta {w_{t-1}}^2\right]}+\varepsilon }{\sqrt{E\left[{\left(\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t}\right)}^2\right]}+\varepsilon }*\frac{\partial L\left(W,\left.B\right|t\right)}{\partial {w}_t} $$

(20)

$$ E\left[\varDelta {w_t}^2\right]=\rho E\left[\varDelta {w_{t-1}}^2\right]+\left(1-\rho \right)*\left(\varDelta {w_t}^2\right) $$

(21)

Here ADADELTA is an improvement over these two methods as it avoids the selection of hyperparameter. Since it is difficult to estimate learning rates for a DNN with deep architecture therefore ADADELTA for DNN gives better result.

4 Experimental setup

The purpose of this experiment is to predict the stock closing price and to compare the performance of this model with other ANN models. The latest literature shows that (2D)²PCA along with RBFNN has performed the best among all the other dimensionality reduction techniques combined with ANN [23]. Therefore for this experiment the dimensionality has been reduced using (2D)²PCA for RNN, RBFNN and DNN model. The reason for doing this is to bring uniformity across the models. The resultant dimensions of (2D)²PCA have been chosen to be 10 × 10,15 × 15 and 19 × 35 for a window size of 20 and therefore 36 × 20 matrix is reduced to the above mentioned dimensions. Window size is the number of days for which the data is being taken into account for predicting the next day’s data. Say, window size 20 means data is being taken for 20 days and results are predicted for 21^st day.

Once the dimensionality is reduced the output is computed for each day based on Deep learning equations. Both the input data and output data for the training set are passed to the deep learning method. The regularization parameters ℓ1 and ℓ2 are set to 10⁻⁵. It is found that this is the best ℓ1 and ℓ2 for the range of ℓ1 and ℓ2 = 10 ⁿ(n = -5,-6….,-10). The loss function is set to the mean square error and the regression stopping criteria is set to 0. For ADADELTA from three possible values for ρ = 0.9, 0.99 and 0.999 the best one is selected. The best ε value is selected from the range 10 ⁿ (n = -4, -5, -6….,-10) and the numbers of epochs are set to 1000.

For the RBFNN model, Mean Squared Error goal is set to 0 for the range m*10 ⁿ(m = 1,2,…,9; n = 1,2….,9) is found to be and SPREAD is selected through repeated experiments according to performance considerations. The best SPREAD 6*10³. The maximum number of neurons is set equal to the total number of dimensions i.e. for 10 × 10 it is 100. For Elman’s RNN [2] the learning rate is 0.1 and the number of units in the hidden layers is 10 where as the maximum number of iterations to learn is 1000. The performance was measured using different error parameters such as Root Mean Square Error (RMSE), Hit Rate (HR) and Total Return (TR) etc. are listed in Table 2.

Table 2 Formula for different error measures

Full size table

The experiment is conducted using a PC with 4GB RAM, 2 GHz PCU on an R package of version 3.2.2. However since the state of the art technique RBFNN is implemented on MATLAB 7.1(R2010a) platform [23] therefore the same platform is used for RBFNN.

5 Results and analysis

The experimental results for different window sizes and dimensions of the model are shown in Fig. 3 and Tables 3, 4, 5, and 6. Table 7 compares the performance of the proposed DNN with RBFNN and RNN. The results are drawn for varying window sizes 20,40,60,80,100 to test which window size gives the better result. Along with this (2D)²PCA is used to reduce the dimensions of input matrix 36 × 20 to lowest range, middle range and last range.

Table 3 Errors measured for lowest range dimensions

Full size table

Table 4 Errors measured for middle range dimensions

Full size table

Table 5 Errors measured for last range dimensions

Full size table

Table 6 Errors measured for window size 20

Full size table

Table 7 Errors measured for various neural networks and DNN

Full size table

In Fig. 3 the x-axis denotes the normalized closing price, y-axis denotes the number of days. For window size 20 both lowest range dimensions 10 × 10 and the middle range 15 × 15 give better performance as actual and predicted lines are quite close. However, for the last range dimensions 19 × 35 the results are not so satisfactory. For window size 40 the lowest range dimensions 10 × 10 give better performance. However, for the middle range dimensions 25 × 25 and last range 39 × 35 the results are not so satisfactory. For all the other window sizes the lowest range dimensions 10 × 10 provides the best result. Our assumption made based on literature [16] that the short term data provides better forecast is hence correct. The lowest range data performs better than middle range.

The errors are measured for each of the window sizes and the reduced dimensions according to the equations given in Table 2. It is found that amongst the lowest dimension matrix i.e. 10 × 10 the best performance is for the window size 20 which is shown in Table 3. Similarly it is found that amongst the middle range dimension matrix the best performance is for the window size 20 as shown in Table 4. Further it is found that amongst the last range dimension matrix the best performance is for the window size 20 as shown in Table 5. Finally, the results reflect confidence in our assumption taken from literature [16] that short term forecasting in case of stock prediction is more accurate.

Since window size 20 performs the best amongst all the other window sizes therefore the results are compared for its reduced dimensions as presented in Table 6. It is also found that 10 × 10 dimension matrix provides the best result and the total return for 10 × 10 is 1.36 and for 19 × 35 is 0.41 which is more than 200%. Additionally the forecasting accuracy measured by Hit Rate is 0.68 for 10 × 10 and 0.65 for 19 × 35 which is 4.4% better and for the remaining error measures the 10 × 10 dimension again gives a better result.

Since window size 20 and dimensions 10 × 10 provide the best result therefore this is used as input to DNN, RBFNN and RNN as displayed in Fig. 4.

It is found that RNN is performing very poorly. However, RBFNN and DNN predicted values are very close to the actual value. For perspective, it is observed from the results of the DNN model that for window size 20 and dimensions 10 × 10 the model architecture uses a 4 layered network with 100 input units, 200 hidden layer units and 1 output unit. The model Mean Square error is 6.43e-05 and the training time is 3 min and 51 s.

The comparative error measures for these neural networks are shown in Table 7. It is found that RNN is not a good performer, but when DNN and RBFNN are compared it is found that they are very close. On a closer look Hit Rate performance is better for DNN as it is 4.8% more accurate than RBFNN and 15.6% better than the RNN. Therefore DNN can be a better predictor for trend prediction of stock market. The correlation coefficient between the actual and predicted return is 0.76 for DNN, it is 0.63 for RBFNN and 0.43 for RNN. This demonstrates that DNN is 17.1% more highly correlated than RBFNN and it is 43.4% better than RNN.

6 Conclusion

This is the first work using deep learning for stock data forecasting. In this paper it is demonstrated that (2D)²PCA + Deep learning on the Google dataset can improve the accuracy of stock multimedia (chart) prediction compared to conventional neural network methods along with (2D)²PCA. Also for varying window sizes and dimensions the model has been tested to improve the accuracy. It is found that for the window size 20 and dimension 10 × 10 the results are the best. The deep learning method for higher dimensions and large window sizes is giving limited performance.

Experimental results confirm that the proposed model provides a promising method for stock trend prediction as Hit Rate performance is better for DNN as it is 4.8% more accurate than RBFNN and 15.6% better than the RNN. Therefore DNN can be a better predictor for trend of stock market. The correlation coefficient between the actual and predicted return for DNN is 17.1% more accurate than RBFNN and it is 43.4% better than RNN. It is also found that the proposed model is not giving better results for Total Return and RMSE when compared to RBFNN. However, in future these parameters could be improved with other algorithms for Deep learning such as Deep Belief Network, Regularization, Autoencoders and Advanced Optimizations. Finally, it would be interesting to investigate the effectiveness of deep leaning in portfolio management and trading strategies.

References

Atsalakis GS, Valavanis KP (2009) Surveying stock market forecasting techniques--Part II: soft computing methods. Expert Syst Appl 36:5932–5941
Article Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(Feb):625–660
MathSciNet MATH Google Scholar
Guo Z, Wang H, Yang J, Miller DJ (2015) A stock market forecasting model combining two-directional two-dimensional principal component analysis and radial basis function neural network. PLoS One 10, e0122385
Article Google Scholar
Gupta P (2015) Deep Learning - Regularisation. https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiGiNiWyuDJAhXDkI4KHZQbBEcQFggbMAA&url=https%3A%2F%2Fcs.nyu.edu%2Fmishra%2FCOURSES%2F15.Summer%2FL4DNN.pdf&usg=AFQjCNEXq89m3B5prfmdk4s2eThK9YKA&sig2=z-wO9QRxYdD2ZS204B95ig&bvm=bv.110151844,d.c2E. Accessed 5 July 2016
Guresen E, Kayakutlu G, Daim TU (2011) Using artificial neural network models in stock market index prediction. Expert Syst Appl 38:10389–10397
Article Google Scholar
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Article MathSciNet MATH Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(80):504–507
Article MathSciNet MATH Google Scholar
Huang CJ, Yang DX, Chuang YT (2008) Application of wrapper approach and composite classifier to the stock trend prediction. Expert Syst Appl 34:2870–2878. doi:10.1016/j.eswa.2007.05.035
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inf. Process. Syst. 1097–1105
Kuremoto T, Kimura S, Kobayashi K, Obayashi M (2014) Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 137:47–56
Article Google Scholar
Kuremoto T, Obayashi M, Kobayashi K, et al (2014) Forecast chaotic time series data by DBNs. In: Image Signal Process. (CISP), 2014 7th. Int Congr 1130–1135
Kwon YK, Moon BR (2007) A hybrid neurogenetic approach for stock forecasting. IEEE Trans Neural Netw 18:851–864. doi:10.1109/TNN.2007.891629
Article Google Scholar
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40
MATH Google Scholar
Le Roux N, Bengio Y (2010) Deep belief networks are compact universal approximators. Neural Comput 22:2192–2207
Article MathSciNet MATH Google Scholar
Lendasse A, de Bodt E, Wertz V, Verleysen M (2000) Non-linear financial time series forecasting-Application to the Bel 20 stock market index. Eur J Econ Soc Syst 14:81–91
Article MATH Google Scholar
Malkiel BG (2007) A random walk down Wall Street: the time-tested strategy for successful investing. WW Norton & Company
Mohamed A, Sainath TN, Dahl G, et al (2011) Deep belief networks using discriminative features for phone recognition. In: 2011 I.E. Int. Conf. Acoust. speech signal Process. 5060–5063
Murphy JJ (1999) Technical analysis of the financial markets: A comprehensive guide to trading methods and applications. Penguin
Pan H, Tilakaratne C, Yearwood J (2003) Predicting the Australian stock market index using neural networks exploiting dynamical swings and intermarket influences. In: Australas. Jt. Conf. Artif. Intell. 327–338
Patel J, Shah S, Thakkar P, Kotecha K (2015) Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst Appl 42:259–268
Article Google Scholar
Patel J, Shah S, Thakkar P, Kotecha K (2015) Predicting stock market index using fusion of machine learning techniques. Expert Syst Appl 42:2162–2172
Article Google Scholar
Shen W, Guo X, Wu C, Wu D (2011) Forecasting stock indices using radial basis function neural networks optimized by artificial fish swarm algorithm. Knowledge-Based Syst 24:378–385. doi:10.1016/j.knosys.2010.11.001
Article Google Scholar
Situngkir H, Surya Y (2004) Neural network revisited: perception on modified Poincare map of financial time-series data. Phys A Stat Mech Appl 344:100–103
Article Google Scholar
Sun F, Toh K-A, Romay MG, Mao K (2014) Extreme Learning Machines 2013: Algorithms and Applications. Springer
Sutskever I, Hinton GE (2008) Deep, narrow sigmoid belief networks are universal approximators. Neural Comput 20:2629–2636
Article MATH Google Scholar
Takeuchi L, Lee Y-YA (2013) Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks http://cs229.stanford.edu/proj2013/TakeuchiLeeApplyingDeepLearningToEnhanceMomentumTradingStrategiesInStocks.pdf
Teixeira LA, De Oliveira ALI (2010) A method for automatic stock trading combining technical analysis and nearest neighbor classification. Expert Syst Appl 37:6885–6890
Article Google Scholar
White H (1988) Economic prediction using neural networks: the case of IBM daily stock returns. In: IEEE Int Conf. Neural Networks. 451–458
Yu D, Deng L, Wang S (2009) Learning in the deep-structured conditional random fields. In: Proc. NIPS Work. 1–8
Yu, K., Xu, W. and Gong, Y. (2009) Deep learning with kernel regularization for visual recognition. In Advances in Neural Information Processing Systems. 1889–1896
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv Prepr. arXiv1212.5701
Zhang D, Zhou Z (2005) (2D)2PCA : 2-Directional 2-Dimensional PCA for Efficient Face Representation and Recognition. Neurocomputing 69:224–231. doi:10.1016/j.neucom.2005.06.004
Article Google Scholar
Zuo Z, Wang G (2014) Learning discriminative hierarchical features for object recognition. IEEE Sig Proc Lett 21:1159–1163
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian School of Mines, Dhanbad, India
Ritika Singh
Faculty of Management Studies, Banaras Hindu University, Varanasi, India
Shashi Srivastava

Authors

Ritika Singh
View author publications
You can also search for this author in PubMed Google Scholar
Shashi Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ritika Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, R., Srivastava, S. Stock prediction using deep learning. Multimed Tools Appl 76, 18569–18584 (2017). https://doi.org/10.1007/s11042-016-4159-7

Download citation

Received: 16 July 2016
Revised: 24 October 2016
Accepted: 14 November 2016
Published: 17 December 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11042-016-4159-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stock prediction using deep learning

Abstract

Similar content being viewed by others

Stock Price Prediction Using Deep Learning

Stock Market Price Trend Prediction – A Comprehensive Review

Stock Market Prediction Using Deep Learning Techniques for Short and Long Horizon

1 Introduction

2 Research methodology