Forecasting high-frequency stock returns: a comparison of alternative methods

Akyildirim, Erdinc; Bariviera, Aurelio F.; Nguyen, Duc Khuong; Sensoy, Ahmet

doi:10.1007/s10479-021-04464-8

Forecasting high-frequency stock returns: a comparison of alternative methods

S.I.: Risk Management Decisions and Value under Uncertainty
Published: 09 January 2022

Volume 313, pages 639–690, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of Operations Research Aims and scope Submit manuscript

Forecasting high-frequency stock returns: a comparison of alternative methods

Download PDF

1396 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

We compare the performance of various advanced forecasting techniques, namely artificial neural networks, k-nearest neighbors, logistic regression, Naïve Bayes, random forest classifier, support vector machine, and extreme gradient boosting classifier to predict stock price movements based on past prices. We apply these methods with the high frequency data of 27 blue-chip stocks traded in the Istanbul Stock Exchange. Our findings reveal that among the selected methodologies, random forest and support vector machine are able to capture both future price directions and percentage changes at a satisfactory level. Moreover, consistent ranking of the methodologies across different time frequencies and train/test set partitions prove the robustness of our empirical findings.

Prediction of Moroccan Stock Price Based on Machine Learning Algorithms

A Literature Review on Machine Learning Techniques and Strategies Applied to Stock Market Price Prediction

Machine learning techniques for cross-sectional equity returns’ prediction

Article Open access 28 September 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Accurate forecasting of stock prices provide valuable information for investment decision-making process and economic planning of many economic agents. In particular, firms could plan their business activities and development strategies more efficiently given the tendency of their stock price movements. Investors, for their parts, would be able to design optimal portfolios that offer superior performance than stock markets do. The rationale behind forecasting accuracy is that stock market metrics such as prices, trading volumes, transaction costs and timing hide the predictive and complex patterns of stock price movements. To date, several time-series techniques have been extensively used to forecast stock prices. They include, among others, ARIMA, GARCH-based volatility models, and regime-switching models [see, Rapach and Zhou (2013) for a detailed literature review on this topic]. Past studies also make use of more complex algorithms such as the Bayesian network (Zuo and Kita 2012), the jump-diffusion models (Christensen et al. 2012), the fuzzy time-series models (Cheng and Yang 2018), the variational mode decomposition (Lahmiri 2016), and the data analytical prediction models (Oztekin et al. 2016).

Another important premise for stock price forecasting is whether or not the efficient market hypothesis (EMH) is a reliable paradigm. This cornerstone of financial economics has many implications for modern finance theories and asset pricing models at both national and international levels. As recalled by Ross (2005), the EMH is an neoclassical economic perspective of financial markets. The seminal paper by Fama (1970) defines informational efficiency as the state of the market, when all relevant information is embedded into current asset prices. It then becomes evident that the main element in this definition is the determination of the information set.^{Footnote 1} Two decades later of his seminal paper, Fama (1991) reexamines the abundant evidence collected during this lapse. The availability of data and the development of the information and communication technologies helped to emerge a bulk of papers aimed at testing different ways of outsmarting the market, creating supposedly profitable trading rules. Fama (1991) thus includes the tests for return predictability in the sense of weak-form efficiency. These tests are of diverse nature: economic, computer or biologically inspired. Ultimately, if the EMH is an adequate description of market behavior, return forecasts (whether they are based on past returns or an addition of other economic and financial variables) should be ruled out.

Our paper contributes to the related literature by testing the weak-form of the EMH for the Istanbul Stock Exchange (also known as Borsa Istanbul), which is the only official stock exchange in Turkey.^{Footnote 2} It considers high frequency data of individual stocks at different sampling frequency and makes return forecasts by means of a six different methodologies: artificial neural network (ANN), k-nearest neighbors (kNN), Logistic Regression (logistic), Naive Bayes (NB), random forest classifier (RF), support vector Machine (SVM), and extreme gradient boosting (XGBoost).

With these methodologies, we attempt to predict the direction of the future prices (upward or downward movement). This prediction allows for setting a trading rule that triggers a buy (sell) signal in case of an upward (a downward) forecast. Our main results indicate that trading strategies based on price direction change could help investors earn additional returns which should not be the case under the EMH. In particular, we show that among the selected methodologies RF and SVM provide the highest average accuracy rates and ideal profit ratios across different time scales and train/test set partitions. Moreover, kNN also performs equally well especially for the 60-min time frequency. We also observe that logistic regression works well especially at 10-min sample frequency in terms of the ideal profit ratios.

The rest of the paper is organized as follows. Section 2 presents a brief review of recent advanced forecasting methods applied in economics and finance. Section 3 describes our main methodologies. Section 4 explains the sample data whereas Sect. 5 presents the empirical results. Finally, Sect. 6 draws the main conclusions of our research.

2 Stock market forecasts: recent approaches

The usage of advanced forecasting techniques has been quite popular for some time in the context of financial market prediction. McGroarty et al. (2019) introduce an agent based model that can replicate clustered volatility, autocorrelation of returns, long memory in order flow, concave price impact and the presence of extreme price events. Their model allows for profitable high frequency trading strategies. Shang et al. (2019), using functional time series methods with dynamic updating techniques, successfully produce 1-day-ahead forecasts of the VIX index. Fernandes et al. (2019) investigate the predictability of European long-term government bond spreads through the application of heuristic and meta-heuristic support vector regression hybrid structures. Authors show that the sine-cosine LSVR is outperforming its counterparts in terms of statistical accuracy. Hudson and Urquhart (2021) use data from two Bitcoin markets and three other popular cryptocurrencies and find significant predictability and profitability in each cryptocurrency from testing almost 15,000 technical trading rules .

D’Ecclesia and Clementi (2021) estimate the time varying implied volatility of equity returns using the E-GARCH approach, Heston model and a novel ANN framework and show that ANN approach results the most accurate. Göçken et al. (2016) propose a hybrid model consisting in a harmony search and a genetic algorithm along with an ANN, in order to enhance the ability of return forecasts in the Turkish stock market. A hybrid approach was also selected by Qiu et al. (2016). They forecast Nikkei 225 index using an ANN with 18 explanatory variables. They combine the ANN gradient search with a genetic algorithm and simulated annealing to avoid bias in optimization. However, the drawback of this study is that the inclusion of so many input variables could produce multicollinearity among them. Recently, Iglesias Caride et al. (2018) found that ANN could be used to forecast stock price movements in the Brazilian market. This forecast is better for stocks with low market capitalization rather than with large capitalization stocks.

With regards to other machine learning approaches, several expert systems and decision support systems have been introduced. Kyriakou et al. (2021) apply a machine learning technique, namely, a fully nonparametric smoother with the covariates and the smoothing parameter chosen by cross-validation, to forecast stock returns in excess of different benchmarks. Kim and Won (2018) design a hybrid expert system that uses a machine-learning approach. Nadkarni and Neves (2018) emphasize the importance of isolating the most important factors first in algorithmic trading. Some systems focus on predicting the future direction of asset prices instead of the returns themselves (Malagrino et al. 2018; Karhunen 2019; Jeong and Kim 2019). Jeong and Kim (2019) provide an advanced machine-learning applications in the context of expert systems for quantitative trading. Brasileiro et al. (2017) introduce a piece-wise aggregate approximation approach that outperforms the competing methods for US stocks. The model by Feuerriegel and Gordon (2018) successfully reduces the forecast errors below baseline predictions from historic lags at a statistically significant level. Nam and Seong (2019) propose a novel machine learning model to forecast Korean stock price movements using financial news, and show that their approach outperforms traditional algorithms.^{Footnote 3} Avci et al. (2019) design a model that empirically tests the impact of agents’ attitudes on their price expectation through their trading behaviour and show that their hypothesis holds in forecasting day-ahead electricity price in Turkey.

3 Methodologies

3.1 Artificial neural network

As mentioned earlier, we aim to predict future stock price directions within an intraday (high-frequency) framework. Given that the existence of automatic trading (i.e., computer-triggered sells and buys) in several markets, we particularly aim at testing if ANNs are suitable to forecast changes in prices 10-, 30-, and 60-min ahead. This paper could thus be of interest not only for high frequency traders but also for dealers, designated market-makers, retail day-traders and other market participants as a whole.

In our setup, the forecast is implemented using a resilient propagation ANN (Isasi Viñuela and Galván León 2004). Neurons are organized into layers according to their function: the input layer provides information to the network, the output layer provides the answer and the hidden layers are responsible for carrying out the mapping between input and output (Freeman and Skapura 1991).

In addition, we use a multilayered neural network. It is a feed-forward network, totally connected, organized in three layers: 10 input neurons, six neurons in a single hidden layer and one output neuron. This architecture is equivalent to the one used by Gencay (1999), Lanzarini et al. (2011) and Fernández-Rodríguez et al. (2000). Differently, we introduce a learning algorithm changing from the more common back-propagation to a resilient propagation algorithm. The training is supervised. The resilient propagation updates each weight independently, and is not influenced by the magnitude of the derivative but only by behavior of its sign (Riedmiller and Braun 1993). The network is designed to forecast the market return.

Let $P=(p_{1}, p_{2}, \ldots , p_{L})$ be the sequence of stock quotes, then the instantaneous return at the moment $t+1$ is computed as:

$$\begin{aligned} r_{t+1}=\ln \left( \frac{p_{t+1}}{p_{t}} \right) , \end{aligned}$$

(1)

where $p_{t+1}$ and $p_{t}$ are stock quotes at times $t+1$ and t, and $r_{t+1}$ is the continuously compounded return in the period $t+1$.

In order to improve network performance, the trend of last nine returns which are used as inputs for the networks is calculated using ordinary least squares and used as the 10th input for the model. The supervised training implies knowing the expected value for each of the examples that will be used in the training.

Thus, it is required a set of ordered pairs $\{(X_{1}, Y_{1}), (X_{2}, Y_{2}), \ldots , (X_{j}, Y_{j}), \ldots , (X_{M}, Y_{M})\}$ being $X_{j}=(x_{j,1}, x_{j,2}, \ldots , x_{j,10})$ the input vector, from which $x_{j,10}$ is the trend of last nine returns and $Y_{j}$ is the answer value that the network expected to learn from the input vector. In this case:

$$\begin{aligned} x_{j,k}= & {} r_{j+k}=\ln \left( \frac{p_{j+k}}{p_{j+k-1}}\right) , k=1, 2,\dots ,9, \end{aligned}$$

(2)

$$\begin{aligned} Y_{j}= & {} r_{j+1+N}=\ln \left( \frac{p_{j+1+N}}{p_{j+N}}\right) . \end{aligned}$$

(3)

The maximum number of pairs $(X_{j}, Y_{j})$ that can be formed with L stock quotes is $M = L - 9$. We use 70% (also 80% and 90%) of them to train the network and the remaining data to verify its performance. Once the network is trained, its answer for vector $X_{j}$ is computed as indicated in equation Eq. (4).

$$\begin{aligned} Y^{\prime }_{j}=G \left( a_{0}+\sum _{i=1}^{6} a_{i} F \left( b_{0,i}+\sum _{k=1}^{10}x_{j,k} b_{k,i} \right) \right) \end{aligned}$$

(4)

where $x_{j,k}$ is the value corresponding to the k-th input defined in Eq. (2), $b_{k,i}$ is the weight of the arch that links the k-th neuron of the input with the i-th hidden neuron and $a_{i}$ is the weight of the arch that links the i-th hidden neuron with the single output neuron of the network. We should note that each hidden neuron has an additional arch, whose value is indicated in $b_{0,i}$. This value is known as bias or trend term. A similar behavior happens to the output neuron with the weight $a_{0}$.

We perform 30 independent runs for each stock in the sample described in Sect. 4. The maximum number of iterations is capped at 3000. The initial weights are randomly distributed between $-1$ and 1. Initial update value is set to 0.01, while the updated values are limited within $e^{-6}$ and 50. Increase factor and decrease factor for weight updates are set to 1.2 and 0.5, respectively. The functions F and G used by the neural network are both sigmoidal functions and defined in Eqs. (5) and (6), respectively. F is bounded between 0 and 1, and G is bounded between $-1$ and 1. In this way, F allows hidden neurons to produce small values (between 0 and 1), and G permits the net to split between negative (expecting negative returns) and positive (expecting positive returns).

$$\begin{aligned} F(n)= & {} \frac{2}{1+e^{-2n}}-1 \end{aligned}$$

(5)

$$\begin{aligned} G(n)= & {} \frac{1}{1+e^{-n}} \end{aligned}$$

(6)

3.2 k-Nearest neighbors algorithm

k-nearest neighbors algorithm (KNN) is a non-parametric and lazy learning type of machine learning method used for classification and regression. It is non-parametric because it does not make any assumption about the data such as normality. It is lazy or instance-based learning because the function is only approximated locally and all computation is deferred until classification. KNN algorithm works as follows: For a given value of k, it computes the distance between the test data and each row of the training data by using a distance metric like Euclidean metric (some of the other metrics that can also be used are cityblock, Chebychev, correlation, and cosine etc.). The distance values are sorted in an ascending order and then top k elements are extracted from the sorted array. It finds the most frequent class among these k elements and returns as the predicted class.

In our application of KNN, we again use 70% (also 80% and 90%) of the whole data set as the training set and the remaining as the test set. We label the log-returns as up and down to create the output variable and we use nine lags of the returns as the input variables. We apply the algorithm for k= 1,...,20 together with the Euclidean metric and we choose the one with the highest accuracy rate.

3.3 Logistic regression

One of the widely used machine learning algorithms to make classification is the logistic regression which divides observations to a discrete set of classes. Logistic regression outputs a probability value by using the logistic sigmoid function and then this probability value is mapped to two or more discrete classes. In our case, we have a binary classification problem of identifying the next return as up or down. The logistic regression assigns probabilities to each row of the features matrix X. Let’s denote the sample size of the data set with N and thus we have N rows of the input vector. Given the set of d features, i.e. $x_i=(x_{1,i},\ldots ,x_{d,i})$, and parameter vector w, the logistic regression minimizes the following optimization problem:

$$\begin{aligned} \min _{w \in R^d} \sum _{i=1}^{N}{\log (1+ \exp (-y_i f(x_i) ) } + \lambda ||w ||^2 \end{aligned}$$

(7)

where $f(x) = w^Tx+b$. Logistic regression model has some advantages compared to some other methods because of its parsimony and speed of implementation. For instance, it is less probable to expose to the over-fitting problem because of the less number of parameters to estimate compared to the artificial neural networks.

In our application of logistic regression, we again take 70% (also 80% and 90%) of the data set as the training set and the rest as the test set. In the training set we first label the returns as $+1$ and $-1$ to produce the output variable y and then we use nine lags of the returns as inputs in the algorithm in order to make it comparable with ANN methodology. Finally, we use the estimated parameters from the training set together with the lagged returns in the test set to predict the price directions in the test set.

3.4 Naïve Bayes classifier

Naïve Bayes classifier is another widely used machine learning algorithm which is based on the Bayes’ theorem in probability theory hence it is listed under probabilistic machine learning algorithms. The name “Naïve” comes from the independence assumption that the input features are conditionally independent of each other given the classification. Although this assumption is usually violated in practice, Naïve Bayes classifiers have worked quite well in many complex real-world situations. The Naïve Bayes algorithm assigns observations to the most probable class by first estimating the densities of the predictors within each class. As a second step, it computes the posterior probabilities according to Bayes rule:

$$\begin{aligned} {\widehat{P}}(Y = k \mid X_1,\ldots ,X_P ) = \frac{\pi (Y = k) \prod _{j=1}^P P(X_j \mid Y=k)}{ \sum _{k=1}^K \pi (Y = k) \prod _{j=1}^P P(X_j \mid Y=k)}, \end{aligned}$$

(8)

where Y is the random variable corresponding to the class index of an observation, $X_1,\ldots ,X_P$ are the random predictors of an observation, and $\pi (Y = k)$ is the prior probability that a class index is k. Finally, it classifies an observation by estimating the posterior probability for each class, and then assigns the observation to the class yielding the maximum posterior probability.

In our case, Y takes the value of either $+1$ for up price direction and $-1$ for the other case. We use the notation $X_1,..,X_9$ to denote the first lagged up to ninth lagged value of log returns for each of the stock series. We again use the ratio of 0.7/0.3 (also 0.8/0.2 and 0.9/0.1 ) for the training and test set division.

3.5 Random forest classifier

The Random Forest Classifier is an ensemble algorithm such that it combines more than one algorithm of the same or different kind for classifying objects. Decision trees are the building blocks of the random forest model. In other words, the random forest consists of a large number of individual decision trees that function as an ensemble. Random forest classifier creates a set of decision trees from a randomly selected subset of the training set, and each individual tree makes a class prediction. It then sums the votes from different decision trees to decide the final class of the test object. For instance, assume that there are 5 points in our training set that is $(x_1,x_2,\ldots , x_5)$ with corresponding labels $(y_1,y_2,\ldots ,y_5)$ then random forest may create four decision trees taking the input of subset such as $(x_1,x_2,x_3,x_4)$, $(x_1,x_2,x_3,x_5)$, $(x_1,x_2,x_4,x_5)$, $(x_2,x_3,x_4,x_5)$. If three of the decision trees vote for “up” against “down” then random forest predicts “up”. This works efficiently because a single decision tree may produce noise but a large number of relatively uncorrelated trees operating as a choir will reduce the effect of noise, resulting in more accurate results.

More generally, in the random forest method as proposed by (Breiman 2001), a random vector $\theta _k$ is generated, independent of the past random vectors $\theta _1,\ldots ,\theta _{k-1}$ but with the same distribution; and a tree is grown using the training set and $\theta _k$ resulting in a classifier $h(x,\theta _k)$ where x is an input vector. In random selection, $\theta $ consists of a number of independent random integers between 1 and K. The nature and dimension of $\theta $ depend on its use in tree construction. After a large number of trees are generated, they vote for the most popular class. This procedure is called random forest. A random forest is a classifier consisting of a collection of tree structured classifiers ${h(x,\theta _k ), k=1, \ldots }$ where the $\theta _k$’s are independent identically distributed random vectors and each tree casts a unit vote for the most popular class at input x.

3.6 Support vector machine classifier

The support vector machine (SVM) is a supervised machine learning algorithm used for both regression and classification tasks. The support vector machine algorithm’s objective is to find a hyperplane in an N-dimensional space where N is the number of features that distinctly classify the data points. Hyperplanes can be thought of as decision boundaries that classify the data points. Data points falling on different sides of the hyperplane can be assigned to different classes. Support vectors are described as the data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. The margin of the classifier is maximized using these support vectors. In more technical terms, the above process can be summarized as follows. Given the training vectors $x_i$ for $i=1,2,\ldots ,N$ with a sample size of N observations, the support vector machine classification algorithm solves the following problem given by

$$\begin{aligned} \min _{w,h,\xi } \frac{w^T w}{2} + C \sum _{i=1}^{N}{\xi _i} \end{aligned}$$

(9)

subject to $y_i(w^T \phi (x_i) ) \ge 1-\xi _i$ and $\xi _i \ge 0,i=1,2,\ldots ,N$. The dual of the above problem is given by

$$\begin{aligned} \min _{\alpha } \frac{\alpha ^T Q \alpha }{2} - e^T \alpha \end{aligned}$$

(10)

subject to $y^T\alpha = 0$ and $0\le \alpha _i \le C$ for $i=1,2,\ldots ,N$, where e is the vector of all ones, $C>0$ is the upper bound. Q is an n by n positive semi-definite matrix. $Q_{ij} = y_i y_j K(x_i,x_j)$, where $K(x_i,x_j) = \phi (x_i)^T \phi (x)$ is the kernel. Here training vectors are implicitly mapped into higher dimensional space by the function $\phi $. The decision function in the support vector machines classification is given by

$$\begin{aligned} sign\left( \sum _{i=1}^{N}{y_i \alpha _i K(x_i,x) } + \rho \right) . \end{aligned}$$

(11)

The optimization problem in Eq. 9 can be solved globally using the Karush–Kuhn–Tucker (KKT) conditions. Clearly, this optimization problem depends on the choice of the Kernel functions. Our study employs the Gaussian (rbf) kernel, which is denoted by $\exp (-\gamma \Vert x-x'\Vert ^2)$ where $\gamma $ must be greater than 0. When SVM is implemented, we try to find an optimal value of C and $\gamma $ for each stock by using a grid search for each of these parameters.

3.7 Extreme gradient boosting classifier

The extreme gradient boosting (XGBoost) is a decision-tree-based ensemble machine learning algorithm that uses a gradient boosting framework. As we said before, an ensemble method is a machine learning technique that combines several base models in order to produce one optimal predictive model. An algorithm is called boosting if it works by adding models on top of each other iteratively, the errors of the previous model are corrected by the next predictor until the training data is accurately predicted or reproduced by the model. A method is called gradient boosting if, instead of assigning different weights to the classifiers after every iteration, it fits the new model to new residuals of the previous prediction and then minimizes the loss when adding the latest prediction. Namely, if a model is updated using gradient descent, then it is called gradient boosting. XGBoost improves upon the base gradient boosting framework through systems optimization and algorithmic enhancements. Some of these enhancements can be listed as parallelised tree building, tree pruning using depth-first approach, cache awareness and out-of-core computing, regularisation for avoiding over-fitting, efficient handling of missing data, and in-built cross validation capability.

3.8 Profitability measures

In order to measure the profitability of our selected forecasting techniques, we benchmark it with a Naïve flipping toss model and with a perfect forecast model. In the first case, we try to verify if selected methodologies are better at forecasting the price change direction than an simple up-or-down fair random process. In the second case, we aim at testing the ability of the selected forecasting techniques to capture a substantial share of a perfect forecast in price change. The profitability measures used on pairs $(X_{j}, Y_{j})$ of the training set are the following:

Sign prediction ratio Correctly predicted price direction change is assigned 1, and $-1$ otherwise.
$$\begin{aligned}&SPR = \frac{\sum _{j=1+M/2}^{M} matches \big (Y_{j},Y^{\prime }_{j}\big )}{M/2} \end{aligned}$$
(12)
$$\begin{aligned}&matches\big (Y_{j}, Y^{\prime }_{j}\big ) = \left\{ \begin{array} {ll} 1, &{}\quad if \ sign(Y_{j})=sign\big (Y^{\prime }_{j}\big ), \\ 0, &{}\quad otherwise \end{array} \right. \end{aligned}$$
(13)
where M denotes the size of the set for which the sign prediction ratio is measured and sign is the sign function that maps $+1$ when the argument is positive and $-1$ when the argument is negative.
The maximum return is obtained by adding all the expected values in absolute value
$$\begin{aligned} MaxReturn = \sum _{j=1}^{N} abs(Y_{j}), \end{aligned}$$
(14)
where N denotes the size of the set for which maximum return is computed.
The Total Return is computed in the following way
$$\begin{aligned} TotalReturn = \sum _{j=1}^{N}sign(Y^{\prime }_{j}) * Y_{j}, \end{aligned}$$
(15)
where N denotes the size of the set for which total return is computed. Notice that the better the prediction method, the larger the total return is.
Ideal profit ratio is the ratio between the total return in Eq. (15) and the maximum return in Eq. (14).
$$\begin{aligned} IPR = \frac{TotalReturn}{MaxReturn}. \end{aligned}$$
(16)

4 Data

We employ tick-by-tick prices (stamped to millisecond precision) for a sample of 27 blue-chip stocks of the Istanbul Stock exchange (ISE), from 30 November 2015 to 29 April 2016.^{Footnote 4} The benchmark index of the exchange is ISE30 which is composed of 30 blue-chip stocks and updated quarterly. The selected 27 stocks are the ones included in the benchmark index for the whole sample period. We excluded the stocks that joined or left the benchmark index in the sample period as we did not want to be affected by index inclusion–exclusion. We label all the trades which occur in the continuous session during the day as “continuous trades and we construct “all trades” by adding trades executed during opening and closing sessions to the “continuous trades”. The details of the number of observations for continuous and all trades and ISIN code of each stock are available in Table 1. In order to prevent asynchronous trading, we sample our data at 10, 30 and 60 min. There are 763 (872) time intervals for 60 min, 1417 (1635) time intervals for 30 min and 4251 (4637) time intervals for 10 min during the sample period for continuous (all) trades. We have at least one trade for each stock for 30 and 60 min time intervals as the selected assets are highly liquid stocks. Being the least liquid one in terms of the number of 10-min time intervals in which there is at least one trade, even ENKAI has 4228 (4614) 10-min time intervals which correspond to 99.5% of the total number of 10-min time intervals for continuous (all) trades.

Table 1 Fundamental information regarding sample stocks

Forecasting high-frequency stock returns: a comparison of alternative methods

Abstract

Similar content being viewed by others

Prediction of Moroccan Stock Price Based on Machine Learning Algorithms

A Literature Review on Machine Learning Techniques and Strategies Applied to Stock Market Price Prediction

Machine learning techniques for cross-sectional equity returns’ prediction

Explore related subjects

1 Introduction

2 Stock market forecasts: recent approaches

3 Methodologies

3.1 Artificial neural network

3.2 k-Nearest neighbors algorithm

3.3 Logistic regression

3.4 Naïve Bayes classifier

3.5 Random forest classifier

3.6 Support vector machine classifier

3.7 Extreme gradient boosting classifier

3.8 Profitability measures

4 Data

5 Empirical results

6 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A. Results for the ‘all trades’ scenario

Appendix A. Results for the ‘all trades’ scenario

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation