Stock market trend prediction using AHP and weighted kernel LS-SVM

Marković, Ivana; Stojanović, Miloš; Stanković, Jelena; Stanković, Milena

doi:10.1007/s00500-016-2123-0

Stock market trend prediction using AHP and weighted kernel LS-SVM

Methodologies and Application
Published: 05 April 2016

Volume 21, pages 5387–5398, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Stock market trend prediction using AHP and weighted kernel LS-SVM

Download PDF

Ivana Marković¹,
Miloš Stojanović²,
Jelena Stanković¹ &
…
Milena Stanković³

1053 Accesses
19 Citations
Explore all metrics

Abstract

Nowadays, stock market trend prediction represents a challenging subject both in terms of the choice of the prediction model and in terms of constructing the set of features that model will use for prediction. To address both of these aspects, we propose a feature ranking and feature selection approach in combination with weighted kernel least squares support vector machines (LS-SVMs). We introduce the analytic hierarchy process (AHP) into the stock market and propose evaluation criteria which provide the prediction model with relevant knowledge of the underlying processes of the studied stock market. The feature weights obtained by the AHP method are used for feature ranking and selection, and used with the LS-SVMs through a weighted kernel. The test results indicate that the proposed model outperforms the benchmark models. In addition, the set of feature weights obtained by the proposed approach can also independently be incorporated into other kernel-based learners.

Fine-tuned support vector regression model for stock predictions

Article 15 March 2021

Local and global characteristics-based kernel hybridization to increase optimal support vector machine performance for stock market prediction

Article 31 August 2018

A Hybrid Machine Learning Approach for Multistep Ahead Future Price Forecasting

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The key to generating a high return on the stock market lies in how well we are able to successfully predict the future movement of financial asset prices (Huang et al. 2005). The stock market index as a hypothetical portfolio of selected stocks is commonly used to measure the performance of both the overall stock market and a particular sector. Consequently, a market trading strategy can be considered effective only if it relies on the precise prediction of the trend of change of the index value of that particular market (Kara et al. 2011; Wang and Choi 2013). Stock market trend prediction represents a challenge for science both in terms of the choice of methodology and in terms of the theoretical basis of its application.

To address these problems, machine learning models, among which the most popular were Artificial Neural Networks (ANNs) (Crone and Kourentzes 2010; Dai et al. 2012; Kara et al. 2011), Support Vector Machines (SVMs) (Huang et al. 2005; Lee 2009; Ni et al. 2011; Yu et al. 2005; Yuling et al. 2013) and Least Squares Support Vector Machines (LS-SVMs) (Chai et al. 2015; Marković et al. 2015; Yu et al. 2009), were the most frequently used alternatives to the classical statistical models in the area of financial forecasting during the last two decades. Due to the principles of the weak form of the efficient market hypothesis (EMH) (Fama 1970; Hawawini and Keim 1995), the behavior of financial asset prices is often governed by a random walk process; thus, the degree of accuracy of an approximate 60 % hit rate obtained in prediction using various machine learning techniques is often considered a satisfactory result for stock market trend prediction (Atsalakis and Valavanis 2009b; Lahmiri 2011).

The determination of sufficient and necessary features is essential for training a good prediction model. If the number of features is insufficient, the prediction accuracy of the model will be poor, and the model may be prone to under-fitting (Stojanović et al. 2014). On the other hand, if we have too many features, the information that they provide for the model could be unnecessary or redundant. As a result, the model could have a poor generalization performance and may be prone to over-fitting. As stated by He et al. (2013) and Barak and Modarres (2015), the most important issue in the creation of a stock market prediction model is the selection of input features for predictors, where the selection of appropriate methods for feature subset selection is highly relevant. In Yuling et al. (2013) and Atsalakis and Valavanis (2009b), it was pointed out that a widely referred to prediction system consists of two parts: feature selection and prediction model design.

According to the selection strategies, feature subset selection can be performed using filter and wrapper approaches, as presented in Guyon and Elisseeff (2003). In the filter methods, the selection criterion uses a selection function which is independent of the learning algorithm used for model construction, as for example, different methods of variable ranking. On the other hand, in the wrapper methods, the selection criterion is based on evaluation measures according to their usefulness to a given learning algorithm. In this way, features that do not contribute to the prediction quality are discarded from the feature set.

In real-world data sets, it is common that different characteristics are less or more relevant to the given problem. However, most learning methods postulate that all the input features have equal relevance. In recent studies, feature weighting has become a very important issue, primarily in the area of clustering algorithms. To increase the effect of relevant features, a learning method that implies Mutual Information (MI) as a criterion, which assigns weights to the features in order to determine their relevance for a specific task, was proposed by Giveki (2012). Guo et al. (2008) introduced spectrally weighted kernels as a way of incorporating theoretical knowledge of the non-uniform information distribution into the machine learning method.

In numerous studies which focused on feature selection in the scope of financial time series and stock market trend prediction, input features are selected based on the analysis of the numerical values of financial assets, including index values, trading volume, financial ratios and technical indicators. For example, in Lee (2009), the F score and Supported Sequential Forward Search (F_SSFS) are combined, and the advantages of both filter and wrapper methods are used to select the optimal feature subset from the original feature set. In Ni et al. (2011), a fractal feature selection method is integrated with SVMs to predict the direction of the daily stock price index. Yu et al. (2005) used a hybrid data mining approach with a genetic algorithm (GA) as a feature selection method. A wide range of various feature selection algorithms, such as GA and sequential forward search, were studied by He et al. (2013). A comprehensive literature review on forecasting techniques can be found in the study of Atsalakis and Valavanis (2009b).

Several recent studies (Fung et al. 2002; Mittermayer 2004; Yoo et al. 2005; Zhai et al. 2007) have been based on qualitative data analysis. Overall, their use of event knowledge and time-series data led to increased accuracy of the prediction models. In Fung et al. (2002), Zhai et al. (2007) and Mittermayer (2004), forecasting of stock price trends is done within the framework of text mining techniques.

According to Atsalakis and Valavanis (2009a) and McNelis (2005), accurate stock market prediction should incorporate how stock market experts learn and process information. Thereby, stock trading is best described as a decision-making process influenced by dynamic market conditions and potential trading risk.

In the multi-criteria decision-making process, where it is necessary to both appropriately evaluate and rank the selected alternatives, the analytic hierarchy process (AHP) developed by Saaty (1999) has widely been applied. Evidence shows that AHP was introduced in several studies for feature weighting in combination with machine learning algorithms (Liu and Shih 2005; Liu et al. 2013; Wang and Zhang 2013). According to our best knowledge, despite their widespread use, there was insufficient evidence on the possibility of optimizing LS-SVMs in the field of stock market trend prediction through customized kernel weighting.

As can be seen, the choice of the prediction method and the determination of its parameters depend on knowing the properties of the underlying processes. The research presented in this paper is motivated by the work presented in Atsalakis and Valavanis (2009a), Guo et al. (2008), Liu et al. (2013), Omak et al. (2007) and Yuling et al. (2013). We proposed an approach concerned with decision making on stock trading, using AHP for feature ranking and feature selection. The contribution of the paper can be summarized as follow:

First, we propose criteria for AHP evaluation of the relevance of technical indicators through the construction of technical trading strategies as a measure of the success of each technical indicator relied on. In this way, we in essence provide the prediction model with a priori knowledge of the underlying processes of the observed stock market.

Second, the weights obtained by AHP are then used for technical indicator ranking and selection. Additionally, the obtained weights are integrated into the LS-SVM through a weighted kernel (WK).

Finally, the AHP-WK-LSSVM model is proposed for stock market trend prediction and tested with the following data sets: the Belex15 index of the Belgrade Stock Exchange, S&P500 index of the US stock market and FTSE100 index of the London Stock Exchange. The obtained results are then compared with the benchmark results of commonly used classifiers and feature selection algorithms.

The test results indicate that the proposed approach outperforms most of benchmark models and that the set of feature weights obtained in our approach can also be incorporated into other kernel-based learners, such as SVMs.

The rest of this paper is organized as follows: Sect. 2 presents a brief overview of the theoretical preliminaries. Section 3 introduces feature evaluation criteria and the proposed algorithm for feature ranking and selection. Section 4 gives the data set analysis and presents the experimental results and discussions. Finally, Sect. 5 provides the conclusions.

2 Preliminaries

The following section provides an overview of the theoretical framework of Least Squares Support Vector Machines, weighted kernels and the Analytical Hierarchy Process.

2.1 Least squares support vector machines for binary classification

Least squares support vector machines are commonly used for function estimation and for solving non-linear classification problems (Suykens et al. 2002). Let us define the training set $\{x_k ,y_k \},k=1,\ldots N$, where N represents the overall number of training examples, with the input of $x_k \in R^n$ and an output of $y_k \in \{-1,\,1\}$. We can form a prediction model in the primal weight space using non-linear mapping $\phi (\cdot ):R^n\rightarrow R^{n_h }$ which maps the input feature space into a multi-dimensional space, defined as:

$$\begin{aligned} y(x)=\mathrm{sign}[ {\omega ^\mathrm{T}\phi (x)+b} ] \end{aligned}$$

(1)

where $\omega $ represents the weight vector and b defines the bias term.

The optimization problem is formed in the primal space:

$$\begin{aligned} \min \;\mathop {J_p (\omega ,e)}\limits _{\omega ,b,e} =\frac{1}{2}\omega ^\mathrm{T}\omega +\frac{1}{2}\gamma \sum \limits _{k=1}^N {e_k^2 } \end{aligned}$$

(2)

with the following constraints:

$$\begin{aligned} y_k [ {\omega ^\mathrm{T}\phi (x_k )+b} ]=1-e_k ,\quad \;k=1,\ldots ,N \end{aligned}$$

(3)

where $e_{k}$ are allowed errors during the formation of the prediction model, while $\gamma $ is a parameter which assigns them with a relative weight.

After solving the optimization problem, the classification model in dual form can be represented as:

$$\begin{aligned} y(x)=\mathrm{sign}\left[ {\sum \limits _{k=1}^N {\alpha _k y_k K(x,x_k )+b} } \right] \end{aligned}$$

(4)

The dot product:

$$\begin{aligned} K(x,x_k )=\phi (x)^\mathrm{T}\phi (x_k ) \end{aligned}$$

(5)

in (4) represents a kernel function, while $\alpha _k $ are Lagrange multipliers.

When using a radial basis function (RBF) defined by:

$$\begin{aligned} K(x,x_k )=\mathrm{e}^{\frac{-\Vert {(x-x_k )} \Vert ^2}{\sigma ^2}} \end{aligned}$$

(6)

the optimal parameter combination ($\gamma , \sigma $) should be established, where $\gamma $ denotes the relative weights to allowed ${e}_{k}$ errors during the training phase, and $\sigma $ is a kernel parameter. For this purpose, a grid-search algorithm in combination with a k-fold cross-validation is a commonly used method (Arlot and Celisse 2010).

2.2 Weighted kernel LS-SVMs

In the following section, we present the basics of weighted kernels in relation to LS-SVM theory.

The weighted kernel function is defined as $K(\theta x_i ,\theta x)$ where $\theta $ is a weight vector of data set features. Without presenting the complete mathematic derivation, which can be found in Xing et al. (2009) for SVMs and can be adopted for LS-SVMs, the classification model in dual form with feature weights is formulated in (7), with the note that feature weights were also included during the computation of $\alpha _{k}$ and b.

$$\begin{aligned} y(x)=\mathrm{sign}\left[ {\sum \limits _{k=1}^N {\alpha _k y_k K(\theta x,\theta x_k )+b} } \right] \end{aligned}$$

(7)

From (7), it can be seen that the defined weighted kernel is not dependent on the type of kernel function (Yao et al. 2006).

The proposed approach used to determine the weight vector $\theta =(\theta _1 ,\theta _2 ,\ldots ,\theta _d )^\mathrm{T}$ is based on the AHP method, and it will be introduced in detail in Sect. 3. However, as presented in Guo et al. (2008) and Xing et al. (2009), it should be noted here that the elements of the feature weight vector obey the following two conditions:

$$\begin{aligned} \begin{array}{l} 0\le \theta _i \le 1\quad \;\;\;i=1,\ldots ,d \\ \mathrm{and} \\ \sum \limits _{i=1}^d {\theta _i =1\;\;} \end{array} \end{aligned}$$

(8)

The weighted RBF kernel in (6) can now be rewritten as:

$$\begin{aligned} K(x,x_k )=\mathrm{e}^{\frac{-\left\| {\Theta (x-x_k )} \right\| ^2}{\sigma ^2}} \end{aligned}$$

(9)

where $\Theta =\mathrm{diag}[\theta _1 ,\theta _2 ,\ldots ,\theta _n ].$

Further, as in the conventional RBF Kernel, the optimal parameter combination ($\gamma $, $\sigma )$ should be established.

2.3 Basics of the analytic hierarchy process

AHP is a method of selection between sets of factors based on their relevance in terms of meeting even opposing criteria. The AHP calculation techniques are used on a designed pairwise comparison matrix to obtain the eigenvector which represents relative feature values for the obtained criterion. The pairwise comparison is represented using the Fundamental 1–9 Scale, as defined by Saaty (1999). The factors could be classified as Equal importance (denoted with 1), Weak importance of one over another (denoted with 3), Essential or strong importance (denoted with 5), Demonstrated—very strong importance (denoted with 7), and Absolute or extreme importance (denoted with 9). The remaining four scales are intermediate values. The successful application of AHP in various empirical data analysis, which is the result of the clarity of its underlying mathematical principles and its ability to evaluate decision-making consistency, has led to it being used on stock market data in this paper.

The AHP calculations can be summarized as follows: compare n elements, $A_1 \ldots A_n $ and determine the significance of $A_i $ with respect to $A_j $ by $p_{ij} $ to form a reciprocal matrix $P=(p_{ij} )_{nxn} $ with the implication that $p_{ij} =1/ {p_{ji}}$ for $i\ne j$ and $p_{ii} =1$. For precisely measured data, the $P_{ij}$ matrix is transitive and the eigenvector $\omega $ of the order n can be calculated such that $P \omega =\lambda \omega $, where $\lambda $ is an eigenvalue. Referring to Coyle (2004), in practice, the first step is to provide an initial matrix for the pairwise criteria comparisons to obtain an eigenvector, referred to as the Relative Value Vector (RVV). Next, for each observed criterion, we need a pairwise comparison matrix (PCM) of how well the selected input features perform in terms of each evaluated criterion. Then, the evaluation of the Option Performance Matrix (OPM) enables us to present the observed features in terms of the selected criteria. The final step is the multiplication of the RVV and the OPM, to obtain the overall ranks.

Due to the inconsistency of the decision-making process, the $\omega $ vector generally satisfies the equation $P \omega =\lambda _\mathrm{max} \omega $ and $\lambda _\mathrm{max} \ge n$. The relationship between $\lambda _\mathrm{max}$ and n determines the level of (in)consistency of the decisions, where equality between the two is an indication of consistency.

A Consistency Index (CI) is calculated as ($\lambda _\mathrm{max}-n)/(n-1)$ and needs to be determined in relation to a corresponding Random consistency Index (RI) (Saaty 1999), which leads to the calculation of the Consistency Ratio (CR) as follows: ${\mathrm{CI}}/{\mathrm{RI}}$. It is established that a CR exceeding 0.1 indicates inconsistent decisions, while a CR of 0 indicates perfectly consistent decisions.

3 The proposed approach for feature ranking and selection

In this section, we explain the proposed feature selection procedure and the algorithm for determining feature weights by applying the analytic hierarchy process.

3.1 The proposed AHP evaluation criteria

First, we introduce AHP evaluation criteria for the assessment of the relevance of technical indicators, which in essence provides the model with a priori knowledge of the observed stock market. We suggest the construction of technical trading strategies as a measure of the success of each technical indicator relied on. A technical trading strategy is composed of a set of trading rules that are used to generate trading signals. In general, commonly used trading systems rely on one or two technical indicators that define the timing of trading signals (Kaufman 2003; Pauwels 2011). The AHP evaluation criteria are twofold. The first group consists of two criteria used to measure the economic relevance of the selected indicators: cumulative gross return, as a measure of stock market profitability, and systematic risk as a measure of market volatility. The third criterion represents a comparison of the trading signals generated with a trading strategy and the signals generated based on actual stock market index values, in relation to their achieved prediction accuracy.

3.1.1 Return evaluation

Returns on investments in the case of a specific stock market index were calculated as the differences between daily index values presented in national currency, multiplied by the generated trading signal for the current day. Gross returns were defined as the cumulative capital gains for a specified period of time, as follows:

$$\begin{aligned} R=\sum \limits _{t=1}^n {S_t *(\mathrm{CP}_t -\mathrm{CP}_{t-1} )} \end{aligned}$$

(10)

where $S_{t}$ represents the trading signals generated by the trading strategy. The calculated return on investment value allows us to compare the selected set of technical indicators. For the evaluation criteria, we created a relative weighting function which ascribes AHP scale values to the obtained returns, taking into consideration the min–max range of the resulting calculations. The same function is applied in the calculations of the following two criteria.

3.1.2 Risk evaluation

In this study, in addition to return, risk was introduced into stock market prediction as one of the evaluation criteria in the AHP analysis, since in stock trading the return is balanced with a proper level of risk (Barak and Modarres 2015; Rabin 2000). Systematic risk, in relation to return, is defined as:

$$\begin{aligned} \sigma =\sqrt{\frac{1}{n-1}\sum \limits _{t=1}^n {(R_t -\bar{R})^2} } \end{aligned}$$

(11)

where $\bar{R}$ represents the mean value of the gross return R in a selected time period t.

3.1.3 Accuracy evaluation

As a general measure for the evaluation of the prediction effect, the Hit Ratio (HR) was used. HR was calculated based on the number of properly generated trading signals within the test group:

$$\begin{aligned} \mathrm{HR}=\frac{1}{m}\sum \limits _{i=1}^m {\mathrm{PO}_i } \end{aligned}$$

(12)

where PO$_{i}$ is the prediction output of the ith trading day, that is, $S_t $ for the observed trading strategies. PO$_{i}$ equals 1 if it is the actual value for the ith trading day; otherwise, PO$_{i }$ equals 0, and m is the number of data in the used data set.

3.2 Determining feature weights by AHP

The proposed approach for the selection of subsets of the features in accordance with the AHP evaluations is shown in Fig. 1.

After forming the initial set of technical indicators, the first step in the proposed algorithm is the calculation of the criterion values for AHP evaluation. For the technical indicators, calculate values of the evaluation criterion: return, systematic risk and prediction accuracy. The Relative Value Vector is calculated by the methods described in Sect. 2.3. Then three pairwise comparison matrices are constructed. The weights in the matrices reflect how the technical indicators perform in terms of each criterion. According to Sect. 2.3, we then create the Option Performance Matrix, and in the next step multiply the RVV and the OPM to obtain the overall feature weights. The weights ($\theta )$ determine the relative significance (ranking) of each input technical indicator candidate in relation to the criterion values. The following step is the sorting of the set of technical indicators in descending order according to $\theta $ values. The goal of this step is to find a feature subset that will be used for the prediction model. More precisely, if one plots the weights, the technical indicator that corresponds to the largest weight will add the most information to the prediction model. At some point the feature relevance will decrease, leading to what is known as an “angle” effect in the plot (see Fig. 3). The estimated feature weights for selected features should proportionally be rescaled in accordance with the constraints defined in (8). In the last step, kernel weighting is performed by feature multiplication with rescaled weights, within the input feature space.

4 Experimental results and discussion

This section presents the experimental results and discussion of applying the proposed approach. The goal of the experimental study is to compare the performance of the proposed feature ranking and selection approach in combination with weighted kernel LS-SVMs with benchmark models. The section begins with a description of the datasets used in the experiments, following the experimental setup. Then the results are presented, and finally a discussion of the results concludes the section.

The experiments were conducted on the data for the Belex15, S&P500 and FTSE100 stock market indexes. The value of indexes determines the price of the most liquid stocks traded on the regulated market of the observed markets. The series consists of six time-series values which are determined for each day: the closing price, the change in the value of the index in relation to the previous trading day in percentages, the opening price, highest price, lowest price and the trading volume. The data were divided into two groups. The first group consisted of records required for the model training, from 26 October 2005 to 31 December 2012. The Belex15 index training data set consisted of 1793 samples. The S&P500 training data set consisted of 1775 samples and the FTSE100 training set consisted of 1851 data samples. For the model testing, data from 3 January 2013 to 31 December 2013 were used, a total of 252 days of trading for all the data series. The results are obtained for one-day-ahead predictions using data over an extended period of time, 1 trading year, and exceed most of the time horizons presented in the literature (Huang et al. 2005; Ni et al. 2011; Yuling et al. 2013).

The stock market trend prediction problem is commonly modeled as a two-class classification problem where the classes are labeled with $-1$ and 1. Class $-1$ indicates that the closing price of the current day is higher than the closing price of the following day. The second class indicates the opposite. Figure 2 shows the trend fluctuations.

From Fig. 2, it can be noticed that the trend fluctuates up and down repeatedly, rendering it challenging for prediction.

Table 1 Descriptive statistics for the selected inputs features

Full size table

Table 2 Technical indicators and trading strategies

Full size table

4.1 Experimental framework

We consider now the set of nine potential input features. In this study, we rely on the most commonly used technical indicators—Exponential Moving Average (EMA—the moving average of the closing price calculated using a smoothing factor to place a higher weight on recent closing prices), Relative Strength Index (RSI—the index that measures the speed and change of price movements), Stochastic Oscillator % K (an indicator that predicts the price turning points by comparing a security’s closing price to its price range over a given time period), Stochastic Oscillator % D (the average of the last three % K values calculated daily), Moving Average Convergence–Divergence (MACD, the indicator that measures the strength and direction of the trend and momentum), ROC (Rate of Change, the indicator that shows the percentile change in the closing prices), Commodity Channel Index (CCI—an indicator used to detect cyclical movements in price change by measuring stock price variations from its statistical mean), and SAR (Parabolic Stop and Reverse—an indicator which detects stock price trend direction and determines entry and exit points). Descriptive statistics for the selected indicators based on the available data sets were calculated, and are shown in Table 1.

The detailed procedure for calculating these indicators and the rules for generating trading signals are given in Table 2.

The first step is to provide an initial matrix for the criterion pairwise comparisons. The risk and return criteria are evaluated based on standard economic theory assumptions that investors are commonly averse to risk (Levy 2006; Lo 2007). Since the aim of this paper is to improve the precision of the prediction model, the third criterion is evaluated as the most significant one. For our calculations, we used a 4-year trading cycle sub-sample period starting from the beginning of 2009 and lasting until the end of 2012.

Table 3 Pairwise criteria comparison matrix

Full size table

Table 4 Option performance matrix$^\mathrm{T}$ $\times $ RVV$^\mathrm{T}$ $=$ feature weights $(\theta )^\mathrm{T}$

Full size table

The eigenvector, the Relative Value Vector, is calculated by the methods described in Sect. 2.3. as RVV $=$ (0.082, 0.236, 0.682)$^\mathrm{T}$ (Table 3). These three numbers correspond, respectively, to the relative values of each criterion of return, risk and accuracy. The result 0.682 means that the model values accuracy most of all; 0.236 shows that risk is valued less; and 0.082 shows that the model values return the least. The CR value is 0.09297, which is less than the value of the critical limit 0.1, and thus the model is consistent in its choices. Previously, the terms technical trading strategies and technical indicators were used. To simplify the notation for further calculations, these two terms will be considered synonyms, although in fact a choice of technical indicators is made.

In the next step using three pairwise comparisons matrices, we compare the selected input features in terms of the gross return, systematic risk and prediction accuracy. Table 4 presents the summarized option performance matrix for the observed technical indicators.

Based on the final calculation, we obtained a decreasing order of feature weights and Fig. 3 shows a final summary of feature relevance.

After obtaining the feature weights, we performed feature selection by analyzing the results shown in Fig. 3, as described in Sect. 3.2. It can be noticed from Fig. 3 that the indicator weights eventually significantly decrease after the second ranked indicator for the Belex15 and S&P500 index, and that for the FTSE100 the decrease is significant after the third indicator. As a result, we selected the first two ranked indicators as input features for the prediction model for the Belex15 and S&P500, and the first two rescaled weights to be incorporated into the LS-SVM kernel. For the FTSE100, we selected the first three ranked indicators. To form the LS-SVM models, LS-SVMlab (Brabanter et al. 2011) was used.

4.2 Experimental evaluation

To assess the increase in the accuracy of the proposed model and its contribution to forecasting research, the accuracy of the model is compared with the results of other classification algorithms: Random Forest (RF) (Breiman 2001), Linear SVM (Chang and Lin 2011) and artificial neural networks (ANN). For the Random Forest, we used 1000 trees and a number of features for each split were set to the square root of the features dimensionality. For the Linear SVM soft, margin parameter C is fixed to $C = 1$. For the ANN, we used two hidden layers with 100 neurons. In addition, we compared the proposed feature selection strategy with several feature selection approaches: mutual information (MI) with forward–backward selection (Gómez-Verdejo et al. 2009), random forest (RF) for feature selection (Genuer 2010), and a linear discriminant classifier (LDC) with sequential forward selection (He et al. 2013).

First, Table 5 presents the comparison of selected features according to different feature selection methods.

Table 5 Comparison of the feature ranking and selection approaches

Full size table

Table 6 The performance comparison of the individual prediction models

Full size table

From Table 5, it can be seen that according to the feature selection approach, both the number and the set of the selected input features vary. Thus, for testing purposes, we built 10 different models, denoted with the abbreviations of one of the above-mentioned feature selection approaches and used prediction models. Accordingly, the MI-LS-SVM is a model trained with features selected by MI. The RF-LS-SVM is an LS-SVM model trained with features selected by Random Forest. The LDC-LS-SVM is an LS-SVM model trained with features selected with a linear discriminant classifier. The AHP-WK-LS-SVM implements the proposed approach for feature selection and weighted kernel. The AHP-WK-SVM model incorporates weights obtained from AHP into the SVM kernel. The Random Walk (RW) model uses the current value to predict the future value, assuming that the value in the following period (y$_{ t+1}$) will be equal to the current value (y$_{ t}$). The hit rate values in percentages for observed data sets according to the initial split of approximately 90 $\%$ for training and 10 $\%$ for test data are shown in Table 6. All of the benchmark prediction models used the same experimental setups across the data series, that is, the same training and test sets for each experimental data set. All of the models are built within the Matlab Tollboxes by using additional libraries where necessary LS-SVMlab (Brabanter et al. 2011), LibSVM (Chang and Lin 2011) and MILCA-MI (Kraskov et al. 2004).

Table 7 Prediction performance depending on the number of training instances (given in %)

Full size table

From Table 6 it can be observed that in terms of the hit rate, the proposed AHP-WK-LS-SVM prediction model significantly outperforms all the benchmark models for the BELEX15 and FTSE100 data sets. In comparison with ANN, the AHP-WK-LS-SVM obtained hit rate is slightly lower for the S&P500 index, around 1 % less, but significantly higher for the FTSE100 and Belex15, more than 3 and 7 % respectively. Besides the AHP-WK-LS-SVM model, we tried to incorporate weights obtained from AHP into the SVM kernel. From Table 6, it can also be noted that the AHP-WK-SVM model significantly improves the SVM model, by 3 % for the BELEX15 and FTSE100, and more than 1 % for S&P500.

For comparing multiple models on multiple data sets, a two-stage procedure is recommended (Dešmar 2006). First, applying Friedman’s test, to test whether the compared models have significant general differences in the performance, and if the null hypothesis is rejected, at the second stage applying some post-hoc test. Friedman’s test is a nonparametric test which is designed to detect differences among two or more groups. Applying Friedman’s test, a p value of 0.0057 is obtained. Thus, the null hypothesis is rejected at the 5 % significance level, which indicates statistically significant differences in the mean ranks among the compared models. For the post-hoc test, the Nemenyi test was used, which indicates no significant differences at the 0.05 significance level between the obtained prediction models, except between AHP-WK-LS-SVM and RW.

Finally, we compared the accuracy of the proposed prediction model with other benchmark classifiers, depending on the number of training instances, and the results are shown in Table 7.

Based on the results presented in Table 7 it can be seen that for all the splits and series, our proposed AHP-based features ranking and selection approach improved the LS-SVM and SVM prediction accuracy. For BELEX15 and FTSE100 data series, the proposed model had the highest hit rate among all the benchmark models, while for S&P500 the recorded hit rate was only slightly lower than that of the ANN and same trend was noted for all the splits.

5 Conclusion

One possible approach for improving stock market trend prediction is presented in this paper. The proposed methodology is based on the concept of AHP analysis for feature ranking and selection. In addition, we used a weighted kernel to increase the generalization performance of the LS-SVM prediction model, where the kernel is weighted based on the feature relevance obtained by the conducted AHP analysis. The influence of the weighted kernel and feature selection led to a significant increase in the prediction accuracy. In addition, the set of feature weights obtained by the proposed approach can also independently be incorporated into other kernel-based learners, beside LS-SVMs.

The improvement in hit rates obtained on the test sets that contain data for 1 trading year can be considered a significant improvement, considering the fact that the stock market trend is predicted for the purpose of the optimization of investment strategies on the financial markets. Thus, percent increase in model precision can lead to a gain in terms of profit, since it results in greater return and a decrease in the risk involved in trading. Therefore, future improvements will focus on the study of criteria relevant to investors with different preferences regarding risk. Also, further work should include the formation of an ensemble model, where the outputs from several models would be combined into a final model by some aggregating scheme.

References

Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Article MathSciNet MATH Google Scholar
Atsalakis SG, Valavanis PK (2009) Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Syst Appl 36(7):10696–10700. doi:10.1016/j.eswa.2009.02.043
Article Google Scholar
Atsalakis SG, Valavanis PK (2009b) Surveying stock market forecasting techniques—Part II: Soft computing methods. Expert Syst Appl 36(3):5932–5941. doi:10.1016/j.eswa.2008.07.006
Article Google Scholar
Barak S, Modarres M (2015) Developing an approach to evaluate stocks by forecasting effective features with data mining methods. Expert Syst Appl 42(3):1325–1339. doi:10.1016/j.eswa.2014.09.026
Article Google Scholar
De Brabanter K, Karsmakers P, Ojeda F, Alzate C, De Brabanter J, Pelckmans K, De Moor B, Vandewalle J, Suykens JAK (2011) LS-SVMlab toolbox user’s guide version 1.8. http://www.esat.kuleuven.be/sista/lssvmlab/
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Coyle G (2004) Practical strategy: structured tools and techniques. Pearson, New York
Crone SF, Kourentzes N (2010) Feature selection for time series prediction—a combined filter and wrapper approach for neural networks. Neurocomputing 73(10–12):1923–1936. doi:10.1016/j.neucom.2010.01.017
Article Google Scholar
Chai J, Du J, Lai KK, Lee YP (2015) A hybrid least square support vector machine model with parameters optimization for stock forecasting. Math Probl Eng (article ID 231394, 7 pages). doi:10.1155/2015/231394
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Dai W, Wu J-Y, Lu C-J (2012) Combining nonlinear independent component analysis and neural network for the prediction of Asian stock market indexes. Expert Syst Appl 39(4):4444–4452. doi:10.1016/j.eswa.2011.09.145
Article Google Scholar
Dešmar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Fama E (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25:383–417
Article Google Scholar
Fung GPC, Yu JX, Lam W (2002) News sensitive stock trend prediction. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 481–493. doi:10.1007/3-540-47887-6_48
Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31(14):2225–2236
Article Google Scholar
Giveki D, Salimi H, Bahmanyar G, Khademian Y (2012) Automatic detection of diabetes diagnosis using feature weighted support vector machines based on mutual information and modified Cuckoo search. arXiv:1201.2173
Gómez-Verdejo V, Verleysen M, Fleury J (2009) Information-theoretic feature selection for functional data classification. Neurocomputing 72(16–18):3580–3589. doi:10.1016/j.neucom.2008.12.035
Article Google Scholar
Guo B, Gunn SR, Damper RI (2008) Customizing kernel functions for SVM-based hyperspectral image classification. IEEE Trans Image Process 17(4):622–629. doi:10.1109/TIP.2008.918955
Article MathSciNet Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hawawini G, Keim DB (1995) On the predictability of common stock returns: world-wide evidence. In: Handbooks in operations research and management science, vol $9$. North-Holland, Amsterdam, pp 497–544
He Y, Fataliyev K, Wang L (2013) ICONIP 2013, Part II, LNCS 8227. In: Lee M et al (eds) Feature selection for stock market analysis. Springer, Berlin, pp 737–744
Huang W, Nakamori Y, Wang S-Y (2005) Forecasting stock market movement direction with support vector machine. Comput Oper Res 32(10):2513–2522. doi:10.1016/j.cor.2004.03.016
Article MATH Google Scholar
Kara Y, Boyacioglu MA, Baykan ÖK (2011) Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange. Expert Syst Appl 38(5):5311–5319. doi:10.1016/j.eswa.2010.10.027
Article Google Scholar
Kaufman PJ (2003) A short course in technical trading. Wiley, New York
Kraskov A, Stogbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(6):066138
Article MathSciNet Google Scholar
Lahmiri S (2011) A comparison of PNN and SVM for stock market trend prediction using economic and technical information. Int J Comput Appl 29:24–30
Google Scholar
Lee M-C (2009) Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Syst Appl 36(8):10896–10904. doi:10.1016/j.eswa.2009.02.038
Article Google Scholar
Levy H (2006) Stochastic dominance investment decision making under uncertainty, 2nd edn. Springer, New York
Liu D-R, Shih Y-Y (2005) Integrating AHP and data mining for product recommendation based on customer lifetime value. Inf Manag 42(3):387–400. doi:10.1016/j.im.2004.01.008
Article Google Scholar
Liu D, Tian Z, Luo B, Xia J (2013) Feature ranking in intrusion detection by hybrid algorithm with support vector machine and analytic hierarchy process. Int J Digit Content Technol Appl (JDCTA) 7(7):1005–1013. doi:10.4156/jdcta.vol7.issue7.119
Article Google Scholar
Lo AW (2007) The new palgrave: a dictionary of economics. In: Blume L, Durlauf S (eds) Efficient markets hypothesis. Palgrave Macmillan, Basingstoke
Marković I, Stojanović M, Božić M, Stanković J (2015) ICT innovations. In: 2014 proceedings of the advances in intelligent systems and computing. In: Bogdanova AM, Gjorgjevik D (eds) Stock market trend prediction based on the LS-SVM model update algorithm. Springer, New York, pp 105—114
Mittermayer MA (2004) Forecasting intraday stock price trends with text mining techniques. Proc Hawai Int Conf Syst Sci. doi:10.1109/HICSS.2004.1265201
McNelis PD (2005) Neural networks in finance: gaining predictive edge in the market. Elsevier, New York
Ni L-P, Ni ZW, Gao YZ (2011) Stock trend prediction based on fractal feature selection and support vector machine. Expert Syst Appl 38(5):5569–5576. doi:10.1016/j.eswa.2010.10.079
Article Google Scholar
Omak EC, Polat K, Gunes S, Arslan A (2007) A new medical decision making system: least square support vector machine (LSSVM) with fuzzy weighting pre-processing. Expert Syst Appl 32(2):409–414. doi:10.1016/j.eswa.2005.12.001
Article Google Scholar
Pauwels S, Inghelbrecht K, Heyman P, Marius D (2011) Technical trading rules in emerging stock markets. World Acad Sci Eng Technol 5:11–20
Google Scholar
Rabin M (2000) Risk aversion and expected-utility theory: a calibration theorem. Econometrica 68:1281–1292
Article Google Scholar
Saaty TL (1999) Monográfico: Problemas complejos de decisión. II. Basic theory of the analytic hierarchy process: how to make a decision. Rev R Acad Cienc Exact Fis Nat (Esp) 93:395–423
Stojanović BM, Božić MM, Stanković MM, Stajić ZP (2014) A methodology for training set instance selection using mutual information in time series prediction. Neurocomputing 141(2):236–245. doi:10.1016/j.neucom.2014.03.006
Article Google Scholar
Suykens JAK, Van Gestel T, Brabanter J De, Moor B De, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Book MATH Google Scholar
Wang Y, Choi I-C (2013) Market index and stock price direction prediction using machine learning techniques: an empirical study on the KOSPI and HIS. arXiv:1309.7119
Wang D, Zhang H (2013) Group AHP and $K$-means cluster for a new segmentation of brand customer. Int J Adv Comput Technol (IJACT) 5:213–221
Google Scholar
Xing H, Ha M, Hu B, Tian D (2009) Linear feature-weighted support vector machine. Fuzzy Inf Eng 1(3):289–305. doi:10.1007/s12543-009-0022-0
Yao J, Zhao S, Fan L (2006) Enhanced support vector machine model for intrusion detection. Rough Sets Knowl Technol LNCS 4062:538–543. doi:10.1007/11795131_78
Yoo P D, Kim MH, Jan T (2005) Machine learning techniques and use of event information for stock market prediction: a survey and evaluation. In: Computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce, pp 835–841. doi:10.1109/CIMCA.2005.1631572
Yu L, Wang S, Lai KK (2005) WINE 2005, LNCS 3828 In: Deng X, Ye Y (eds) Mining stock market tendency using GA-based support vector machines. Springer, Berlin, pp 336–345
Yu L, Chen H, Wang S, Lai KK (2009) Evolving least squares support vector machines for stock market trend mining. IEEE Trans Evolut Comput 13(1):87–102. doi:10.1109/TEVC.2008.928176
Yuling L, Guo H, Hu J (2013) An SVM-based approach for stock market trend prediction. Neural Netw (IJCNN) (IEEE Press, New York) 1– 7. doi:10.1109/IJCNN.2013.6706743
Zhai Y, Hsu A, Halgamuge SK (2007) ISNN 2007, Part III, LNCS 4493. In: Liu D et al (eds) Combining news and technical indicators in daily stock price trends prediction. Springer, Berlin, pp 1087–1096. doi:10.1007/3-540-47887-6_48

Download references

Author information

Authors and Affiliations

Faculty of Economics, University of Niš, Trg kralja Aleksandra Ujedinitelja 11, 18000, Niš, Serbia
Ivana Marković & Jelena Stanković
College of Applied Technical Sciences, Aleksandra Medvedeva 20, 18000, Niš, Serbia
Miloš Stojanović
Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000, Niš, Serbia
Milena Stanković

Authors

Ivana Marković
View author publications
You can also search for this author in PubMed Google Scholar
Miloš Stojanović
View author publications
You can also search for this author in PubMed Google Scholar
Jelena Stanković
View author publications
You can also search for this author in PubMed Google Scholar
Milena Stanković
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivana Marković.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marković, I., Stojanović, M., Stanković, J. et al. Stock market trend prediction using AHP and weighted kernel LS-SVM. Soft Comput 21, 5387–5398 (2017). https://doi.org/10.1007/s00500-016-2123-0

Download citation

Published: 05 April 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00500-016-2123-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stock market trend prediction using AHP and weighted kernel LS-SVM

Abstract

Similar content being viewed by others

Fine-tuned support vector regression model for stock predictions

Local and global characteristics-based kernel hybridization to increase optimal support vector machine performance for stock market prediction

A Hybrid Machine Learning Approach for Multistep Ahead Future Price Forecasting

1 Introduction

2 Preliminaries