1 Introduction

The key to generating a high return on the stock market lies in how well we are able to successfully predict the future movement of financial asset prices (Huang et al. 2005). The stock market index as a hypothetical portfolio of selected stocks is commonly used to measure the performance of both the overall stock market and a particular sector. Consequently, a market trading strategy can be considered effective only if it relies on the precise prediction of the trend of change of the index value of that particular market (Kara et al. 2011; Wang and Choi 2013). Stock market trend prediction represents a challenge for science both in terms of the choice of methodology and in terms of the theoretical basis of its application.

To address these problems, machine learning models, among which the most popular were Artificial Neural Networks (ANNs) (Crone and Kourentzes 2010; Dai et al. 2012; Kara et al. 2011), Support Vector Machines (SVMs) (Huang et al. 2005; Lee 2009; Ni et al. 2011; Yu et al. 2005; Yuling et al. 2013) and Least Squares Support Vector Machines (LS-SVMs) (Chai et al. 2015; Marković et al. 2015; Yu et al. 2009), were the most frequently used alternatives to the classical statistical models in the area of financial forecasting during the last two decades. Due to the principles of the weak form of the efficient market hypothesis (EMH) (Fama 1970; Hawawini and Keim 1995), the behavior of financial asset prices is often governed by a random walk process; thus, the degree of accuracy of an approximate 60 % hit rate obtained in prediction using various machine learning techniques is often considered a satisfactory result for stock market trend prediction (Atsalakis and Valavanis 2009b; Lahmiri 2011).

The determination of sufficient and necessary features is essential for training a good prediction model. If the number of features is insufficient, the prediction accuracy of the model will be poor, and the model may be prone to under-fitting (Stojanović et al. 2014). On the other hand, if we have too many features, the information that they provide for the model could be unnecessary or redundant. As a result, the model could have a poor generalization performance and may be prone to over-fitting. As stated by He et al. (2013) and Barak and Modarres (2015), the most important issue in the creation of a stock market prediction model is the selection of input features for predictors, where the selection of appropriate methods for feature subset selection is highly relevant. In Yuling et al. (2013) and Atsalakis and Valavanis (2009b), it was pointed out that a widely referred to prediction system consists of two parts: feature selection and prediction model design.

According to the selection strategies, feature subset selection can be performed using filter and wrapper approaches, as presented in Guyon and Elisseeff (2003). In the filter methods, the selection criterion uses a selection function which is independent of the learning algorithm used for model construction, as for example, different methods of variable ranking. On the other hand, in the wrapper methods, the selection criterion is based on evaluation measures according to their usefulness to a given learning algorithm. In this way, features that do not contribute to the prediction quality are discarded from the feature set.

In real-world data sets, it is common that different characteristics are less or more relevant to the given problem. However, most learning methods postulate that all the input features have equal relevance. In recent studies, feature weighting has become a very important issue, primarily in the area of clustering algorithms. To increase the effect of relevant features, a learning method that implies Mutual Information (MI) as a criterion, which assigns weights to the features in order to determine their relevance for a specific task, was proposed by Giveki (2012). Guo et al. (2008) introduced spectrally weighted kernels as a way of incorporating theoretical knowledge of the non-uniform information distribution into the machine learning method.

In numerous studies which focused on feature selection in the scope of financial time series and stock market trend prediction, input features are selected based on the analysis of the numerical values of financial assets, including index values, trading volume, financial ratios and technical indicators. For example, in Lee (2009), the F score and Supported Sequential Forward Search (F_SSFS) are combined, and the advantages of both filter and wrapper methods are used to select the optimal feature subset from the original feature set. In Ni et al. (2011), a fractal feature selection method is integrated with SVMs to predict the direction of the daily stock price index. Yu et al. (2005) used a hybrid data mining approach with a genetic algorithm (GA) as a feature selection method. A wide range of various feature selection algorithms, such as GA and sequential forward search, were studied by He et al. (2013). A comprehensive literature review on forecasting techniques can be found in the study of Atsalakis and Valavanis (2009b).

Several recent studies (Fung et al. 2002; Mittermayer 2004; Yoo et al. 2005; Zhai et al. 2007) have been based on qualitative data analysis. Overall, their use of event knowledge and time-series data led to increased accuracy of the prediction models. In Fung et al. (2002), Zhai et al. (2007) and Mittermayer (2004), forecasting of stock price trends is done within the framework of text mining techniques.

According to Atsalakis and Valavanis (2009a) and McNelis (2005), accurate stock market prediction should incorporate how stock market experts learn and process information. Thereby, stock trading is best described as a decision-making process influenced by dynamic market conditions and potential trading risk.

In the multi-criteria decision-making process, where it is necessary to both appropriately evaluate and rank the selected alternatives, the analytic hierarchy process (AHP) developed by Saaty (1999) has widely been applied. Evidence shows that AHP was introduced in several studies for feature weighting in combination with machine learning algorithms (Liu and Shih 2005; Liu et al. 2013; Wang and Zhang 2013). According to our best knowledge, despite their widespread use, there was insufficient evidence on the possibility of optimizing LS-SVMs in the field of stock market trend prediction through customized kernel weighting.

As can be seen, the choice of the prediction method and the determination of its parameters depend on knowing the properties of the underlying processes. The research presented in this paper is motivated by the work presented in Atsalakis and Valavanis (2009a), Guo et al. (2008), Liu et al. (2013), Omak et al. (2007) and Yuling et al. (2013). We proposed an approach concerned with decision making on stock trading, using AHP for feature ranking and feature selection. The contribution of the paper can be summarized as follow:

First, we propose criteria for AHP evaluation of the relevance of technical indicators through the construction of technical trading strategies as a measure of the success of each technical indicator relied on. In this way, we in essence provide the prediction model with a priori knowledge of the underlying processes of the observed stock market.

Second, the weights obtained by AHP are then used for technical indicator ranking and selection. Additionally, the obtained weights are integrated into the LS-SVM through a weighted kernel (WK).

Finally, the AHP-WK-LSSVM model is proposed for stock market trend prediction and tested with the following data sets: the Belex15 index of the Belgrade Stock Exchange, S&P500 index of the US stock market and FTSE100 index of the London Stock Exchange. The obtained results are then compared with the benchmark results of commonly used classifiers and feature selection algorithms.

The test results indicate that the proposed approach outperforms most of benchmark models and that the set of feature weights obtained in our approach can also be incorporated into other kernel-based learners, such as SVMs.

The rest of this paper is organized as follows: Sect. 2 presents a brief overview of the theoretical preliminaries. Section 3 introduces feature evaluation criteria and the proposed algorithm for feature ranking and selection. Section 4 gives the data set analysis and presents the experimental results and discussions. Finally, Sect. 5 provides the conclusions.

2 Preliminaries

The following section provides an overview of the theoretical framework of Least Squares Support Vector Machines, weighted kernels and the Analytical Hierarchy Process.

2.1 Least squares support vector machines for binary classification

Least squares support vector machines are commonly used for function estimation and for solving non-linear classification problems (Suykens et al. 2002). Let us define the training set \(\{x_k ,y_k \},k=1,\ldots N\), where N represents the overall number of training examples, with the input of \(x_k \in R^n\) and an output of \(y_k \in \{-1,\,1\}\). We can form a prediction model in the primal weight space using non-linear mapping \(\phi (\cdot ):R^n\rightarrow R^{n_h }\) which maps the input feature space into a multi-dimensional space, defined as:

$$\begin{aligned} y(x)=\mathrm{sign}[ {\omega ^\mathrm{T}\phi (x)+b} ] \end{aligned}$$
(1)

where \(\omega \) represents the weight vector and b defines the bias term.

The optimization problem is formed in the primal space:

$$\begin{aligned} \min \;\mathop {J_p (\omega ,e)}\limits _{\omega ,b,e} =\frac{1}{2}\omega ^\mathrm{T}\omega +\frac{1}{2}\gamma \sum \limits _{k=1}^N {e_k^2 } \end{aligned}$$
(2)

with the following constraints:

$$\begin{aligned} y_k [ {\omega ^\mathrm{T}\phi (x_k )+b} ]=1-e_k ,\quad \;k=1,\ldots ,N \end{aligned}$$
(3)

where \(e_{k}\) are allowed errors during the formation of the prediction model, while \(\gamma \) is a parameter which assigns them with a relative weight.

After solving the optimization problem, the classification model in dual form can be represented as:

$$\begin{aligned} y(x)=\mathrm{sign}\left[ {\sum \limits _{k=1}^N {\alpha _k y_k K(x,x_k )+b} } \right] \end{aligned}$$
(4)

The dot product:

$$\begin{aligned} K(x,x_k )=\phi (x)^\mathrm{T}\phi (x_k ) \end{aligned}$$
(5)

in (4) represents a kernel function, while \(\alpha _k \) are Lagrange multipliers.

When using a radial basis function (RBF) defined by:

$$\begin{aligned} K(x,x_k )=\mathrm{e}^{\frac{-\Vert {(x-x_k )} \Vert ^2}{\sigma ^2}} \end{aligned}$$
(6)

the optimal parameter combination (\(\gamma , \sigma \)) should be established, where \(\gamma \) denotes the relative weights to allowed \({e}_{k}\) errors during the training phase, and \(\sigma \) is a kernel parameter. For this purpose, a grid-search algorithm in combination with a k-fold cross-validation is a commonly used method (Arlot and Celisse 2010).

2.2 Weighted kernel LS-SVMs

In the following section, we present the basics of weighted kernels in relation to LS-SVM theory.

The weighted kernel function is defined as \(K(\theta x_i ,\theta x)\) where \(\theta \) is a weight vector of data set features. Without presenting the complete mathematic derivation, which can be found in Xing et al. (2009) for SVMs and can be adopted for LS-SVMs, the classification model in dual form with feature weights is formulated in (7), with the note that feature weights were also included during the computation of \(\alpha _{k}\) and b.

$$\begin{aligned} y(x)=\mathrm{sign}\left[ {\sum \limits _{k=1}^N {\alpha _k y_k K(\theta x,\theta x_k )+b} } \right] \end{aligned}$$
(7)

From (7), it can be seen that the defined weighted kernel is not dependent on the type of kernel function (Yao et al. 2006).

The proposed approach used to determine the weight vector \(\theta =(\theta _1 ,\theta _2 ,\ldots ,\theta _d )^\mathrm{T}\) is based on the AHP method, and it will be introduced in detail in Sect. 3. However, as presented in Guo et al. (2008) and Xing et al. (2009), it should be noted here that the elements of the feature weight vector obey the following two conditions:

$$\begin{aligned} \begin{array}{l} 0\le \theta _i \le 1\quad \;\;\;i=1,\ldots ,d \\ \mathrm{and} \\ \sum \limits _{i=1}^d {\theta _i =1\;\;} \end{array} \end{aligned}$$
(8)

The weighted RBF kernel in (6) can now be rewritten as:

$$\begin{aligned} K(x,x_k )=\mathrm{e}^{\frac{-\left\| {\Theta (x-x_k )} \right\| ^2}{\sigma ^2}} \end{aligned}$$
(9)

where \(\Theta =\mathrm{diag}[\theta _1 ,\theta _2 ,\ldots ,\theta _n ].\)

Further, as in the conventional RBF Kernel, the optimal parameter combination (\(\gamma \), \(\sigma )\) should be established.

2.3 Basics of the analytic hierarchy process

AHP is a method of selection between sets of factors based on their relevance in terms of meeting even opposing criteria. The AHP calculation techniques are used on a designed pairwise comparison matrix to obtain the eigenvector which represents relative feature values for the obtained criterion. The pairwise comparison is represented using the Fundamental 1–9 Scale, as defined by Saaty (1999). The factors could be classified as Equal importance (denoted with 1), Weak importance of one over another (denoted with 3), Essential or strong importance (denoted with 5), Demonstrated—very strong importance (denoted with 7), and Absolute or extreme importance (denoted with 9). The remaining four scales are intermediate values. The successful application of AHP in various empirical data analysis, which is the result of the clarity of its underlying mathematical principles and its ability to evaluate decision-making consistency, has led to it being used on stock market data in this paper.

The AHP calculations can be summarized as follows: compare n elements, \(A_1 \ldots A_n \) and determine the significance of \(A_i \) with respect to \(A_j \) by \(p_{ij} \) to form a reciprocal matrix \(P=(p_{ij} )_{nxn} \) with the implication that \(p_{ij} =1/ {p_{ji}}\) for \(i\ne j\) and \(p_{ii} =1\). For precisely measured data, the \(P_{ij}\) matrix is transitive and the eigenvector \(\omega \) of the order n can be calculated such that \(P \omega =\lambda \omega \), where \(\lambda \) is an eigenvalue. Referring to Coyle (2004), in practice, the first step is to provide an initial matrix for the pairwise criteria comparisons to obtain an eigenvector, referred to as the Relative Value Vector (RVV). Next, for each observed criterion, we need a pairwise comparison matrix (PCM) of how well the selected input features perform in terms of each evaluated criterion. Then, the evaluation of the Option Performance Matrix (OPM) enables us to present the observed features in terms of the selected criteria. The final step is the multiplication of the RVV and the OPM, to obtain the overall ranks.

Due to the inconsistency of the decision-making process, the \(\omega \) vector generally satisfies the equation \(P \omega =\lambda _\mathrm{max} \omega \) and \(\lambda _\mathrm{max} \ge n\). The relationship between \(\lambda _\mathrm{max}\) and n determines the level of (in)consistency of the decisions, where equality between the two is an indication of consistency.

A Consistency Index (CI) is calculated as (\(\lambda _\mathrm{max}-n)/(n-1)\) and needs to be determined in relation to a corresponding Random consistency Index (RI) (Saaty 1999), which leads to the calculation of the Consistency Ratio (CR) as follows: \({\mathrm{CI}}/{\mathrm{RI}}\). It is established that a CR exceeding 0.1 indicates inconsistent decisions, while a CR of 0 indicates perfectly consistent decisions.

3 The proposed approach for feature ranking and selection

In this section, we explain the proposed feature selection procedure and the algorithm for determining feature weights by applying the analytic hierarchy process.

3.1 The proposed AHP evaluation criteria

First, we introduce AHP evaluation criteria for the assessment of the relevance of technical indicators, which in essence provides the model with a priori knowledge of the observed stock market. We suggest the construction of technical trading strategies as a measure of the success of each technical indicator relied on. A technical trading strategy is composed of a set of trading rules that are used to generate trading signals. In general, commonly used trading systems rely on one or two technical indicators that define the timing of trading signals (Kaufman 2003; Pauwels 2011). The AHP evaluation criteria are twofold. The first group consists of two criteria used to measure the economic relevance of the selected indicators: cumulative gross return, as a measure of stock market profitability, and systematic risk as a measure of market volatility. The third criterion represents a comparison of the trading signals generated with a trading strategy and the signals generated based on actual stock market index values, in relation to their achieved prediction accuracy.

Fig. 1
figure 1

Algorithm for feature ranking and selection based on AHP evaluation

3.1.1 Return evaluation

Returns on investments in the case of a specific stock market index were calculated as the differences between daily index values presented in national currency, multiplied by the generated trading signal for the current day. Gross returns were defined as the cumulative capital gains for a specified period of time, as follows:

$$\begin{aligned} R=\sum \limits _{t=1}^n {S_t *(\mathrm{CP}_t -\mathrm{CP}_{t-1} )} \end{aligned}$$
(10)

where \(S_{t}\) represents the trading signals generated by the trading strategy. The calculated return on investment value allows us to compare the selected set of technical indicators. For the evaluation criteria, we created a relative weighting function which ascribes AHP scale values to the obtained returns, taking into consideration the min–max range of the resulting calculations. The same function is applied in the calculations of the following two criteria.

3.1.2 Risk evaluation

In this study, in addition to return, risk was introduced into stock market prediction as one of the evaluation criteria in the AHP analysis, since in stock trading the return is balanced with a proper level of risk (Barak and Modarres 2015; Rabin 2000). Systematic risk, in relation to return, is defined as:

$$\begin{aligned} \sigma =\sqrt{\frac{1}{n-1}\sum \limits _{t=1}^n {(R_t -\bar{R})^2} } \end{aligned}$$
(11)

where \(\bar{R}\) represents the mean value of the gross return R in a selected time period t.

3.1.3 Accuracy evaluation

As a general measure for the evaluation of the prediction effect, the Hit Ratio (HR) was used. HR was calculated based on the number of properly generated trading signals within the test group:

$$\begin{aligned} \mathrm{HR}=\frac{1}{m}\sum \limits _{i=1}^m {\mathrm{PO}_i } \end{aligned}$$
(12)

where PO\(_{i}\) is the prediction output of the ith trading day, that is, \(S_t \) for the observed trading strategies. PO\(_{i}\) equals 1 if it is the actual value for the ith trading day; otherwise, PO\(_{i }\) equals 0, and m is the number of data in the used data set.

3.2 Determining feature weights by AHP

The proposed approach for the selection of subsets of the features in accordance with the AHP evaluations is shown in Fig. 1.

After forming the initial set of technical indicators, the first step in the proposed algorithm is the calculation of the criterion values for AHP evaluation. For the technical indicators, calculate values of the evaluation criterion: return, systematic risk and prediction accuracy. The Relative Value Vector is calculated by the methods described in Sect. 2.3. Then three pairwise comparison matrices are constructed. The weights in the matrices reflect how the technical indicators perform in terms of each criterion. According to Sect. 2.3, we then create the Option Performance Matrix, and in the next step multiply the RVV and the OPM to obtain the overall feature weights. The weights (\(\theta )\) determine the relative significance (ranking) of each input technical indicator candidate in relation to the criterion values. The following step is the sorting of the set of technical indicators in descending order according to \(\theta \) values. The goal of this step is to find a feature subset that will be used for the prediction model. More precisely, if one plots the weights, the technical indicator that corresponds to the largest weight will add the most information to the prediction model. At some point the feature relevance will decrease, leading to what is known as an “angle” effect in the plot (see Fig. 3). The estimated feature weights for selected features should proportionally be rescaled in accordance with the constraints defined in (8). In the last step, kernel weighting is performed by feature multiplication with rescaled weights, within the input feature space.

4 Experimental results and discussion

This section presents the experimental results and discussion of applying the proposed approach. The goal of the experimental study is to compare the performance of the proposed feature ranking and selection approach in combination with weighted kernel LS-SVMs with benchmark models. The section begins with a description of the datasets used in the experiments, following the experimental setup. Then the results are presented, and finally a discussion of the results concludes the section.

The experiments were conducted on the data for the Belex15, S&P500 and FTSE100 stock market indexes. The value of indexes determines the price of the most liquid stocks traded on the regulated market of the observed markets. The series consists of six time-series values which are determined for each day: the closing price, the change in the value of the index in relation to the previous trading day in percentages, the opening price, highest price, lowest price and the trading volume. The data were divided into two groups. The first group consisted of records required for the model training, from 26 October 2005 to 31 December 2012. The Belex15 index training data set consisted of 1793 samples. The S&P500 training data set consisted of 1775 samples and the FTSE100 training set consisted of 1851 data samples. For the model testing, data from 3 January 2013 to 31 December 2013 were used, a total of 252 days of trading for all the data series. The results are obtained for one-day-ahead predictions using data over an extended period of time, 1 trading year, and exceed most of the time horizons presented in the literature (Huang et al. 2005; Ni et al. 2011; Yuling et al. 2013).

The stock market trend prediction problem is commonly modeled as a two-class classification problem where the classes are labeled with \(-1\) and 1. Class \(-1\) indicates that the closing price of the current day is higher than the closing price of the following day. The second class indicates the opposite. Figure 2 shows the trend fluctuations.

Fig. 2
figure 2

Trend fluctuations

From Fig. 2, it can be noticed that the trend fluctuates up and down repeatedly, rendering it challenging for prediction.

Table 1 Descriptive statistics for the selected inputs features
Table 2 Technical indicators and trading strategies

4.1 Experimental framework

We consider now the set of nine potential input features. In this study, we rely on the most commonly used technical indicators—Exponential Moving Average (EMA—the moving average of the closing price calculated using a smoothing factor to place a higher weight on recent closing prices), Relative Strength Index (RSI—the index that measures the speed and change of price movements), Stochastic Oscillator % K (an indicator that predicts the price turning points by comparing a security’s closing price to its price range over a given time period), Stochastic Oscillator % D (the average of the last three % K values calculated daily), Moving Average Convergence–Divergence (MACD, the indicator that measures the strength and direction of the trend and momentum), ROC (Rate of Change, the indicator that shows the percentile change in the closing prices), Commodity Channel Index (CCI—an indicator used to detect cyclical movements in price change by measuring stock price variations from its statistical mean), and SAR (Parabolic Stop and Reverse—an indicator which detects stock price trend direction and determines entry and exit points). Descriptive statistics for the selected indicators based on the available data sets were calculated, and are shown in Table 1.

The detailed procedure for calculating these indicators and the rules for generating trading signals are given in Table 2.

The first step is to provide an initial matrix for the criterion pairwise comparisons. The risk and return criteria are evaluated based on standard economic theory assumptions that investors are commonly averse to risk (Levy 2006; Lo 2007). Since the aim of this paper is to improve the precision of the prediction model, the third criterion is evaluated as the most significant one. For our calculations, we used a 4-year trading cycle sub-sample period starting from the beginning of 2009 and lasting until the end of 2012.

Table 3 Pairwise criteria comparison matrix
Table 4 Option performance matrix\(^\mathrm{T}\) \(\times \) RVV\(^\mathrm{T}\) \(=\) feature weights \((\theta )^\mathrm{T}\)
Fig. 3
figure 3

Decreasing order of the obtained feature weights

The eigenvector, the Relative Value Vector, is calculated by the methods described in Sect. 2.3. as RVV \(=\) (0.082, 0.236, 0.682)\(^\mathrm{T}\) (Table 3). These three numbers correspond, respectively, to the relative values of each criterion of return, risk and accuracy. The result 0.682 means that the model values accuracy most of all; 0.236 shows that risk is valued less; and 0.082 shows that the model values return the least. The CR value is 0.09297, which is less than the value of the critical limit 0.1, and thus the model is consistent in its choices. Previously, the terms technical trading strategies and technical indicators were used. To simplify the notation for further calculations, these two terms will be considered synonyms, although in fact a choice of technical indicators is made.

In the next step using three pairwise comparisons matrices, we compare the selected input features in terms of the gross return, systematic risk and prediction accuracy. Table 4 presents the summarized option performance matrix for the observed technical indicators.

Based on the final calculation, we obtained a decreasing order of feature weights and Fig. 3 shows a final summary of feature relevance.

After obtaining the feature weights, we performed feature selection by analyzing the results shown in Fig. 3, as described in Sect. 3.2. It can be noticed from Fig. 3 that the indicator weights eventually significantly decrease after the second ranked indicator for the Belex15 and S&P500 index, and that for the FTSE100 the decrease is significant after the third indicator. As a result, we selected the first two ranked indicators as input features for the prediction model for the Belex15 and S&P500, and the first two rescaled weights to be incorporated into the LS-SVM kernel. For the FTSE100, we selected the first three ranked indicators. To form the LS-SVM models, LS-SVMlab (Brabanter et al. 2011) was used.

4.2 Experimental evaluation

To assess the increase in the accuracy of the proposed model and its contribution to forecasting research, the accuracy of the model is compared with the results of other classification algorithms: Random Forest (RF) (Breiman 2001), Linear SVM (Chang and Lin 2011) and artificial neural networks (ANN). For the Random Forest, we used 1000 trees and a number of features for each split were set to the square root of the features dimensionality. For the Linear SVM soft, margin parameter C is fixed to \(C = 1\). For the ANN, we used two hidden layers with 100 neurons. In addition, we compared the proposed feature selection strategy with several feature selection approaches: mutual information (MI) with forward–backward selection (Gómez-Verdejo et al. 2009), random forest (RF) for feature selection (Genuer 2010), and a linear discriminant classifier (LDC) with sequential forward selection (He et al. 2013).

First, Table 5 presents the comparison of selected features according to different feature selection methods.

Table 5 Comparison of the feature ranking and selection approaches
Table 6 The performance comparison of the individual prediction models

From Table 5, it can be seen that according to the feature selection approach, both the number and the set of the selected input features vary. Thus, for testing purposes, we built 10 different models, denoted with the abbreviations of one of the above-mentioned feature selection approaches and used prediction models. Accordingly, the MI-LS-SVM is a model trained with features selected by MI. The RF-LS-SVM is an LS-SVM model trained with features selected by Random Forest. The LDC-LS-SVM is an LS-SVM model trained with features selected with a linear discriminant classifier. The AHP-WK-LS-SVM implements the proposed approach for feature selection and weighted kernel. The AHP-WK-SVM model incorporates weights obtained from AHP into the SVM kernel. The Random Walk (RW) model uses the current value to predict the future value, assuming that the value in the following period (y\(_{ t+1}\)) will be equal to the current value (y\(_{ t}\)). The hit rate values in percentages for observed data sets according to the initial split of approximately 90 \(\%\) for training and 10 \(\%\) for test data are shown in Table 6. All of the benchmark prediction models used the same experimental setups across the data series, that is, the same training and test sets for each experimental data set. All of the models are built within the Matlab Tollboxes by using additional libraries where necessary LS-SVMlab (Brabanter et al. 2011), LibSVM (Chang and Lin 2011) and MILCA-MI (Kraskov et al. 2004).

Table 7 Prediction performance depending on the number of training instances (given in %)

From Table 6 it can be observed that in terms of the hit rate, the proposed AHP-WK-LS-SVM prediction model significantly outperforms all the benchmark models for the BELEX15 and FTSE100 data sets. In comparison with ANN, the AHP-WK-LS-SVM obtained hit rate is slightly lower for the S&P500 index, around 1 % less, but significantly higher for the FTSE100 and Belex15, more than 3 and 7 % respectively. Besides the AHP-WK-LS-SVM model, we tried to incorporate weights obtained from AHP into the SVM kernel. From Table 6, it can also be noted that the AHP-WK-SVM model significantly improves the SVM model, by 3 % for the BELEX15 and FTSE100, and more than 1 % for S&P500.

For comparing multiple models on multiple data sets, a two-stage procedure is recommended (Dešmar 2006). First, applying Friedman’s test, to test whether the compared models have significant general differences in the performance, and if the null hypothesis is rejected, at the second stage applying some post-hoc test. Friedman’s test is a nonparametric test which is designed to detect differences among two or more groups. Applying Friedman’s test, a p value of 0.0057 is obtained. Thus, the null hypothesis is rejected at the 5 % significance level, which indicates statistically significant differences in the mean ranks among the compared models. For the post-hoc test, the Nemenyi test was used, which indicates no significant differences at the 0.05 significance level between the obtained prediction models, except between AHP-WK-LS-SVM and RW.

Finally, we compared the accuracy of the proposed prediction model with other benchmark classifiers, depending on the number of training instances, and the results are shown in Table 7.

Based on the results presented in Table 7 it can be seen that for all the splits and series, our proposed AHP-based features ranking and selection approach improved the LS-SVM and SVM prediction accuracy. For BELEX15 and FTSE100 data series, the proposed model had the highest hit rate among all the benchmark models, while for S&P500 the recorded hit rate was only slightly lower than that of the ANN and same trend was noted for all the splits.

5 Conclusion

One possible approach for improving stock market trend prediction is presented in this paper. The proposed methodology is based on the concept of AHP analysis for feature ranking and selection. In addition, we used a weighted kernel to increase the generalization performance of the LS-SVM prediction model, where the kernel is weighted based on the feature relevance obtained by the conducted AHP analysis. The influence of the weighted kernel and feature selection led to a significant increase in the prediction accuracy. In addition, the set of feature weights obtained by the proposed approach can also independently be incorporated into other kernel-based learners, beside LS-SVMs.

The improvement in hit rates obtained on the test sets that contain data for 1 trading year can be considered a significant improvement, considering the fact that the stock market trend is predicted for the purpose of the optimization of investment strategies on the financial markets. Thus, percent increase in model precision can lead to a gain in terms of profit, since it results in greater return and a decrease in the risk involved in trading. Therefore, future improvements will focus on the study of criteria relevant to investors with different preferences regarding risk. Also, further work should include the formation of an ensemble model, where the outputs from several models would be combined into a final model by some aggregating scheme.