Novel hybrid SVM-TLBO forecasting model incorporating dimensionality reduction techniques

Das, Shom Prasad; Achary, N. Sangita; Padhy, Sudarsan

doi:10.1007/s10489-016-0801-3

Novel hybrid SVM-TLBO forecasting model incorporating dimensionality reduction techniques

Published: 08 July 2016

Volume 45, pages 1148–1165, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Novel hybrid SVM-TLBO forecasting model incorporating dimensionality reduction techniques

Download PDF

Shom Prasad Das¹,
N. Sangita Achary¹ &
Sudarsan Padhy²

654 Accesses
14 Citations
Explore all metrics

Abstract

In this paper, we present a highly accurate forecasting method that supports improved investment decisions. The proposed method extends the novel hybrid SVM-TLBO model consisting of a support vector machine (SVM) and a teaching-learning-based optimization (TLBO) method that determines the optimal SVM parameters, by combining it with dimensional reduction techniques (DR-SVM-TLBO). The dimension reduction techniques (feature extraction approach) extract critical, non-collinear, relevant, and de-noised information from the input variables (features), and reduce the time complexity. We investigated three different feature extraction techniques: principal component analysis, kernel principal component analysis, and independent component analysis. The feasibility and effectiveness of this proposed ensemble model were examined using a case study, predicting the daily closing prices of the COMDEX commodity futures index traded in the Multi Commodity Exchange of India Limited. In this study, we assessed the performance of the new ensemble model with the three feature extraction techniques, using different performance metrics and statistical measures. We compared our results with results from a standard SVM model and an SVM-TLBO hybrid model. Our experimental results show that the new ensemble model is viable and effective, and provides better predictions. This proposed model can provide technical support for better financial investment decisions and can be used as an alternative model for forecasting tasks that require more accurate predictions.

A two-stage model for stock price prediction based on variational mode decomposition and ensemble machine learning method

Article 11 June 2023

A Hybrid Machine Learning Approach for Multistep Ahead Future Price Forecasting

Automatic optimized support vector regression for financial data prediction

Article 06 May 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the business environment, we desire precise and efficient forecasts of various kinds of financial variables, which can then be used to develop successful strategies and avoid large losses [7]. Over the last three decades, many researchers have considered financial time-series data prediction, with the prime objective of beating the financial market. Financial forecasting is an interesting and challenging field, because there are a huge number of factors (e.g., economic, political, environmental, and psychological) that must be considered during the forecasting process. Financial time series data are intrinsically noisy, non-stationary, and deterministically chaotic [4, 39]. The noise in financial time-series data are caused by a lack of information regarding the historical behavior of financial markets, which makes it hard to map the future and past values. The non-stationary and chaotic nature of the data indicates that the data distribution varies over time and is unpredictable.

Because of successful developments in different computational intelligence techniques, researchers have started to apply computational intelligence approaches to financial markets. Example techniques include artificial neural networks (ANNs), support vector machines (SVMs), genetic algorithms (GAs), particle swarm optimization (PSO), and fuzzy technologies. Vapnik et al. [44] introduced SVM methods to overcome the problems of ANNs (such as getting trapped in local minima, overfitting to training data, and long training times). Since then, several authors have proposed financial instrument pricing using SVMs. For example, Tay and Cao [39] and Cao and Tay [4] developed pricing models for five specific financial futures in the US market using SVMs, and Van Gestel et al. [43] used LS-SVM (least squares support vector machine) for the T-bill (treasury bill) rate and stock index pricing in the United States (US) and German markets. Their experimental results showed that SVM performed well when applied to financial markets and produced good predictions Sapankevych and Sankar [38] published an exhaustive survey on SVMs for time series data prediction. They surveyed papers in the areas of financial market prediction, electric utility load forecasting, environmental states, weather prediction, and reliability forecasting. In their survey, they noted that the selection of free parameters for the SVM and the kernel function had a significant influence on the forecast. The experimental result by Kim [21] showed that SVM predictions are sensitive to these free parameters, and that it is important to select optimal values. Improper selection of free parameters can cause over- or under-fitting problems [21]. To select the optimal SVM parameter(s) and kernel function, several studies proposed a hybrid model combining an SVM and optimization techniques using PSO, GAs, artificial bee colonies (ABCs), differential evolution (DE), ant colony optimization (ACO), simulated annealing (SA), and so on [19, 29, 30, 46, 47]. However, the optimization model used to select the optimal parameter(s) itself introduces additional user-specified controlling parameter(s), making the user’s task even more complex. To determine the influence of the user-specified controlling parameter(s) of the optimization technique on the forecasting result, Das and Padhy [8] proposed a novel hybrid SVM-TLBO model. In the hybrid model, the teaching-learning-based optimization (TLBO) algorithm proposed by Rao et al. [37] is used to optimize the SVM parameters and kernel function, which does not require user-specified control parameters. Their extensive experimental results showed that the SVM-TLBO novel hybrid model outperformed the standard SVM model and the SVM + PSO model proposed by Lin et al. [29].

Das and Padhy [8] proposed a novel hybrid regression model for forecasting the value of the multicommodity futures index (COMDEX) traded on the Multi Commodity Exchange of India Limited (MCX). They considered 17 technical indicators as input variables (features) to the SVM regression model, and predicted closing values of the futures index from 1, 3, and 5 days ahead. The 17 technical indicators in their study were selected from different literature surveys based on work by Hsu [12], Huang and Tsai [13], Ince and Trafalis [18], Kim [21], Kim and Han [22], Kim and Lee [23], Lai et al. [25], Liang et al. [27], and Tsang et al. [41], and feedback from domain experts. A detailed description of these technical indicators can be found in Appendix I. When modeling financial time series using SVM regression, technical indicators are used as input and must be very carefully selected and identified. Including irrelevant technical indicators as input to the regression model may lead to noise. Training an SVM model with the inherited noisy data may cause it to fit to undesirable data, resulting in an inappropriate approximation function and loss of the generalization. Moreover, the model could under- or over-fit to the noisy data [2]. Additionally, almost all stock prediction techniques use many approaches such as statistics, mathematical models, machine learning and artificial intelligence to determine potential future rules. However, these approaches require high quality features (inputs) before they can learn any knowledge, because inaccurate information can produce erroneous results [6].

To minimize the noise and collinearity in the data and to extract the most relevant information for knowledge discovery, we can use dimensionality reduction techniques like independent component analysis (ICA), principal component analysis (PCA), kernel principal component analysis (KPCA), and factor analysis (FA). The fundamental objectives of dimensionality reduction are to identify and extract more reliable information and effective features from the original data [40]. Our literature survey identified different dimensional reduction techniques used in machine learning and showed that the generalization capabilities of these methods are improving. Cao et al. [3] compared PCA, KPCA and ICA applied to SVM classification. Cai et al. [1] applied dimensionality reduction in SVM using PCA, KPCA and ICA. Ekenel and Sankur [10] applied ICA and PCA dimensionality reduction methods to facial recognition. Lu et al. [32] used ICA for dimensionality reduction with support vector regression (SVR) for financial timeseries data forecasting. Ince and Trafalis [17] used KPCA and factor analysis (FA) as dimensionality reduction techniques with SVM to predict stock prices. Lu [31] developed a hybrid model using non-linear independent component analysis (NLICA), SVR and PSO to forecast the stock index. Zhai et al. [48] used PCA with SVM to improve material identification of loose particles in sealed electronic devices Their experimental results showed that this model was effective. An improved SVM model was proposed by Kuang et al. [24], which integrated SVM, KPCA, and chaotic PSO for intrusion detection.The focus of this research was to incorporate some well-known dimensionality reduction techniques (feature extraction approaches) into the novel hybrid SVM-TLBO regression model proposed by Das and Padhy [8], to develop an improved prediction model (DR-SVM-TLBO) that forecasts COMDEX. Chang and Wu [6] concluded from their experimental study that dimensional reduction using feature extraction techniques is superior to feature selection techniques. We considered PCA, a non-linear version of PCA with a kernel based trick (KPCA), and ICA to reduce the dimensionality of the input variables (features), because these three techniques are well-known methods [1, 3, 17, 33] More detailed information regarding the PCA, KPCA, and ICA algorithms can be found in [6, 14–17, 20, 31, 32, 42] and Appendix II. We compared standard SVM without dimension reduction and the novel hybrid SVM-TLBO [8] model with our proposed model. The expected benefits of the proposed model are that it can: (1) reduce the dimension of the financial time-series data, (2) reject noisy inputs, (3) reduce the computational complexity, and (4) produce a more generalized model.

The experimental results of our commodity futures index data show that the forecasting results of the proposed ensemble model are more accurate than those of the standard SVM regression model and the novel hybrid SVM-TLBO model [8]. To measure and compare the forecasting performances of the models, we used the root mean square error (RMSE), mean absolute error (MAE), normalized mean square error (NMSE), directional symmetric (DS), and Diebold-Mariano (DM) statistical test. The outcome of this study provides empirical confirmation of the usefulness of the proposed model when forecasting in the financial domain.

The contribution of this study and experiment was to design a new ensemble system by integrating two approaches: dimensionality reduction techniques using feature extraction methods and the novel hybrid SVM-TLBO regression model. A small improvement in the forecasting performance can minimize the stakeholders’ risk and lead to a considerable investment profit. The proposed model produced a system that revealed three key characteristics: (1) the resulting model reduced the input features (technical indicators) from seventeen to six features that had a greater than 95 % cumulative variance; (2) the time complexity of the proposed model was less than the benchmark models; and (3) the forecasts were more accurate than those of the benchmark models. The organization of this paper is as follows. Section 2 provides a summary of SVM for regression, TLBO optimization methods, and the novel hybrid SVM-TLBO model. In Section 3, we describe the proposed hybrid model architecture and experiments, followed by the empirical results and discussion in Section 4. Section 5 concludes the study with a brief discussion of our findings and future work.

2 Novel hybrid SVM-TLBO model

2.1 SVM for regression

Vapnik et al. [44] developed an SVM technique for regression. The method was presented in Hykins [11] as follows.

Given a training data set {(x ₁,y ₁),…,(x _ℓ,y _ℓ)}, where each x _i∈X⊂R ⁿ (X denotes the input sample space) and matching target values y _i∈R for i=1,…,l (where l corresponds to the size of the training data), the objective of the regression problem is to find a function f:R ⁿ→R that can approximate the value of y for x not in the training set.

The estimating function f is defined as

$$ f(x)=(w^{T}{\Phi} (x))+b, $$

(1)

where w∈R ^m b∈R is the bias, and Φ denotes a nonlinear function from R ⁿ to high-dimensional space R ^m(m>n). The aim is to find w and b such that the value of f(x) can be determined by minimizing the risk.

$$ R_{\text{reg}} (f)=C\sum\limits_{i=1}^{n} {L_{\in } (y_{i} ,f(x_{i} ))} +\frac{1}{2}\left\| w \right\|^{2}. $$

(2)

Here, L _∈ is the extension of the ∈-insensitive loss function originally proposed by Vapnik et al. [44], which is defined as

$$ \mathrm{L}_{\in}=\left\{\begin{array}{ll} | \mathrm{y}-\mathrm{z} |- \in , &\quad | \mathrm{y}-\mathrm{z} |\ge \in \qquad{~}\\ 0,& \quad \text{otherwise} \qquad{~} \end{array} \right\}. $$

(3)

By introducing the slack variables ζ _i and $\zeta _{i}^{\ast }$, the problem in (2) can be reformulated to the following.

(P) Minimize $C[\sum \limits _{i=1}^{l} {(\zeta _{i} +\zeta _{i}^{\prime })} ]+\frac {1}{2}\| w \|^{2}$ subject to

$$ \begin{array}{l} (i)y_{i} -w^{T}{\Phi} (x_{i} )-b\le \in +\zeta_{i} , \\ (ii)w^{T}{\Phi} (x_{i} )+b-y_{i} \le \in +\zeta_{i}^{\prime}, \\ (iii)\zeta_{i} \ge 0, \\ (iv)\zeta_{i}^{\prime}\ge 0, \end{array} $$

(4)

where i=1,…,l, and C is a user-specified constant known as a regularization parameter.

We can solve (P) using the primal dual method to get the following dual problem.

Determine ($\left \{ {\alpha }_{\mathrm {i}} \right \}_{i=1}^{l} and \left \{ {\alpha }_{\mathrm {i}}^{\ast } \right \}_{i=1}^{l}$) (α _i and α _i∗ are the respective Lagrange multipliers for constraints (i) and (ii) of the primal quadratic optimization problem P in (4)), that maximize the objective function

$$\begin{array}{@{}rcl@{}} Q(\alpha_{i} ,\alpha_{i}^{\ast })&=&\sum\limits_{i=1}^{l} y{}_{i}(\alpha_{i} - \alpha_{i}^{\ast })-\in \sum\limits_{i=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast })}\\ &&-\frac{1}{2}\sum\limits_{i=1}^{l} \sum\limits_{j=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast })} (\alpha_{j} -\alpha_{j}^{\ast })K(x_{i} ,x_{j} ),\\ \end{array} $$

(5)

subject to

(1)
$$ \sum\limits_{i=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast })} =0,\,\, \text{and} $$
(6)
(2)
$$ 0\le \alpha_{i} \le C,0\le \alpha_{i}^{\ast }\le C. $$
(7)

Here, i=1,…,l, and K:X×X→Ris the Mercer kernel defined by

$$ K(x,z)={\Phi} (x)^{T}{\Phi} (z). $$

(8)

The solution of the primal dual method yields

$$ w=\sum\limits_{i=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast }){\Phi} (x_{i} )}, $$

(9)

where b is calculated using the Karush-Kuhn-Tucker conditions. That is,

$$ \begin{array}{l} \alpha_{i} (\varepsilon +\zeta_{i} -y_{i} +w^{T}{\Phi} (x_{i} )+b)=0, \\ \alpha_{i}^{\ast }(\varepsilon +\zeta_{i}^{\ast }+y_{i} -w^{T}{\Phi} (x_{i} )-b)=0, \end{array} $$

(10)

$$ (C-\alpha_{i} )\zeta_{i} =0\,\,\text{and}\,\,(C-\alpha_{i}^{\ast})\zeta_{i}^{\ast}=0\, ,\,\, \text{where}\,\, i=1,\ldots,l $$

(11)

Because $\alpha _{i} \bullet \alpha _{i}^{\ast }=0$, and α _i and $\alpha _{i}^{\ast }$ cannot simultaneously be non-zero, there exists some i for which either α _i∈(0,C) or $\alpha _{i}^{\ast }\in (0,C)$. Hence, b can be computed using

$$ \begin{array}{l} b=y_{i} -\sum\limits_{j=1}^{l} {(\alpha_{j} -\alpha_{j}^{\ast })K(x_{j} ,x_{i} )} -\varepsilon \quad for\,\,0<\alpha_{i} <C, \\ b=y_{i} -\sum\limits_{j=1}^{l} {(\alpha_{j} -\alpha_{j}^{\ast })K(x_{j} ,x_{i} )} +\varepsilon \quad for\,\,0<\alpha_{i}^{\ast }<C. \end{array} $$

(12)

The x _i corresponding to 0<α _i<C and $0<\alpha _{i}^{\ast }<C$ are called support vectors. Using the expressions for w and b in (9) and (12), f(x) can be computed using

$$f(x)=\sum\limits_{i=1}^{n} {(\alpha_{i} -\alpha_{i}^{\ast })({\Phi} (x_{i})^{T}{\Phi} (x))} +b, $$

$$ =\sum\limits_{i=1}^{\ell} {(\alpha_{i} -\alpha_{i}^{\ast })K(x_{i} ,x)} +b. $$

(13)

Note that we do not require the function Φ to compute f(x), which is an advantage of using the kernel.

2.2 Teaching-learning-based optimization technique

Teaching learning based optimization (TLBO) is a recently established novel and effective meta-heuristic population based optimization algorithm [37]. TLBO uses a certain population of solutions to find the global solution, in a similar way as other nature-inspired algorithms such as PSO, GA, and ABC. The TLBO algorithm is based on a simulation of a traditional learning process, that is, the transfer of knowledge within a classroom atmosphere. The algorithm consists of two stages: (i) learners (students) first acquire knowledge from a teacher (teacher phase); and (ii) they enhance their knowledge by interacting with their peers (student phase). The TLBO population consists of a group of learners. There are decision variables, similar to other optimization algorithms. The different decision variables in TLBO are equivalent to the different subjects offered to students, and the students’ results are analogous to the ‘fitness’ value of the optimization problem.

2.2.1 Steps in the TLBO algorithm

The following steps of the TLBO algorithm were described by Rao et al. [37].

Step 1::

Define the optimization problem and create a solution space: In the initial phase, we identify the decision variable(s) in the problem to be optimized and assign them a range (minimum and maximum of the variable) where we will search for the optimal solution. If the solution spaces and ranges are not properly defined, then there is a chance that the optimization will take more time.

Step 2::

Identify the fitness function: In this step, we design or identify the fitness function, which accurately represents how well the optimized solution fits our problem using a single number. The TLBO algorithm uses the fitness function to evaluate its candidate solutions and obtains the optimal solution by minimizing (or maximizing) f(X) over the range of values of the decision variables (X), where f(X) is the fitness function.

Step 3::

Initialize the learners (or students): Each learner (based on the population size) is initialized using random values for each of the decision variables (within the appropriate ranges). The i-th learner is represented by row vector X _i, defined as

$$ X_{i} =[x_{i,1} ,x_{i,2} ,x_{i,3} ,\ldots,x_{i,D} ],i=1,2,\ldots,N, $$

(14)

whereD is the number of decision variables, and N is the total number of learners. Each decision variable x _i,jis randomly assigned a value using

$$ x_{i,j} =x_{j}^{\min } +rand()\ast (x_{j}^{\max } -x_{j}^{\min } )j=1,2,\ldots,D $$

(15)

where $x_{j}^{\min } $ and $x_{j}^{\max } $ are the minimum and maximum values of the j-th variable of i-th learner, and r a n d()is a function that returns a random number between 0 and 1.

Step 4::

Teacher phase

a.
Compute the mean value of each of the learners’ decision variables and denote the population mean as

$X_{mean} =[\overline {x}_{_{1} } ,\overline {x}_{_{2} } ,\ldots ,\overline {x}_{_{j} } ,\ldots ,\overline {x}_{_{D} } ],$ where $\overline {x}_{_{j} } =\frac {\sum \nolimits _{i=1}^{N} {x_{i,j} } }{N}$
b.
Compute the fitness values of each learner X based on the fitness function f(X).The learner with the best fitness value (solution) is identified as the teacher (X _{t
e
a
c
h
e
r}) for the teacher phase.
c.
Now the teacher (X _{t
e
a
c
h
e
r}) transfers their knowledge and tries to improve the fitness of other learners (X _i) by shifting the mean of the learners towards the teacher using
$$\begin{array}{@{}rcl@{}} X_{new} &=&X_{i} \!+rand()\ast (X_{teacher} \,-\,(TF)\ast X_{mean} ),\\ for~ i&=&1,2,\ldots,N, \end{array} $$
(16)
where,
$$ TF=round[1+rand(0,1)]. $$
(17)
Here, TF is the teaching factor (either 1 or 2), and r a n d() is a random number function that returns a number between 0 and 1.

Note that TFis not a parameter of the TLBO algorithm. The value of TFis not provided as input to the TLBO, but its value is randomly chosen with equal probability by the algorithm using (17).
d.
If the updated solution (X _{n
e
w}) is better than the existing solution (X _i), then we accept the new solution, otherwise we reject it.

Step 5::

Student phase

In the student phase, the learners (students) enhance their knowledge by interaction with other peer learners in the classroom. The practice of mutual interaction between learners (students) tends to increase the knowledge of the learner. Therefore, an individual learner learns if the other individuals have more knowledge.

a.
Randomly select any two solutions X _i and X _j such that i≠j.
b.
Solution X _iinteracts with solution X _j If f(X _i),that is, the fitness value of X _i is better (superior) than fitness value of X _j, then we update X _ito X _{n
e
w} using

$$ X_{new} =X_{i} +rand()\ast (X_{i} -X_{j} ) $$

(18)

otherwise, we update it to

$$ X_{new} =X_{i} +rand()\ast (X_{j} -X_{i} ) $$

(19)

Step 6::

Iterate until the termination criteria are satisfied

We then repeat Steps 4 and 5 until our termination conditions are satisfied, i.e., the average value of the fitness function for all learners does not improve, or we reach the maximum number of generations. The X _ithat minimizes f(X _i)for a minimization problem (or X _ithat maximizes f(X _i)for a maximization problem) is the final solution to the optimization problem.

A three-dimensional graphical illustration of single learner X _i searching for the optimal solutions is presented in Fig. 1. The initial stage represents the status of the decision variables obtained by the learner for each of the parameters in the optimization problem, as in (15) X _{m
e
a
n} and X _{t
e
a
c
h
e
r} represent the mean and current best status among all the learners (populations). The updated X _{n
e
w} is the status of the learner after the teacher phase, which is updated based on (16). X _{n
e
w} after the student phase represents the learner status after interacting with its peers in the student phase, where the status is updated using either (18) or (19) Note that the fitness value (i.e., distance between the X’s and corresponding f(X’s)) of each learner improves after each phase.

2.3 The novel hybrid SVM-TLBO regression model

The novel hybrid SVM-TLBO model proposed by Das and Padhy [8] predicts using SVM regression and uses TLBO to determine the SVM parameters. The hybrid model was designed to work in a two-dimensional solution space, that is, to optimize C and σ, where C is the regularization parameter of the SVM regression model and σ is the bandwidth of the radial basis (Gaussian) kernel function. The values of different parameters used in the SVM-TLBO novel hybrid model are presented in Table 1(a) and (b). The flow chart of the SVM-TLBO hybrid regression model is shown in Fig. 2. The raw time series financial data are processed to prepare the input set (features), and then the TLBO algorithm selects the optimal free parameters for the SVM regression model. We evaluate the fitness function for the optimization algorithm using the RMSE of the SVM regression results. In the training phase we apply the SVM regression model for each set of parameter (C and σ) values obtained by the TLBO algorithm. These multiple executions of the SVM regression model in the training phase increase the computational time, but this is the only overhead involved in the hybrid model. After determining the optimal parameters for the training data set, we apply the trained model to the test (out-of-sample) data to evaluate the performance of the forecasting model.

Table 1 (a) SVM and (b) TLBO parameters used in experiments [8]

Full size table

3 Proposed ensemble model architecture and methodology

We propose a new ensemble model called DR-SVM-TLBO for predicting financial time series, particularly the values of the energy commodity futures index. The proposed model is presented as a flowchart in Fig. 3. In the first step, the raw original time-series data collected from the market (i.e., MCX COMDEX) is input into the model for preprocessing and is used to calculate the technical indicators (see Appendix I). Detailed explanations of the data collection methodology, data, and preprocessing are given in Section 3.1. To determine the optimal number of features (to reduce dimensionality) from the 17 normalized input technical indicators, we used PCA dimension reduction to explain at least 95 % of the cumulative variance in our data set. Then the flow chart is divided into two stages: (1) dimensional reduction (critical feature extraction), and (2) implementation of SVM-TLBO hybrid model. In the dimensionality reduction stage, we apply feature extraction techniques using PCA, KPCA, and ICA to the normalized data. The number of critical features to be extracted is equal to the optimal number of input features (N) determined according to PCA to account for 95 % of the cumulative variance. After the dimension reduction step, we construct an input dataset containing the extracted features (reduced in size) In the SVM-TLBO stage, we apply the model to the reduced features. Here the training dataset is used to find the optimal values for the free parameters of the SVM regression model and the kernel function. The SVM-TLBO hybrid model used in the second stage is similar to the model developed by Das and Padhy [8] and presented in Fig. 2 The only difference is that we have omitted the first two blocks (i.e., data preprocessing and input preparation). These changes were required because data preprocessing and input preparation steps are already included in the proposed model. A detailed overview of the computation techniques used in our study is presented in Section 3.3 To forecast the values of the commodity futures index for a new data pattern, X, we must first apply the dimension reduction technique to extract the optimal feature values. Then, the trained SVM regression model is used to predict the value for the new data pattern. The out-of-sample (test) data are historical data, so the desired index values are known, and we can easily calculate the forecast performance.

3.1 Experimental data

To examine the effectiveness of the improved forecasting model, we applied it to real COMDEX data collected from the MCX (http://www.mcxindia.com) [8]. We collected daily trading series data points from January 1, 2010, to May 7, 2014, and used them as training and testing data. The total number of data samples in the time frame was 1,332. The time-series data consist of daily opening price, low price, high price, closing price, and traded date. We used 17 technical indicators as the inputs. The raw daily prices were used to calculate the technical indicators as per the details given in Appendix I. The data period includes many important and significant economic events so we consider this data to be appropriate for training the proposed models. Table 2 describes the dataset in terms of high, low, mean, and median values, as well as standard deviation, kurtosis (measure of the flatness of the distribution), and skewness (degree of asymmetry of the distribution close to its mean). Table 2 shows that the skewness value of the dataset is less than zero i.e., the dataset is left skewed (most values are concentrated to the right of the mean, with the extreme values to the left), there are lot of spikes in the dataset, and the kurtosis value is less than three i.e., it is a platykurtic distribution (flatter than a normal distribution with a wider peak). After processing the 1,332 raw data points, we obtained 1,307 transformed data points with dates from February 1, 2010 to May 7, 2014. The technical indicators were normalized to the range [0, 1] to minimize forecasting errors and to prevent variables with large numeric ranges from overwhelming the other data. The min-max normalization process was applied to the input technical indicators and the output closing prices. The technical indicators and closing prices were normalized using

$$\begin{array}{@{}rcl@{}} \bar{{x}}_{i}^{d} &=&\frac{{x_{i}^{d}} -\min \left( {x_{i}^{d}} \vert_{i=1}^{N} \right)}{\max \left( {x_{i}^{d}} \vert_{i=1}^{N} \right)-\min \left( {x_{i}^{d}} \vert_{i=1}^{N}\right)},\\ d&=&1,2,\ldots,17(number\,of\,input\,\text{variable}) \end{array} $$

(20)

where $\bar {{x}}_{i}^{d} $ is the normalized value, ${x_{i}^{d}} $is the original value, $\min ({x_{i}^{d}} \vert _{i=1}^{N} )$ is the minimum value in the original input data, $\max ({x_{i}^{d}} \vert _{i=1}^{N} )$ is the maximum value in the original input data, and N is the total number of trading days.

Table 2 Brief description of the MCX COMDEX dataset

Full size table

The normalized data were segregated into training and test groups, approximately in the ratio of 5:1. The data were divided into training and testing samples based on previous work [7, 8, 21, 31, 45]. The ratio of training to test data used by Chen et al. [7] was 9:1, Das and Padhy [8] used approximately 5:1, Kim [21] and Lu [31] used 4:1, and Wang and Wang [45] used approximately 6:1. In our case, 1085 data points were used for training with 5-fold cross-validation, and the remaining 222 were used to test the model. We considered three different forecasts of the closing prices: (1) 1 day ahead; (2) 3 days ahead; and (3) 5 days ahead.

3.2 Performance measures and statistical test

The performance of the proposed model was evaluated using standard parametric statistical metrics: RMSE, MAE, NMSE, and DS [4, 7, 39]. The descriptions and definitions of these performance criteria are given in Table 3. The accuracy of the direction of the prediction is provided by DS (in %). Larger DS values indicate a better forecast. These parametric statistical tests require a distributional assumption (i.e., the data are normally distributed) and are not robust to outliers, so they may occasionally produce ambiguous results. Therefore, we also used nonparametric techniques to evaluate the significance of any differences in the test (out-of-sample) performance of the proposed model compared with the benchmark models. We applied the DM statistical test [9], which is a nonparametric statistical test extensively used for forecasting model validation, especially in economics and finance. In the DM test, the null hypothesis states that the two forecasting methods have the same forecasting accuracy, while the alternative hypothesis is that the two forecasting methods have different levels of accuracy. The null hypothesis of equal forecasting accuracy is rejected at the 5 % significance level; that is, if the computed absolute value of the DM statistic is greater than 1.96 (i.e., |DM value |> 1.96). In this study, we used the square-error criteria as the loss function in the DM test.

Table 3 Performance evaluation metrics and their definitions

Full size table

3.3 Computation techniques

We implemented Vapnik’s SVM regression technique using LIBSVM, which is a SVM tool box [5]. We used the Gaussian kernel (radial basis) function, because it performs well under general smoothness assumptions. All the experiments were executed on an Intel Core i7 CPU @ 2.10 GHz, with 6 GB primary memory. We wrote our own code to implement TLBO for the SVM-TLBO hybrid regression model. The TLBO algorithm was defined in two dimensions, to optimize σ (bandwidth) of the Gaussian kernel parameter and C (regularization parameter) of the SVMs. In our experimental runs of the TLBO algorithm, there were no significant changes to σ and Cafter 25–30 iterations, when using a population size (learners/students) of 15. Pawar and Rao [34] and Rao and Patel [36], observed that TLBO only requires a small population and few iterations (generations). With this in mind, we fixed the maximum number of iterations for the TLBO to 30, with a population size 15. According to Tay and Cao [39], SVRs are insensitive to ε (the extent to which deviations are tolerated) if it is a reasonable value. Cao and Tay [4] observed that the number of support vectors decreases as ε increases. Thus, we chose ε= 0.0001. The SVM parameter range of C was set to 0.01–35,000, and the range of σ (bandwidth parameter of Gaussian kernel) was set to 0.0001–32 [28].

For the proposed ensemble DR-SVM-TLBO model, we designed our own code for dimensionality reduction using the PCA, KPCA, and ICA techniques which we implemented in R (https://www.r-project.org). The ensemble model was implemented according to the flowchart in Fig. 3. As previously discussed, we determined the optimal number of features from the original 17 technical indicators (features) using PCA. We selected the number of dimensions based on the PCA results that accounted for at least 95 % of the cumulative variance in the dataset. The cumulative variance of the PCA result is presented in Fig. 4. The optimal number of features is six, because the cumulative variance of the first six principal components (PC1 to PC6) was 95.41 %. So we set the optimal number of features (components) for all the dimension reduction techniques to six. In the implementation of the KPCA technique, we used a Gaussian kernel (radial basis) with a bandwidth parameter of 0.01 and for ICA, we used the fast ICA algorithm [15].

Table 4 Model performance with respect to the RMSE and the optimal parameters for standard SVM, SVM-TLBO novel hybrid, DR _PCA-SVM-TLBO, DR _KPCA-SVM-TLBO, and DR _ICA-SVM-TLBO

Full size table

The dimensional reduction in the proposed ensemble model uses PCA, KPCA, and ICA methods. So there are three different variants of the proposed model: 1) DR _PCA-SVM-TLBO, the proposed model with PCA for dimensionality reduction; 2) DR _KPCA-SVM-TLBO, the proposed model with KPCA for dimensionality reduction; and 3) DR _ICA-SVM-TLBO the proposed model with ICA for dimensionality reduction. We ran the new ensemble DR-SVM-TLBO algorithm as per the flow chart in Fig. 3. The simulation results are shown in Table 4. We compared the results of all three variants (i.e., DR _PCA-SVM-TLBO, DR _KPCA-SVM-TLBO, and DR _ICA-SVM-TLBO), the new ensemble model with the standard SVM regression without dimension reduction, and the novel hybrid SVM-TLBO model [8]. We used a sequential optimization based algorithm to train the SVM regression, because it is fast and efficient for large data sets.

4 Experimental results and discussion

In this section, we present our experimental results regarding the efficiency of the proposed new ensemble model. The RMSE results and average computational time in milliseconds for all five models in the training phase (in-sample) and testing phase (out-of-sample), and the optimal values of C and σ are presented in Table 4. The testing phase (out-of-sample) RMSE, MAE, and NMSE values presented in Table 5 show that the new ensemble DRSVM-TLBO model (all three variants) outperformed the standard SVM regression and SVM-TLBO novel hybrid models in all three forecasting cases. This is because the parameters (C and σ) of the standard SVM were selected using a traditional grid search method whereas the optimal values of C and σ for SVM-TLBO, DR _PCA-SVM-TLBO DR _KPCA-SVM-TLBO and DR _ICA-SVM-TLBO were obtained using the TLBO algorithm starting at random values within the defined solution space. In addition to the selection of the optimal SVM and kernel parameters, the dimension reduction techniques (i.e., PCA, KPCA, and ICA) extracted the input features from the original input set (17 technical indicators). The extracted input features contain less noise and more refined information. This changes the optimal values (of C and σ) derived by the optimization process and produced superior forecasting models. Table 4 clearly shows that the dimension reduction techniques used in our models reduced the average computational time and increased the accuracy when compared to the benchmark models. With respect to the DS performance metric, standard SVM performed better than rest of the models in the 1-day-ahead forecasts, DR _ICA-SVM-TLBO performed best in the 3-days-ahead forecasts, and SVM-TLBO performed best for the 5-days-ahead forecasts. Financial market practitioners evaluate forecasting models using both the minimum forecast error and directional accuracy [26]. The aim is to get a directional accuracy (DS value) of over 50 % [7]. In our study, the DS values for the benchmark and the proposed new ensemble forecasting models were greater than 50 % in all the forecasting cases. Table 5 clearly shows that the proposed ensemble model with KPCA (i.e. DR _KPCA-SVM-TLBO) outperformed the other models under consideration in this study. This is because the nonlinear kernel based PCA (i.e., KPCA) can include more discriminatory information to improve the accuracy of the forecasting model. The number in bold corresponds to the best performance.

Table 5 Comparison of the out-of-sample results with respect to the RMSE, MAE, NMSE, and DS of the standard SVM, SVM-TLBO novel hybrid, DR _PCA-SVM-TLBO, DR _KPCA-SVM-TLBO, and DR _ICA-SVM-TLBO ensemble models

Full size table

Table 6 summarizes the DM statistic with the p-values for the DM test given in parentheses. We compared the proposed ensemble DR _KPCA-SVM-TLBO forecasting model with two benchmark models (i.e. standard SVM and novel hybrid SVM-TLBO) and two variants of proposed models (i.e. DR _PCA-SVM-TLBO and DR _ICA-SVM-TLBO) for the 1-, 3-, and 5-days-ahead forecast cases. The results in Table 6 show that the p-values were smaller than the chosen significance level (i.e., 5 %) and the DM test values were greater than 1.96 except for the SVM-TLBO and DR _ICA-SVM-TLBO models, when applied to the 3-days-ahead forecast. The absolute value of the DM test results of DR _KPCA-SVM-TLBO compared to SVM-TLBO was 0.4223 (p-value: 0.6733) and for DR _KPCA-SVM-TLBO compared to DR _ICA-SVM-TLBO was 0.286 (p-value: 0.7752). These values are less than 1.96, so we cannot reject the zero hypothesis at the 5 % significance level. That is, the experimental difference between the forecasting performance of these models is not significant and might be due to stochastic variations. From these observations, we can conclude that the proposed DR-SVM-TLBO (all three variants) yields more accurate predictions than the benchmark models. And among the proposed ensemble models, DR _KPCA-SVM-TLBO performed the best. Table 7 gives the percentage improvements of the proposed ensemble model for all three variants (i.e. DR _PCA-SVM-TLBO, DR _KPCA-SVM-TLBO, and DR _ICA-SVM-TLBO) over the benchmark novel hybrid SVM-TLBO model for the out-of-sample (test) data with respect to the RMSE and MAE. Figure 5a, b, and c show box plots of the MAE for the 1-, 3-, and 5-days-ahead forecasts, respectively, for the standard SVM, SVM-TLBO novel hybrid, and the proposed models (i.e. DR _PCA-SVM-TLBO, DR _KPCA-SVM-TLBO, and DR _ICA-SVM-TLBO). The middle square in each box plot represents the MAE. The box plots clearly show that DR _KPCA-SVM-TLBO has the smallest range and smallest standard error deviation (denoted by the lines above and below the box). This shows that the DR _KPCA-SVM-TLBO model outperformed all the other models.

Table 6 Diebold-Mariano statistic: DM test values and p-values (in parentheses) for the MSE loss function

Full size table

Table 7 Percentage (%) improvement of the proposed new ensemble DR-SVM-TLBO (all three variants) model over the benchmark novel hybrid SVM-TLBO [8] model for out-of-sample data, with respect to RMSE and MAE

Full size table

5 Conclusions and future work

In this study, we extended the novel hybrid SVM-TLBO model by incorporating dimension reduction techniques. To reduce the number of input variables (features), we used three well known dimensional reduction techniques: PCA, KPCA, and ICA. We used multicommodity futures index data collected from MCX to examine the feasibility of the proposed ensemble model. Our models performed better than existing methods. Our conclusions are summarized as follows

1.
The average computational time results (Table 4) suggest that reducing the number of input variables (features) decreased the computational time.
2.
Our empirical results show that DR-SVM-TLBO (i.e. DR _PCA-SVM-TLBO, DR _KPCA-SVM-TLBO, and DR _ICA-SVM-TLBO) produced better predictions than the standard SVM regression method and the SVM-TLBO hybrid model. Among the three variants of the proposed ensemble model, DR _KPCA-SVM-TLBO performed the best.
3.
DR _KPCA-SVM-TLBO improved the RMSE by 50.23 % (for the 1-day-ahead forecast), 28.43 % (for the 3-days-ahead forecast), and 20.03 % (for the 5-days-ahead forecast), when compared with the SVM-TLBO hybrid regression model. The DR _KPCA-SVM-TLBO model also improved the MAE result by 55.45 % (1-day-ahead), 30.63 % (3-days-ahead), and 17.87 % (5-days-ahead), when compared with the SVM-TLBO novel hybrid regression model. There were similar improvements in terms of MAE and RMSE for the other two variants of the proposed model (i.e. DR _PCA-SVM-TLBO and DR _ICA-SVM-TLBO).
4.
The results of the DM statistical test (Table 6) show that all the DM tests comparing the proposed model (DR _KPCA-SVM-TLBO) with the other models yielded values greater than 1.96 (the threshold value at the 5 % significance level). The corresponding p-values lie within the 5 % significance level in all cases except DR _KPCA-SVM-TLBO compared to SVM-TLBO and DR _KPCA-SVM-TLBO compared to DR _ICA-SVM-TLBO, for the 3-days-ahead forecast. The DM test confirms that the predictive accuracy of our proposed model is statistically significantly better than that of the benchmark models.

In this study, we selected quantitative technical indicators (features) based on previous research work by different resear-chers in this area and feedback from a domain expert. We could improve the predictive performance by including non-quantitative factors like data from breaking news and social media, efficient macroeconomics factors, and psychological factors. One limitation of this study is that we used a relatively small dataset. Despite this, we achieved reasonably good forecasts. The proposed hybrid model should provide better forecasting results when applied to larger volumes of data. The successful application of our proposed model to non-linear and highly complex financial time-series data suggests that it may be useful in other domains.

References

Cai LJ, Zhang JQ, Zongwu CAI, Kian Guan LIM (2006) An empirical study of dimensionality reduction in support vector machine. Neural Network World 16(3):177–192
Google Scholar
Cao LJ (2003) Support vector machines experts for time series forecasting. Neorocomputing 51:321–339
Article Google Scholar
Cao LJ, Chua KS, Chong WK, Lee HP, Gu QM (2003) A comparison of PCA, KPCA and ICA for dimensional reduction in support vector machines. Neurocomputing 55(1):321–336
Google Scholar
Cao LJ, Tay FEH (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14(6):1506–1518
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Chang PC, Wu JL (2015) A critical feature extraction by kernel PCA in stock trading model. Soft Comput 19(5):1393–1408
Article Google Scholar
Chen WH, Shih JY, Wu S (2006) Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. International Journal of Electronic Finance 1(1):49–67
Article Google Scholar
Das SP, Padhy S (2015) A novel hybrid model using teaching–learning-based optimization and a support vector machine for commodity futures index forecasting. Int J Mach Learn Cyber:1–15. doi:10.1007/s13042-015-0359-0
Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263
MathSciNet Google Scholar
Ekenel HK Sankur B (2004) Feature selection in the independent component subspace for face recognition. Pattern Recogn Lett 25(12):377–1388
Google Scholar
Haykin S (2010) Neural Networks and Learning Machines. 3rd Edition, PHI Learning Private Limited
Hsu CM (2013) A hybrid procedure with feature selection for resolving stock/futures price forecasting problems. Neural Comput Applic 22(3–4):651–671. doi:10.1007/s00521-011-07214
Article Google Scholar
Huang CL, Tsai CY (2009) A hybrid SOFM-SVR with a filter based feature selection for stock market forecasting. Expert Syst Appl 36(2):1529–1539. doi:10.1016/j.eswa.2007.11.062
Article MathSciNet Google Scholar
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York
Book Google Scholar
Hyvarinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7):1483– 1492
Article Google Scholar
Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural networks 13(4):411–430
Article Google Scholar
Ince H, Trafalis TB (2007) Kernel principal component analysis and support vector machines for stock price prediction. IIE Trans 39(6):629–637
Article Google Scholar
Ince H, Trafalis TB (2008) Short term forecasting with support vector machines and application to stock price prediction. Int J Gen Syst 37(6):77–687. doi:10.1080/03081070601068595
Article MathSciNet MATH Google Scholar
Jiang M, Jiang S, Zhu L, Wang Y, Huang W, Zhang H (2013) Study on parameter optimization for support vector regression in solving the inverse ECG problem. Comput Math Methods Med Article ID 158059. doi:10.1155/2013/158056
MathSciNet MATH Google Scholar
Jolliffe IT (2002) Principle components analysis 2 ^nd Edition. Springer, New York
Google Scholar
Kim KJ (2003) Financial time series forecasting using support vector machines. Neurocomputing 55(1):307–319
Article Google Scholar
Kim KJ, Han I (2000) Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert Syst Appl 19(2):125–132
Article MathSciNet Google Scholar
Kim KJ, Lee WB (2004) Stock market prediction using artificial neural networks with optimal feature transformation. Neural Comput Applic 13(3):255–260. doi:10.1007/s00521-004-0428-x
Article Google Scholar
Kuang F, Zhang S, Jin Z, Xu W (2015) A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection. Soft Comput 19:1187–1199. doi:10.1007/s00500-014-1332-7
Article Google Scholar
Lai RK, Fan CY, Huang WH, Chang PC (2009) Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Syst Appl 36(2):3761–3773. doi:10.1016/j.eswa.2008.02.025
Article Google Scholar
Leung MT, Daouk H, Chen AS (2000) Forecasting stock indices: a comparison of classification and level estimation models. Int J Forecast 16(2):173–190
Article Google Scholar
Liang X, Zhang H, Xiao J, Chen Y (2009) Improving option price forecasts with neural networks and support vector regressions. Neurocomputing 72(13):3055–3065. doi:10.1016/j.neucom.2009.03.015
Article Google Scholar
Lin HT, Lin CJ (2003) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods Technical report, University of National Taiwan Department of Computer Science and Information Engineering, March 1–32
Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817– 1824
Article Google Scholar
Liu S, Tian L, Huang Y (2014) A comparative study on prediction of throughput in coal ports among three models. Int J Mach Learn Cybern 5(1):125–133. doi:10.1007/s13042-013-0201-5
Article Google Scholar
Lu CJ (2013) Hybridizing nonlinear independent component analysis and support vector regression with particle swarm optimization for stock index forecasting. Neural Comput Applic 23(7–8):2417–2427. doi:10.1007/s00521-012-1198-5
Article Google Scholar
Lu CJ, Lee TS, Chiu CC (2009) Financial time series forecasting using independent component analysis and support vector regression. Decis Support Syst 47(2):115–125
Article Google Scholar
Musa AB (2014) A comparison of 1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression. Int J Mach Learn Cybern 5(6):861–873. doi:10.1007/s13042-013-0171-7
Article Google Scholar
Pawar PV, Rao RV (2013) Parameter optimization of machining using teaching-learning-based optimization algorithm. Int J Adv Manuf Technol 67:995–1006
Article Google Scholar
Porikli F, Haga T (2004) Event detection by eigenvector decomposition using object and frame features. IEEE Conference In Computer Vision and Pattern Recognition Workshop 2004(CVPRW’04):114–114
Google Scholar
Rao RV, Patel V (2014) A multi-objective improved teaching-learning based optimization algorithm for unconstrained and constrained optimization problems. Int J Ind Eng Comput 5(1):1–22. doi:10.5267/j.ijiec.2013.09.007
Google Scholar
Rao RV, Savsani VJ, Vakharia DP (2011) Teaching-learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput Aided Des 43(3):303– 315
Article Google Scholar
Sapankevych NI, Sankar R (2009) Time series prediction using support vector machines: a survey. IEEE Comput Intell Mag 4(2):24–38. doi:10.1109/MCI.2009.932254
Article Google Scholar
Tay FE, Cao LJ (2002) Modified support vector machines in financial time series forecasting. Neurocomputing 48(1):847– 861
Article MATH Google Scholar
Tsai CF, Hsiao YC (2010) Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis Support Syst 50(1):258–269. doi:10.1007/s00500-014-1350-5
Article Google Scholar
Tsang PM, Kwok P, Choy SO, Kwan R, Ng SC, Mak J, Tsang J, Koong K, Wong TL (2007) Design and implementation of NN5 for Hong Kong stock price forecasting. Eng Appl Artif Intell 20 (4):453–461. doi:10.1016/j.engappai.2006.10.002
Article Google Scholar
Twining CJ, Taylor CJ (2003) The use of kernel principal component analysis to model data distributions. Pattern Recogn 36(1):217–227
Article MATH Google Scholar
Van Gestel T, Suykens JA, Baestaens DE, Lambrechts A, Lanckriet G, Vandaele B, Vandewalle J (2001) Financial time series prediction using least squares support vector machines within the evidence framework. IEEE Trans Neural Netw 12(4):809–821
Article Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, NY
Book MATH Google Scholar
Wang J, Wang J (2015) Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks. Neurocomputing 156:68–78
Article Google Scholar
Wang S, Meng B (2011) Parameter selection algorithm for support vector machine. Prog Environ Sci 11:538–544. doi:10.1016/j.proenv.2011.12.085
Article Google Scholar
Wu CH, Tzeng GH, Lin RH (2009) A Novel hybrid genetic algorithm for kernel function and parameter optimization in support vector regression. Expert Syst Appl 36(3):4725–4735. doi:10.1016/j.eswa.2008.06.046
Article Google Scholar
Zhai G, Chen J, Wang S, Li K, Zhang L (2015) Material identification of loose particles in sealed electronics devices using PCA and SVM. Neurocomputing 148:222–228. doi:10.1016/j.neucom.2013.10.043
Article Google Scholar

Download references

Acknowledgments

We would like to express our gratitude to the National Institute of Science and Technology (NIST), for the facilities and resources provided at the Data Science Laboratory at NIST for the development of this study.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Science and Technology, Palur Hills, Odisha, 761008, India
Shom Prasad Das & N. Sangita Achary
Silicon Institute of Technology, Silicon Hills, Bhubaneswar, Odisha, 751024, India
Sudarsan Padhy

Authors

Shom Prasad Das
View author publications
You can also search for this author in PubMed Google Scholar
N. Sangita Achary
View author publications
You can also search for this author in PubMed Google Scholar
Sudarsan Padhy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shom Prasad Das.

Ethics declarations

Conflict of interests

The authors declare that there are no conflict of interests (either financial or non-financial) regarding the publication of the paper.

Appendices

Appendix A:: Technical indicators (features) used in this study

ᅟ ᅟ

Full size table

Appendix B:: Dimensionality reduction techniques used in this study

The objective of a dimension reduction technique is to reduce the dimension (number of features) of the input from a high-dimensional space to a low dimensional subspace. Dimensional reduction methods can be divided into two types: (i) feature selection and (ii) feature extraction. In feature selection, a subset of features is selected from the originals. In feature extraction new features are computed by transforming the original features. We present brief reviews of the dimensional reduction methods based on feature extraction that were used in our study. That is, PCA, KPCA, and ICA.

B.1 Principal component analysis (PCA)

Principal component analysis is a well-known linear statistical approach for feature extraction. The objective is to reduce the dimension of the input features from the original dataset [20]. It uses an orthogonal transformation to convert a set of N patterns (samples) of l possibly correlated features into a set of Nsamples of m(≤l) uncorrelated features called principal components (PCs). The transformation mechanism is designed such that the first principal component (PC1) has the highest possible variance, the second principal component (PC2) is orthogonal to the PC1 and accounts for next highest variance, and so on for the other PCs.

The PCA procedure is briefly described as follows

Step 1::

Input Npatterns (samples) X ₁,X ₂,…,X _N that each have l features (X _j∈R ^l). Each vector X _jfor j=1,2,…,N is such that the mean value of the features in X _j is zero (that is, we subtract the mean value of the original feature from each feature value).

Step 2::

Compute the covariance matrix

$$ C=\frac{1}{N}\sum\limits_{k=1}^{N} {X_{k} {X_{k}^{T}}} $$

(B.1)

The ij-th element of matrix Cis

$$ C_{ij} =\frac{1}{N}\sum\limits_{k=1}^{N} {X_{k} (i)X_{k} (j)} $$

(B.2)

where X _k(i)denotes the ith component of the X _ksample.

Step 3::

Calculate l eigenvalues of C and arrange them in non-increasing order λ ₁≥λ ₂≥...≥λ _l. For each eigenvalue λ _i, i=1,2,…,l, compute an associated eigenvector α _i∈R ^lof matrix C using an eigenvector decomposition technique [35].

Step 4::

Choose the m≤l largest eigenvalues (choose the smallest integer m, so that λ _m−1−λ _m is large or $\sum \limits _{i=1}^{m} {\lambda _{i} } \ge t\sum \limits _{i=1}^{N} {\lambda _{i} } $ where t=0.95 if we wish to retain 95 % variance in the transformed data, where $\sum \limits _{i=1}^{N} {\lambda _{i} }$ represents the total variance).

Step 5::

Use the eigenvectors (column vectors) α ₁,α ₂,…,α _m to form the transformation matrix.

$$ A=[\alpha_{1} \alpha_{2} ...\alpha_{m}] $$

(B.3)

Step 6::

Transform each pattern X _i in the original space R ^l to the vector Y _i in the m-dimensional space R ^m(m<l) using

$$ Y_{i} =A^{T}X_{i} ,i=1,2,\ldots,N $$

(B.4)

So the jth component Y _i(j) of Y _i is the projection of X _i on α _i(i.e., $Y_{i} (j)=\alpha _{j}^{T} X_{i} )$.

B.2 Kernel principal component analysis (KPCA)

In the PCA technique, each input pattern (sample) in R ^l is linearly projected onto a lower dimensional subspace. This is appropriate when the data approximately lie on a linear manifold (for example a hyperplane). However, in many applications the input data lie on a low dimensional nonlinear manifold. Then it is more appropriate to use KPCA, which is a nonlinear dimensional reduction technique. In this method the input patterns X _i∈R ^l for i=1,2,…,N (where N is the number of input samples) are first mapped onto a space H with more than l dimensions using a non-linear mapping ϕ:R ^l→H[42]. Their images ϕ(X _i) are projected along the orthonormal eigenvectors of the covariance matrix of ϕ(X _i)’s. These projections only involve the inner product of the ϕ(X _i)’s in H,ϕ is not explicitly known, and it is difficult to construct a kernel function. So we use Kdefined by K:R ^l×R ^l→R such that

$$ K(X_{i} ,X_{j} )=<\phi (X_{i} ),\phi (X_{j})> $$

(B.5)

(where <, > denotes inner product in H) to compute the inner products involved in the projections leading to the computation of Y _i’s having fewer dimensions m(m<l)than X _i’s. It has been proved that the components Y _i(k),k=1,2,…,m of the Y _i’s are uncorrelated and the first q(≤m) principal components have maximum mutual information with respect to the inputs, which justifies the use of the method for dimensionality reduction.

The KPCA procedure is given in the form of the following algorithm.

Step 1::: Input the data patterns (samples) X _i∈R ^l for i=1,2,…,N (where N is the number of input samples).
Step 2::: Choose a kernel function K:R ^l×R ^l→R and compute the kernel matrix K ₁whose ij-th element is equal to K(X _i,X _j) for i,j=1,2,…,l
Step 3::: Compute the eigenvalues and eigenvectors of K ₁. Arrange the eigenvalues in non-increasing order λ ₁≥λ ₂≥...≥λ _l. Let the corresponding eigenvectors be a ₁,a ₂,…,a _l.
Step 4::: Choose mdominant eigenvalues λ ₁,λ ₂,…,λ _m(m≤l)[choose the smallest integer m such that λ _m−1−λ _mis large or $\sum \limits _{i=1}^{m} {\lambda _{i} } \ge t\sum \limits _{i=1}^{N} {\lambda _{i} } $, where t=0.95 if we wish to retain 95% of the variance in the transformed data, and $\sum \limits _{i=1}^{N} {\lambda _{i} } $represents the total variance], and normalize the corresponding eigenvectors a ₁,a ₂,…,a _m using
$$ a_{k}^{\prime} =\frac{a_{k} }{\left\| {a_{k} } \right\|\sqrt {\lambda_{k} } },k=1,2,\ldots,m $$
(B.6)
Step 5::: For each X _i,i=1,2,…,N, compute the m projections Y _i(k) of ϕ(X _i) onto each of the orthonormal eigenvectors $a_{{k}^{\prime }}$’s, k=1,2,…,m,i.e.,
$$ Y_{i} (k)=\sum\limits_{j=1}^{l} {{a_{k}^{i}} (j)K(X_{i} ,X_{j} ),k=} 1,2,\ldots,m $$
(B.7)

B.3 Independent component analysis (ICA)

Independent component analysis (ICA) is a relatively new statistical method [14,16]. ICA does not transform uncorrelated components or factors, but instead attempts to find statistically independent components or factors in the transformed vectors. The primary goal of this method is to find representations of non-Gaussian data, so those components are statistically independent or as independent as possible [16].

In ICA, we assume that lmeasured variables X=[x ₁,x ₂,…,x _l]^T can be expressed as linear combinations of n unknown latent source components S=[s ₁,s ₂,…,s _n]^T, i.e.,

$$ X=AS $$

(B.8)

where A _l×l is an unknown mixing matrix. Here, we consider that l≥n if A is a full rank matrix. S is the latent source data that cannot be directly observed from the input mixture data, X. The basic ICA objective is to estimate the latent source components, S, and unknown mixing matrix A from X with appropriate assumptions on the statistical properties of the source distribution. The basic ICA model for feature transformation aims to find a de-mixing matrix W _l×l that can be written as

$$ Y=WX $$

(B.9)

where Y=[y ₁,y ₂,…,y _n]^T is the independent component vector. The elements of Ymust be statistically independent and are called independent components (ICs). Here, W = A ⁻¹(i.e., the de-mixing matrix Wis the inverse of mixing matrix A). The ICs (y _i) can be used to compute the latent source signals s _i.

Many algorithms can perform the ICA. The fixed-point fast ICA method presented by Hyvärinen and Oja [15] is the most popular. We used fixed-point fast ICA in our experimental study. In this algorithm, PCA is first used to transform the original input vectors (X) to a set of new uncorrelated vectors with zero means and unity variance. This process reduces the dimension of X and consequently reduces the number of Y. Then, the uncorrelated vector obtained by PCA is used to estimate the independent components vectors (Y) and the transformed matrix using the fixed point algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, S.P., Achary, N.S. & Padhy, S. Novel hybrid SVM-TLBO forecasting model incorporating dimensionality reduction techniques. Appl Intell 45, 1148–1165 (2016). https://doi.org/10.1007/s10489-016-0801-3

Download citation

Published: 08 July 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10489-016-0801-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Novel hybrid SVM-TLBO forecasting model incorporating dimensionality reduction techniques

Abstract

Similar content being viewed by others

A two-stage model for stock price prediction based on variational mode decomposition and ensemble machine learning method

A Hybrid Machine Learning Approach for Multistep Ahead Future Price Forecasting

Automatic optimized support vector regression for financial data prediction

1 Introduction

2 Novel hybrid SVM-TLBO model

2.1 SVM for regression

2.2 Teaching-learning-based optimization technique

2.2.1 Steps in the TLBO algorithm

2.3 The novel hybrid SVM-TLBO regression model

3 Proposed ensemble model architecture and methodology

3.1 Experimental data

3.2 Performance measures and statistical test

3.3 Computation techniques

4 Experimental results and discussion

5 Conclusions and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Appendices

Appendix A:: Technical indicators (features) used in this study

Appendix B:: Dimensionality reduction techniques used in this study

B.1 Principal component analysis (PCA)

B.2 Kernel principal component analysis (KPCA)

B.3 Independent component analysis (ICA)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Novel hybrid SVM-TLBO forecasting model incorporating dimensionality reduction techniques

Abstract

Similar content being viewed by others

A two-stage model for stock price prediction based on variational mode decomposition and ensemble machine learning method

A Hybrid Machine Learning Approach for Multistep Ahead Future Price Forecasting

Automatic optimized support vector regression for financial data prediction

Explore related subjects

1 Introduction

2 Novel hybrid SVM-TLBO model

2.1 SVM for regression

2.2 Teaching-learning-based optimization technique

2.2.1 Steps in the TLBO algorithm

2.3 The novel hybrid SVM-TLBO regression model

3 Proposed ensemble model architecture and methodology

3.1 Experimental data

3.2 Performance measures and statistical test

3.3 Computation techniques

4 Experimental results and discussion

5 Conclusions and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Appendices

Appendix A:: Technical indicators (features) used in this study

Appendix B:: Dimensionality reduction techniques used in this study

B.1 Principal component analysis (PCA)

B.2 Kernel principal component analysis (KPCA)

B.3 Independent component analysis (ICA)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation