1 Introduction

In the business environment, we desire precise and efficient forecasts of various kinds of financial variables, which can then be used to develop successful strategies and avoid large losses [7]. Over the last three decades, many researchers have considered financial time-series data prediction, with the prime objective of beating the financial market. Financial forecasting is an interesting and challenging field, because there are a huge number of factors (e.g., economic, political, environmental, and psychological) that must be considered during the forecasting process. Financial time series data are intrinsically noisy, non-stationary, and deterministically chaotic [4, 39]. The noise in financial time-series data are caused by a lack of information regarding the historical behavior of financial markets, which makes it hard to map the future and past values. The non-stationary and chaotic nature of the data indicates that the data distribution varies over time and is unpredictable.

Because of successful developments in different computational intelligence techniques, researchers have started to apply computational intelligence approaches to financial markets. Example techniques include artificial neural networks (ANNs), support vector machines (SVMs), genetic algorithms (GAs), particle swarm optimization (PSO), and fuzzy technologies. Vapnik et al. [44] introduced SVM methods to overcome the problems of ANNs (such as getting trapped in local minima, overfitting to training data, and long training times). Since then, several authors have proposed financial instrument pricing using SVMs. For example, Tay and Cao [39] and Cao and Tay [4] developed pricing models for five specific financial futures in the US market using SVMs, and Van Gestel et al. [43] used LS-SVM (least squares support vector machine) for the T-bill (treasury bill) rate and stock index pricing in the United States (US) and German markets. Their experimental results showed that SVM performed well when applied to financial markets and produced good predictions Sapankevych and Sankar [38] published an exhaustive survey on SVMs for time series data prediction. They surveyed papers in the areas of financial market prediction, electric utility load forecasting, environmental states, weather prediction, and reliability forecasting. In their survey, they noted that the selection of free parameters for the SVM and the kernel function had a significant influence on the forecast. The experimental result by Kim [21] showed that SVM predictions are sensitive to these free parameters, and that it is important to select optimal values. Improper selection of free parameters can cause over- or under-fitting problems [21]. To select the optimal SVM parameter(s) and kernel function, several studies proposed a hybrid model combining an SVM and optimization techniques using PSO, GAs, artificial bee colonies (ABCs), differential evolution (DE), ant colony optimization (ACO), simulated annealing (SA), and so on [19, 29, 30, 46, 47]. However, the optimization model used to select the optimal parameter(s) itself introduces additional user-specified controlling parameter(s), making the user’s task even more complex. To determine the influence of the user-specified controlling parameter(s) of the optimization technique on the forecasting result, Das and Padhy [8] proposed a novel hybrid SVM-TLBO model. In the hybrid model, the teaching-learning-based optimization (TLBO) algorithm proposed by Rao et al. [37] is used to optimize the SVM parameters and kernel function, which does not require user-specified control parameters. Their extensive experimental results showed that the SVM-TLBO novel hybrid model outperformed the standard SVM model and the SVM + PSO model proposed by Lin et al. [29].

Das and Padhy [8] proposed a novel hybrid regression model for forecasting the value of the multicommodity futures index (COMDEX) traded on the Multi Commodity Exchange of India Limited (MCX). They considered 17 technical indicators as input variables (features) to the SVM regression model, and predicted closing values of the futures index from 1, 3, and 5 days ahead. The 17 technical indicators in their study were selected from different literature surveys based on work by Hsu [12], Huang and Tsai [13], Ince and Trafalis [18], Kim [21], Kim and Han [22], Kim and Lee [23], Lai et al. [25], Liang et al. [27], and Tsang et al. [41], and feedback from domain experts. A detailed description of these technical indicators can be found in Appendix I. When modeling financial time series using SVM regression, technical indicators are used as input and must be very carefully selected and identified. Including irrelevant technical indicators as input to the regression model may lead to noise. Training an SVM model with the inherited noisy data may cause it to fit to undesirable data, resulting in an inappropriate approximation function and loss of the generalization. Moreover, the model could under- or over-fit to the noisy data [2]. Additionally, almost all stock prediction techniques use many approaches such as statistics, mathematical models, machine learning and artificial intelligence to determine potential future rules. However, these approaches require high quality features (inputs) before they can learn any knowledge, because inaccurate information can produce erroneous results [6].

To minimize the noise and collinearity in the data and to extract the most relevant information for knowledge discovery, we can use dimensionality reduction techniques like independent component analysis (ICA), principal component analysis (PCA), kernel principal component analysis (KPCA), and factor analysis (FA). The fundamental objectives of dimensionality reduction are to identify and extract more reliable information and effective features from the original data [40]. Our literature survey identified different dimensional reduction techniques used in machine learning and showed that the generalization capabilities of these methods are improving. Cao et al. [3] compared PCA, KPCA and ICA applied to SVM classification. Cai et al. [1] applied dimensionality reduction in SVM using PCA, KPCA and ICA. Ekenel and Sankur [10] applied ICA and PCA dimensionality reduction methods to facial recognition. Lu et al. [32] used ICA for dimensionality reduction with support vector regression (SVR) for financial timeseries data forecasting. Ince and Trafalis [17] used KPCA and factor analysis (FA) as dimensionality reduction techniques with SVM to predict stock prices. Lu [31] developed a hybrid model using non-linear independent component analysis (NLICA), SVR and PSO to forecast the stock index. Zhai et al. [48] used PCA with SVM to improve material identification of loose particles in sealed electronic devices Their experimental results showed that this model was effective. An improved SVM model was proposed by Kuang et al. [24], which integrated SVM, KPCA, and chaotic PSO for intrusion detection.The focus of this research was to incorporate some well-known dimensionality reduction techniques (feature extraction approaches) into the novel hybrid SVM-TLBO regression model proposed by Das and Padhy [8], to develop an improved prediction model (DR-SVM-TLBO) that forecasts COMDEX. Chang and Wu [6] concluded from their experimental study that dimensional reduction using feature extraction techniques is superior to feature selection techniques. We considered PCA, a non-linear version of PCA with a kernel based trick (KPCA), and ICA to reduce the dimensionality of the input variables (features), because these three techniques are well-known methods [1, 3, 17, 33] More detailed information regarding the PCA, KPCA, and ICA algorithms can be found in [6, 1417, 20, 31, 32, 42] and Appendix II. We compared standard SVM without dimension reduction and the novel hybrid SVM-TLBO [8] model with our proposed model. The expected benefits of the proposed model are that it can: (1) reduce the dimension of the financial time-series data, (2) reject noisy inputs, (3) reduce the computational complexity, and (4) produce a more generalized model.

The experimental results of our commodity futures index data show that the forecasting results of the proposed ensemble model are more accurate than those of the standard SVM regression model and the novel hybrid SVM-TLBO model [8]. To measure and compare the forecasting performances of the models, we used the root mean square error (RMSE), mean absolute error (MAE), normalized mean square error (NMSE), directional symmetric (DS), and Diebold-Mariano (DM) statistical test. The outcome of this study provides empirical confirmation of the usefulness of the proposed model when forecasting in the financial domain.

The contribution of this study and experiment was to design a new ensemble system by integrating two approaches: dimensionality reduction techniques using feature extraction methods and the novel hybrid SVM-TLBO regression model. A small improvement in the forecasting performance can minimize the stakeholders’ risk and lead to a considerable investment profit. The proposed model produced a system that revealed three key characteristics: (1) the resulting model reduced the input features (technical indicators) from seventeen to six features that had a greater than 95 % cumulative variance; (2) the time complexity of the proposed model was less than the benchmark models; and (3) the forecasts were more accurate than those of the benchmark models. The organization of this paper is as follows. Section 2 provides a summary of SVM for regression, TLBO optimization methods, and the novel hybrid SVM-TLBO model. In Section 3, we describe the proposed hybrid model architecture and experiments, followed by the empirical results and discussion in Section 4. Section 5 concludes the study with a brief discussion of our findings and future work.

2 Novel hybrid SVM-TLBO model

2.1 SVM for regression

Vapnik et al. [44] developed an SVM technique for regression. The method was presented in Hykins [11] as follows.

Given a training data set {(x 1,y 1),…,(x ,y )}, where each x i XR n (X denotes the input sample space) and matching target values y i R for i=1,…,l (where l corresponds to the size of the training data), the objective of the regression problem is to find a function f:R nR that can approximate the value of y for x not in the training set.

The estimating function f is defined as

$$ f(x)=(w^{T}{\Phi} (x))+b, $$
(1)

where wR m bR is the bias, and Φ denotes a nonlinear function from R n to high-dimensional space R m(m>n). The aim is to find w and b such that the value of f(x) can be determined by minimizing the risk.

$$ R_{\text{reg}} (f)=C\sum\limits_{i=1}^{n} {L_{\in } (y_{i} ,f(x_{i} ))} +\frac{1}{2}\left\| w \right\|^{2}. $$
(2)

Here, L is the extension of the ∈-insensitive loss function originally proposed by Vapnik et al. [44], which is defined as

$$ \mathrm{L}_{\in}=\left\{\begin{array}{ll} | \mathrm{y}-\mathrm{z} |- \in , &\quad | \mathrm{y}-\mathrm{z} |\ge \in \qquad{~}\\ 0,& \quad \text{otherwise} \qquad{~} \end{array} \right\}. $$
(3)

By introducing the slack variables ζ i and \(\zeta _{i}^{\ast }\), the problem in (2) can be reformulated to the following.

(P) Minimize \(C[\sum \limits _{i=1}^{l} {(\zeta _{i} +\zeta _{i}^{\prime })} ]+\frac {1}{2}\| w \|^{2}\) subject to

$$ \begin{array}{l} (i)y_{i} -w^{T}{\Phi} (x_{i} )-b\le \in +\zeta_{i} , \\ (ii)w^{T}{\Phi} (x_{i} )+b-y_{i} \le \in +\zeta_{i}^{\prime}, \\ (iii)\zeta_{i} \ge 0, \\ (iv)\zeta_{i}^{\prime}\ge 0, \end{array} $$
(4)

where i=1,…,l, and C is a user-specified constant known as a regularization parameter.

We can solve (P) using the primal dual method to get the following dual problem.

Determine (\(\left \{ {\alpha }_{\mathrm {i}} \right \}_{i=1}^{l} and \left \{ {\alpha }_{\mathrm {i}}^{\ast } \right \}_{i=1}^{l}\)) (α i and α i ∗ are the respective Lagrange multipliers for constraints (i) and (ii) of the primal quadratic optimization problem P in (4)), that maximize the objective function

$$\begin{array}{@{}rcl@{}} Q(\alpha_{i} ,\alpha_{i}^{\ast })&=&\sum\limits_{i=1}^{l} y{}_{i}(\alpha_{i} - \alpha_{i}^{\ast })-\in \sum\limits_{i=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast })}\\ &&-\frac{1}{2}\sum\limits_{i=1}^{l} \sum\limits_{j=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast })} (\alpha_{j} -\alpha_{j}^{\ast })K(x_{i} ,x_{j} ),\\ \end{array} $$
(5)

subject to

  1. (1)
    $$ \sum\limits_{i=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast })} =0,\,\, \text{and} $$
    (6)
  2. (2)
    $$ 0\le \alpha_{i} \le C,0\le \alpha_{i}^{\ast }\le C. $$
    (7)

Here, i=1,…,l, and K:X×XRis the Mercer kernel defined by

$$ K(x,z)={\Phi} (x)^{T}{\Phi} (z). $$
(8)

The solution of the primal dual method yields

$$ w=\sum\limits_{i=1}^{l} {(\alpha_{i} -\alpha_{i}^{\ast }){\Phi} (x_{i} )}, $$
(9)

where b is calculated using the Karush-Kuhn-Tucker conditions. That is,

$$ \begin{array}{l} \alpha_{i} (\varepsilon +\zeta_{i} -y_{i} +w^{T}{\Phi} (x_{i} )+b)=0, \\ \alpha_{i}^{\ast }(\varepsilon +\zeta_{i}^{\ast }+y_{i} -w^{T}{\Phi} (x_{i} )-b)=0, \end{array} $$
(10)
$$ (C-\alpha_{i} )\zeta_{i} =0\,\,\text{and}\,\,(C-\alpha_{i}^{\ast})\zeta_{i}^{\ast}=0\, ,\,\, \text{where}\,\, i=1,\ldots,l $$
(11)

Because \(\alpha _{i} \bullet \alpha _{i}^{\ast }=0\), and α i and \(\alpha _{i}^{\ast }\) cannot simultaneously be non-zero, there exists some i for which either α i ∈(0,C) or \(\alpha _{i}^{\ast }\in (0,C)\). Hence, b can be computed using

$$ \begin{array}{l} b=y_{i} -\sum\limits_{j=1}^{l} {(\alpha_{j} -\alpha_{j}^{\ast })K(x_{j} ,x_{i} )} -\varepsilon \quad for\,\,0<\alpha_{i} <C, \\ b=y_{i} -\sum\limits_{j=1}^{l} {(\alpha_{j} -\alpha_{j}^{\ast })K(x_{j} ,x_{i} )} +\varepsilon \quad for\,\,0<\alpha_{i}^{\ast }<C. \end{array} $$
(12)

The x i corresponding to 0<α i <C and \(0<\alpha _{i}^{\ast }<C\) are called support vectors. Using the expressions for w and b in (9) and (12), f(x) can be computed using

$$f(x)=\sum\limits_{i=1}^{n} {(\alpha_{i} -\alpha_{i}^{\ast })({\Phi} (x_{i})^{T}{\Phi} (x))} +b, $$
$$ =\sum\limits_{i=1}^{\ell} {(\alpha_{i} -\alpha_{i}^{\ast })K(x_{i} ,x)} +b. $$
(13)

Note that we do not require the function Φ to compute f(x), which is an advantage of using the kernel.

2.2 Teaching-learning-based optimization technique

Teaching learning based optimization (TLBO) is a recently established novel and effective meta-heuristic population based optimization algorithm [37]. TLBO uses a certain population of solutions to find the global solution, in a similar way as other nature-inspired algorithms such as PSO, GA, and ABC. The TLBO algorithm is based on a simulation of a traditional learning process, that is, the transfer of knowledge within a classroom atmosphere. The algorithm consists of two stages: (i) learners (students) first acquire knowledge from a teacher (teacher phase); and (ii) they enhance their knowledge by interacting with their peers (student phase). The TLBO population consists of a group of learners. There are decision variables, similar to other optimization algorithms. The different decision variables in TLBO are equivalent to the different subjects offered to students, and the students’ results are analogous to the ‘fitness’ value of the optimization problem.

2.2.1 Steps in the TLBO algorithm

The following steps of the TLBO algorithm were described by Rao et al. [37].

Step 1::

Define the optimization problem and create a solution space: In the initial phase, we identify the decision variable(s) in the problem to be optimized and assign them a range (minimum and maximum of the variable) where we will search for the optimal solution. If the solution spaces and ranges are not properly defined, then there is a chance that the optimization will take more time.

Step 2::

Identify the fitness function: In this step, we design or identify the fitness function, which accurately represents how well the optimized solution fits our problem using a single number. The TLBO algorithm uses the fitness function to evaluate its candidate solutions and obtains the optimal solution by minimizing (or maximizing) f(X) over the range of values of the decision variables (X), where f(X) is the fitness function.

Step 3::

Initialize the learners (or students): Each learner (based on the population size) is initialized using random values for each of the decision variables (within the appropriate ranges). The i-th learner is represented by row vector X i , defined as

$$ X_{i} =[x_{i,1} ,x_{i,2} ,x_{i,3} ,\ldots,x_{i,D} ],i=1,2,\ldots,N, $$
(14)

whereD is the number of decision variables, and N is the total number of learners. Each decision variable x i,j is randomly assigned a value using

$$ x_{i,j} =x_{j}^{\min } +rand()\ast (x_{j}^{\max } -x_{j}^{\min } )j=1,2,\ldots,D $$
(15)

where \(x_{j}^{\min } \) and \(x_{j}^{\max } \) are the minimum and maximum values of the j-th variable of i-th learner, and r a n d()is a function that returns a random number between 0 and 1.

Step 4::

Teacher phase

  1. a.

    Compute the mean value of each of the learners’ decision variables and denote the population mean as

    \(X_{mean} =[\overline {x}_{_{1} } ,\overline {x}_{_{2} } ,\ldots ,\overline {x}_{_{j} } ,\ldots ,\overline {x}_{_{D} } ],\) where \(\overline {x}_{_{j} } =\frac {\sum \nolimits _{i=1}^{N} {x_{i,j} } }{N}\)

  2. b.

    Compute the fitness values of each learner X based on the fitness function f(X).The learner with the best fitness value (solution) is identified as the teacher (X t e a c h e r ) for the teacher phase.

  3. c.

    Now the teacher (X t e a c h e r ) transfers their knowledge and tries to improve the fitness of other learners (X i ) by shifting the mean of the learners towards the teacher using

    $$\begin{array}{@{}rcl@{}} X_{new} &=&X_{i} \!+rand()\ast (X_{teacher} \,-\,(TF)\ast X_{mean} ),\\ for~ i&=&1,2,\ldots,N, \end{array} $$
    (16)

    where,

    $$ TF=round[1+rand(0,1)]. $$
    (17)

    Here, TF is the teaching factor (either 1 or 2), and r a n d() is a random number function that returns a number between 0 and 1.

    Note that TFis not a parameter of the TLBO algorithm. The value of TFis not provided as input to the TLBO, but its value is randomly chosen with equal probability by the algorithm using (17).

  4. d.

    If the updated solution (X n e w ) is better than the existing solution (X i ), then we accept the new solution, otherwise we reject it.

Step 5::

Student phase

In the student phase, the learners (students) enhance their knowledge by interaction with other peer learners in the classroom. The practice of mutual interaction between learners (students) tends to increase the knowledge of the learner. Therefore, an individual learner learns if the other individuals have more knowledge.

  1. a.

    Randomly select any two solutions X i and X j such that ij.

  2. b.

    Solution X i interacts with solution X j If f(X i ),that is, the fitness value of X i is better (superior) than fitness value of X j , then we update X i to X n e w using

$$ X_{new} =X_{i} +rand()\ast (X_{i} -X_{j} ) $$
(18)

otherwise, we update it to

$$ X_{new} =X_{i} +rand()\ast (X_{j} -X_{i} ) $$
(19)
Step 6::

Iterate until the termination criteria are satisfied

We then repeat Steps 4 and 5 until our termination conditions are satisfied, i.e., the average value of the fitness function for all learners does not improve, or we reach the maximum number of generations. The X i that minimizes f(X i )for a minimization problem (or X i that maximizes f(X i )for a maximization problem) is the final solution to the optimization problem.

A three-dimensional graphical illustration of single learner X i searching for the optimal solutions is presented in Fig. 1. The initial stage represents the status of the decision variables obtained by the learner for each of the parameters in the optimization problem, as in (15) X m e a n and X t e a c h e r represent the mean and current best status among all the learners (populations). The updated X n e w is the status of the learner after the teacher phase, which is updated based on (16). X n e w after the student phase represents the learner status after interacting with its peers in the student phase, where the status is updated using either (18) or (19) Note that the fitness value (i.e., distance between the Xs and corresponding f(X’s)) of each learner improves after each phase.

Fig. 1
figure 1

Three-dimensional graphical illustration of the learner (population) searching for optimal solutions, in the teacher and student phases

2.3 The novel hybrid SVM-TLBO regression model

The novel hybrid SVM-TLBO model proposed by Das and Padhy [8] predicts using SVM regression and uses TLBO to determine the SVM parameters. The hybrid model was designed to work in a two-dimensional solution space, that is, to optimize C and σ, where C is the regularization parameter of the SVM regression model and σ is the bandwidth of the radial basis (Gaussian) kernel function. The values of different parameters used in the SVM-TLBO novel hybrid model are presented in Table 1(a) and (b). The flow chart of the SVM-TLBO hybrid regression model is shown in Fig. 2. The raw time series financial data are processed to prepare the input set (features), and then the TLBO algorithm selects the optimal free parameters for the SVM regression model. We evaluate the fitness function for the optimization algorithm using the RMSE of the SVM regression results. In the training phase we apply the SVM regression model for each set of parameter (C and σ) values obtained by the TLBO algorithm. These multiple executions of the SVM regression model in the training phase increase the computational time, but this is the only overhead involved in the hybrid model. After determining the optimal parameters for the training data set, we apply the trained model to the test (out-of-sample) data to evaluate the performance of the forecasting model.

Fig. 2
figure 2

Flow chart representation of the novel hybrid SVM-TLBO regression model [8]

Table 1 (a) SVM and (b) TLBO parameters used in experiments [8]

3 Proposed ensemble model architecture and methodology

Fig. 3
figure 3

Flowchart for the proposed DR-SVM-TLBO ensemble model

We propose a new ensemble model called DR-SVM-TLBO for predicting financial time series, particularly the values of the energy commodity futures index. The proposed model is presented as a flowchart in Fig. 3. In the first step, the raw original time-series data collected from the market (i.e., MCX COMDEX) is input into the model for preprocessing and is used to calculate the technical indicators (see Appendix I). Detailed explanations of the data collection methodology, data, and preprocessing are given in Section 3.1. To determine the optimal number of features (to reduce dimensionality) from the 17 normalized input technical indicators, we used PCA dimension reduction to explain at least 95 % of the cumulative variance in our data set. Then the flow chart is divided into two stages: (1) dimensional reduction (critical feature extraction), and (2) implementation of SVM-TLBO hybrid model. In the dimensionality reduction stage, we apply feature extraction techniques using PCA, KPCA, and ICA to the normalized data. The number of critical features to be extracted is equal to the optimal number of input features (N) determined according to PCA to account for 95 % of the cumulative variance. After the dimension reduction step, we construct an input dataset containing the extracted features (reduced in size) In the SVM-TLBO stage, we apply the model to the reduced features. Here the training dataset is used to find the optimal values for the free parameters of the SVM regression model and the kernel function. The SVM-TLBO hybrid model used in the second stage is similar to the model developed by Das and Padhy [8] and presented in Fig. 2 The only difference is that we have omitted the first two blocks (i.e., data preprocessing and input preparation). These changes were required because data preprocessing and input preparation steps are already included in the proposed model. A detailed overview of the computation techniques used in our study is presented in Section 3.3 To forecast the values of the commodity futures index for a new data pattern, X, we must first apply the dimension reduction technique to extract the optimal feature values. Then, the trained SVM regression model is used to predict the value for the new data pattern. The out-of-sample (test) data are historical data, so the desired index values are known, and we can easily calculate the forecast performance.

3.1 Experimental data

To examine the effectiveness of the improved forecasting model, we applied it to real COMDEX data collected from the MCX (http://www.mcxindia.com) [8]. We collected daily trading series data points from January 1, 2010, to May 7, 2014, and used them as training and testing data. The total number of data samples in the time frame was 1,332. The time-series data consist of daily opening price, low price, high price, closing price, and traded date. We used 17 technical indicators as the inputs. The raw daily prices were used to calculate the technical indicators as per the details given in Appendix I. The data period includes many important and significant economic events so we consider this data to be appropriate for training the proposed models. Table 2 describes the dataset in terms of high, low, mean, and median values, as well as standard deviation, kurtosis (measure of the flatness of the distribution), and skewness (degree of asymmetry of the distribution close to its mean). Table 2 shows that the skewness value of the dataset is less than zero i.e., the dataset is left skewed (most values are concentrated to the right of the mean, with the extreme values to the left), there are lot of spikes in the dataset, and the kurtosis value is less than three i.e., it is a platykurtic distribution (flatter than a normal distribution with a wider peak). After processing the 1,332 raw data points, we obtained 1,307 transformed data points with dates from February 1, 2010 to May 7, 2014. The technical indicators were normalized to the range [0, 1] to minimize forecasting errors and to prevent variables with large numeric ranges from overwhelming the other data. The min-max normalization process was applied to the input technical indicators and the output closing prices. The technical indicators and closing prices were normalized using

$$\begin{array}{@{}rcl@{}} \bar{{x}}_{i}^{d} &=&\frac{{x_{i}^{d}} -\min \left( {x_{i}^{d}} \vert_{i=1}^{N} \right)}{\max \left( {x_{i}^{d}} \vert_{i=1}^{N} \right)-\min \left( {x_{i}^{d}} \vert_{i=1}^{N}\right)},\\ d&=&1,2,\ldots,17(number\,of\,input\,\text{variable}) \end{array} $$
(20)

where \(\bar {{x}}_{i}^{d} \) is the normalized value, \({x_{i}^{d}} \)is the original value, \(\min ({x_{i}^{d}} \vert _{i=1}^{N} )\) is the minimum value in the original input data, \(\max ({x_{i}^{d}} \vert _{i=1}^{N} )\) is the maximum value in the original input data, and N is the total number of trading days.

Table 2 Brief description of the MCX COMDEX dataset

The normalized data were segregated into training and test groups, approximately in the ratio of 5:1. The data were divided into training and testing samples based on previous work [7, 8, 21, 31, 45]. The ratio of training to test data used by Chen et al. [7] was 9:1, Das and Padhy [8] used approximately 5:1, Kim [21] and Lu [31] used 4:1, and Wang and Wang [45] used approximately 6:1. In our case, 1085 data points were used for training with 5-fold cross-validation, and the remaining 222 were used to test the model. We considered three different forecasts of the closing prices: (1) 1 day ahead; (2) 3 days ahead; and (3) 5 days ahead.

3.2 Performance measures and statistical test

The performance of the proposed model was evaluated using standard parametric statistical metrics: RMSE, MAE, NMSE, and DS [4, 7, 39]. The descriptions and definitions of these performance criteria are given in Table 3. The accuracy of the direction of the prediction is provided by DS (in %). Larger DS values indicate a better forecast. These parametric statistical tests require a distributional assumption (i.e., the data are normally distributed) and are not robust to outliers, so they may occasionally produce ambiguous results. Therefore, we also used nonparametric techniques to evaluate the significance of any differences in the test (out-of-sample) performance of the proposed model compared with the benchmark models. We applied the DM statistical test [9], which is a nonparametric statistical test extensively used for forecasting model validation, especially in economics and finance. In the DM test, the null hypothesis states that the two forecasting methods have the same forecasting accuracy, while the alternative hypothesis is that the two forecasting methods have different levels of accuracy. The null hypothesis of equal forecasting accuracy is rejected at the 5 % significance level; that is, if the computed absolute value of the DM statistic is greater than 1.96 (i.e., |DM value |> 1.96). In this study, we used the square-error criteria as the loss function in the DM test.

Table 3 Performance evaluation metrics and their definitions

3.3 Computation techniques

We implemented Vapnik’s SVM regression technique using LIBSVM, which is a SVM tool box [5]. We used the Gaussian kernel (radial basis) function, because it performs well under general smoothness assumptions. All the experiments were executed on an Intel Core i7 CPU @ 2.10 GHz, with 6 GB primary memory. We wrote our own code to implement TLBO for the SVM-TLBO hybrid regression model. The TLBO algorithm was defined in two dimensions, to optimize σ (bandwidth) of the Gaussian kernel parameter and C (regularization parameter) of the SVMs. In our experimental runs of the TLBO algorithm, there were no significant changes to σ and Cafter 25–30 iterations, when using a population size (learners/students) of 15. Pawar and Rao [34] and Rao and Patel [36], observed that TLBO only requires a small population and few iterations (generations). With this in mind, we fixed the maximum number of iterations for the TLBO to 30, with a population size 15. According to Tay and Cao [39], SVRs are insensitive to ε (the extent to which deviations are tolerated) if it is a reasonable value. Cao and Tay [4] observed that the number of support vectors decreases as ε increases. Thus, we chose ε= 0.0001. The SVM parameter range of C was set to 0.01–35,000, and the range of σ (bandwidth parameter of Gaussian kernel) was set to 0.0001–32 [28].

Fig. 4
figure 4

Cumulative variance for the PCA technique

For the proposed ensemble DR-SVM-TLBO model, we designed our own code for dimensionality reduction using the PCA, KPCA, and ICA techniques which we implemented in R (https://www.r-project.org). The ensemble model was implemented according to the flowchart in Fig. 3. As previously discussed, we determined the optimal number of features from the original 17 technical indicators (features) using PCA. We selected the number of dimensions based on the PCA results that accounted for at least 95 % of the cumulative variance in the dataset. The cumulative variance of the PCA result is presented in Fig. 4. The optimal number of features is six, because the cumulative variance of the first six principal components (PC1 to PC6) was 95.41 %. So we set the optimal number of features (components) for all the dimension reduction techniques to six. In the implementation of the KPCA technique, we used a Gaussian kernel (radial basis) with a bandwidth parameter of 0.01 and for ICA, we used the fast ICA algorithm [15].

Table 4 Model performance with respect to the RMSE and the optimal parameters for standard SVM, SVM-TLBO novel hybrid, DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO

The dimensional reduction in the proposed ensemble model uses PCA, KPCA, and ICA methods. So there are three different variants of the proposed model: 1) DR PCA-SVM-TLBO, the proposed model with PCA for dimensionality reduction; 2) DR KPCA-SVM-TLBO, the proposed model with KPCA for dimensionality reduction; and 3) DR ICA-SVM-TLBO the proposed model with ICA for dimensionality reduction. We ran the new ensemble DR-SVM-TLBO algorithm as per the flow chart in Fig. 3. The simulation results are shown in Table 4. We compared the results of all three variants (i.e., DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO), the new ensemble model with the standard SVM regression without dimension reduction, and the novel hybrid SVM-TLBO model [8]. We used a sequential optimization based algorithm to train the SVM regression, because it is fast and efficient for large data sets.

4 Experimental results and discussion

In this section, we present our experimental results regarding the efficiency of the proposed new ensemble model. The RMSE results and average computational time in milliseconds for all five models in the training phase (in-sample) and testing phase (out-of-sample), and the optimal values of C and σ are presented in Table 4. The testing phase (out-of-sample) RMSE, MAE, and NMSE values presented in Table 5 show that the new ensemble DRSVM-TLBO model (all three variants) outperformed the standard SVM regression and SVM-TLBO novel hybrid models in all three forecasting cases. This is because the parameters (C and σ) of the standard SVM were selected using a traditional grid search method whereas the optimal values of C and σ for SVM-TLBO, DR PCA-SVM-TLBO DR KPCA-SVM-TLBO and DR ICA-SVM-TLBO were obtained using the TLBO algorithm starting at random values within the defined solution space. In addition to the selection of the optimal SVM and kernel parameters, the dimension reduction techniques (i.e., PCA, KPCA, and ICA) extracted the input features from the original input set (17 technical indicators). The extracted input features contain less noise and more refined information. This changes the optimal values (of C and σ) derived by the optimization process and produced superior forecasting models. Table 4 clearly shows that the dimension reduction techniques used in our models reduced the average computational time and increased the accuracy when compared to the benchmark models. With respect to the DS performance metric, standard SVM performed better than rest of the models in the 1-day-ahead forecasts, DR ICA-SVM-TLBO performed best in the 3-days-ahead forecasts, and SVM-TLBO performed best for the 5-days-ahead forecasts. Financial market practitioners evaluate forecasting models using both the minimum forecast error and directional accuracy [26]. The aim is to get a directional accuracy (DS value) of over 50 % [7]. In our study, the DS values for the benchmark and the proposed new ensemble forecasting models were greater than 50 % in all the forecasting cases. Table 5 clearly shows that the proposed ensemble model with KPCA (i.e. DR KPCA-SVM-TLBO) outperformed the other models under consideration in this study. This is because the nonlinear kernel based PCA (i.e., KPCA) can include more discriminatory information to improve the accuracy of the forecasting model. The number in bold corresponds to the best performance.

Table 5 Comparison of the out-of-sample results with respect to the RMSE, MAE, NMSE, and DS of the standard SVM, SVM-TLBO novel hybrid, DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO ensemble models

Table 6 summarizes the DM statistic with the p-values for the DM test given in parentheses. We compared the proposed ensemble DR KPCA-SVM-TLBO forecasting model with two benchmark models (i.e. standard SVM and novel hybrid SVM-TLBO) and two variants of proposed models (i.e. DR PCA-SVM-TLBO and DR ICA-SVM-TLBO) for the 1-, 3-, and 5-days-ahead forecast cases. The results in Table 6 show that the p-values were smaller than the chosen significance level (i.e., 5 %) and the DM test values were greater than 1.96 except for the SVM-TLBO and DR ICA-SVM-TLBO models, when applied to the 3-days-ahead forecast. The absolute value of the DM test results of DR KPCA-SVM-TLBO compared to SVM-TLBO was 0.4223 (p-value: 0.6733) and for DR KPCA-SVM-TLBO compared to DR ICA-SVM-TLBO was 0.286 (p-value: 0.7752). These values are less than 1.96, so we cannot reject the zero hypothesis at the 5 % significance level. That is, the experimental difference between the forecasting performance of these models is not significant and might be due to stochastic variations. From these observations, we can conclude that the proposed DR-SVM-TLBO (all three variants) yields more accurate predictions than the benchmark models. And among the proposed ensemble models, DR KPCA-SVM-TLBO performed the best. Table 7 gives the percentage improvements of the proposed ensemble model for all three variants (i.e. DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO) over the benchmark novel hybrid SVM-TLBO model for the out-of-sample (test) data with respect to the RMSE and MAE. Figure 5a, b, and c show box plots of the MAE for the 1-, 3-, and 5-days-ahead forecasts, respectively, for the standard SVM, SVM-TLBO novel hybrid, and the proposed models (i.e. DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO). The middle square in each box plot represents the MAE. The box plots clearly show that DR KPCA-SVM-TLBO has the smallest range and smallest standard error deviation (denoted by the lines above and below the box). This shows that the DR KPCA-SVM-TLBO model outperformed all the other models.

Fig. 5
figure 5

Box plot of MAE values using standard SVM, SVM-TLBO novel hybrid, and our proposed new ensemble models (i.e. DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO): (a) 1-day-ahead forecast, (b) 3-days-ahead forecast, and (c) 5-days-ahead forecast

Table 6 Diebold-Mariano statistic: DM test values and p-values (in parentheses) for the MSE loss function
Table 7 Percentage (%) improvement of the proposed new ensemble DR-SVM-TLBO (all three variants) model over the benchmark novel hybrid SVM-TLBO [8] model for out-of-sample data, with respect to RMSE and MAE

5 Conclusions and future work

In this study, we extended the novel hybrid SVM-TLBO model by incorporating dimension reduction techniques. To reduce the number of input variables (features), we used three well known dimensional reduction techniques: PCA, KPCA, and ICA. We used multicommodity futures index data collected from MCX to examine the feasibility of the proposed ensemble model. Our models performed better than existing methods. Our conclusions are summarized as follows

  1. 1.

    The average computational time results (Table 4) suggest that reducing the number of input variables (features) decreased the computational time.

  2. 2.

    Our empirical results show that DR-SVM-TLBO (i.e. DR PCA-SVM-TLBO, DR KPCA-SVM-TLBO, and DR ICA-SVM-TLBO) produced better predictions than the standard SVM regression method and the SVM-TLBO hybrid model. Among the three variants of the proposed ensemble model, DR KPCA-SVM-TLBO performed the best.

  3. 3.

    DR KPCA-SVM-TLBO improved the RMSE by 50.23 % (for the 1-day-ahead forecast), 28.43 % (for the 3-days-ahead forecast), and 20.03 % (for the 5-days-ahead forecast), when compared with the SVM-TLBO hybrid regression model. The DR KPCA-SVM-TLBO model also improved the MAE result by 55.45 % (1-day-ahead), 30.63 % (3-days-ahead), and 17.87 % (5-days-ahead), when compared with the SVM-TLBO novel hybrid regression model. There were similar improvements in terms of MAE and RMSE for the other two variants of the proposed model (i.e. DR PCA-SVM-TLBO and DR ICA-SVM-TLBO).

  4. 4.

    The results of the DM statistical test (Table 6) show that all the DM tests comparing the proposed model (DR KPCA-SVM-TLBO) with the other models yielded values greater than 1.96 (the threshold value at the 5 % significance level). The corresponding p-values lie within the 5 % significance level in all cases except DR KPCA-SVM-TLBO compared to SVM-TLBO and DR KPCA-SVM-TLBO compared to DR ICA-SVM-TLBO, for the 3-days-ahead forecast. The DM test confirms that the predictive accuracy of our proposed model is statistically significantly better than that of the benchmark models.

In this study, we selected quantitative technical indicators (features) based on previous research work by different resear-chers in this area and feedback from a domain expert. We could improve the predictive performance by including non-quantitative factors like data from breaking news and social media, efficient macroeconomics factors, and psychological factors. One limitation of this study is that we used a relatively small dataset. Despite this, we achieved reasonably good forecasts. The proposed hybrid model should provide better forecasting results when applied to larger volumes of data. The successful application of our proposed model to non-linear and highly complex financial time-series data suggests that it may be useful in other domains.