1 Introduction

In the business environment, we wish to accurately and efficiently forecast various kinds of financial variables to develop successful strategies and avoid large losses [43]. Many researchers have considered financial time series data forecasts since the 1980s, with the objective of beating the financial market. There are a huge number of factors (economic, political, environmental, and psychological) that make financial forecasting an interesting and challenging field. Further, financial time series are inherently noisy, nonstationary, and deterministically chaotic [2, 38].

Conventional statistical methods such as time series and multivariate analyses have been used in most prediction techniques. However, researchers have started to apply artificial intelligence (AI) methods to financial markets, because of recent successful developments such as artificial neural networks (ANNs), support vector machines (SVMs), particle swarm optimization (PSO), genetic algorithms (GAs), and fuzzy technologies. Refenes et al. [34], Tsibouris and Zeidenberg [40], and Steiner and Wittkemper [37] used ANN models to predict stock prices in the UK, US, and German markets, respectively. Wittkemper and Steiner [42] and Shazly et al. [36] used ANNs with GAs (hybrid models) to predict stock prices and currency exchange rates in Germany, the United Kingdom, Japan, and Switzerland.

Vapnik [41] introduced SVM methods to overcome the problems of ANNs (such as getting trapped in local minima, overfitting to training data, and long training times). Since then, several authors have proposed financial instruments pricing using SVMs. For example, Tay and Cao [38] and Cao and Tay [2] developed pricing models for five specific financial futures in the US market using SVMs, and Gestel et al. [7] used LS-SVM (Least squares support vector machine) for T-bill (Treasury bill) rate and stock index pricing in the US and German markets. Their simulated results show that SVM outperforms ANNs. Moreover, SVM has also been shown to perform better than ANNs and other statistical methods in other domains [23, 24].

Nicholas and Ravi [25] published an exhaustive survey on SVMs for time series prediction. They surveyed papers in the areas of financial market prediction, electric utility load forecasting, environment states, weather prediction, and reliability forecasting. In their survey, they noted that the free parameters chosen for the SVM are significant. The experimental result by Kim [14] showed that SVM predictions are sensitive to these free parameters, and that it is important to select optimal values. Improperly selected free parameters can cause over- or under-fitting problems [14]. The nonlinear nature of financial time series data means that we use nonlinear kernel functions such as the Gaussian and polynomial functions, which require appropriate, user chosen parameter(s). Current approaches for choosing these free parameters are typically based on domain knowledge, cut and try, and ergodic search methods [4]. Several studies proposed selecting the optimal free parameters for SVM/ANNs using PSO, GAs, artificial bee colonies (ABCs), ant colony optimization (ACO), differential evolution (DE), simulated annealing (SA), and so on [3, 11, 12, 21, 22, 44]). However, the optimization model itself introduces additional user-specified controlling parameter(s), making the user’s task even more complex. For example, GAs need optimal controlling parameter values for crossover and mutation probabilities; PSO needs specified optimal controlling parameters such as inertia weight and social and cognitive parameters; SA needs cooling temperature and cooling constants; DE requires a differentiation factor and a crossover probability; and ABC requires the optimal controlling parameters for the number of bees (employed, scout, and onlookers), limits, and so on. Variations to the controlling parameters alter the effectiveness of the optimization algorithm.

Rao et al. [30] proposed the teaching–learning-based optimization (TLBO) algorithm, which is an optimization technique for a mechanical design problem that does not require user defined parameters. They tested their novel technique using different benchmark functions. Their results show that TLBO can outperform many optimization algorithms such as particle evolutionary swarm optimization, ABC, and cultural DE. Similarly, Rao et al. [31] compared TLBO with well-known optimization techniques such as GA, ABC, PSO, HS, DE, and hybrid-PSO, by applying the methods to different benchmark problems [such as Griewank (\(D\, = \,10\)), Hyper Sphere (\(D\, = \,6\)), Rosenbrock (\(D\, = \,1\), \(D\, = \,3\)), Rastrigin, and Ackley]. They considered the effectiveness of TLBO in terms of different performance criteria (such as the average number of function evaluations, success rate, convergence rate, and mean solution). These results also showed that the TLBO method performed better than other nature-inspired optimization techniques, for the considered benchmark functions. The TLBO optimization technique developed by Rao et al. [30] performed well in many studies [2633, 35, 45].

In this study, we propose an SVM–TLBO hybrid regression prediction model for forecasting the multicommodity futures index (COMDEX) traded in the Multi Commodity Exchange of India Limited (MCX). We use the TLBO algorithm to select the free parameter(s) of the SVM, and the free parameter(s) of the kernel function. We compared the standard SVM method with the SVM–TLBO hybrid technique. The commodity futures index under consideration is a significant indicator for the performance of the Indian commodities market. MCX COMDEX is composed of futures contracts on 15 physical commodities with three subindices, representing the key commodity sectors within the index: metals, energy, and agricultural. Investors can use MCX COMDEX futures to efficiently hedge commodity and inflation exposure and lay off residual risk [1]. We developed the SVM–TLBO hybrid regression model because the most important consideration when using the standard SVM model is to properly select the free parameters [C (regularization) and \({\kern 1pt} \varepsilon\) (insensitive loss function radius)] and the kernel parameter(s) for training the data.

TLBO does not require any user-defined, controlling parameters(s), which means that can it effectively determine the free parameter(s) of the SVM without any user input. Our experimental results show that the proposed hybrid SVM–TLBO regression model produces better forecasts than the PSO + SVM hybrid and standard SVM models. The remainder of this paper is structured as follows. In Sect. 2, we provide a summary of SVM regression and the SVM-TLBO hybrid regression model for selecting the optimal free parameters. Section 3 contains the proposed method for predicting the commodity futures index, followed by our results, comparisons, and analysis in Sect. 4. Section 5 concludes the study and outlines some future work.

2 SVM for regression and the SVM–TLBO hybrid regression model

2.1 SVM for regression

Vapnik and his coworkers have developed an SVM technique for regression. The method was presented as follows.

Given a training data-set \(\{ (x_{1} ,y_{1} ),\,\, \ldots \,\,,(x_{\ell } ,y_{\ell } )\}\) (where each \(x_{i} \in X \subset R^{n}\), and \(X\) denotes the input sample space), and matching target values \(y_{i} \in R\) for \(i\,\, = 1,\,\,.\,\,.\,\,.\,\,,\,l\) (where \(l\) corresponds to the size of the training data), the objective of the regression problem is to find a function \(f:R^{n} \, \to R\) that can approximate the value of \(y\) when \(x\) is not in the training set.

The estimating function \(f\) is defined as

$$f(x) = (w^{T} \varPhi (x)) + b\,,\,$$
(1)

where \(w \in R^{m} ,\,b \in R\) is the bias, and \(\varPhi\) denotes a nonlinear function from \(R^{n}\) to high-dimensional space \(R^{m}\) (\(m\,\, > \,\,n\)). The aim is to find \(w\) and \(b\) such that the value of \(f(x)\) can be determined by minimizing the risk

$$R_{\text{reg}} (f) = C\sum\limits_{i = 1}^{n} {L_{ \in } (y_{i} ,f(x_{i} ))} + \frac{1}{2}\left\| w \right\|^{2} .$$
(2)

Here, \(L_{ \in }\) is the extension of the \(\in\)-insensitive loss function originally proposed by Vapnik [41], which is defined as

$$L_{ \in } \, = \left\{ {\begin{array}{*{20}c} {y - z|\, - \, \in ,} & {y\, - \,z|\, \ge \, \in } \\ {0,} & {\text{otherwise}} \\ \end{array} } \right\}.$$
(3)

By introducing the slack variables \(\zeta_{i}\) and \(\zeta_{i}^{*}\), the problem in Eq. (2) can be reformulated to the following.

(P) Minimize \(C\,\,\left[ {\sum\limits_{i = 1}^{l} {(\zeta_{i} + \zeta_{i} ')} } \right]\,\, + \,\,\frac{1}{2}\left\| w \right\|^{2}\) subject to

$$\begin{aligned} y_{i} - w^{T} \varPhi (x_{i} ) - b\,\,\, \le \,\, \in + \zeta_{i} , \hfill \\ w^{T} \varPhi (x_{i} ) + b - y_{i} \,\,\, \le \,\, \in + \zeta_{i}^{{\prime }} , \hfill \\ \zeta_{i} \ge 0\,, \hfill \\ \zeta_{i}^{{\prime }} \ge 0\,, \hfill \\ \end{aligned}$$
(4)

where \(i\,\, = 1,\, \ldots \,,\,l\), and \(C\) is a user-specified constant known as regularization parameter.

We can solve (P) using the primal dual method to get the following dual problem.

Determine the Lagrange multipliers (\(\left\{ {\alpha_{\text{i}} } \right\}_{\text{i = 1}}^{l} and\,\,\left\{ {\alpha_{\text{i}}^{*} } \right\}_{\text{i = 1}}^{l}\)) that maximize the objective function

$$Q(\alpha_{i} ,\,\,\alpha_{i}^{*} ) = \sum\limits_{i = 1}^{l} {y_{i} (\alpha_{i} - \alpha_{i}^{*} ) - \, \in } \sum\limits_{i = 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} ) - \frac{1}{2}} \,\,\sum\limits_{i = 1}^{l} {\sum\limits_{j = 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} )\,} (\alpha_{j} - \alpha_{i}^{*} )\,\,K(x_{i} ,\,x_{j} ),} \,$$
(5)

subject to

$$\sum\limits_{i = 1}^{l} {(\alpha_{i} - \alpha_{i}^{*} ) = 0,}$$
(6)

and

$$0 \le \alpha_{i} \le C,\,\,\,0 \le \alpha_{i}^{*} \le C.$$
(7)

Here, \(i\,\, = 1,\, \ldots ,\,l\), and \(K:\,X\,\, \times X\, \to \,R\) is the Mercer kernel defined by

$$K(x,z) = \varPhi (x)^{T} \,\varPhi (z).$$
(8)

The solution of the primal dual method yields

$$w = \sum\limits_{i = 1}^{l} {(\alpha_{j} - \alpha_{i}^{*} )\,\,\varPhi (x_{i} ),}$$
(9)

where \(b\) is calculated using the Karush–Kuhn–Tucker conditions. That is,

$$\begin{aligned} \alpha_{i} (\varepsilon + \zeta_{i} - y_{i} + w^{T} \varPhi (x_{i} ) + b) = 0, \hfill \\ \alpha_{i}^{*} (\varepsilon + \zeta_{i}^{*} + y_{i} - w^{T} \varPhi (x_{i} ) - b) = 0, \hfill \\ \end{aligned}$$
(10)
$$(C - \alpha_{i} )\,\zeta_{i} = 0\,\,\,\,and \, \,(C - \alpha_{i}^{*} )\,\zeta_{i}^{*} = 0,\,\,{\text{where}}\,i\,\, = 1, \ldots ,\,l.$$
(11)

Since \(\alpha_{i} \, \bullet \,\alpha_{i}^{*} = 0\) both \(\alpha_{i}\) and \(\alpha_{i}^{*}\) cannot be simultaneously non-zero, there exists some i for which either \(\alpha_{i} \in (0,C)\) or \(\alpha_{i}^{*} \in (0,C)\) and hence \(b\) can be computed using

$$\begin{aligned} b & = y_{i} - \sum\limits_{j = 1}^{l} {(\alpha_{j} - \alpha_{j}^{*} )\,\,{\text{K}}(x_{j} ,x_{i} ) - \varepsilon } \quad for\,\,0 < \alpha_{i} \, < C, \\ b & = y_{i} - \sum\limits_{j = 1}^{l} {(\alpha_{j} - \alpha_{j}^{*} )\,\,{\text{K}}(x_{j} ,x_{i} ) + \varepsilon \quad for\,\,0 < \alpha_{i}^{*} < C.} \\ \end{aligned}$$
(12)

The \(x_{i}\) corresponding to \(0 < \alpha_{i} \, < C\) and \(0 < \alpha_{i}^{*} < C\) are called support vectors. Using the expressions for \(w\) and \(b\) in Eqs. (9) and (12), \(f(x)\) can be computed using

$$\begin{aligned} f(x) & = \sum\limits_{i = 1}^{n} {(\alpha_{i} - \alpha_{i}^{*} )(\varPhi (x_{i} )^{T} \varPhi (x)) + b,} \\ & = \sum\limits_{i = 1}^{\ell } {(\alpha_{i} - \alpha_{i}^{*} )K(x_{i} ,x) + b.} \\ \end{aligned}$$
(13)

Note that we do not require function \(\varPhi\) to compute \(f(x)\), which is an advantage of using the kernel.

Advantages of SVM

SVMs have become a well-established tool within machine learning. Conceptually, they have many advantages, which include the following.

  1. a.

    The technique is methodical and derived from statistical learning theory.

  2. b.

    The SVM process requires convex function optimization, so there is a unique optimal solution (global minima).

  3. c.

    The model has an explicit strong dependence on a subset of the data points (support vectors), which improves model design.

  4. d.

    The relatively easy training process is a major strength of SVM.

  5. e.

    There are no local optima, unlike ANNs.

  6. f.

    The method scales moderately well to high-dimensional data, and the tradeoff between model complexity and errors can be explicitly controlled using appropriate optimal parameters.

Disadvantage of SVM

SVMs have the following disadvantage.

  • The training time is roughly between a quadratic and cubic function of the number of samples in the training set.

2.2 Teaching–learning-based optimization technique

TLBO is a newly developed novel and effective meta-heuristic population based optimization algorithm [30]. It is similar to PSO, GAs, and ABC. TLBO is modeled on the transfer of knowledge within a classroom atmosphere, where learners (students) first acquire knowledge from a teacher (teacher phase) and then from their peers (student phase). The population in TLBO consists of a group of learners. There are decision variables, similar to other optimization algorithms. The different decision variables in TLBO are equivalent to the different subjects offered to students, and the students’ grades are equivalent to the “fitness” in other population-based optimization methods. The flow chart for TLBO algorithm is presented in Fig. 1.

Fig. 1
figure 1

Flow chart for the TLBO algorithm

Salient features of TLBO

TLBOs have the following features.

  • Similar to other population-based methods (e.g., GAs, PSO, and ABC), TLBO uses many results to proceed to the optimal solution.

  • We do not need to tune any additional algorithm-specific controlling parameter.

  • It uses the best solution of the current iteration to modify the existing solution in the population, which increases the convergence rate.

  • The mean value of the population is used to update the solution.

  • A good solution is accepted using a greediness approach.

  • The population is not divided, unlike methods such as the ABC algorithm.

2.2.1 Steps involved in the TLBO algorithm

The following steps of the TLBO algorithm were described by Rao et al. [30].

Step 1: Define the optimization problem and create a solution space

In the initial phase, we identify the decision variable(s) in the problem to be optimized and assign them a range (minimum and maximum of the variable) where we will search for the optimal solution. If the solution spaces and ranges are not properly defined, then there is a chance that the optimization will take more time.

Step 2: Identify the fitness function

In this step, we design or identify the fitness function, which accurately represents how well the optimized solution fits our problem using a single number. The TLBO algorithm uses the fitness function to evaluate its candidate solutions and obtains the optimal solution by minimizing f (X), where f (X) is the fitness function.

Step 3: Initializing learners (or students)

Each learner (based on the population size) is initialized using random values for each of the variables (within the appropriate ranges).

The ith learner is represented by row vector \(X_{i}\), defined as

$$X_{i} = \,\left[ {x_{i,1} ,x_{i,2} ,x_{i,3} , \ldots ,x_{i,D} } \right],\,\,\,i\, = \,1,2, \ldots ,N ,$$
(14)

where \(D\) is the number of decision variables, and N is the number of learners. Each decision variable \(x_{i,j}\) is randomly assigned a value using

$$x_{i,j} \, = \,x_{j}^{\hbox{min} } + rand()*\left( {x_{j}^{\hbox{min} } -_{j}^{\hbox{min} } } \right)\,\,\,\,\,j = 1,2, \ldots ,D ,$$
(15)

where \(x_{j}^{\hbox{min} }\) and \(x_{j}^{\hbox{max} }\) are the minimum and maximum values of the jth variable of ith learner, and \(rand()\) is the random number function that returns a number between 0 and 1.

Step 4: Teacher phase

  1. (a)

    Compute the mean value of each of the learners’ decision variables and denote the population mean as

$$X_{mean} = \left[ {\bar{x}_{1} ,\bar{x}_{2} , \ldots ,\bar{x}_{j} , \ldots ,\bar{x}_{D} } \right]\,,\,\,{\text{where}}\,\,\,\bar{x}_{j} \, = \,\frac{{\sum\nolimits_{i = 1}^{N} {x_{i,j} } }}{N}.$$
  1. (b)

    Compute the fitness values of each learner \(X\) based on the fitness function f(X). The learner with the best fitness value (solution) is identified as the teacher (\(X_{teacher}\)) for the teacher phase.

  2. (c)

    Now the teacher (\(X_{teacher}\)) transfers their knowledge and tries to improve the fitness of other learners (\(X_{i}\)) using

$$X_{new} = \,X_{i} + \,rand()\,\,*\,(X_{teacher} \,\, - \,(TF)\,\,*\,X_{mean} )\,\,,\,\,\,\,{\text{for}}\quad\,i\, = \,1,2, \ldots ,N,$$
(16)

where

$$TF = \,round\left[ {1\, + \,rand\,(0,1)} \right]\,.$$
(17)

Here, \(TF\) is the teaching factor (either 1 or 2), and \(rand()\,\) is the random number function that returns a number between 0 and 1.

Note that \(TF\) is not a parameter of the TLBO algorithm. The value of \(TF\) is not provided as input to the TLBO, but its value is randomly chosen by the algorithm using Eq. (17).

  1. (d)

    If the previously mentioned updated solution (\(X_{new}\)) is better than the existing solution (\(X_{i}\)), then we accept the new solution, otherwise we reject it.

Step 5: Student phase

In the student phase, the learners (students) enhance their knowledge by communicating with other learners in the classroom. Therefore, an individual learner learns if the other individuals have more knowledge.

  1. (a)

    Randomly select any two solutions \(X_{i}\) and \(X_{j}\) such that \(i\, \ne \,j\,.\)

  2. (b)

    If f(X i ), that is, the fitness value of \(X_{i}\) is better than \(X_{j}\), then we update \(X_{i}\) to \(X_{new}\) using

$$X_{new} = \,X_{i} + \,rand()\,\,*\,(X_{i} \,\, - \,X_{j} )\,$$
(18)

otherwise, we update it to

$$X_{new} = \,X_{i} + \,rand()\,\,*\,(X_{j} \,\, - \,X_{i} )\, .$$
(19)

Here, \(rand()\) is the random number function that returns a number between 0 and 1.

Step 6: Iterate until the termination criteria are satisfied

We then repeat Steps 4 and 5 until our termination conditions are satisfied, i.e., the average value of the fitness function for all learners does not improve, or we reach the maximum number of generations. The \(X_{i}\) that minimizes f(X i ) is the final solution of the optimization problem.

2.3 SVM–TLBO hybrid regression model

We propose a hybrid SVM–TLBO regression model, which uses SVM for predictions and TLBO for determining the SVM parameters. SVM can use many kernels, for example, linear, polynomial, sigmoid, wavelet, and Gaussian kernels. We have considered the Gaussian kernel (radial basis) function. This produces better financial time series forecasts [2, 38] because the data are complex and nonlinear. A SVM with a Gaussian kernel has three parameters that must be optimized. That is, \(C\) (regularization), \({\kern 1pt} \sigma\) (kernel width), and \({\kern 1pt} \varepsilon\) (insensitive loss function radius).

We designed the proposed SVM–TLBO hybrid regression model to work in a two-dimensional solution space, that is, to optimize \(C\) and \({\kern 1pt} \sigma\). We keep the \(\varepsilon\) parameter constant at a reasonable value (i.e., 0.0001), because the number of support vectors decreases as \(\varepsilon\) increases, when \(\varepsilon\) is greater than 0.01 [2]. A flow chart of the SVM–TLBO hybrid regression model is presented in Fig. 2.

Fig. 2
figure 2

Flow chart representation of the SVM–TLBO hybrid regression model

3 Proposed methodology

3.1 Dataset

We applied our forecasting model to real multicommodity futures index (MCX COMDEX) data collected from the MCX (http://www.mcxindia.com). MCX COMDEX is a collection of futures contracts on 15 physical commodities with a simple weighted average of three subindices (MCX AGRI, MCX METAL, and MCX ENERGY), which represent the key commodity sectors within the index. The index captures various sectors that incorporate futures contracts drawn on metals, energy, and agricultural commodities that are traded in the MCX. 1332 daily trading data points were collected from MCX COMDEX from January 1, 2010, to May 7, 2014. The time series data consist of daily open price, low price, high price, closing price, and traded date. The raw daily prices were used to calculate our financial technical indicators inputs. The time span covers many important and significant economic events, which we think are appropriate for training the models. Table 1 describes the data set in terms of high, low, mean, median, standard deviation, kurtosis (measure of flatness of the distribution), and skewness (degree of asymmetry of a distribution near its mean). The raw daily closing prices are plotted in Fig. 3. The data description in Table 1 and the plot in Fig. 3 clearly show that the data are well-spread. Therefore, an SVM trained with these data should be a well-generalized model.

Table 1 Description of MCX COMDEX dataset
Fig. 3
figure 3

Closing prices of MCX COMDEX

3.2 Preprocessing of data

We derived 17 financial technical indicators using the collected data, and used these indicators as input into the SVM regression model to forecast the closing price of the futures index. The technical indicators were computed using the formulas in Table 2. Financial technical indicators are a class of metrics whose values are derived from generic price activities in financial markets, and are extensively used by traders to predict the future price levels of a financial instrument by looking at past patterns. These financial technical indicators capture random price fluctuations in the market and offer a smoother perspective, because they are trend following or lagging indicators. The 17 financial technical indicators used in our study are based on previous work by Kim and Han [15], Kim [14], Kim and Lee 16], Tsang et al. [39], Ince and Trafalis [9], Huang and Tsai [8], Liang et al. [19], Lai et al. [17], and Chih-Ming [6], and from feedback from domain experts. The indicators are (1) 10-day moving average, (2) 20-day bias, (3) moving average convergence/divergence (MACD), (4) stochastic indicator %K, (5) stochastic indicator %D, (6) stochastic slow %D, (7) Larry William’s %R, (8) rate of change (ROC), (9) relative strength index (RSI), (10) commodity channel index (CCI), (11) psychological line, (12) buying/selling momentum indicator, (13) buying/selling willingness indicator, (14) momentum, (15) disparity 5, (16) disparity 10, and (17) moving average oscillators (MAO). After processing the 1332 raw data points, we obtained 1307 transformed data points with dates from February 1, 2010 to May 7, 2014. The 25 data points from January 1, 2010 to January 31, 2010, are not available because of the definitions of some technical indicators. For example, the buying/selling momentum and willingness indicators require 26 days of data.

Table 2 Technical indicators (features)

We linearly normalized the technical indicators so they have a range of [0, 1]. This normalization procedure minimizes the forecasting errors and stops variables with larger numeric ranges from dominating those with smaller numeric ranges. We applied this to both the input technical indicators and the output closing prices. The technical indicators and closing prices were normalized using

$$Y_{i} \, = \,\frac{{\left( {P_{i} - \,P_{\hbox{min} } } \right)}}{{\left( {P_{\hbox{max} } - \,P_{\hbox{min} } } \right)}},\,\quad for \quad i\, = \,1,2,3, \ldots ,\,\,N ,$$
(20)

where \(Y_{i}\) is the normalized value,\(P_{i}\) is the original value, \(P_{\hbox{min} }\) and \(P_{\hbox{max} }\) are the minimum and maximum values in the original data, and \(N\) is the total number of trading days.

The normalized data were segregated into training and test groups, approximately in the ratio of 5:1. Hence, 1085 data points were used for training with 5-fold cross-validation, and the remaining 222 were used to test the model. We considered three different forecasts of the closing prices: (1) 1 day ahead; (2) 3 days ahead; and (3) 5 days ahead.

In the 1-day-ahead forecasting case, the normalized technical indicators for each trading day from February 1, 2010 to April 30, 2014, and the normalized closing price for the next trading day (from February 2, 2010 to May 1, 2014, 1 day ahead) were partitioned into training and testing sets. The data were split up in a similar way for the 3 and 5-days-ahead forecasts.

3.3 Performance criteria

We evaluated the performance of the proposed model using standard statistical metrics: root mean square error (RMSE), normalized mean squared error (NMSE), mean absolute error (MAE), and directional symmetry (DS) [2, 38, 43]. Detailed descriptions and definitions of these performance criteria are given in Table 3. RMSE, MAE, and NMSE measure the deviation between the actual and forecasted futures index prices, so smaller values are preferred. The accuracy of the direction of the prediction is provided by DS (in %). Larger DS values indicate a better forecast.

Table 3 Performance metrics

3.4 Computation Techniques

We implemented Vapnik’s SVM regression technique using LIBSVM, which is a SVM tool box [5]. SVMs for financial time series forecasting commonly use the polynomial kernel \(\left( {k\left( {x,y} \right) = \left( {x.y + 1} \right)^{d} } \right)\) or the Gaussian kernel \((k\left( {x,y} \right) = { \exp }\left( {\left( { - 1/{\varvec{\upsigma}}^{ 2} } \right)\left| {\left| {x{-}y} \right|} \right|^{ 2} } \right)\). d is the degree of the polynomial kernel and \({\varvec{\upsigma}}^{2}\) is the width (bandwidth) of the Gaussian kernel. We used the Gaussian kernel (radial basis) function, because it performs well under general smoothness assumptions. Additionally, the Gaussian kernel has fewer parameters than the polynomial kernel. The polynomial kernel produces inferior results when compared with the Gaussian kernel, and requires more training time [2, 21, 38, 43]. We used an Intel Core i7 CPU, 4 GB memory PC for our simulations.

Traditional procedures for optimizing the parameters of the SVM model and the kernel function use grid search [13] or cross-validation [10] methods. However, both of these methods are computationally expensive and data intensive [12]. Grid search is a local search technique that often becomes trapped in local optima, and it is sometime hard to determine its search interval [21]. In this study, we used grid search to find the best values for C and \(\varvec{\sigma}^{2}\) using cross-validation. We considered different pairs of (C, \(\varvec{\sigma}^{2}\)) and then selected those that minimized the error, which we then used in our comparisons. In the simulation experiment, we used C values in the range 0.01 to 35,000, and \(\varvec{\sigma}^{2}\) values between 0.0001 and 32 (Table 4). After determining the final (C and \(\varvec{\sigma}^{2}\)) values for all three forecasting cases (i.e., 1, 3, and 5 days ahead), we trained the model again to generate the final forecasting model. The index prices obtained for the standard SVM regression model are shown in Figs. 4a–c.

Table 4 (a) SVM and (b) TLBO parameters used in our experiments
Fig. 4
figure 4

Actual and predicted futures index prices using the SVM regression, PSO + SVM hybrid model, and SVM–TLBO hybrid regression models, for the a 1-day-ahead forecast, b 3-days-ahead forecast, and c 5-days-ahead forecast

We wrote our own code to implement TLBO for the proposed SVM-TLBO hybrid regression model. The TLBO algorithm was defined in two dimensions, to optimize \(\varvec{\sigma}^{2}\) (bandwidth) of the Gaussian kernel parameter and C (regularization parameter) of the SVMs. In our experimental runs of the TLBO algorithm, there were no significant changes to \(\varvec{\sigma}^{2}\) and C after 25–30 iterations, when using a population size (learners/students) of 15. Rao and Patel [29], Pawar and Rao [26], Rao et al. [28], and Rao and Waghmare [33] also observed that TLBO only requires a small population and few iterations (generations). With this in mind, we fixed the maximum number of iterations for the TLBO to 30, with a population size 15 (Table 4). We observed that the value of our objective function decreased when the algorithm went from the teacher to student phases within the same iteration, and reduced with the number of iterations. Similar observations were made by Rao et al. [31]. When defining the solution space for TLBO, the range of C was set to 0.01–35,000, and the range of \(\varvec{\sigma}^{2}\) was set to 0.0001–32 [20]. The hybrid regression model algorithm ran as per the flow chart provided in Fig. 2, and the simulation results are shown in Fig. 4 and Table 8. In these results, the kth test day means the (1085 + k)th day from our reference date (February 1, 2010), because we have taken the first 1085 days of data for training, and used the remaining 222 days for testing. We compared the results of our proposed SVM–TLBO hybrid regression approach with standard SVM regression and the PSO + SVM model of Lin et al. [21]. We used a sequential optimization (SMO)-based algorithm to train the SVM regression, because it is fast and efficient for large data sets.

4 Results and discussion

The RMSE results of the SVM regression model in the training and testing phases, and the final values of C and \(\varvec{\sigma}^{2}\) are presented in Table 5 for all three forecasting cases.

Table 5 Model performance and final parameter settings using the standard SVM regression model

The results for the proposed SVM–TLBO hybrid regression model, and the optimal parameters are summarized in Table 6.

Table 6 Model performance and optimal parameters achieved by proposed SVM-TLBO hybrid regression model

4.1 Comparisons of results

The RMSE, MAE, and NMSE values presented in Table 7 show that the SVM–TLBO hybrid regression model outperformed the standard SVM regression and PSO + SVM hybrid approaches in all three forecasting cases. With regard to the DS performance metric, SVM–TLBO performed better than the standard SVM and PSO + SVM models in two forecasting cases (3 and 5 days ahead), but standard SVM performed better for the 1-day-ahead forecast. Financial market practitioners evaluate forecasting models using both minimum forecast error and directional accuracy [18]. The aim is to get a directional accuracy of over 50 % [43]. In our study, the DS values for the SVM–TLBO hybrid and standard SVM methods were greater than 50 % in all cases. The DS values for the PSO + SVM hybrid approach was greater than 50 % for the 1-day-ahead and 3-days-ahead forecasts, but it was 48.15 % for the 5-days-ahead forecasts. The number in bold is the best performance. 

Table 7 Comparison of the results of the standard SVM, PSO + SVM hybrid, and SVM–TLBO hybrid regression models

Figure 4 shows the actual futures index prices, and the prices predicted using standard SVM regression, the PSO + SVM hybrid model, and the proposed SVM–TLBO hybrid regression model for the three types of forecasts. Table 8 represents the forecasting results in terms of index prices for a few data samples using the standard SVM, PSO + SVM hybrid, and SVM–TLBO hybrid regression model. It can be clearly seen from Table 8 that the index prices from the proposed SVM–TLBO hybrid model were more accurate than standard SVM, and were much better than the PSO + SVM hybrid model.

Table 8 Forecasting results using the SVM regression, SVM-TLBO hybrid regression, and PSO + SVM hybrid models

5 Conclusions and future work

In this research, we examined the feasibility of applying the newly developed novel TLBO algorithm to select optimal free parameters for an SVM regression model of financial time-series data. We used multicommodity futures index data collected from MCX. Our experimental results show that our proposed SVM–TLBO hybrid regression model effectively found the optimal parameters, and produced better predictions than the standard SVM method. The proposed model improved the MAE result by 65.87 % (for the 1-day-ahead forecast), 55.83 % (for the 3-days-ahead forecast), and 67.03 % (for the 5-days-ahead forecast), when compared with standard SVM regression. The proposed model also improved the RMSE result by 55.64 % (1 day ahead), 55.74 % (3 days ahead), and 57.3 % (for 5 days ahead), when compared with standard SVM regression. There were similar improvements in terms of MAE and RMSE when we compared the proposed SVM–TLBO hybrid regression method with the PSO + SVM hybrid model. Moreover, our experiments demonstrate that the proposed SVM–TLBO hybrid regression model is more efficient than the standard SVM and PSO + SVM hybrid models for financial time series forecasting. The proposed model avoids user-specified control parameters, which are required when using optimization methods such as PSO, GAs, and ACO.

In our current model, we selected the technical indicators (features) using previous research in this area and expert feedback. We could enhance the accuracy of the forecast by including efficient macroeconomic features. The proposed model can also be applied to other domains in the future, to validate and extend the model.