1 Introduction

Stock market can be defined as a complex nonlinear dynamic system. Many factors have an influence on the stock market, and also complex and subtle interrelation are available between these factors [1, 2]. Therefore, analysis of the stock market is one of the interesting points of stock market research over the years. Especially with the stock market continued to heat up, accurate stock market prediction has become one of the key issues for countries all over the world. One of the main ideas to remarkable stock market research is providing the best prediction results using the minimum relevant input data [3]. For these purposes, a great deal of attention has been focused on the idea of using soft computing models for stock market prediction since correctly designed models can easily converge on the optimal result. However, their performance depends on parameter tuning and input variable selection. Particularly, variable selection is a central issue for model developer. It improves the performance of the proposed model. In addition, variable selection not only provides faster and more cost-effective predictors but also ensures a better understanding of the fundamental system [4]. Variable selection models are essentially divided into filter models, wrapper models, and embedded models. Filter models choose subsets of input variables as a preprocessing step. Wrapper models use the learning machine of interest as a black box to evaluate input variable subsets considering their predictive power. Embedded models perform input variable selection in the process of training and are normally specific to given learning machines [4]. It should be emphasized that not a single variable selection model is universally superior to the others. To specify which variable selection model is best to use and how to use it, one should first determine the purpose of variable selection and then study the characteristics of the dataset at hand [5].

To determine optimal settings of input variables and to tune parameters for stock price prediction, Harmony Search (HS) is used for Neural Network (NN), Jordan Recurrent Neural Network (JRNN), Extreme Learning Machine (ELM), Recurrent Extreme Learning Machine (RELM), Generalized Linear Model (GLM), Regression Tree (RT), Gaussian Process Regression (GPR) in this study. The selection of input variables strongly influences complexity and performance of models, but the performance of models is also affected by different factors such as the number of hidden neurons and the transfer function type. Too few or too many hidden neurons can lead to underfitting or overfitting issues, respectively, that greatly degrade the generalization capability to lead with significant deviation in prediction. In addition, the determination of transfer functions can strongly affect complexity and performance of the models. Hence, what kinds of transfer functions should be selected is very important.

In literature, many variants of the soft computing based architecture were created to improve the performance of the models. For example, Zhu et al. [6] proposed an evolutionary ELM model using differential evolution (DE), which is utilized to find the optimal input weights and hidden biases, and Moore–Penrose (MP)-generalized inverse that is utilized to define the output weights. Evolutionary ELM provides an opportunity to reduce the hidden neurons and to improve the generalization performance. Suresh et al. [7] used a real coded genetic algorithm (RCGA) to determine proper values for the free parameters in ELM whose input weights, bias values, and the number of hidden neurons are specified automatically. RCGA-based ELM has the ability to find small network to approximate the classifier function, but the time taken to receive the results is more. Lan et al. [8] proposed a constructive model for hidden neurons selection to become stable and to handle the architectural design of ELM network. Basically, proposed model is to identify the significance of each hidden neuron and to determine the optimal subset of hidden neurons achieving comparable generalization performance. Saraswathi et al. [9] used a particle swarm optimization (PSO), integer-coded genetic algorithm, and ELM for accurate gene selection and sparse data classification. In the study, the optimal input weights are determined by PSO algorithm and integer-coded genetic algorithm, which reduce the gene numbers, and are used to reduce the computational effort. The proposed model provides high classification accuracy with just twelve genes. Lahoz et al. [10] assigned the biases and hidden weights of the single-hidden layer feedforward neural network (SLFN) using bi-objective microgenetic algorithm that gives reasonable results while maintaining the time of execution. Huang and Lai [11] employed the PSO to optimize the structural risk minimization function for the optimal number of hidden neurons in ELM. To improve ELM classifier’s generalization performance, Xue et al. [12] proposed a variable-length PSO algorithm to determine the number of neurons in layer as well as corresponding input weights and hidden biases. Bazi et al. [13] applied the DE to analyze the model selection problem of ELM. Also, orthogonal crossover is used to improve the search ability of the DE. The experimental result showed that DE-based ELM is faster than the DE-based support vector machine with accurate and efficient classification. Hegazy et al. [14] used a flower pollination algorithm that selects input weights and hidden biases to create more compact network structure than traditional ELM model for monthly stock price prediction. The proposed model has the ability to achieve lowest error value with little advance since its parameter is few and can be easily tuned. To improve the balance between explorative and exploitive power, Yang et al. [15] proposed a differential evolution coral reef optimization (DECRO), where the DE is utilized to perform the broadcast spawning. It is determined that DECRO-based ELM is prone to improve the prediction accuracy of DE-based ELM and coral reef optimization-based ELM. Furthermore, it enhances the prediction speed of the ELM. In recent years, RELM is also used to improve the prediction performance of ELM [16, 17].

Ruxanda and Badea [18] built various configurations of NNs to make predictions on Bucharest Stock Market Index and then evaluated NN models in terms of prediction errors. In the study, the number of hidden neurons is altered between 2 and 6, logistic sigmoid and hyperbolic tangent sigmoid are used as a type of activation function, and gradient descent and Broyden–Fletcher–Goldfarb–Shanno method are applied as a training algorithm. Developing more realistic models basically depends on NN parameters. To increase the performance of NN models, a significant amount of studies (especially, hybrid NN models) have been done in this field. Guresen et al. [19] reviewed artificial NN and hybrid models for time series forecasting in a detailed way and evaluated the performance of dynamic artificial NN, hybrid NNs, and multilayer perceptron in forecasting time series used in market values. Hsieh et al. [20] presented an integrated system for stock price prediction that used wavelet transforms and Recurrent Neural Network (RNN) whose weights and biases are determined by artificial bee colony algorithm. To ensure that the application of the model is sufficiently robust, proposed model is applied to four stock markets. Wei and Cheng [21] used RNN to build a prediction model and proposed three refined processes in the hybrid model where essential technical indicators are selected from popular indicators by a correlation matrix, stepwise regression, and a decision tree. The study supported that selecting the important technical indicators reduces forecasting errors more effectively. Zahedi and Rounaghi [22] used NN to predict stock price and applied 20 accounting variables in principal component analysis model to evaluate their effects on stock price and to identify effective factors in stock price by using real values. To reduce the computational complexity, Anish and Majhi [23] combined the variable selection with a feedback type of the functional link artificial NN model trained with recursive least square algorithm for efficient prediction of stock prices. Dash et al. [24] presented a hybrid feedforward functional link dynamic neural network model to ensure a trade-off between speed and accuracy.

To summarize, nowadays one is equipped with a multitude of models, but the choice of the models that allow for successful and efficient solution of a particular problem is usually not trivial. The main problem is parameter tuning since no definite and explicit model is available to determine optimal parameter setting for prediction model. In this case, metaheuristics can be used to systematically assign the optimal setting of parameters. In this study, HS is used to tune optimal parameter for proposed models. The reason of selecting HS can be summarized as follows: (1) there is no need to use initial values for the decision variables; (2) the derivative information is unnecessary since a stochastic random search is used by HS; (3) HS is able to adapt easily to a broad class of optimization problems due to its simplicity [25]. In the light of previous studies, it can also be said that HS does not require parameter fine-tuning to reach high-quality solutions. Thus, it is less sensitive to the values of the chosen parameters [26]. It is hoped that the present paper may help find the best model, or at least filter out the ones that are not promising in stock price prediction. The main aims of this study are summarized as follows:

  • Select the relevant technical indicators for NN, JRNN, ELM, RELM, GLM, RT, and GPR models,

  • Determine the proper number of hidden neurons for NN, JRNN, ELM, RELM models,

  • Determine the proper number of context neurons for JRNN and RELM models,

  • Determine the appropriate transfer function type for NN, JRNN, ELM, RELM models,

  • Create HS-NN, HS-JRNN, HS-ELM, HS-RELM, HS-GLM, HS-RT, and HS-GPR models for 1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction,

  • Testing of the models with three stocks that are selected among the BIST-100 companies, and their names are Eregli Iron and steel company (EREGL), Eczacıbası Pharmaceutical industry (ECILC), and Afyon cement industry (AFYON),

  • To compare the forecasting performance of models, five performance measures namely Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Theil’s Inequality Coefficient (TheilU), and Directional Prediction Statistics (DS) are utilized.

The remainder of this paper is created as follows. The basic structure of the proposed models is described in Sect. 2. In Sect. 3, a detailed analysis of the models is given. Section  4 is devoted to conclusions.

2 Proposed models

The lack of certain architectures in a particular model is still a controversial issue. For soft computing models, the performance of the proposed models is highly dependable not only on input variable selection but also on many architectural parameters. Therefore, we firstly used HS to tune the optimum parameters to provide the desired output. Then, we synthesized determined parameters and constructed the hybrid models for 1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction, and three stocks are used.

Overall, dataset is divided into training and testing datasets. Training dataset covers the period from April 17, 2013, up to September 11, 2015, and there are 1200 trading sessions. Testing dataset cover the period from September 14, 2015, up to November 30, 2015, and there are 60 trading sessions. Original dataset is transformed to a normalized dataset to increase convergence ability and decrease the noise. In this study, sigmoid normalization is applied as normalization technique using the following equation:

(1)

where xnew represents the value after normalization and xold denotes the value before normalization. ʎ = 1/xmax, and xmax is the maximum value of a dataset.

In stock price prediction, determination of relevant technical indicators is one of the critical steps. A large number of technical indicators can cause irrelevant information, while a small number of technical indicators could be insufficient for proposed models. Since the chosen relevant variables subset could signify better the original character of dataset, prediction with these input variables could increase the accuracy and efficiency. In this study, relevant technical indicators are selected from initial variable pool (Table 1). Note that initial variable pool is the same as with the study of Göçken et al. [27], but only today’s close–previous close price was extracted.

Table 1 Initial variable pool

For particular stock or index, the fitness of a technical indicator is dynamic. Stocks can also demonstrate conflicting trends on different time. The optimization of technical indicators should be adaptive to changes in the fitness and to generate new instances of the indicator as required [28]. Relevant technical indicators can change according to the proposed models. In this paper, the number of candidate variables to be selected is fixed at five for each model. When input variables (technical indicators) are higher than five, the computational time increases. In addition, the basic idea to effective forecasting in stock market is ensuring the best results using the least complex model with the minimum required input data [3]. Therefore, the number of selected technical indicators is restricted to be five in HS to improve performance of the proposed hybrid models. The complete set of different transfer functions for each model is given in Tables 2 and 3.

Table 2 List of available transfer function types in HS-NN/HS-JRNN models
Table 3 List of available transfer function types in HS-ELM/HS-RELM

It is known that the selection of transfer function strongly influences the complexity, but how to determine a suitable transfer function is still an unresolved problem. Actually, the choice of transfer functions depends on the model type and the problem type. Hence, this study is to be useful for researchers working on transfer functions.

What’s more, we used HS as a simple and efficient model to set the proper number of hidden and context neurons since the number of neurons directly influences the generalization performance of the models.

2.1 Harmony Search (HS)

HS mimics the improvisation process of musicians. To find a best harmony, each musician plays a note. In same manner, the decision variables of the cost (or profit) function can be typically represented by musicians in optimization problems. HS attempts to determine an optimal solution vector in this function where each decision variable (musician) produces a value (note) [29]. In general optimization problem, basic steps of the HS can be summarized as follows [30]:

Step 1

The optimization problem is briefly described.

$$\begin{aligned} & Minimize\;({\text{or}}\;Maximize)\;f(\vec{x}) \hfill \\& {\text{subjected}}\;{\text{to}}\;x_{i} \in X_{i} , \quad i = 1, 2, \ldots , N. \hfill \\ \end{aligned}$$
(2)

where \(f \left( \cdot \right)\) denotes a scalar objective function; \(\vec{x}\) represents a solution vector including decision variables \(x_{i}\); \(X_{i}\) denotes the set of possible range of values for each decision variable \(x_{i}\) and changes between the lower \(\left( {Lx_{i } } \right)\) and upper bounds (\(Ux_{i}\)) for each decision variable; and \(N\) represents the number of decision variables. In this step, harmony memory size (HMS), harmony memory considering rate (HMCR), pitch adjusting rate (PAR) and the number of improvisations (NI) are also initialized.

Step 2

HM is initialized using Eq. (3).

$$x_{i}^{j} = Lx_{i} + rand()\left( {Ux_{i} - Lx_{i} } \right)$$
(3)

where \(j = 1, 2, 3, \ldots ,HMS\) and \(rand()\) is a uniformly distributed random number between 0 and 1.

Step 3

This step is responsible for creating a new potential variation. The new harmony vector (\(\vec{x}^{\prime} = x_{1}^{\prime} , x_{2}^{\prime} , \ldots ,x_{N}^{\prime} )\) is created using memory consideration, pitch adjustment, and random selection. The value of the decision variables for the new vector is selected from any of the values already existing in the current HM.

$$x_{i}^{\prime} \in \left\{ {\begin{array}{*{20}l} {x_{i}^{\prime} \in x_{i}^{1} ,x_{i}^{2} , \ldots ..x_{i}^{\text{HMS}} ,} \hfill & {{\text{with}}\;{\text{probability}}\;{\text{HMCR}}} \hfill \\ {x_{i}^{\prime} \in X_{i} ,} \hfill & {{\text{with}} \;{\text{probability}} \;1 - {\text{HMCR}}} \hfill \\ \end{array} } \right.$$
(4)

Then, parameter PAR is applied by using following equation:

$$x_{i }^{\prime} = \left\{ {\begin{array}{*{20}l} {x_{i}^{\prime} \pm rand() \times bw } \hfill & {{\text{with}}\;{\text{probability}}\;{\text{PAR}}} \hfill \\ {x_{i}^{\prime} } \hfill & {{\text{with }}\;{\text{probability}}\, \left( {1 - {\text{PAR}}} \right)} \hfill \\ \end{array} } \right.$$
(5)

where \(bw\) represents an arbitrary distance bandwidth.

For JRNN and RELM models, HM includes five parts. The number of hidden neurons, the number of context neurons, and transfer function type are represented by first, second, and third parts, respectively. The fourth part denotes the information on whether the considered technical indicator is selected or not. The last part of the HM represents the objective function. Note that similar representation of the HM matrix can be found in [27]. For NN and ELM models, HM includes four parts. The number of hidden neurons and transfer function type are represented by first and second parts, respectively. The third part denotes the input variable selection. The final part of the HM represents the objective function. For GLM, RT, and GPR models, HM only includes two parts. First one denotes input variables, and second part represents the objective function.

Step 4

A new harmony vector (\(\vec{x}^{\prime} = x_{1}^{\prime} , x_{2}^{\prime} , \ldots ..x_{N}^{\prime}\)) is improvised in HM. Then, HM is updated and the generated solution is evaluated. If its fitness value is better than the worst harmony in the HM, it will replace the worst harmony with the new one. Otherwise, it is eliminated.

Step 5

HS terminates after a certain maximum number of iterations. Otherwise, HS will repeat steps 3 and 4. HS parameter values for proposed models are given in Table 4. Note that these values were determined by considering upper bound of parameters in [31] where HMS value ranges between 10 and 50, value of HMCR ranges between 0.7 and 0.95, and PAR is set between 0.01 and 0.3.

Table 4 HS parameter values

2.2 Hybrid models

Although various models and algorithms have been presented in literature for input variable selection, determining the transfer function type is still controversial. However, the selection of transfer functions can strongly affect complexity and performance of models and has been said to play a significant role in the convergence of the models. Hence, we used HS for variable selection and parameter tuning including the determination of the transfer function type, and the optimal number of hidden and context neurons. After HS-NN, HS-JRNN, HS-ELM, HS-RELM, HS-GLM, HS-RT, and HS-GPR are created using all optimal solutions that are generated by HS. Thus, proposed models calculate an output to the problem based on the parameters specified by HS that takes a number of iterations for convergence. Note that the number of iteration in hybrid models is same with HS.

The general structure of proposed hybrid models is seen in Fig. 1. Initially, the dataset is divided randomly into four subsets of equal size. Then, each subset is divided to 80% of data as training and 20% of data as testing set. However, the percent distributions of training and testing dataset can be any other ratios such as 50/50, 60/40, 70/30, and 95/5. In subsequent sections, each step of the models is described to give the details of the hybrid modes.

Fig. 1
figure 1

Block diagram of proposed hybrid models

2.2.1 Harmony Search-based Neural Network (HS-NN)

NNs are formed in layer by layer basis. Each layer can include one or more computational neurons. The jth neuron of a NN can be expressed as:

$$y_{j} = \varphi_{j} \left( {\mathop \sum \limits_{i} w_{ji} x_{i} - b_{j} } \right)$$
(6)

where \(y_{j}\) is the output of jth neuron, \(w_{ji}\) is the connection weight between ith neuron and jth neuron, \(x_{i}\) is ith input, \(b_{j}\) is the bias at the jth neuron, and \(\varphi_{j} \left( \cdot \right)\) denotes the transfer function at jth neuron. Since \(x\) represents the input (known) and variable \(y\) specifies the output (to be computed), we need information on the other remaining variables using a training process that may help determine the optimal values. More details can be found in [32].

The selected variables and optimized parameter values of HS-NN model are given in Table 5. Hyperbolic tangent sigmoid transfer function (1) and elliot sigmoid transfer function (2) are generally used in HS-NN. Also, the most commonly used number of hidden neurons is 20 neurons. Typical price (42) is the most commonly used technical indicator as input variables.

Table 5 Selected variables and optimized parameter values for HS-NN

2.2.2 Harmony Search-based Jordan Recurrent Neural Network (HS-JRNN)

JRNN can be considered as an extended version of the NN models. It includes input, hidden, output, and context layers. Note that temporal information is extracted from the input data by means of context layer. In JRNN, connections between the context layers and the output layer are fixed with a weight of one. Weights are set to carry out a specific application. Then, learning algorithm is performed in JRNN and input data are used to calculate the output of the network. In proposed model, data from lower layer neurons are propagated forward to upper layer neurons through feedforward connection network (see Hikawa and Araga [33] for details).

For HS-JRNN, the number of input variables, the transfer function type, and the number of hidden and context neurons are optimized with HS algorithm (Table 6).

Table 6 Selected variables and optimized parameter values for HS-JRNN

In HS-JRNN, two most commonly used transfer function types are pure linear transfer function and normalized radial basis transfer function. The most commonly used number of hidden layer neurons is 9 neurons, while the most commonly used number of neurons in the context layer is 10 neurons. K % stochastic (29) is the most commonly used technical indicator.

2.2.3 Harmony Search-based Extreme Learning Machine (HS-ELM)

A new learning algorithm for the SLFN called the ELM overcomes the aforementioned disadvantages of NN. ELM avoids being stuck at local optimization and is reasonably faster than the traditional gradient-based algorithms. In addition, it ensures NN models with better generalization performance [15]. ELM has an ability to reach the smallest training error and obtain the smallest norm of weights [34]. On the other hand, without the proper input layer weights and bias, more hidden layer neurons are required to enhance the performance of ELM [15]. In ELM, the input weights and hidden biases are randomly determined and MP-generalized inverse is utilized to define the output weights [6]. Details can be found in [34]. Briefly, the steps of the ELM are:

Given a training set \(\varPhi = \left\{ {\left( {x_{k} , t_{k} } \right)|\left( {x_{k} } \right) \in {\mathbb{R}}^{n} , t_{k} \in {\mathbb{R}},\quad k = 1,2, \ldots ,N} \right\}\), hidden neuron transfer function type \(h\left( x \right)\), and the number of hidden neurons, \(L\),

  1. 1.

    Randomly initiate considering any continuous sampling distribution hidden neuron parameters.

  2. 2.

    Calculate the hidden layer output matrix \(H\):

    $$H = \left[ {\begin{array}{*{20}c} {h\left( {x_{1} } \right)} \\ \vdots \\ {h\left( {x_{N} } \right)} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {h_{1} \left( {x_{1} } \right)} & \cdots & {h_{L} \left( {x_{1} } \right)} \\ \vdots & \ddots & \vdots \\ {h_{1} \left( {x_{N} } \right)} & \cdots & {h_{L} \left( {x_{N} } \right)} \\ \end{array} } \right]$$
    (7)

    where \(h_{i} \left( x \right)\) denotes a transfer function of the ith neuron. The ith column of \(H\) represents the ith hidden neuron output vector with respect to inputs \(x_{1} , x_{2} , \ldots .,x_{N}\).

  3. 3.

    Calculate the output weights \(\beta_{i}\):

    $$\beta = H^{\dag} T$$
    (8)

    where \(\beta = \left[ {\beta_{1} ,\beta_{2} , \ldots ,\beta_{L} } \right]^{T}\) specifies the vector of the output weights, \(T = \left[ t_{1} ,t_{2} , \ldots ,t_{N} \right]^{\text{T}}\) represents the training data output matrix, and \(H^{\dag}\) denotes the MP-generalized inverse of matrix \(H\). \(\beta\) is a result of the minimization of approximation error:

    $$\min\| H\beta - T\|$$
    (9)

The output function of ELM is defined as (one output case):

$$f_{L} \left( x \right) = \mathop \sum \limits_{i = 1}^{L} f_{i} \left( x \right) = \mathop \sum \limits_{i = 1}^{L} \beta_{i} h_{i} \left( x \right) = h\left( x \right)\beta$$
(10)

where \(f_{i} \left( x \right)\) = \(\beta_{i} h_{i} \left( x \right)\) is the weighted output of the ith hidden neuron.

The output function \(f_{L} \left( x \right)\) is a linear combination of the transfer functions \(h_{i} \left( x \right)\) [35].

In HS-ELM, triangular basis is used as transfer function type for 1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction models except 10-day-ahead prediction models for EREGL that is used sigmoidal transfer function type. The number of hidden neurons is usually 16 neurons. Previous high (2), 10-day simple moving average (7), 10-day triangular moving average (15), high price accelerator (21), and opening price momentum (24) are generally selected as technical indicators (Table 7).

Table 7 Selected variables and optimized parameter values for HS-ELM

2.2.4 Harmony Search-based Recurrent Extreme Learning Machine (HS-RELM)

RELM has been shown a novel training approach for a single-hidden layer JRNN whose output can be defined by Ertugrul [17]:

$$T = \mathop \sum \limits_{j = 1}^{m} \beta_{j} g\left( {\mathop \sum \limits_{i = 1}^{n} w_{i,j} x_{i} + \mathop \sum \limits_{i = n + 1}^{n + r} w_{i,j} \delta \left( {t - i + n} \right) + b_{j} } \right)$$
(11)

where m is the neuron numbers in the hidden layer and n is the neuron numbers in the input layers. \(\delta\) denotes delay, t shows the instance order, r represents the employed context neuron numbers that are backward connections from the output layer to the input layer.

The feedbacks in RELM are taken as new inputs with delay and attached to the H matrix. Note that the major difference between RELM and ELM is that RELM is built for sequential (time ordered) datasets, but the order of the data in the dataset is not important in ELM. Details can be found in [17]. In this paper, the number of input variables, transfer function type, number of hidden, and context neurons, are optimized. In HS-RELM, sigmoidal transfer function type is applied for all models. The number of hidden layer neurons is generally 21 neurons, and the number of context neurons is usually 16 neurons for proposed models. In general, previous open (4), 10-day exponential moving average (11), William  %R (33), lower Bollinger band (37), and median price (40) are selected as technical indicators (Table 8).

Table 8 Selected variables and optimized parameter values for HS-RELM

2.2.5 Harmony Search-based Generalized Linear Model (HS-GLM)

In GLMs, the predictor variables \(X_{j} \left( {j = 1, \ldots ,p} \right)\) are combined to generate a linear predictor (LP) that is related to the expected value \(\mu = E\left( Y \right)\) of the response variable Y through a link function g(), such as:

$$g\left( {E\left( Y \right)} \right) = LP = \alpha + X^{T} \beta$$
(12)

where Y is the response variable, \(\alpha\) is a constant denoted the intercept, \(X = \left( {X_{1} , \ldots ,X_{p} } \right)\) denotes a vector of p predictor variables, \(\beta = \left\{ {\beta_{1} , \ldots ,\beta_{p} } \right\}\) represents the vector of p regression coefficients [36]. In HS-GLM, close price accelerator (23) is found to be the most commonly used technical indicator as seen in Table 9.

Table 9 Optimal variable subset for HS-GLM

2.2.6 Harmony Search-based Regression Tree (HS-RT)

RT is an automatic classifier and creates a binary tree structure classifier. The repeated splits of subsets into two descendant subsets are used to produce the tree. Every split is an inquiry about the input variables, and the answers of “yes” and “no” lead to the left and right descendant subsets, respectively. Details of RT can be found in [37]. In HS-RT model, 6-day simple moving average (6), 5-day triangular moving average (13), low price accelerator (22), close price accelerator (23), and K % stochastic (29) are selected as input variables for 1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction models and three stocks (ECILC, EREGL, and AFYON).

2.2.7 Harmony Search-based Gaussian Process Regression (HS-GPR)

GPR is a flexible model to handle the problems of nonlinear regression. In GPR, gaussian process is a collection of random variables \(t = \left( {t\left( {x_{1} } \right),t\left( {x_{2} } \right), \ldots } \right)\) which have a gaussian joint distribution,

$$P\left( {t|C,x_{n} } \right) = \frac{1}{Z}\exp \left( { - \frac{1}{2}\left( {t - \mu } \right)^{T} C^{ - 1} \left( {t - \mu } \right)} \right)$$
(13)

for any input sets \(\left\{ {x_{n} } \right\}\). \(C\) represents the covariance matrix denoted by the covariance function \(C\left( {x_{n} ,x_{m} ;\varTheta } \right)\) which is parametrized by hyperparameters \(\varTheta\), and \(\mu\) denotes the mean function. GPR predict \(\tilde{t}\) on the new data point \(\tilde{x}\) giving predictive mean and variance of the posterior distribution.

$$\tilde{y}\left( {\tilde{x}} \right) = {\mathbf{k}}\left( {\tilde{x}} \right)C_{N}^{ - 1} t$$
(14)
$$\sigma_{{\tilde{y}}}^{2} \left( {\tilde{x}} \right) = C\left( {\tilde{x},\tilde{x}} \right) - k\left( {\tilde{x}} \right)C_{N}^{ - 1} k\left( {\tilde{x}} \right)$$
(15)

where \({\mathbf{k}}\left( {\tilde{x}} \right) = \left( {C\left( {x_{1} ,\tilde{x}} \right), \ldots , C\left( {x_{N} ,\tilde{x}} \right)} \right)\) represents the covariance between the training data and \(\tilde{x}\), and \(C_{N}\) denotes the \(N \times N\) covariance matrix of training data points given the covariance function \(C\).\(C_{N}^{ - 1} {\mathbf{t}}\) specifies independent of the new data. Details of GPR can be found in [38]. In this study, 6-day simple moving average (6), 5-day triangular moving average (13), low price accelerator (22), close price accelerator (23), and K % stochastic (29) are determined as the input variables for HS-GPR model.

3 Results and discussion

There exists ample evidence that stock markets serve as one of the leading indicator for the economy, and hence, stock price prediction at higher frequencies can provide early indication to the direction the economy might be headed, besides providing information about rising imbalances and risks. We presented how hybrid soft computing models are applied effectively to the problem of analyzing stock market and to prognosticate future price in a useful and applicable manner. We aimed at encouraging researchers to seek and embrace new opportunities of soft computing models in stock markets. In this study, HS is utilized to construct seven robust hybrid models for stock price prediction. To ensure that proposed hybrid models can be applicable in other stock markets, our work is also applied to three stocks (ECILC, EREGL, and AFYON). To clarify which prediction model performed the best under different conditions, five different statistical performance measures including MAPE, MAE, RMSE, TheilU, and DS are used. MAPE is a relative measure that denotes errors as a percentage of the actual data. It is simple to calculate and easy to understand [39]. MAPE is calculated as follows:

$${\text{MAPE}} = \frac{100}{N}\mathop \sum \limits_{t = 1}^{N} \left| {\frac{{X_{t} - F_{t} }}{{X_{t} }}} \right|$$
(16)

\(F_{t}\) is predicted stock price, \(X_{t}\) is actual stock price, and N denotes total number of tests for all performance measures. The MAPE values for 1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction models are given in Table 10 for three stocks. It is clearly seen that although the prediction accuracy of the models changes according to prediction terms and stocks, HS-JRNN is the best model according to the MAPE values. Also, HS-RELM model for ECILC and AFYON has good prediction performance in comparison with other models. The MAPE performances of AFYON are quite worse than the other stocks (ECILC and EREGL).

Table 10 MAPE values for proposed models

MAE is dependent on the scale of the dependent variable, but it is less sensitive to large deviations than the usual squared loss. Its formula:

$${\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{t = 1}^{N} \left| {X_{t} - F_{t} } \right|$$
(17)

RMSE formula is given in Eq. (18).

$${\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{t = 1}^{N} (X_{t} - F_{t} )^{2} }$$
(18)

HS-JRNN is the best model according to MAE and RMSE values (Tables 11 and 12). Note that the smaller difference between RMSE and MAE means the smaller variance in the individual errors of the sample.

Table 11 MAE values for proposed models
Table 12 RMSE values for proposed models

TheilU is scale invariant and lies between 0 and 1. Note that 0 shows a perfect fit, and hence, low values are the ideal case of perfect forecast. TheilU is calculated as follows:

$${\text{TheilU}} = \frac{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{t = 1}^{N} (X_{t} - F_{t} )^{2} } }}{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{t = 1}^{N} (X_{t} )^{2} } + \sqrt {\frac{1}{N}\mathop \sum \nolimits_{t = 1}^{N} (F_{t} )^{2} } }}$$
(19)

The assigned values by DS are assumed to be between 0 and 100. The larger values of DS mean better performance in stock market prediction. DS is calculated using Eqs. 20 and 21. The TheilU and DS values for 1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction models are given in Tables 13 and 14.

Table 13 TheilU values for proposed models
Table 14 DS values for proposed models
$${\text{DS}} = \frac{100}{N}\mathop \sum \limits_{t = 1}^{N} a_{t}$$
(20)
$$a_{t} = \left\{ {\begin{array}{*{20}l} {1, } \hfill & {\left( {X_{t} - X_{t - 1} } \right)\left( {F_{t} - X_{t - 1} } \right) \ge 0} \hfill \\ 0 \hfill & { {\text{otherwise}}} \hfill \\ \end{array} } \right.$$
(21)

Although the prediction performances of models vary according to the prediction terms and stocks, HS-JRNN is reasonably better than the other models considered in this study. Furthermore, the comparison of RMSE and/or MAPE values showed that especially HS-JRNN model for 3-day-ahead prediction of AFYON stocks generally gives more promising forecasting results than many other studies [20, 21, 23, 40,41,42]. The findings supported that HS-JRNN has the ability to extract remarkable forecasting results from selected technical indicators. Also, Dematos et al. [43] presented that recurrent networks can be helpful for forecasting financial prices since they use information from the sequence or time dependence of the inputs as well as the inputs themselves. In this direction, we can say that HS-JRNN can be successfully used for stock market forecasting due to the advantage of time dependence.

One of the most important contributions is that significant improvements are made through the proposed hybrid models. They have proven to be useful in variable selection and parameter determination that is an active and fruitful field of research in soft computing models. It is noteworthy that according to the results of performance measures, our proposed models can serve desirable approach in the stock price prediction. The results also indicate that one should first determine the characteristics of the stocks to determine which model is the best to use. Also, architecture of the model is reasonably important to improve the prediction performance of the models. The prediction performance of the model directly depends on how the parameters are selected, but an almost unlimited number of variations are available to be used by models. The primary advantage of using HS-based hybrid model is to provide a generic framework that captures the underlying dynamics of noisy and dimensional stock market data. HS determines the most relevant technical indicators and simultaneously searching the most appropriate transfer function type and neuron numbers in hidden and context layer. The interactions of the parameters clearly influence the prediction performance, and therefore, determining critical parameters of the model increases the prediction speed and accuracy. However, not all parameters affecting the hybrid model were taken into account in this paper. In the ongoing research, it can be interesting to consider all other parameters such as training functions, iteration number in the designing of HS-based hybrid model. In this study, following problems are solved in order to carry out accurate prediction.

  1. (i)

    How many input variables (technical indicators) should be used in for NN, JRNN, ELM, RELM, GLM, RT, and GPR models?

  2. (ii)

    How many neurons can be used in hidden layer?

  3. (iii)

    How many neurons can be used in context layer?

  4. (iv)

    Which transfer function should be used?

Our work can be helpful to determine robust model architecture. Also, the prediction accuracy of our proposed hybrid models will empower speculators and investors to decide investment strategy in stock markets.

4 Conclusion

It is important to keep in mind that determination of model parameters is one of the most important factors in prediction. However, the process of determining optimal architecture of models for a stock market problem is still a controversial issue. Particularly, how to select relevant input variables to predict stock price accurately is one of the most important issues to make an investment decision. Accordingly, the emphasis of previous studies is generally on input variable selection and therefore neglecting the importance of transfer functions. However, the determination of transfer functions can strongly affect complexity and performance of model. The uniqueness of our proposed models comes from the fact that HS is used for variable selection and parameter tuning. Thus, transfer function type, and number of hidden and context neurons are successfully optimized with HS algorithm. Findings of analyses indicate that close price accelerator (23) and K % stochastic (29) are two mostly used technical indicators for proposed prediction models. HS-NN and HS-JRNN models commonly used elliot sigmoid transfer function and pure linear transfer function, while sigmoidal transfer function type is generally used for HS-ELM and HS-RELM. The number of hidden neurons for HS-NN, HS-JRNN, HS-ELM, and HS-RELM varies between 3 neurons and 29 neurons. The most commonly used number of hidden neurons is 16 neurons. For HS-JRNN and HS-RELM, the number of context neurons varies between 1 neuron and 16 neurons. The most commonly used number of context neurons is 16 neurons.

To create further comparison, proposed hybrid models are applied for six different prediction terms (1-, 2-, 3-, 5-, 7-, and 10-day-ahead prediction) and three stocks (ECILC, EREGL, and AFYON). According to the obtained MAPE, MAE, RMSE, TheilU, and DS values, HS-JRNN outperforms the others. However, other proposed models can also be very suitable for prediction with highly complex stock market data. HS-JRNN gives a promising direction to the study of stock price prediction and other stock markets. Although proposed hybrid models have a satisfactory prediction performance, some insufficiencies should be considered to be enhanced. For instance, it can be better to use improved HS for architecture optimization in soft computing models. Moreover, different metaheuristics such as genetic algorithm and PSO can be utilized to compare the performance of the hybrid models. Finally, fundamental indicators such as inflation rates, foreign exchange rates, money supply, import–export figures, employment rate, financial ratios of companies, and other stock index series can be added as initial input variables pool.