1 Introduction

Stock/futures price forecasting is an important issue in investment/financial decision-making among individual investors, stock fund managers, and financial analysts and is currently receiving considerable attention from both researchers and practitioners. Stock/futures price forecasting is considered a challenging task due to the fact that the prices are highly volatile, complex, and dynamic. Therefore, various approaches for tackling the problems of stock/futures price forecasting have been proposed. These approaches can be broadly classified into three categories: fundamental analysis, technical analysis, and traditional time series forecasting. Fundamental analysis examines the statistics of macroeconomics data such as interest rates, money supply, and foreign exchange rates, as well as the basic financial information of a corporation, in order to forecast stock price movements [1, 2]. Technical analysis involves studying stock prices, recent and historical price trends and cycles, factors beyond stock price, such as dividend payments, trading volume, group trends and popularity, and the volatility of a stock to forecast future stock prices [3]. The third approach, traditional time series forecasting, uses statistical tools to forecast future values of a stock based on its past values and other variables. These tools include moving average, autoregressive integrated moving average (ARIMA) [4], generalized autoregressive conditional heteroskedasticity (GARCH) [5], and multivariate regression, etc.

In recent years, data mining/computational intelligence techniques have become rapidly growing alternative methods for resolving, or assisting other approaches in resolving, the problems of stock/futures price forecasting. For example, Pai and Lin [6] exploited the strengths of the autoregressive integrated moving average (ARIMA) and support vector machine (SVM) to develop a hybrid methodology. The ARIMA technique was used to model the linear dataset, while the SVMs were employed to deal with the nonlinear data pattern in time series forecasting. The performance of the proposed model was evaluated by examining real datasets of ten stocks and adequate results were obtained. Furthermore, comparison results showed that their presented methodology can significantly improve the forecasting performance of the single ARIMA or SVM model. Chang and Liu [7] presented a Takagi–Sugeno–Kang (TSK) type fuzzy rule-based system to tackle stock price forecasting problems. They used stepwise regression to analyze and select technical indicators that are more influential to the stock price. The K-means clustering technique was then applied to partition the dataset into several clusters. Finally, a simplified fuzzy rule inference system was set up with the optimal rule parameters trained by simulated annealing to forecast stock prices. Their proposed approach was tested on the Taiwan Stock Exchange (TSE) and MediaTek Inc., and the experimental results outperformed other methodologies, such as backpropagation neural networks and multiple regression analysis. Ince and Trafalis [8] proposed an approach for stock price forecasting by using kernel principal component analysis (kPCA) and support vector regression (SVR), based on the assumption that the future value of a stock price depends on its financial indicators. The kPCA was utilized to identify the most important technical indicators, and the SVR was employed to construct the relationship between the stock price and selected technical indicators. For comparison, they also applied factor analysis to determine the most influential input technical indicators for a forecasting model and used multilayer perceptron (MLP) neural networks to build the forecasting model. The experiments were conducted on the daily stock prices of ten companies traded on the NASDAQ. The comparison results indicated that proposed heuristic models produce better results than the studied kPCA-SVR model. Furthermore, there is no significant difference in forecasting performance between the MLP neural model and SVR technique in terms of the mean square error. Huang and Tsai [9] proposed a hybrid procedure using filter-based feature selection, a self-organizing feature map (SOFM), and support vector regression (SVR), in order to forecast the stock market price index. The filter-based feature selection was first used to determine important input technical indicators thus reducing data complexity. The SOFM algorithm was then applied to cluster the training data into several disjointed clusters such that the elements in each cluster are similar. Finally, the SVR technique was employed to construct an individual forecasting model for each cluster. Their proposed model was demonstrated through a case study on forecasting the next day’s price index for Taiwan index futures (FITX), and the experiment results showed that the proposed approach can improve forecasting accuracy and reduce the training time over the traditional single SVR model. Liang et al. [10] presented a two-stage approach to improve option price forecasting by using modified conventional option pricing methods, neural networks (NNs), and support vector regressions (SVRs). In their study, three improved modified conventional parametric methods, the binomial tree method, the finite difference method, and the Monte Carlo method, were utilized to forecast the option prices in the first stage. Then, the NNs and SVRs were employed to carry out nonlinear curve approximation to further reduce the forecasting errors in the second stage. The proposed approach was demonstrated by experimental studies on the data taken from the Hong Kong options market, and the results showed that the NN and SVR approaches can significantly shrink the average forecast errors, thus improving forecasting accuracy. In yet another approach to stock price forecasting, Hadavandi et al. [11] presented a hybrid model by integrating stepwise regression analysis, artificial neural networks, and genetic fuzzy systems. They first used stepwise regression analysis to select factors that can significantly affect stock prices. A self-organizing map (SOM) neural network was then used to divide the raw data into several clusters such that the elements in each cluster are more homogeneous. Finally, data in each cluster were fed into a genetic fuzzy system to forecast stock prices. Their proposed approach was demonstrated through an experiment on forecasting the next day’s closing prices of two IT corporations and two airlines, and comparison results indicated that the proposed approach outperforms other previous methods. Lu [12] proposed an integrated stock price forecasting model by using independent component analysis (ICA) and a backpropagation (BP) neural network. The ICA was first applied to the forecasting variables in order to filter out the noise contained in these variables, thus generating the independent components (ICs). The filtered forecasting variables then served as the input variables of the BP neural network to build the forecasting model. The proposed approach was illustrated by a case study on forecasting the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) closing cash index and Nikkei 225 opening cash index. The comparison results revealed that the proposed forecasting model outperforms the integrated wavelet-BP model, the BP model without filtered forecasting variables, and the random walk model. Cheng et al. [13] employed the cumulative probability distribution approach (CDPA), minimize entropy principle approach (MEPA), rough sets theory (RST), and genetic algorithms (GAs) to develop a hybrid approach for stock price forecasting. The CDPA and MEPA were first utilized to partition technical indicator values and daily price fluctuation into linguistic values according to the characteristics of data distributions. The RST was then employed to generate linguistic rules from the linguistic technical indicator dataset. Finally, the GAs were applied to refine the extracted rules, thus improving the forecasting accuracy and stock return modeling. Their proposed methodology was verified through a case study on forecasting the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and adequate results were obtained.

The above-mentioned studies prove that using neural network techniques is an effective way to address the problems of stock/futures price forecasting. Furthermore, forecasting performance can be further improved by using feature selection methods to find out the most important factors that serve as input variables for a forecasting model. These methods include stepwise regression, kernel principal component analysis, factor analysis, and independent component analysis. The current study proposes a systematic procedure based on a backpropagation (BP) neural network, a feature selection technique, and genetic programming (GP) to tackle stock/futures price forecasting problems with the use of technical indicators. First, the BP neural network is used to construct the preliminary forecasting model that describes the complex nonlinear relationship between the technical indicators and future stock/futures prices. Next, the feature selection technique based on simulation is utilized to explore the preliminary forecasting model in order to select the most important technical indicators for forecasting stock/futures prices. In addition, the GP is also employed to build another forecasting model, thus automatically screening the vital technical indicators that closely correlate with future stock/futures prices. The BP neural network is then applied again to develop the final forecasting model by using the technical indicators that are produced by the feature selection technique based on simulation or screened by the GP program as the input forecasting variables. Finally, the performance of the final forecasting model is evaluated and compared with the performance of the preliminary BP and GP forecasting models.

The remainder of this paper is organized as follows. In Sect. 2, BP neural networks, feature selection, and GP are discussed. The proposed integrated approach is presented in Sect. 3. Section 4 evaluates the feasibility and effectiveness of the proposed approach by presenting and analyzing a case study on forecasting the closing prices of Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) futures of the spot month. Finally, Sect. 5 concludes the paper.

2 Methodologies

Three methods are used in this study to develop the integrated procedure for solving a stock/futures price forecasting problem. This section briefly introduces these methods, beginning with the backpropagation neural network.

2.1 Backpropagation neural network

The backpropagation (BP) neural network is a widely used network that is trained through a supervised learning algorithm. The design of the network is a multi-layered neural model that consists of one input layer, at least one hidden layer, and one output layer. Each layer, which comprises several nodes called neurons, is fully connected to the succeeding layer. The number of neurons in the input and output layers can be accurately determined by the number of input and output variables in a problem. On the other hand, the optimal number of hidden layers and the neurons in each hidden layer are usually determined through a trial-and-error method based on root mean squared errors (RMSEs) found in the training and test data. Theoretically, one hidden layer is sufficient for approximating any continuous functional relationship between input and output variables with an arbitrary degree of accuracy [14]. When applying a BP neural network to a specific problem, the neural model must be learned (trained) in advance. The learning process includes three stages: (1) feed-forward of the input training pattern, (2) calculation and backpropagation of the associated error, and (3) weight and bias adjustment. The learning process proceeds repeatedly until the RMSE associated with the training data sufficiently converge to a global minimum. Furthermore, the optimal number of learning iterations is usually determined by observing the progress of test RMSE by feeding independent test data into the trained BP neural model in order to prevent over-training. During the learning process of a BP neural network, two major parameters, including the learning rate and momentum, must be specified by the users. The appropriate setting of the learning rate and momentum can help the BP neural model achieve the global minimum training and test RMSEs. However, the values of both parameters are problem dependent [15] and are usually determined through trial and error. Once the learning process is accomplished, the well-trained BP neural network can be utilized to estimate the unknown values of output variables by the recall process of feeding the input pattern into this BP model.

BP neural networks have attracted substantial research interest from a wide range of applications. As these applications demonstrate, adequate estimation results can be obtained through these neural networks [1619].

2.2 Feature selection

Feature selection is a process of selecting a minimal subset of features based on certain reasonable criteria in order to remove irrelevant and redundant features and hence noise. In this way, the original problem can be tackled equally well, if not better, than by using a full set of features [20]. Feature selection can prevent the selection of too many or too few features (i.e., more or less than necessary) and achieve data reduction for accelerating training and increasing computational efficiency [21]. If too many (irrelevant) features are selected, the effects of noise may duplicate or shadow the valuable information in data. However, if too few features are chosen, the information contained in the selected subset of features might be low. Feature selection approaches can be classified into two categories: the wrapper approach and the filter approach. In the wrapper approach, the feature selection algorithm acts as a wrapper around the induction algorithm [22] and searches for a good subset by evaluating feature subsets via the induction algorithm. In other words, the feature selection and induction algorithms interact while searching for the best feature subset. Comparatively, the filter approach utilizes a preprocessing step to filter features and thus find a good feature subset. This method entirely ignores the performance of the selected feature subset on the induction algorithm. That is, the feature selection algorithm proceeds independently of the induction algorithm [22].

Liu and Motoda [20] provided a unified model of feature selection, which is shown in Fig. 1. First, the feature generation produces a feature subset each time using a certain search strategy based on the training data. The generated feature subset along with the training data are then evaluated by a measuring function. The feature generation mechanism stops searching once the stopping criteria are fulfilled. While the stopping criteria vary case by case, a stop may be activated by any one of the following conditions: (1) the feature subset is good enough, (2) some bound is reached, and (3) the search completes [20]. Notably, the feature subset is determined to be adequate if its estimated accuracy, i.e., classification or forecasting accuracy, is better than the accuracy obtained by using a full set of features. Next, the training data with a specified feature subset, i.e., the best feature subset generated, are fed into the learning algorithm in order to construct the classifier (or forecaster). Finally, the classifier (or forecaster) is tested on the test data with the selected features, thus obtaining the classification accuracy (or forecasting errors).

Fig. 1
figure 1

A unified model of feature selection

Researchers have attempted to resolve problems of feature selection using various techniques. These include the following: information-theoretic measures [23], rough sets and ant colony optimization [24], genetic algorithms [25], and mathematical programming [26]. Further discussions on feature selection can be found in the studies reported by Kohavi and John [22], Liu and Motoda [20], Piramuthu [21], and Mladenic [27].

2.3 Genetic programming

Genetic programming (GP) is an evolutionary methodology that can automatically create computer programs to solve a user-defined problem by genetically breeding a population of computer programs using the principles of Darwinian natural selection and biologically inspired operations. The solution technique of GP closely relates to the field of genetic algorithms (GAs). However, there exist three major differences between GAs and GP. First, GP usually evolves a tree-based structure, while GAs evolve binary or real number strings. Second, the length of the binary or real number string is fixed in traditional GAs, while the parse trees can vary in length throughout the run in GP. Finally, the solutions of GP are active structures; this means that they can be directly executed without post-processing, whereas the binary or real number strings in GAs are passive structures which require post-processing [28]. Figure 2 illustrates a tree-based representation of the solution \( [(x \times 5) + 3] - [{ \cos }(y) \div 6] \) in GP. As seen, the terminal components in the branches, i.e., x, 5, 3, y, and 6, are called terminal elements. The terminal elements (e.g., the independent variables of the problem, zero-argument functions, random constants, etc.) available for each branch of the to-be-evolved computer program in GP are defined by a terminal set. On the other hand, the remaining components in Fig. 2, i.e., +, −, ×, ÷, and cos, are called primitive functions. The primitive functions available to the medium elements of the to-be-evolved computer program, such as addition, square root, multiplication, sine, etc., are defined by a function set. Like GAs, the fitness (adaptability) of a solution in the GP population is explicitly or implicitly measured via a predefined fitness function. Furthermore, some parameters and the termination criterion must be specified in advance in order to apply GP to resolve a problem. The major parameters that control the run in GP include population size, maximum size of programs, crossover rate, and mutation rate. In addition, the time to stop the evolutionary procedure of GP is determined by the termination criterion. The general steps of GP are presented in the studies of Koza et al. [29, 30] and Ciglarič and Kidrič [31] and are summarized as follows:

Fig. 2
figure 2

Tree-based representation of a solution in GP

Step 1 Generating an initial population

This step generates an initial population of computer programs, which can be of different sizes and different shapes, appropriate to a problem subject to a prespecified maximum size.

Step 2 Evaluating computer programs

Each program in the population is executed and measured by using a predefined fitness function to obtain the fitness value.

Step 3 Creating the next generation

A pool of programs is first selected from the population using a probability based on the fitness value. Next, the genetic operations, including reproduction, crossover, mutation, and architecture-altering operations, are applied to the selected programs thus producing the offspring. Finally, the current population is replaced with the population of offspring based on a certain strategy, e.g., the elitist strategy, in order to create the next generation.

Step 4 Examining the termination criterion

When the termination criterion is satisfied, the outcome is designated as the final results of the run. Otherwise, Steps 2–4 are executed iteratively.

As shown in the studies of Etemadi et al. [32], Hwang et al. [33], and Bae et al. [34], GP has produced many novel and outstanding results in resolving problems from numerous fields. Further discussions on GP, its applications and related resources can be found in the studies of Langdon and Poli [35] and Koza et al. [29, 30].

3 Proposed forecasting procedure

In this study, a systematic procedure based on a backpropagation (BP) neural network, a feature selection technique, and genetic programming (GP) was proposed to tackle stock/futures price forecasting problems with the use of technical indicators. The proposed forecasting procedure comprises three stages and is described as follows:

3.1 Construction of preliminary forecasting models

In the first stage, the essential historical stock/futures trading data, e.g., opening price, highest price, lowest price, closing price, trade volume, etc. in each trading day are first collected. Next, the required technical indicators, e.g., moving average, Williams overbought/oversold index, psychological line, commodity channel index, etc. are calculated. To avoid variables with larger numeric ranges from dominating those with smaller numeric ranges, we normalize the technical indicators into a range between 0 and 1 using the following formula:

$$ x_{ij} = \frac{{{ti}_{ij} - {ti}_{j}^{\min } }}{{{ti}_{j}^{\max } - {ti}_{j}^{\min } }}\quad {\text{for}}\quad i = 1,2, \ldots ,n;\,j = 1,2, \ldots ,m $$
(1)

where \( {ti}_{ij} \) and \( x_{ij} \) are the original and normalized technical indicators, respectively; \( {ti}_{j}^{\max } \) and \( {ti}_{j}^{\min } \) are the maximum and minimum value of the jth technical indicator, respectively; and n and m are the total number of trading days and technical indicators, respectively. These normalized technical indicators, which take the full feature set as shown previously in Fig. 1, are the independent input variables in the stock/futures price forecasting model. In addition, the closing price of the stock/futures in each trading day is also normalized into a range between 0 and 1, based on the following formula:

$$ y_{i} = \frac{{p_{i} - p^{\min } }}{{p^{\max } - p^{\min } }}\quad {\text{for}}\quad i = 1,2, \ldots ,n $$
(2)

where \( p_{i} \) and \( y_{i} \) are the original and normalized closing price of the stock/futures, respectively; \( p^{\max } \) and \( p^{\min } \) are the maximum and minimum value of the closing price of the stock/futures, respectively; and n is the total number of trading days. The normalized technical indicators in each trading day along with the normalized closing price in the future, e.g., next trading day or next second trading day, denoted by \( (x_{i1} ,x_{i2} , \ldots ,x_{im} ,y_{i} ) \), are then partitioned into training, test, and validation data, based on a prespecified proportion, e.g., 4:1:1. Next, a BP neural network is applied to the training and test data to construct several forecasting models. Then, a preliminary optimal forecasting model, named BP_Modelpre, is selected based on simultaneously minimizing the root mean squared errors (RMSEs) of the training and test data. For comparison, the GP algorithm is also utilized to construct several forecasting models, and a preliminary optimal forecasting model, denoted as GP_Modelpre, is determined according to the training and test RMSEs to describe the functional relationship between the normalized technical indicators of each trading day and the normalized closing price of stock/futures as forecasted.

3.2 Feature selection

In the second stage, the feature selection technique based on simulation is utilized to determine the impact of each technical indicator on forecasting the closing price of the stock/futures. Suppose the total numbers of trading days in the training, test, and validation data are n tr, n te, and n va, respectively. First, the mean of the jth normalized technical indicator in successive t trading days, starting from the ith trading day in the training and test periods, is calculated by:

$$ \bar{x}_{ij}^{(t)} = \sum\limits_{k = 0}^{t - 1} {\frac{{x_{(i + k)j} }}{t}} \quad {\text{for}}\quad i = 1,2, \ldots ,n_{\text{tr}} + n_{\text{te}} ;\,j = 1,2, \ldots ,m. $$
(3)

Next, the standard deviation of the jth normalized technical indicator in successive t trading days, starting from the ith trading day in the training and test periods, can be calculated by:

$$ s_{ij}^{(t)} = \sum\limits_{k = 0}^{t - 1} {\frac{{\left( {x_{(i + k)j} - \bar{x}_{ij}^{(t)} } \right)^{2} }}{t - 1}} \quad {\text{for}}\quad i = 1,2, \ldots ,n_{\text{tr}} + n_{\text{te}} ;j = 1,2, \ldots ,m. $$
(4)

Therefore, the mean of the standard deviation of the jth normalized technical indicator in successive t trading days, denoted by \( \bar{s}_{j}^{(t)} \), during the training and test periods can be expressed as:

$$ \bar{s}_{j}^{(t)} = \sum\limits_{i = 1}^{{n_{\text{tr}} + n_{\text{te}} }} {\frac{{s_{ij}^{(t)} }}{{n_{\text{tr}} + n_{\text{te}} }}} \quad {\text{for}}\quad j = 1,2, \ldots ,m. $$
(5)

To assess the impact of the jth technical indicator on forecasting the closing price of the stock/futures, we assume that the jth technical indicator in the ith trading day during the training and test periods can vary, based on a normal distribution, with its mean of x ij and standard deviation of \( \bar{s}_{j}^{(t)} \). Hence, the probability with which the value \( x_{ij}^{*} \) will lie in the range from \( x_{ij} - z_{\alpha /2} \bar{s}_{j}^{(t)} \) to \( x_{ij} + z_{\alpha /2} \bar{s}_{j}^{(t)} \) is \( 1 - \alpha \). That is,

$$ P\left( {x_{ij} - z_{\alpha /2} \bar{s}_{j}^{(t)} \le x_{ij}^{*} \le x_{ij} + z_{\alpha /2} \bar{s}_{j}^{(t)} } \right) = 1 - \alpha \quad {\text{for}}\quad i = 1,2, \ldots ,n_{\text{tr}} + n_{\text{te}} ;\,j = 1,2, \ldots ,m $$
(6)

where \( z_{\alpha /2} \) is the critical value such that the probability that the standard normal distribution z is greater than \( z_{\alpha /2} \) is \( {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-\nulldelimiterspace} 2} \); \( 1 - \alpha \) is the confidence level set by the user, e.g., 0.95 or 0.99. Next, sample points of a total number of s, denoted by \( x_{ij(l)}^{*} \), are drawn from the range \( \left( {x_{ij} - z_{\alpha /2} \bar{s}_{j}^{(t)} ,\,x_{ij} + z_{\alpha /2} \bar{s}_{j}^{(t)} } \right) \), based on the following formulas:

$$ x_{ij(l)}^{*} = x_{ij} + z_{{\alpha^{*} }} \bar{s}_{j}^{(t)} \quad {\text{for}}\quad i = 1,2, \ldots ,n_{\text{tr}} + n_{\text{te}} ;j = 1,2, \ldots ,m $$
(7)

where \( z_{{\alpha^{*} }} \) is the critical value such that the probability that the standard normal distribution z is greater than \( z_{{\alpha^{*} }} \) is \( \alpha^{*} \), and

$$ \alpha^{*} = \frac{\alpha }{2} + (l - 1) \times \frac{1 - \alpha }{s - 1}\quad {\text{for}}\quad l = 1,2, \ldots ,s. $$
(8)

Then, the normalized technical indicators for the ith trading day, not including the jth technical indicator which is replaced by the sample point \( x_{ij(l)}^{*} \), are fed into the well-trained BP neural network constructed in the previous stage (i.e., BP_Modelpre) to obtain the forecasted normalized closing price of the stock/futures, denoted by \( \hat{y}_{i(l)} \). Hence, the absolute percentage error (APE) of the forecast can be calculated by:

$$ {\text{APE}}_{i(l)} = \left| {\frac{{\hat{y}_{i(l)} - y_{i} }}{{y_{i} }}} \right| \times 100\% \quad {\text{for}}\quad i = 1,2, \ldots ,t_{\text{tr}} + t_{\text{te}} ;\,l = 1,2, \ldots ,s $$
(9)

where \( y_{i} \) is the normalized actual closing price of the stock/futures as described in the first stage. Therefore, the impact of the jth normalized technical indicator on forecasting the normalized closing price of the stock/futures, denoted by IMP j , can be evaluated through the mean absolute percentage error (MAPE) expressed as:

$$ {\text{IMP}}_{j} = \sum\limits_{i = 1}^{{n_{\text{tr}} + n_{\text{te}} }} {\sum\limits_{l = 1}^{s} {\frac{{{\text{APE}}_{i(l)} }}{{s \times (n_{\text{tr}} + n_{\text{te}} )}}} } \quad {\text{for}}\quad j = 1,2, \ldots ,m. $$
(10)

Next, starting from one, the normalized technical indicators are ranked in ascending order based on their IMP j s. The normalized technical indicators for the ith trading day are then fed into the well-trained BP_Modelpre model, while the technical indicators whose ranks are not greater than r (\( r = 0,1, \ldots ,m - 1 \)) are replaced by 0.5, to acquire the forecasted normalized closing price of the stock/futures, denoted by \( \hat{\hat{y}}_{i(r)} \), during the training and test periods. The reason for replacing a normalized technical indicator with 0.5, which is the mean of its maximum and minimum values, is that we intend to weaken its effect on forecasting the closing price; this will assist us to determine the best feature subset. The mean absolute percentage error (MAPE) of the forecast, while weakening the effect of normalized technical indicators whose ranks are not greater than r, can then be calculated by:

$$ {\text{MAPE}}_{(r)} = \sum\limits_{i = 1}^{{n_{\text{tr}} + n_{\text{te}} }} {\left| {\frac{{\hat{\hat{y}}_{i(r)} - y_{i} }}{{y_{i} \times (n_{\text{tr}} + n_{\text{te}} )}}} \right|} \times 100\% \quad {\text{for}}\quad r = 0,1, \ldots ,m - 1. $$
(11)

The direction of change (CD) regarding the MAPE of the forecast while weakening the effect of one more normalized technical indicator can then be expressed by:

$$ CD_{(r)} = \left\{ {\begin{array}{*{20}c} { + 1} \hfill & {{\text{if }}\quad {\text{MAPE}}_{(r)} > {\text{MAPE}}_{(r - 1)} } \hfill \\ 0 \hfill & {{\text{if}}\quad {\text{ MAPE}}_{(r)} = {\text{MAPE}}_{(r - 1)} } \hfill \\ { - 1} \hfill & {{\text{if }}\quad {\text{MAPE}}_{(r)} < {\text{MAPE}}_{(r - 1)} } \hfill \\ \end{array} } \right.\quad {\text{for}}\quad r = 1,2, \ldots ,m - 1. $$
(12)

We then find a certain rank, say r*, such that \( {\text{CD}}_{{(r^{*} )}} = - 1 \) and \( {\text{CD}}_{(r)} = + 1 \) (for \( r = r^{*} + 1,r^{*} + 2, \ldots ,m - 1 \)). This implies that the absolute percentage error of the forecast always increases while weakening the effects of normalized technical indicators with ranks \( r^{*} + 1 \), \( r^{*} + 2 \), …, and \( m - 1 \). Hence, the normalized technical indicators with ranks greater than \( r^{*} \), denoted by \( \left( {f_{i1} ,f_{i2} , \ldots ,f_{{i(m - r^{*} )}} } \right) \) (for i = 1, 2…n, where n is the total number of trading days), are retained as input variables for constructing the final BP forecasting model. This completes the tasks of feature selection and finding the best feature subset, which were described by Fig. 1.

3.3 Construction and validation of final forecasting models

In the final stage, BP neural networks are re-applied to the training and test data, whose input variables are the retained normalized technical indicators in the previous stage, i.e., \( \left( {f_{i1} ,f_{i2} , \ldots ,f_{{i(m - r^{*} )}} } \right) \), to construct several forecasting models. Based on the simultaneous optimization of the training and test RMSEs, a final optimal BP neural model, named BP_BP_Modelfinal, is selected to represent the functional relationship between the retained normalized technical indicators and the normalized stock/futures closing price as forecasted. Furthermore, the preliminary optimal GP model constructed in the first stage, i.e., GP_Modelpre, is actually a computer program that is constructed automatically by using the most important normalized technical indicators. Hence, BP neural networks are also applied to the training and test data whose input variables are the normalized technical indicators that appeared in the GP_Modelpre model, and another final optimal BP neural model is determined, namely GP_BP_Modelfinal. This latter model is selected based on the training and test RMSEs and used for comparison to the BP_BP_Modelfinal model. Therefore, the forecasted closing price of the stock/futures, denoted by \( \hat{p}_{i} \), can be obtained by inputting the retained normalized technical indicators \( \left( {f_{i1} ,f_{i2} , \ldots ,f_{{i(m - r^{*} )}} } \right) \) for each trading day to BP_BP_Modelfinal, or by inputting the normalized technical indicators which appeared in the GP_Modelpre model in each trading day to GP_BP_Modelfinal, and denormalizing the output. Finally, through statistical metrics of the root mean squared error (RMSE) and mean absolute percentage error (MAPE), the effectiveness of the proposed forecasting procedure can be evaluated by applying the BP_BP_Modelfinal and GP_BP_Modelfinal models to the validation data that were not previously used. The RMSE and MAPE regarding the forecast for the validation data are defined as follows:

$$ {\text{RMSE}}_{\text{forecast}} = \sqrt {\sum\limits_{i = 1}^{{n_{a} }} {\frac{{(p_{i} - \hat{p}_{i} )^{2} }}{{n_{\text{va}} }}}} , $$
(13)
$$ {\text{MAPE}}_{\text{forecast}} = \sum\limits_{i = 1}^{{n_{a} }} {\frac{{{{|p_{i} - \hat{p}_{i} |} \mathord{\left/ {\vphantom {{|p_{i} - \hat{p}_{i} |} {p_{i} }}} \right. \kern-\nulldelimiterspace} {p_{i} }}}}{{n_{\text{va}} }}} \times 100\% , $$
(14)

where \( p_{i} \) and \( \hat{p}_{i} \) are the actual and forecasted closing price of the stock/futures, respectively, and \( n_{\text{va}} \) is the total number of trading days in the validation data. At this stage, the tasks shown in the lower part of Fig. 1 are completed.

3.4 Proposed forecasting procedure

The proposed forecasting procedure in this study is conceptually illustrated in Fig. 3 and summarily restated as follows:

Fig. 3
figure 3

Proposed forecasting procedure

Stage 1: Construction of preliminary forecasting models

  1. 1.

    Collect the essential historical stock trading data and calculate the required technical indicators.

  2. 2.

    Normalize the technical indicators and closing prices of the stock/futures using (1) and (2), respectively.

  3. 3.

    Partition the normalized technical indicators along with the normalized closing price in the future into training, test, and validation data.

  4. 4.

    Apply the BP neural network and GP algorithm to the training and test data to construct preliminary optimal forecasting models.

Stage 2: Feature selection

  1. 1.

    Calculate the mean and standard deviation of each normalized technical indicator during a number of successive trading days through (3) and (5), respectively.

  2. 2.

    Assess the impact of each technical indicator on forecasting the closing price of the stock/futures through (10).

  3. 3.

    Determine the best subset of normalized technical indicators according to the direction of change regarding the mean absolute percentage error of the forecast, while weakening the effect of one more normalized technical indicator, as shown in (12).

Stage 3: Construction of final forecasting models and validation

  1. 1.

    Apply the BP neural network to the training and test data with input normalized technical indicators retained in Stage 2 to construct the final optimal forecasting model.

  2. 2.

    Using as inputs the normalized technical indicators that appeared in the preliminary optimal GP forecasting model in Stage 1, apply the BP neural network to the training and test data to construct another final optimal forecasting model.

  3. 3.

    Evaluate via (13) and (14) the effectiveness of the proposed forecast procedure by applying the two final optimal forecasting models constructed in Steps 1 and 2 of Stage 3 to the validation data.

4 Case study

In this section, a case study on forecasting the closing prices of Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) futures of the spot month is presented to demonstrate the feasibility and effectiveness of the proposed forecasting procedure. The details of the case study and the procedure are described below.

4.1 Constructing preliminary forecasting models

4.1.1 Preparing the experimental data

A total of 2,353 pairs of daily trading data, including opening price, highest price, lowest price, closing price, and trade volume, are first collected from the Taiwan Stock Exchange Corporation (TWSE) over a period of approximately nine-and-a-half years, from January 2, 2001, to June 30, 2010. Fifteen technical indicators are then selected as the input variables for forecasting the closing price of futures. This is in line with previous studies by Kim and Han [36], Kim and Lee [37], Tsang et al. [38], Chang and Liu [8], Ince and Trafalis [9], Huang and Tsai [10], and Lai et al. [39]. The formulas for these technical indicators are presented in “Appendix”, and they include the following: (1) 10-day moving average, (2) 20-day bias, (3) moving average convergence/divergence, (4) 9-day stochastic indicator K, (5) 9-day stochastic indicator D, (6) 9-day Williams overbought/oversold index, (7) 10-day rate of change, (8) 5-day relative strength index, (9) 24-day commodity channel index, (10) 26-day volume ratio, (11) 13-day psychological line, (12) 14-day plus directional indicator, (13) 14-day minus directional indicator, (14) 26-day buying/selling momentum indicator, and (15) 26-day buying/selling willingness indicator. The fifteen technical indicators are calculated according to the formulas in “Appendix” and the initial 2,353 pairs of daily trading data. Notably, the technical indicators for the first few days, beginning with January 2, 2001, are not available due to the technical indicator definitions. For example, the 9-day stochastic indicator K can be obtained only for the period after the 9th trading day from January 2, 2001. After this limitation and the validity of the technical indicators are taken into account, there are 2,318 pairs of technical indicator data. These data, along with the closing prices from March 1, 2001, to June 30, 2010, are used as the sample data in the follow-up forecasting procedure.

4.1.2 Constructing models for a forecast of 1 day ahead

The above fifteen technical indicators and the closing price in each trading day are normalized into a range between 0 and 1 using the formulas in (1) and (2), respectively. These normalized technical indicators in each trading day from March 1, 2001, to June 29, 2010, along with the normalized closing price of the next trading day from March 2, 2001, to June 30, 2010, are then partitioned into training, test, and validation sample data groups, based on the proportion of 4:1:1, as shown in Table 1. Notably, training data are usually given preference over test data. Thus, the size of training data is set as four times the size of test data. In addition, validation data of a size equal to the size of the test data are designed to verify the capability of the well-trained BP neural network or GP models obtained by using training and test data to deal with data not previously used. With this approach, the influence of partitioned sample data on the performance of the obtained forecasting model can be presumed to be eliminated. Next, Table 1 contains 16 subsets, which are achieved by slicing periods of time in order to ensure that the training, test, and validation sample data groups can cover the entire period of research. Through this manner, it is believed that the functional relationship between the normalized technical indicators and the normalized closing price of the next trading day can be estimated more accurately by using the BP neural network or GP models constructed later. Based on the training and test data in Table 1, a three-layered BP neural network is then designed by using NeuralWorks Professional II/Plus (http://www.neuralware.com) software to create a mathematical forecasting model. The total number of neurons in the input and output layers is set to 15, which is the total number of technical indicators, and 1, which is the closing price of futures on the next trading day, respectively. In addition, the learning rate and momentum are set, through trial and error, to 0.25 and 0.75, respectively, and the default values in software are applied for the remaining parameters. A hidden layer analysis is attempted where the learning process is terminated when the training RMSE sufficiently converges and over-training is prevented by observing the test RMSE. The hidden layer has a total number of neurons from 1 to 30. Table 2 summarizes the execution results. The structure 15-X-1 indicates that the BP neural model consists of X neurons in the hidden layer. Based on the simultaneous consideration of the training and test RMSEs, the neural model with the structure 15-20-1 is selected as the preliminary optimal BP forecasting model for forecasting the normalized closing price of the next trading day. It is named BP_Modelpre_1_day. For comparison, the GP algorithm is also applied to the training and test data in Table 1 to establish a futures price forecasting model. Here, the Discipulus 4.0 (http://www.rmltech.com) software is employed where the fitness of a program is evaluated through RMSE. The termination criterion is that the generations without improvement have reached 100 and the remaining parameters are set as their default values in the software. The Discipulus 4.0 software is executed 5 times, and Table 3 summarizes the obtained results. Based on the training and test RMSEs, the 2nd model, described as GP_Modelpre_1_day, is selected as the preliminary optimal GP forecasting model.

Table 1 Data partitioned into training, test, and validation data groups
Table 2 Execution results of BP neural networks (forecast of 1 day ahead)
Table 3 Execution results of GP algorithms (forecast of 1 day ahead)

4.1.3 Constructing models for a forecast of 2 and 3 days ahead

Similarly, we intend to construct the BP neural network and GP models for forecasting the closing futures prices of the next second and third trading days. For forecasting the closing price of the next second trading day, the fifteen normalized technical indicators in each trading day from March 1, 2001, to June 28, 2010, along with the normalized closing price of the second trading day, i.e., from March 5, 2001, to June 30, 2010, are partitioned into training, test, and validation sample data groups based on the proportion of 4:1:1. The total numbers of paired data in the training, test, and validation sample data groups are 1,391, 463, and 462, respectively. With regard to the forecasting of the closing price of the next third trading day, the sample data are formed by the paired normalized technical indicators from March 1, 2001, to June 25, 2010, along with the normalized closing price of the third trading day, i.e., from March 6, 2001, to June 30, 2010. The sample data are also partitioned into training, test, and validation sample data groups, with sample sizes of 1,391, 463, and 461, respectively, based on the proportion of 4:1:1.

Next, a three-layered BP neural network using NeuralWorks Professional II/Plus software is applied to the above training and test data regarding the closing price forecast of the next second and third trading days, where the BP parameters are set to those used previously. Table 4 summarizes the partial execution results. According to this table, the neural models with the structures 15-20-1 and 15-25-1 are selected as the preliminary optimal BP forecasting models, based on the training and test RMSEs, for forecasting the normalized closing prices of the next second and third trading days; they are identified, respectively, as BP_Modelpre_2_day and BP_Modelpre_3_day. Finally, the GP algorithm using Discipulus 4.0 software is reapplied to the above training and test data concerning the closing price forecast of the next second and third trading days, where the fitness function, termination criterion, and the other remaining parameters are set to those used previously. Table 5 summarizes the results obtained by executing the Discipulus 4.0 software 5 times. Based on the training and test RMSEs, the 1st and 4th models are selected as the preliminary optimal GP forecasting models for forecasting the normalized closing prices of the next second and third trading days; they are identified, respectively, as GP_Modelpre_2_day and GP_Modelpre_3_day.

Table 4 Partial execution results of BP neural networks (forecast of 2 and 3 days ahead)
Table 5 Execution results of GP algorithms (forecast of 2 and 3 days ahead)

4.2 Selecting features

4.2.1 Evaluating the impact of technical indicators

Consider the training and test periods. Starting from a certain trading date, the mean of each normalized technical indicator in successive trading days is first calculated by (3). Notably, parameters n tr and n te are set to 1,391 and 463, which are, respectively, the total number of training and test paired data, as shown in Table 1. In addition, parameter m is set to 15, which is the total number of normalized technical indicators, and parameter t is set to 10, which stands for the successive 10 trading days considered in this study. Next, starting from a certain trading date, the standard deviation of each normalized technical indicator in the successive 10 trading days is calculated using (4). Thus, the mean of the standard deviation of each normalized technical indicator in the successive 10 trading days can be obtained by (5). Table 6 lists these values.

Table 6 Summary of \( \bar{s}_{j}^{(10)} \)s

Next, we intend to assess the impact of each technical indicator on forecasting the closing price of the next trading day. Consider the first trading day in our study, March 1, 2001. The first normalized technical indicator, i.e., the 10-day moving average, is 0.381902. Hence, 200 sample points are initially drawn using the formulas in (7) and (8) where parameters α and s are set to 0.01 and 200, respectively. For example, the first and fifth sample points are, respectively, obtained by (15) and (16), as shown:

$$ 0.381902 + z_{0.005} \bar{s}_{1}^{(10)} = 0.381902 + 2.576 \times 0.014178 = 0.418425 $$
(15)
$$ 0.381902 + z_{0.0249} \bar{s}_{1}^{(10)} = 0.381902 + 1.962 \times 0.014178 = 0.409719. $$
(16)

Then, the normalized technical indicators for the first trading day, not including the first technical indicator which is replaced by the first sample point, i.e., 0.418425, are fed into the neural network BP_Modelpre_1_day model. Thus, a forecasted value \( \hat{y}_{1(1)} \) regarding the normalized closing price of the next trading day, March 2, 2001, is obtained. The absolute percentage error (APE) of the forecast can be calculated based on (9) as follows:

$$ {\text{APE}}_{1(1)} = \left| {\frac{{\hat{y}_{1(1)} - y_{1} }}{{y_{1} }}} \right| \times 100\% $$
(17)

where \( y_{1} \) and \( \hat{y}_{1(1)} \) are the normalized actual and forecasted closing price of futures on March 2, 2001. Similarly, the values of \( {\text{APE}}_{1(2)} \), \( {\text{APE}}_{1(3)} \), …, and \( {\text{APE}}_{1(200)} \) can be obtained by feeding the normalized technical indicators for the first trading day, not including the first technical indicator which is replaced by the remaining 199 sample points, into the BP_Modelpre_1_day neural model and calculating, in order, the forecast errors based on (9). Thus, the impact of the first normalized technical indicator on forecasting the normalized closing price of the next trading day can be evaluated through (10). By following the above procedure, the impact of the remaining normalized technical indicators on forecasting the normalized closing price of the next trading day can be obtained, and Table 7 summarizes these values. This table also shows the ranks of IMP j s based on their values in ascending order.

Table 7 Summary of \( {\text{IMP}}_{j} \)s

4.2.2 Determining the best subset of technical indicators

Next, we intend to determine the best subset of technical indicators by evaluating the effect of weakening normalized technical indicators on forecasting the closing price. For example, the normalized technical indicators for the first trading day are fed into the BP_Modelpre_1_day neural model to acquire the forecasted normalized closing price of the next trading day \( \hat{\hat{y}}_{1(3)} \), while the technical indicators whose ranks are not greater than 3, i.e., the 10th, 11th, and 14th, are replaced by 0.5. Similarly, the values \( \hat{\hat{y}}_{2(3)} \), \( \hat{\hat{y}}_{3(3)} \), …, and \( \hat{\hat{y}}_{1,854(3)} \) are acquired by feeding the normalized technical indicators for each trading day during the training and test periods into the BP_Modelpre_1_day neural model, while replacing the technical indicators with ranks of 1, 2, or 3 by 0.5 (the subscript 1,854 in \( \hat{\hat{y}}_{1,854(3)} \) is the total number of sample data in the training and test periods, as shown in Table 1). Then, while weakening the effect of normalized technical indicators whose ranks are not greater than 3, the mean absolute percentage error \( \left( {{\text{MAPE}}_{(3)} } \right) \) of the forecast can be obtained by (11) where parameters n tr, n te, and m are 1,391, 463, and 15, as described previously. Similarly, the values \( {\text{MAPE}}_{(0)} \), \( {\text{MAPE}}_{(1)} \), …, and \( {\text{MAPE}}_{(14)} \) can be acquired by applying the above procedure, and Table 8 summarizes these values. Hence, the direction of change (CD) regarding the MAPE of the forecast, while weakening the effect of one more normalized technical indicator, can be obtained through (12) as shown in Table 8.

Table 8 Summary of \( {\text{MAPE}}_{(r)} \)s and \( {\text{CD}}_{(r)} \)s

In Table 8, we can observe that \( {\text{CD}}_{(10)} \) is −1 and all of \( {\text{CD}}_{(11)} \), \( {\text{CD}}_{(12)} \), …, and \( {\text{CD}}_{(14)} \) are +1. Thus, the normalized technical indicators with ranks 11–15 in Table 7, i.e., the 7th, 2nd, 4th, 1st, and 5th normalized technical indicators, denoted by \( \left( {f_{i1}^{(1)} ,f_{i2}^{(1)} , \ldots ,f_{i5}^{(1)} } \right) \), are retained as input variables for forecasting the futures closing price of the next trading day later. Then, the above procedure is applied to the training and test sample data groups for the forecasting of the closing prices of the next second and third trading days, using the neural network models previously constructed, namely, BP_Modelpre_2_day and BP_Modelpre_3_day. Thus, the impact of each normalized technical indicator on forecasting the normalized closing prices of the next second and third trading days can be obtained, as summarized in Tables 9 and 10. Furthermore, the MAPE of the forecast while weakening the effect of a certain normalized technical indicator and the direction of change regarding the MAPE of the forecast while weakening the effect of one more normalized technical indicator can be acquired. This is shown in Tables 11 and 12. According to Table 11, \( {\text{CD}}_{(10)} \) is −1 and \( {\text{CD}}_{(11)} \), \( {\text{CD}}_{(12)} \), …, and \( {\text{CD}}_{(14)} \) are all +1. Therefore, the normalized technical indicators with ranks 11–15 in Table 9, i.e., the 7th, 2nd, 4th, 1st, and 5th normalized technical indicators denoted by \( \left( {f_{i1}^{(2)} ,f_{i2}^{(2)} , \ldots ,f_{i5}^{(2)} } \right) \), are retained as input variables for forecasting the futures closing price of the next second trading day later. Similarly, it is found that \( {\text{CD}}_{(9)} \) is −1 and \( {\text{CD}}_{(10)} \), \( {\text{CD}}_{(11)} \), …, and \( {\text{CD}}_{(14)} \) are all +1. Therefore, the normalized technical indicators with ranks 10–15 in Table 10, i.e., the 7th, 9th, 2nd, 1st, 4th, and 5th normalized technical indicators denoted by \( \left( {f_{i1}^{(3)} ,f_{i2}^{(3)} , \ldots ,f_{i6}^{(3)} } \right) \), are retained as input variables for forecasting the futures closing price of the next third trading day later.

Table 9 Summary of \( {\text{IMP}}_{j} \)s (forecast of 2 days ahead)
Table 10 Summary of \( {\text{IMP}}_{j} \)s (forecast of 3 days ahead)
Table 11 Summary of \( {\text{MAPE}}_{(r)} \)s and \( {\text{CD}}_{(r)} \)s (forecast of 2 days ahead)
Table 12 Summary of \( {\text{MAPE}}_{(r)} \)s and \( {\text{CD}}_{(r)} \)s (forecast of 3 days ahead)

4.3 Constructing the final forecasting models

To construct the final BP neural model for forecasting the futures closing price of the next trading day, a three-layered BP neural network is applied again to the training and test data in Table 1, where the input variables are the retained normalized technical indicators \( \left( {f_{i1}^{(1)} ,f_{i2}^{(1)} , \ldots ,f_{i5}^{(1)} } \right) \) and the output variable is the normalized closing price of the next trading day. Hence, the neural model consists of 5 and 1 neurons in the input and output layers, respectively, and the remaining parameters are the same as those in Sect. 4.1.2. Table 13 summarizes the partial execution results. The 5-24-1 neural model, named BP_BP_Modelfinal_1_day, is selected as the final optimal forecasting model. Similarly, BP neural networks can be applied to the training and test data to construct the final forecasting models for the forecast of the closing prices in the next second and third trading days, as described in Sect. 4.1.3. When forecasting the closing price of the next second trading day, the normalized technical indicators \( \left( {f_{i1}^{(2)} ,f_{i2}^{(2)} , \ldots ,f_{i5}^{(2)} } \right) \) retained in the previous section serve as the input variables and the normalized closing price of the second trading day serves as the output variable. On the other hand, when creating the BP model for the forecast of the closing price of the next third trading day, the input variables are the retained normalized technical indicators \( \left( {f_{i1}^{(3)} ,f_{i2}^{(3)} , \ldots ,f_{i6}^{(3)} } \right) \) and the output variable is the normalized closing price of the third trading day. Table 13 summarizes the partial execution results. The 5-26-1 and 6-35-1 neural models are selected as the final optimal model for forecasting the closing prices of the next second and third trading days and are, respectively, named BP_BP_Modelfinal_2_day and BP_BP_Modelfinal_3_day.

Table 13 Partial execution results of BP neural networks (with feature selection by BP)

In addition, the preliminary optimal GP forecasting model GP_Modelpre_1_day contains the 1st, 2nd, 4th, 5th, and 13th normalized technical indicators, denoted by \( \left( {f_{i1}^{\prime (1)} ,f_{i2}^{\prime (1)} , \ldots ,f_{i5}^{\prime (1)} } \right) \). Hence, to construct the final optimal forecasting model, a BP neural network is used again based on the training and test data in Table 1, where the input variables are \( \left( {f_{i1}^{\prime (1)} ,f_{i2}^{\prime (1)} , \ldots ,f_{i5}^{\prime (1)} } \right) \) and the output is the normalized closing price of the next trading day. Similarly, a BP neural network can be applied to the training and test data, as described in Sect. 4.1.2, to construct the final forecasting models for the forecast of the closing prices in the next second trading day, where the input variables are the 1st, 2nd, 5th, and 6th normalized technical indicators, denoted by \( \left( {f_{i1}^{\prime (2)} ,f_{i2}^{\prime (2)} ,f_{i2}^{\prime (2)} ,f_{i4}^{\prime (2)} } \right) \), that appeared in the preliminary optimal GP forecasting model (i.e., GP_Modelpre_2_day) and the output is the normalized closing price of the second trading day. When forecasting the closing price of the next third trading day, the 1st, 2nd, 4th, 5th, 9th, and 15th normalized technical indicators, denoted by \( \left( {f_{i1}^{'(3)} ,f_{i2}^{'(3)} , \ldots ,f_{i6}^{'(3)} } \right) \), that are contained in the preliminary optimal GP forecasting model (i.e., GP_Modelpre_3_day), serve as the input variables, and the normalized closing price of the third trading day serves as the output variable. Table 14 summarizes the partial execution results. The 5-23-1, 4-28-1, and 6-28-1 neural models are selected as the final optimal models for forecasting the closing prices of the next, next second, and next third trading days, and they are, respectively, named GP_BP_Modelfinal_1_day, GP_BP_Modelfinal_2_day, and GP_BP_Modelfinal_3_day.

Table 14 Partial execution results of BP neural networks (with feature selection by GP)

Next, the retained normalized technical indicators \( \left( {f_{i1}^{(1)} ,f_{i2}^{(1)} , \ldots ,f_{i5}^{(1)} } \right) \), \( \left( {f_{i1}^{(2)} ,f_{i2}^{(2)} , \ldots ,f_{i5}^{(2)} } \right) \), and \( \left( {f_{i1}^{(3)} ,f_{i2}^{(3)} , \ldots ,f_{i6}^{(3)} } \right) \) for each trading day are fed into the appropriate corresponding forecast models, namely, BP_BP_Modelfinal_1_day, BP_BP_Modelfinal_2_day, and BP_BP_Modelfinal_3_day, to obtain the normalized forecasted futures prices for the next, next second, and next third trading days. The forecasted closing prices of futures can then be acquired by denormalizing these normalized forecasted futures prices. Similarly, the closing prices of the next, next second, and next third trading days can be acquired by inputting the normalized technical indicators \( \left( {f_{i1}^{\prime (1)} ,f_{i2}^{\prime (1)} , \ldots ,f_{i5}^{\prime (1)} } \right) \), \( \left( {f_{i1}^{\prime (2)} ,f_{i2}^{\prime (2)} ,f_{i2}^{\prime (2)} ,f_{i4}^{\prime (2)} } \right) \), and \( \left( {f_{i1}^{\prime (3)} ,f_{i2}^{\prime (3)} , \ldots ,f_{i6}^{\prime (3)} } \right) \) for each trading day to the appropriate corresponding final optimal BP models, namely, GP_BP_Modelfinal_1_day, GP_BP_Modelfinal_2_day, and GP_BP_Modelfinal_3_day, and denormalizing these output values.

4.4 Measuring the forecast performance

To evaluate the overall performance of the proposed forecasting procedure, the statistics, including root mean squared error (RMSE) and mean absolute percentage error (MAPE), as shown in (13) and (14), are used to assess forecast errors, as shown in Table 15. For comparison, this table also lists the performance evaluation regarding the preliminary optimal BP and GP forecasting models. In addition, the maximum and minimum absolute percentage errors (APEs), denoted by \( {\text{APE}}_{ \max } \) and \( {\text{APE}}_{ \min } \), respectively, are also available in this table. As shown in Table 15, some findings can be obtained from the forecasting results using the BP_BP_Modelfinal_1_day, BP_BP_Modelfinal_2_day, and BP_BP_Modelfinal_3_day models. The overall MAPEs of forecasting the closing futures prices of the next, next second, and next third trading days are a mere 1.36, 1.86, and 2.26%, respectively. The absolute percentage of the difference between the actual and forecasted closing prices is, on average, only 2.26%, even with respect to the most difficult case of forecasting, which is the forecast of the closing futures price of the next third trading day. This indicates that the proposed procedure in this study can yield a forecast of great precision for the closing futures prices over a given period of three trading days. In forecasting the closing futures price of the next trading day, the minimum APE attains the excellent level of 0.0001%, and the maximum APE is around 10.5%. In forecasting the closing futures price of the next third trading day, the maximum APE grows to 15.2% and the minimum APE still attains the excellent level of 0.0025%. The performance of forecasting the closing futures prices of the next 3 days via the BP_BP_Modelfinal_1_day, BP_BP_Modelfinal_2_day, and BP_BP_Modelfinal_3_day models is almost better than that obtained through the GP_BP_Modelfinal_1_day, GP_BP_Modelfinal_3_day, and GP_BP_Modelfinal_3_day models, with respect to RMSE, MAPE, the minimum APE, and the maximum APE. The only exception to this statement is that the maximum APE of BP_BP_Modelfinal_1_day is slightly larger than that acquired by GP_BP_Modelfinal_1_day. Hence, we can generally conclude that the feature selection method that uses the simulation technique based on the preliminary BP neural model is more effective in selecting important technical indicators for forecasting closing futures prices than the method that determines these indicators solely on the basis of the preliminary GP forecast models.

Table 15 Performance of the proposed forecasting procedure

In addition, the final BP forecast models that use as input variables the retained normalized technical indicators based on the preliminary BP neural model and simulation technique perform better than their corresponding preliminary BP forecast models. Similarly, the final BP forecast models with input variables that appear in the preliminary GP models perform better than their corresponding preliminary GP forecast models, with one exception: the minimum APE of GP_BP_Modelfinal_2_day is somewhat larger than that of GP_Modelpre_2_day. Therefore, we can generally conclude that the technical indicators that are critical to the forecast of futures prices can be determined by applying a feature selection method that uses either a simulation technique based on the preliminary BP neural model or the screening technique of the preliminary GP forecast model. Based on the above analysis, the proposed forecasting procedure in this study can be considered an effective tool for forecasting futures prices in the next three trading days with the use of technical indicators.

5 Conclusions

With the inherent high volatility, complexity, and turbulence of stock/futures prices, financial forecasting is a challenging task that continues to be investigated by researchers and practitioners. The approaches undertaken in the past for tackling the stock/futures price forecasting problems can be broadly classified into three main categories: fundamental analysis, technical analysis, and traditional time series. Each of these approaches has its merits and limitations. In this study, a backpropagation (BP) neural network, a feature selection technique, and genetic programming (GP) were utilized to develop an integrated approach to deal with the stock/futures price forecasting problems. First, the BP neural network was used to construct a preliminary forecasting model that describes the complex nonlinear relationship between technical indicators and future stock/futures prices. Next, the feature selection technique based on simulation was utilized to explore the preliminary forecasting model; thus, the most important technical indicators for forecasting stock/futures prices were determined. For comparison, the GP algorithm was also employed to build another forecasting model that automatically screens the vital technical indicators which closely correlate with future stock/futures prices. Then, the BP neural network was applied again to construct the final stock/futures price forecasting model. At this stage, the technical indicators retained by the feature selection technique or GP algorithm were used as the input variables. Finally, the performance of the final forecasting model was evaluated and compared with that of the preliminary BP and GP forecasting models.

The feasibility and effectiveness of the proposed forecasting procedure were verified through a case study on forecasting the closing prices of Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) futures of the spot month over the period from January 2, 2001, to June 30, 2010. The obtained results showed that the overall mean absolute percentage errors (MAPEs) of forecasting the closing futures prices of the next and next second trading days were only 1.36 and 1.86%, respectively. Even in the most difficult case of forecasting, i.e., the forecast of the closing futures price in the next third trading day, the absolute percentage of the difference between the actual and forecasted closing prices was only 2.26%, on average. The analysis also indicated that the technical indicators which are critical to the forecast of futures prices can be determined by applying one of the following: a feature selection method that uses a simulation technique based on the preliminary BP neural model or the screening technique of the preliminary GP forecast model. Based on the above information, the proposed forecasting procedure can be considered a feasible and effective tool for stock/futures price forecasting.

However, the proposed forecasting procedure still has some limitations. First, during the construction of both the preliminary and final forecasting models, there are no absolute criteria based on the RMSE of the training and test data that will enable the selection of the optimal BP neural network or GP models. Next, the key parameter settings in BP neural networks or GP algorithms may influence the final training results; however, there are no exact rules for setting such key parameters. Finally, this study proposed the direction of change (CD) regarding the MAPE of the forecast and constructed the final BP forecasting models by developing a rule based on a simulation technique for selecting important normalized technical indicators as input variables. However, this is only a heuristic method, and the feasibility and effectiveness of the approach can only be verified through practical implementation. Further research directions suggested by this study might include using clustering techniques to divide the sample data into clusters such that the approximate functional model that describes an implicit mathematical relationship between the technical indicators and future closing stock/futures price can be constructed more easily and precisely. In addition, optimizing the parameters of BP neural networks or GP algorithms through other soft computing methods (e.g., particle swarm optimization or ant colony optimization) might be another future research direction.