1 Introduction

An intelligent stock trading recommender system capable of recommending buy/sell decisions to the user can be a great asset to an individual attempting to profit by trading in stocks. However, due to the nonlinear nature of stock price movements, it is considered to be an extremely challenging task. Work over the past century, e.g. in Cowles (1933) and the Efficient market hypothesis proposed by Fama (1969, 1995) even suggests that it is not possible to obtain excess returns from the markets. Though a number of studies tend to support this hypothesis, e.g. Caporale et al. (2016), there is a large body of empirical evidence, e.g. in Atsalakis and Valavanis (2009b) and Nair and Mohandas (2015a) to suggest that soft computing based techniques can be successfully employed to forecast stock price movements and hence, can be utilized for developing a successful trading strategy. However, there are two major aspects of such soft computing based recommender systems that play a significant role in their ability to make successful trading recommendations: (a) selection of the appropriate soft computing technique and (b) the choice of the optimal input feature set to ensure that the model trained using the selected soft computing technique learns the patterns in stock price movements and is capable of making profitable trading recommendations. This study is presents an innovative technique to address both these aspects by proposing the design and empirical validation of classifier based stock trading recommender systems.

Soft computing/data mining based techniques have been widely used in financial applications, for e.g. in stock signal prediction (Luo and Chen 2013), stock index and stock price prediction (Zarandi et al. 2013; Banik et al. 2007; Dai et al. 2012), exchange rate forecasting (Khashei and Bijari 2011). Atsalakis and Valavanis (2009b) presents a survey of application of soft computing techniques in financial forecasting. Soft computing/data mining based approaches tend to outperform the traditional techniques such as regression, however, it is observed that there are very few studies available with respect to the effectiveness of classifier based recommender systems. This is especially true in case of India, which also happens to be one of the largest emerging economies.

Selection of the optimal input feature set can significantly affect the performance of a forecasting/recommender system. As was observed in Nair and Mohandas (2015a), there appears to be no consensus on the input features that should be used to ensure high prediction accuracy. Historical stock time series data itself has been used as input, for e.g. in Atsalakis and Valavanis (2009a) and Saad et al. (1998).Technical indicators have been used, with varying degrees of success, for e.g. in Chan and Teong (1995), Dempster et al. (2001), Kuo and Chen (2006), Lee and Chen (2007) and Nair and Mohandas (2015b). Fundamental indicators and macroeconomic indicators have also been used, for e.g. in Abbasi and Abouec (2008) and Ballings et al. (2015) etc. However, since the recommender systems evaluated in the present study are designed to recommend trading decisions over a short time frame, fundamental and macroeconomic indicators might not be suitable as input features. In the present study, impact of three different input feature sets, (a) the daily stock data, (b) daily stock data along with technical indicators with all technical indicator window lengths set to the naïve assumption of the minimum i.e. two-days (this feature set is referred to, later in this study as: daily data-naïve TI) and (c) daily stock data along with technical indicators with the optimal window length identified using Genetic Algorithms (GA) (this feature set is referred to, later in this study as: daily data-GA-TI), on the performance of each of the recommenders considered is also empirically evaluated on 293 stocks drawn from Group ‘A’ stocks in the Bombay Stock Exchange (BSE). An attempt is also made to: (1) empirically determine if ensemble classifier based recommender systems are capable of outperforming single classifiers, as in Ballings et al. (2015) and (2) to identify the optimal set of input features that can result in optimal performance. In addition to accuracy, the financial implications of following the recommendations made by each classifier is also evaluated for all the 293 stocks using eight financial performance measures, as employed in Nair and Mohandas (2015b).

Remainder of the paper is organized as follows: Sect. 2 presents the methodology followed in design of the recommender systems, Sect. 3 presents the results and the conclusions are presented in Sect. 4.

2 Methodology

Block diagram of the recommender system evaluation process is presented in Fig. 1.

Fig. 1
figure 1

Recommender system evaluation process block diagram

2.1 Data Extraction

First step in the process is the extraction of historical stock price data. Daily data (daily open, high, low, close, adjusted close and traded volume) for 293 firms that constitute the Group ‘A’ firms in the Bombay Stock Exchange (BSE) is extracted from (“Yahoo! Finance India,” 2015).

2.2 Pre-processing

The second step, preprocessing, involves extraction of features from the daily stock price data. As discussed in the previous section, three different input feature sets are evaluated to identify the optimal input feature set that can generate the optimal recommender performance. First feature set consists of six features (the daily stock price data alone), i.e. each input sample to the recommender at time t, can be represented as:

$$ \varvec{X}_{\varvec{t}} = \left\{ {\begin{array}{*{20}c} {Open_{t} } & {High_{t} } & {Low_{t} } & {Volume_{t} } & {Close_{t} } & {Adj\_Close_{t} } \\ \end{array} } \right\}. $$

The second and third feature sets consist of the daily stock price data along with the technical indicator values at that instant. A total of nineteen technical indicators are considered in the present study based on the results reported in Nair et al. (2010) and Nair and Mohandas (2015b). The technical indicators can be classified into three broad categories:

  1. (a)

    Price based indicators: (1) Stochastics: consisting of two indices, namely, stochastic %K and %D, (2) Relative Strength Index (RSI), (3) Price Rate of Change (ProC), (4) Ulcer Index (UI), (5) Moving Average Convergence Divergence (MACD) consisting of two indices: Nine Period Moving Average (NPMA) and MACD Line, (6) Ease of Movement (EM), (7) Ultimate Oscillator (UO), (8) Acceleration (Acc), (9) Momentum (Mom), (10) William’s  %R, (xi) Highest High (HH), (12) Lowest Low (LL).

  2. (b)

    Volume based indicators: (1) Percentage Volume Oscillator (PVO), (2) Positive Volume Index(PVI), (3) Negative Volume Index(NVI), (4) On-balance Volume (OBV).

  3. (c)

    Overlays: (1) Twenty-five day Moving Average (MA (25)), (2) Sixty-five day Moving Average (MA (65)), (3) Bollinger Bands composed of three indices: Bollinger Upper (BU), Bollinger Mid (BM) and Bollinger Lower(BL).

Detailed discussions on the indicators listed above and their calculation can be found in Eng (1998) and Jobman (1998). Indicators listed above are widely used by the trading community, however, not all indicators are useful under all market conditions. Stock traders tend to select relevant technical indicators and the parameters used for calculating the technical indicators, largely based on their experience. Hence, the second feature set used in this study takes the naïve assumption and fixes calculation window size for all the technical indicators to their lowest possible values (i.e. 2 days). The third feature set employs Genetic Algorithms (GAs) to find the optimal technical indicator window size. The objective function to be maximized was the profit factor (PF). Population size of 40 was considered with elitist selection, uniform crossover and Gaussian mutation.

2.3 Recommender System Design

Once the features are extracted, the next step is to train the recommender system using the feature set. All the recommender systems considered in the present study use classification as the primary technique for recommending stock trading decisions. The recommenders generate ‘Buy’ or ‘Hold’ recommendations on a daily basis, based on the 3-month ahead stock direction forecast. If the stock price is likely to go up by more than 5% from the current price, 3 months from the current date (taken as 65 days-ahead, considering 5-day week for the BSE and holidays). The value of 5% was chosen due to the fact that some of the largest stock broking firms in India impose a maximum brokerage charge for small value trades that is typically quoted as a fixed amount or 2.5% of the traded value per trade, whichever is lower. However, for larger value trades, a brokerage of around 0.5% (this varies slightly depending on the broker and the type of trading plan a customer opts for) of the traded value is charged. A detailed description can be found in HDFC securities Ltd. (2018) and ICICI securities Ltd. (2018). Two trades (one buy and one sell) need to be completed for generating profit (or loss). Hence, for the trader to generate a profit even for small value transactions after paying the brokerage charges, the profits should exceed 5%.

The decision table used for training the recommender systems, consisting of N samples and M features takes the form

$$\varvec{D} = \left( {\left. {\begin{array}{l} {\varvec{X}_{\varvec{t}} } \\ {\varvec{X}_{{\varvec{t} - 1}} } \\ \vdots \\ {\varvec{X}_{{\varvec{t} - \left( {\varvec{N} - 1} \right)}} } \\ \end{array} } \right|\begin{array}{*{20}c} {\varvec{C}_{\varvec{t}} } \\ {\varvec{C}_{{\varvec{t} - 1}} } \\ \vdots \\ {\varvec{C}_{{\varvec{t} - \left( {\varvec{N} - 1} \right)}} } \\ \end{array} } \right) $$

where \( \varvec{X}_{\varvec{t}} = \left\{ {\begin{array}{*{20}c} {x_{1t} } & {x_{2t} } & \ldots & {x_{Mt} } \\ \end{array} } \right\} \) are the M feature values at day t.

And the corresponding output class at time t, given by

$$ \varvec{C}_{\varvec{t}} = \left\{ {\begin{array}{l} {Hold, x_{t + 65} - x_{t} < 0.05} \\ {Buy, x_{t + 65} - x_{t} \ge 0.05} \\ \end{array} } \right. $$

Hence, it becomes a binary classification problem with two classes, C1t = ‘Hold’ and C2t = ‘Buy’.

Two categories of classifiers were evaluated in this study: Single classifiers and Ensemble classifiers.

Single classifiers attempt to classify the data points using a single learner while ensemble classifiers use an ensemble of weak learners to generate a classifier. Single classifiers have been used, for e.g. in Nair et al. (2011). The only study apart from the one presented in this paper was on the evaluation of ensemble classifiers for stock direction prediction (Ballings et al. 2015) for European stocks. No such study on Indian stocks appears to have been carried out, so far. Following classifiers were considered for generating the recommenders:

2.3.1 K-Nearest Neighbor (kNN) Classifier

The kNN classifier (Han et al. 2006) can be considered a ‘lazy’ classifier since it does not require any training. All the training samples need to be stored in the memory and once a test sample Xt is available, the closest k samples to Xt are identified from the training dataset based on some distance measure such as the Euclidean distance. Now Xt is simply assigned the class to which the majority of its k nearest neighbors in the training set belong.

2.3.2 Naïve Bayes Classifier(NB)

The Naïve Bayes (Han et al. 2006) classifier, based on the Bayes theorem, attempts to find the posterior probabilities for both the classes C1t and C2t, given by \( P\left( {C_{1t} |\varvec{X}_{\varvec{t}} } \right) = \frac{{P\left( {C_{1t} } \right)p(\varvec{X}_{\varvec{t}} |C_{1t} )}}{{p\left( {\varvec{X}_{\varvec{t}} } \right)}} \) and \( P\left( {C_{2t} |\varvec{X}_{\varvec{t}} } \right) = \frac{{P\left( {C_{2t} } \right)p(\varvec{X}_{\varvec{t}} |C_{2t} )}}{{p\left( {\varvec{X}_{\varvec{t}} } \right)}} \). The sample is assigned to the class for which the posterior probability is higher, with the Naïve assumption being: \( p\left( {\varvec{X}_{\varvec{t}} |C_{it} } \right) = p\left( {x_{1t} |C_{it} } \right). p\left( {x_{2t} |C_{it} } \right). \cdots .p\left( {x_{2t} |C_{it} } \right) \), i = 1, 2

2.3.3 Linear Discriminant Analysis Classifier (DISC)

Discriminant analysis (Teknomo 2015; Tufféry 2011) is also based on Bayes theorem, however, the probability, \( p(\varvec{X}_{\varvec{t}} |C_{1t} ) \) and \( p(\varvec{X}_{\varvec{t}} |C_{2t} ) \) are modeled as multivariate normal distributions with the density function being represented by:

$$ p\left( {\varvec{X}_{\varvec{t}} |C_{it} } \right) = \frac{1}{{\sqrt {\left( {2\pi } \right)^{M} {\varvec{\Sigma}}_{\varvec{i}} } }}e^{{ - \frac{1}{2}\left( {\varvec{X}_{\varvec{t}} -\varvec{\mu}_{\varvec{i}} } \right)^{T} {\varvec{\Sigma}}_{\varvec{i}}^{ - 1} \left( {\varvec{X}_{\varvec{t}} -\varvec{\mu}_{\varvec{i}} } \right)}} ,\;{\text{where}}\;i = 1, \, 2. $$
(1)

where \( {\varvec{\Sigma}}_{\varvec{i}} \) is the \( M \times M \) dimensional covariance matrix for the i-th class; \( \varvec{\mu}_{\varvec{i}} \) is the \( M \times 1 \) dimensional vector of means of the M features in the i-th class.

The sample Xt is assigned to the class i for which \( P\left( {C_{it} } \right)p\left( {\varvec{X}_{\varvec{t}} |C_{it} } \right) \) is higher.

Since linear discriminant is employed in the present study, \( {\varvec{\Sigma}}_{\varvec{i}} = \varvec{ }{\varvec{\Sigma}}_{\varvec{j}} = {\varvec{\Sigma}} \).

As in the Bayes classifier, the sample Xt is assigned to the class for which \( P\left( {C_{it} } \right)p\left( {\varvec{X}_{\varvec{t}} |C_{it} } \right) \), where i = 1,2; is the highest.

2.3.4 Decision Tree Based Classifier (TREE)

Classification tree employed in the present study attempts to successively partition D by selecting one among the M features into smaller subsets (nodes) such that the Gini impurity at each node of the tree is minimized. The Gini impurity at node n is given by

$$ Gini_{n} = 1 - \left( {p_{{C_{1} }}^{2} + p_{{C_{2} }}^{2} } \right) $$
(2)

where \( Gini_{n} \) = Gini impurity at node n.\( p_{{C_{1} }} \), \( p_{{C_{2} }} \) are the fractions of classes C1 and C2 that reach the node n.

In this study, the partitioning stops at a node n (resulting in a leaf node) if \( Gini_{n} = 0 \) or if the number of samples in a node drops below 10.

2.3.5 SVM Classifier

SVM classifier used in the present study attempts to find, from the decision table D, the optimal hyperplane of the form \( C\left( \varvec{X} \right) =\varvec{\beta}^{T} \varvec{X} + b \), that separates the two classes C1 and C2 (represented typically using class labels + 1 and − 1) such that the hyperplane is at maximum distance from points in the two classes. For a given sample \( \varvec{X}_{\varvec{t}} \) SVM with linear kernel is used in the present study. Detailed discussion on SVM classifier can be found in Soman et al. (2010).

2.3.6 Ensemble Classifiers

Bagging, as proposed in Breiman (1994) creates an ensemble of classifiers by training the classifiers on randomly drawn N samples from the dataset, with replacement. It must be noted that N is the number of samples in the dataset.

Two boosting techniques are also evaluated: Adaboost (Freund and Schapire 1997) and RobustBoost (Yoav Freund 2009)

2.4 Performance Evaluation Measures

Performance of the recommender systems was evaluated on two fronts:

  1. (a)

    Efficiency of classification, as measured by classification accuracy and

  2. (b)

    Economic performance as evaluated by eight performance measures suggested in Brabazon and O’Neill (2008).

Considering the total number of trades as TT and the profit or loss made in each transaction (assuming a transaction cost of 0.5%) be represented as pi, where i = 1,2,…,TT. This set of profit (loss) generated for each transaction can be represented as: \( \varvec{P} = \left\{ {\begin{array}{*{20}c} {p_{1} } & {p_{2} } & \ldots & {p_{{\left| \varvec{P} \right|}} } \\ \end{array} } \right\} \)

The set of trades that result in profit can be represented as: \( \varvec{S} = \left\{ {\begin{array}{*{20}c} {p_{1} } & {p_{2} } & \ldots & {p_{{\left| \varvec{S} \right|}} } \\ \end{array} } \right\} \), \( p_{i} \ge 0 \), i = 1,2,…,|S| and \( \varvec{S} \subseteq \varvec{P} \)

The set of trades that result in loss can be represented as: \( = \left\{ {\begin{array}{*{20}c} {p_{1} } & {p_{2} } & \ldots & {p_{{\left| \varvec{L} \right|}} } \\ \end{array} } \right\} \), \( p_{i} < 0 \), i = 1,2,…, |L| and \( \varvec{L} \subset \varvec{P} \)

Where \( \left| \varvec{S} \right| + \left| \varvec{L} \right| = \left| \varvec{P} \right| \), \( \varvec{L} \cup \varvec{S} = \varvec{P} \) and \( \varvec{S} \cap \varvec{L} = \phi \)

  1. 1.

    Total Profit (TP): Total profit or loss made considering all trades, \( TP = \mathop \sum \limits_{{p \in \varvec{P}}} p \)

  2. 2.

    Average Profit (AP): Average profit or loss made considering all trades, \( AP = \frac{1}{{\left| \varvec{P} \right|}}\mathop \sum \limits_{{p \in \varvec{P}}} p \)

  3. 3.

    Profit per Successful Trade (P/ST): Profit considering only the profit-making trades, \( P/ST = \frac{1}{{\left| \varvec{S} \right|}}\mathop \sum \limits_{{s \in \varvec{S}}} s \)

  4. 4.

    Loss per Loss making Trade (L/LT) : Loss considering only the loss-making trades, \( L/LT = \frac{1}{{\left| \varvec{L} \right|}}\mathop \sum \limits_{{l \in \varvec{L}}} l \)

  5. 5.

    Maximum Drawdown (MD): The lowest profit(or the highest loss) incurred among all trades executed, \( MD = \mathop {\hbox{min} }\limits_{{p \in \varvec{P}}} p \)

  6. 6.

    Total Trades (TT): Total number of trades, \( TT = \left| P \right| \)

  7. 7.

    Profit Factor (PF) : Ratio of the total profits generated by the profit-making trades to the total loss generated by the loss-making trades, \( PF = \frac{{\mathop \sum \nolimits_{{s \in \varvec{S}}} s}}{{\mathop \sum \nolimits_{{l \in \varvec{L}}} l}} \)

  8. 8.

    Win ratio (WR): Ratio of the total profit-making trades to the total loss-making trades, \( WR = \frac{{\left| \varvec{S} \right|}}{{\left| \varvec{L} \right|}} \)

3 Results and Discussion

A comprehensive study comprising of 293 group ‘A’ stocks drawn from the BSE was carried out. The broad distribution of stocks across the sectors is given in Table 1. Stock details are presented in "Appendix 10".

Table 1 Distribution of stocks across sectors

The daily stock data consists of the daily Open price, Close price, High Price, Low Price, Adjusted Closing Price and Volume. Time frame considered was from May 25, 2012 to March 30, 2015. Data up to December 31, 2014 was used for training and the last 3 months data for testing.

The effectiveness of each classifier was evaluated as the first step. It was observed that decision tree classifiers were able to outperform all other single classifiers. It was also observed that incorporating technical indicators into the input feature set improved the accuracy of the classification process. Classification accuracies for each of the 293 stocks for each of the three different input feature sets is presented in "Appendix" 1, 2, and 3, respectively. Figure 2 presents the scatter plots of the testing accuracy vs the training accuracy (in percentages) for all stocks considered and each of the single classifiers considered in the present study trained using only the daily stock data. It must be noted that each point on the plot represents the testing vs training accuracy of one stock.

Fig. 2
figure 2

Plots of training accuracy vs testing accuracy for standalone classifier based recommenders trained using only daily data: a NB, b SVM, c Tree, d discriminant and (e) kNN. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It can be observed from Fig. 2 (as well as from "Appendix 1"), that apart from decision tree based and kNN based recommenders, the training accuracies of other three classifiers varied widely from stock to stock. It was also observed that in case of kNN classifier based recommender systems (Fig. 2e), while the training accuracy was close to 100% for most of the stocks, the forecasting accuracy for the testing dataset was poor. This could be attributed to the fact that by its very nature, there is no real ‘learning’ taking place in case of kNN classifier. Decision tree classifier based recommender offered relatively better performance for the testing data, however it can be seen from Fig. 2c that while the training accuracies were between 90% -98% for most of the stocks, accuracy for the test data showed wide variation with the average accuracy of around 50%.

Figure 3 shows the scatter plots of the testing accuracy vs the training accuracy (in percentages) for all 293 stocks considered and each of the single classifiers considered in the present study trained using the daily stock data and technical indicators with the naïve assumption of setting all the technical indicator window sizes to their minimum (i.e. 2 days). Details for each stock has been presented in "Appendix 2".

Fig. 3
figure 3

Plots of training accuracy vs testing accuracy for standalone classifier based recommenders trained using daily data-naïve TI for: a NB, b SVM, c Tree, d Discriminant, e kNN. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It can be seen from Fig. 3 (and "Appendix 2") that the performance of single classifiers shows significant improvement with the addition of technical indicators as additional features for all except SVM and kNN based recommenders. The training accuracies for naïve Bayes based recommenders (see Fig. 3a), improved, on an average from around 70% (when trained using daily data alone) to around 90%, with the test data accuracies also demonstrating a marked improvement. The improvement was most marked for decision tree based recommender system with training accuracy of around 100% and testing accuracy between 96–100% for all 293 stocks. Discriminant classifier based recommenders, as can be observed from Figs. 2d, 3d, also showed a marked improvement in accuracy. It must be noted that in Fig. 3c only three distinct data points are visible which is due to the fact that for of the 293 stocks, the test data accuracies were found to be 98.4% for 12 stocks, 96.9% for one stock (MARUTI) and 100% for the rest of the stocks.

Performance of single classifiers when the technical indicator window sizes are tuned using GA, for all the 293 stocks, is presented in Fig. 4. Performance figures for each stock has been listed in "Appendix 3".

Fig. 4
figure 4

Plots of training accuracy vs testing accuracy for standalone classifier based recommenders trained using daily data-GA-TI for: a NB, b SVM, c Tree, d Disc, e kNN. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It is observed that optimization of the window lengths of TIs using GA did not offer any discernable improvement in accuracy for any of the single classifiers considered.

As the next step, the accuracy of ensemble classifier based recommender systems was evaluated. Adaboost, Robustboost and bagging ensemble classifier based recommender systems were evaluated. Two weak learners namely, discriminant and tree were considered. Discriminant weak learner based ensemble classifier recommenders were evaluated first, followed by tree weak learner based ensemble classifier recommenders. Figure 5 presents the scatter plots of testing vs training accuracy for Adaboost classifier based recommender systems. Discriminant was used as the weak learner in this case. Accuracy values for each of the 293 stocks is presented in "Appendix 4".

Fig. 5
figure 5

Plots of training accuracy vs testing accuracy for Adaboost ensemble classifier based recommenders employing discriminant as the weak learner trained using: a daily data, b daily data-naïve TI and c daily data-GA-TI. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It is observed that for 198 of the 293 stocks, the test accuracy was 100%, 72 stocks generated test accuracy between 90.1–99.9% and 7 stocks, between 80–90%. The results remained consistently same for all three different input feature set combinations.

Training vs testing accuracy scatter plots for bagging classifier based recommender systems is presented in Fig. 6. Accuracies for each stock can be obtained from "Appendix 5".

Fig. 6
figure 6

Plots of training accuracy vs testing accuracy for bagging classifier based recommenders employing discriminant as the weak learner trained using: a daily data, b daily data-naïve TI and c daily data-GA-TI. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It is observed that bagging classifier based recommenders using discriminant as weak learner were able to generate testing accuracy in excess of 60% for 261 of the total 293 stocks when trained using the daily stock data alone. This dropped by one to a total of 260 stocks for the other two feature sets. However, while only 219 stocks were able to generate accuracies in the range 80–100% for bagging-discriminant recommenders trained using daily data alone, this number rose up to 233 and 234 stocks for bagging-discriminant recommenders trained using daily data-GA-TI input feature set and daily data-naïve TI feature set, respectively. Only 85,86 and 83 stocks reported test accuracy of 100% for daily data, daily data-naïve TI and daily data-GA-TI trained bagging-discriminant classifiers. Accuracies for training data were also found to be poorer when compared to Adaboost-discriminant based recommenders with training accuracies of only 3,4 and 4 stocks corresponding to daily data alone, daily data-naïve TI and daily data-GA-TI input feature sets being above 99%. For 259, 262 and 263 stocks respectively, the training accuracies were in the range 90.1–99%.

Accuracies for Robustboost ensemble classifier based recommender systems for all the stocks considered in the present study is presented in Fig. 7. Discriminant was used as the weak learner. Performance details for each of the 293 stocks can be found in "Appendix 6".

Fig. 7
figure 7

Plots of training accuracy vs testing accuracy for Robustboost ensemble classifier based recommenders employing discriminant as the weak learner trained using: a daily data, b daily data-naïve TI and c daily data-GA-TI. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It is observed that 134 of the 293 stocks considered, show an accuracy of 100% for the test data when Robustboost ensemble classifier based recommenders employing discriminant as the weak learner are trained using daily data alone (Fig. 7a). This number slightly to 138 stocks each for the other two input feature sets. Test data classification accuracies were between 90.1 and 99.9% for 105 stocks when Robustboost-discriminant classifier based recommenders are trained using daily data, while for the other two input feature set combinations, this improved slightly to 110 stocks in both cases.

Figure 8 presents the scatter plots of testing vs training accuracy for Adaboost classifier based recommender systems. Tree was used as the weak learner in this case. Accuracy values for each of the 293 stocks is presented in "Appendix 7".

Fig. 8
figure 8

Plots of training accuracy vs testing accuracy for Adaboost ensemble classifier based recommenders employing tree as the weak learner trained using: a daily data, b daily data-naïve TI and c daily data-GA-TI. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

Compared to any of the discriminant weak learner classifier based recommenders, Adaboost ensemble classifiers with tree as weak learner performed poorly. When trained using only daily stock data, the recommenders could not correctly classify even a single test data point for 33 of 293 stocks considered. This further worsened for the other two input feature sets to 44 stocks. 100% test data classification accuracies were obtained for only 64 of the 293 stocks when trained using only the daily data while this worsened to 55 and 55 stocks each when trained using daily data-naïve TI and daily data-GA-TI feature sets. This could be attributed to very poor training results since only three stocks demonstrated training accuracy in excess of 80% for all the three input feature sets.

Accuracies for bagging ensemble classifier based recommender systems for all the stocks considered in the present study is presented in Fig. 9. Tree was used as the weak learner. Performance details for each of the 293 stocks can be found in "Appendix 8".

Fig. 9
figure 9

Plots of training accuracy vs testing accuracy for bagging classifier based recommenders employing discriminant as the weak learner trained using: a daily data b daily data-naïve TI and c daily data-GA-TI. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It is observed that for bagging classifier based recommenders employing discriminant as the weak learner, the training accuracy for all the 293 stocks for all three input feature sets was uniformly 100%. For testing data however, 264 stocks showed an accuracy of 100% when trained using daily data alone. Training using daily data-naïve TI feature set resulted in 270 stocks demonstrating a testing accuracy of 100%. This slightly improved to 272 stocks with 100% accuracy, when the recommenders were trained using training data-GA-TI input feature set.

Accuracies for Robustboost ensemble classifier based recommender systems using tree as the weak learner for all the stocks considered in the present study is presented in Fig. 10. Performance details for each of the 293 stocks can be found in "Appendix 9".

Fig. 10
figure 10

Plots of training accuracy vs testing accuracy for Robustboost ensemble classifier based recommenders employing tree as the weak learner trained using: a daily data, b daily data-naïve TI and c daily data-GA-TI. Training accuracy (%) is on x axis and testing accuracy (%) is on the y axis

It is observed from the results that the training accuracy for Robustboost ensemble classifier based recommenders employing tree as the weak learner for all the 293 stocks for all three input feature sets was uniformly 100%, similar to the results obtained for bagging ensemble classifier based recommender using tree as weak learner. Testing performance was, however, slightly better compared to bagging-tree ensemble based recommenders with 277 of 293 stocks classifying 100% of the test data correctly when trained using daily data alone. This further improved to 280 stocks that showed 100% test accuracy when the input feature set employed was daily data-naïve TI as well as when daily data-GA-TI was employed. Moreover, considering all the stocks and all the three different input feature sets, the worst test accuracy observed was 96.9%, while 290, 292 and 292 stocks showed test accuracy in excess of 98% when trained using daily data, daily data-naïve TI and daily data-GA-TI, respectively. Of all the recommender systems considered in the present study, Robustboost ensemble classifier based recommenders with tree as weak learner and trained using either daily data-naïve TI or daily data-GA-TI were seen to demonstrate highest accuracy, followed very closely by its third variant, trained using only the daily data.

As the next step, the economic performance of each type of recommender system is evaluated. A summary of the average profits (AP) generated for different classifier based recommender systems is presented in Tables 2 and 3. Both the tables present the number of stocks for which the AP was less than zero rupees (< 0), equal to zero rupees (0), between zero and hundred rupees (0–100) and greater than two hundred rupees (> 200). It must be noted that AP can become equal to zero under two conditions, namely, the resultant sum of all the profits and losses (considering transaction cost of 0.5% always) should sum to exactly zero or there should be no trades (i.e. TT for the stock = 0) at all. The first case was encountered rarely while the second case was observed much more frequently. A summary analysis of the number of trades is presented in Table 5.

Table 2 Number of stocks with average profits in five ranges, for single classifiers considered
Table 3 Number of stocks with average profits in five ranges, for ensemble classifiers considered

From Table 2, it is seen that of all the single classifier based recommenders considered, Tree based recommenders trained using daily data—naïve TI and daily data-GA-TI, yield the smallest number of stocks with average negative returns. It is also seen that SVM based recommenders, in more than 40% of the cases for all the three input feature sets, recommend ‘Hold’ for the entire time frame considered. kNN classifier based recommenders trained using daily data, were also seen to be effective, however, the number of stocks for which the average returns turned out to be negative were three times more than that for the tree based recommenders discussed above.

From Table 3, it can be observed that Robustboot ensemble classifier based recommenders result in minimal number of stocks with negative average returns for all the three input feature set combinations.

Maximum drawdown (MD) is another important parameter that determines the recommender performance. Ideally a recommender should be able to prevent large negative returns from trades. MD is a measure of the worst loss from among all the trades carried out. In the present study, if there are no loss-making trades reported for a stock, the lowest profit generated is taken as the MD. Table 4 presents the average MD per stock for proposed recommender systems. Average MD for each recommender is calculated as:

Table 4 Average MD for recommender systems considered
$$ Avg MD_{Recommender} = \frac{1}{293}\mathop \sum \limits_{i = 1}^{293} MD\left( i \right)_{Recommender} $$
(3)

Where:

Recommender = Recommender considered.

\( Avg MD_{Recommender} \) = Average MD for the recommender considered over all 293 stocks.

It can be seen from Table 4 that Robustboost ensemble classifier based recommenders employing tree as the weak learner tend to generate the best average MD when trained using daily data. Average MD values showed slight worsening for daily data-naïve TI and daily data-GA-TI based recommenders. Worst performance is demonstrated by NB based recommenders when trained using Daily data-naïve TI.

Total number of trades (TT) carried out during the time frame under consideration is a measure of how efficient the recommender is at identifying trading points. Ideally, a recommender should generate recommendations such that maximum profit can be realized using minimum number of trades. This will help the trader minimize the transaction costs that he/she might have to pay to the brokerage firm.

Table 5 presents the average TT per stock for proposed recommender systems. Average TT for each recommender is calculated as:

Table 5 Average TT for recommender systems considered
$$ Avg TT_{Recommender} = \frac{1}{293}\mathop \sum \limits_{i = 1}^{293} TT\left( i \right)_{Recommender} $$
(4)

where:

Recommender = Recommender considered.

\( Avg TT_{Recommender} \) = Average TT for the recommender considered over all 293 stocks.

From Table 5, it is seen that incorporating technical indicators into the input feature set along with the daily data results in reduction in the average number of trades recommended by Bagging and Robustboost ensemble classifier based recommenders. This trend is also observed in the case of Tree and NB based recommenders and to a small extent, in kNN based recommenders, as well.

It must be noted that, in addition to the TT, the profit factor (PF) and the win-ratio (WR) must also be considered while evaluating any recommender system. As described in Sect. 2.4 above, PF is the ratio of the profit from profit-making trades and the loss from loss-making trades. Ideally, for a recommender, the number of loss-making trades should minimal and even if loss-making trades do exist, the total loss from those loss-making trades should be much lower when compared to the total profit from profit-making trades. Hence, PF should be as high as possible for a good recommender system.

Number of stocks falling within four broad ranges of PF for each of the recommender system considered, is presented in Table 6. It must be noted that the heading ‘NA’ in Table 6 refers to the number of stocks for which PF could not be calculated (due to reasons such as no trade being executed during the time frame under consideration). The heading ‘Inf’ in Table 6 refers to the number of stocks for which the PF was infinity (i.e. the loss from loss-making trades was zero, implying no loss-making trade in the time frame). The number of stocks, for which the loss from loss-making trades exceeds the profit from profit-making trades, and thus, have PF < 1 are listed under head (< 1). Number of stocks, for which the loss from loss-making trades is less than the profit from profit-making trades have PF > 1 are listed under head (> 1).

Table 6 Number of stocks within each PF range for recommender systems considered

It can be seen from Table 6 that kNN based recommenders tend to generate only profit-making trades during the time frame considered, for 221 of the 293 stocks. This is closely followed by Robustboost and Bagging ensemble classifier based recommenders using tree as learner and trained using daily data. However, it must be noted that for kNN based recommenders, the number of stocks with PF < 1 was at least three times higher than the above two ensemble classifier based recommenders.

Number of stocks falling within four broad ranges of WR for each of the recommender system considered, is presented in Table 7. It must be noted that the heading ‘NA’ in Table 7 refers to the number of stocks for which WR could not be calculated (again, due to reasons such as no trade being executed during the time frame under consideration). The heading ‘Inf’ in Table 7 refers to the number of stocks for which the WR was infinity (i.e. the number of loss-making trades was zero). The number of stocks, for which the number of loss-making trades exceeds the number of profit-making trades, and thus, have WR < 1 are listed under head (< 1). Number of stocks, for which the number of loss-making trades is less than the number of profit-making trades have WR > 1 are listed under heading (> 1).

Table 7 Number of stocks within each WR range for recommender systems considered

It is observed from Table 7 that that kNN based recommenders tend to outperform other recommenders while considering the WR. Similar to the situation observed in the case of PF, this is closely followed by Robustboost and Bagging ensemble classifier based recommenders using tree as learner and trained using daily data. However, in this case too, for kNN based recommenders, the number of stocks with WR < 1 around three times higher than the above two ensemble classifier based recommenders.

4 Conclusions

A comprehensive empirical study involving 293 Group-A stocks from the Indian stock exchange BSE for evaluating the effectiveness of classifier based recommender systems in recommending stock trading decisions for Indian stocks, was carried out. Five single classifier and six ensemble classifier based recommender systems were considered. The effectiveness of three different input feature sets on recommender system performance was also empirically evaluated. It was observed that Robustboost and Bagging ensemble classifier based recommender systems employing tree as the weak learner were able to generate high profits, while at the same time minimizing the number of loss-making trades. kNN classifier based recommenders performed the best among single classifiers considered, however, the number of stocks for which the recommendations ended up in loss, was much higher for kNN based recommenders when compared to the above two recommenders. Another interesting observation made was that the recommender systems showed their best performance in terms of PF and WR, which indicate the ability of the recommender to recommend profitable trades, when trained using daily stock price data. Based on the results, it can be said that ensemble classifier based recommender systems employing tree as the weak learner and trained using the daily stock price data as the input feature set, can be successfully employed for trading in Indian stocks.