1 Introduction

The well-being of every growing economy, country or societies in this twenty first century mainly hinges on their market economies and stock-price, with the financial market being the pivot (Nassirtoussi et al. 2014; Göçken et al. 2016). Thus, it is essential and vital to study and learn about the financial market extensively. Due to a number of uncertainties (such as general economic conditions, social factors and political events at both homegrown and international levels), it is difficult to predict financial markets (Adebiyi et al. 2012, 2014b; Bisoi and Dash 2014; Ding et al. 2014; Rajashree et al. 2014; Rather et al. 2014; Lin 2018).

A stock market is a place for trading stocks (equity) and other financial instruments of public listed companies, where the price of shares is termed “share” or “stock price” (Wanjawa and Muchemi 2014). In reality, the stock price level of a firm, to a large extent reflects how it “cuts its pie” (Chan et al. 2017). Investments in the stock markets are often guided by some form of prediction (Wanjawa and Muchemi 2014; Ghaznavi et al. 2016). Three main approaches for stock market prediction, namely: fundamental analysis, technical analysis (charting) and technology (Machine learning) methods (Dunne 2015). Conversely, some scholars do categorise these three into 2, thus technical analysis and fundamental analysis (Nassirtoussi et al. 2014; Dunne 2015; Gyan 2015; Prem Sankar et al. 2015; Ahmadi et al. 2018).

The fundamental analysts approach concerned with the company that underlies the stock itself instead of the actual stock (Anbalagan and Maheswari 2014; Ghaznavi et al. 2016; Agarwal et al. 2017). The data used by the fundamental analyst usually are unstructured, which poses a difficult challenge. However, occasionally been proven to be a good predictor of stock price movement in the works of Zhang et al. (2011), Li et al. (2014a, b, d), Rather et al. (2014), Ballings et al. (2015), Liu et al. (2015), Sun et al. (2016), Kumar et al. (2016), Checkley et al. (2017), Tsai and Wang (2017), Zhang et al. (2017), Coyne et al. (2017), Pimprikar et al. (2017) and Sorto et al. (2017).

On the other hand, in the technical analysis, the analyst predicts the future price of stocks by studying the trends in the past and present stock price (Anbalagan and Maheswari 2014; Agarwal et al. 2017; Ahmadi et al. 2018). The following studies (Akinwale Adio et al. 2009; Guresen et al. 2011; Ju-Jie et al. 2012) and present (Rather et al. 2014; Laboissiere et al. 2015; Adebayo et al. 2017; Thanh et al. 2018; Umoru and Nwokoye 2018; Zhou et al. 2018) predicted future stock-price movement based on technical analysis. Figure 2 shows the general format for a technical analysis approach of stock market prediction.

Globally, billions of dollars are traded on the stock market daily, to make a profit (Dunne 2015). Thus, making stock-market prediction an attractive research area for researchers, investors and financial analysts, despite its difficulty (Ticknor 2013; Chen et al. 2014; Metghalchi et al. 2014; Rather et al. 2014; Prem Sankar et al. 2015; Wanjawa 2016; Agarwal et al. 2017; Tsai and Wang 2017; Lin 2018; Zhou et al. 2018). Hence, resulting in the application of machine learning and computational intelligence techniques in analysing the stock market trend. These include hidden Markov model, neural network, neuro-fuzzy inference system, genetic algorithm, time series analysis, regression, mining association rules, support vector machine (SVM), principal component analysis (PCA) and rough set theory among others (Chen et al. 2014; Lin 2018; Thanh et al. 2018).

This research seeks to perform a comprehensive systematic review of previous studies on stock market predictions based on the fundamental and technical analyst point of view, leading to the clarification of the current state-of-the-art and its possible future directions. Succinctly, this work contributes to the body of knowledge as summarised:

  1. 1.

    A well-organised review of pertinent literature with an emphasis on the different factors, which affects the movement of stock prices.

  2. 2.

    Acknowledgement of the distinguishing factors among existing works in the literature and the comparative analysis of the methods used in the predictive models.

  3. 3.

    Established justifications for future research directions that could alleviate the deficiencies identified in the existing techniques and proffer solutions for improvement.

The remaining sections of this paper are organised as follows. Section 2 provides insight to markets predictability, machine-learning algorithms, technical and fundamental analysis, predictive model evaluation, and a review of related works. Section 3 presents the research design, research framework, and sample techniques. Section 4 presents the results and discussions of findings. Section 5 gives a summary of the findings. Besides, Sect. 6 concludes this work and outline the directions for future research.

2 The categorisation of stock market decision techniques

This section discussed in brief the fundamental and technical analysis as decision-making tools for stock market decisions. The fundamental and technical analysis approach to stock market prediction is, as shown in Figs. 1 and 2, respectively.

Fig. 1
figure 1

Fundamental analysts approach

Fig. 2
figure 2

Technical analysts approach

Technical analysis

The technical analyst tries to predict the stock market through the learning of charts that portray the historical market-prices and technical indicators (Sureshkumar and Elango 2011; Wei et al. 2011; Suthar et al. 2012; de Oliveira et al. 2013; Ballings et al. 2015; Gaius 2015; Su and Cheng 2016). As shown in Fig. 2, the historical stock prices are preprocessed, and appropriate indicators are calculated and fed into the predictive model. Some of the technical indicators used in technical analysis discussed in Anbalagan and Maheswari (2014), Bisoi and Dash (2014) and Rajashree et al. (2014) are simple-moving average (SMA), exponential moving average (EMA), moving average convergence/divergence rules (MACD), relative-strength index (RSI) and on-balance-volume (OBV) as shown in Fig. 3.

Fig. 3
figure 3

Technical analysis

SMA

The SMA is ascertained by totalling the most recent closing prices of a stock and then dividing that by the number “n” of periods in the calculation average (Anbalagan and Maheswari 2014).

EMA

The EMA is similar to the SMA line except the given day’s EMA determination depends on the EMA calculations for all the days preceding that day.

$$ EMA = Price \left( t \right) \times k + EMA\left( y \right) \times \left( {1 - k} \right) $$
(1)

where t and y represents today and yesterday respectively, N is the number of days in EMA and k (smoothing) = 2/(N + 1).

MACD

The MACD indicator was a momentum indicator; it tries to predict stock market trends by a comparison between short and long-term trends. To ascertain MACD, find the difference between a 26-day EMA and a 12-day.

$$ MACD = \mathop \sum \limits_{i = 1}^{n} EMA_{k} - \mathop \sum \limits_{i = 1}^{n} EMA_{d} $$
(2)

where k = 12 (number of days) and d = 26 reflect the number of days in EMA.

OBV

The OBV indicator is also a momentum indicator that employs volume-flow to predict movements in stock price. An indication of a fall in stock price is a falling OBV line, whiles a future rising in stock price is indicated by growing OBV line.

RSI

This is an indicator that measures whether a stock bought is oversold or overboug. The following equation shows how is obtained.

$$ RSI = 100 - \left( {\frac{100}{1} + RS} \right) $$
(3)

where RS = average gain/average loss

Conventionally most stock market prediction methods usually employ technical analysis techniques to predict future trends in stock values. However, Li et al. (2015) argue that quantified data cannot wholly convey the wide variety of firms’ financial status. Hence, the quality of information concealed in traditional news and social network sites (unstructured data) can serve as complementary to quantitative data to enhance prediction models, specifically in this age of social media.

Fundamental analysis

The fundamental analysis uses the economic standing of the firm, employees, the board of directors, financial status, firm’s yearly report, balance-sheets, income-reports, terrestrial and climatic circumstances like unnatural or natural disasters and political data to predict future stock price (Tsai and Hsiao 2010; Anbalagan and Maheswari 2014; Ghaznavi et al. 2016; Agarwal et al. 2017). Due to the unstructured nature of fundamental factors, automation of fundamental analysis is difficult. On the other hand, the emergence of machine learning has enabled researchers to automate stock market prediction based on unstructured data, which in some cases has reported higher prediction accuracy. Nonetheless, fundamental analysis is useful for long-term stock-price movement, but not suitable for short-term stock-price change (Khan et al. 2011).

The fundamental analyst uses the openly accessible facts about the stock to perform analysis of stock price movement in three dimensions, concerning the economy, its industry, and the firm, as shown in Fig. 4. Again, the fundamental analyst also considers different financial ratios of the firm. Few of these important ratios discussed in Renu and Christie (2018) include:

Fig. 4
figure 4

Fundamental analysis

Return on equity (ROE)

This ratio offers an overview of how well the shareholder’s funds were used and the gain made out of its investment. When ROE is low, it implies that the shareholder’s funds were not used properly. The formula computes ROE:

$$ ROE = \frac{PTP}{SE} $$
(4)

where PTP = post-tax profit and SE = shareholder equity

Debt/equity ratio (D/E)

Reveals the power of the available capital as opposed to the capital engaged. A low value of D/E means the credit accessible was not used. D/E is computed as follows:

$$ \frac{ D}{E} = \frac{{{\text{Overall}}\,{\text{liabilities }}}}{{{\text{Shareholders}}'\,{\text{equity}}}} $$
(5)

Market capitalization (MC)

MC measures the total stocks transacted in the market. Concerning MC, stocks can be categorised into three groups, namely: small-cap, medium-cap, and large-cap. The formula can compute MC:

$$ MC = {\text{Total}}\,{\text{shares}} \times {\text{Price}}\,{\text{per}}\,{\text{share}} $$
(6)

Price/sales ratio (P/S)

This ratio ascertains if a share price of a stock depicts stock’s value. The formula calculates P/S:

$$ \frac{ P}{S} = \frac{Share\,Price}{{\left( {{\text{Returns}}\,{\text{over}}\,{\text{a}}\,12\,{\text{month}}\,{\text{time}}\,{\text{frame}}} \right)}} $$
(7)

Price/book ratio (P/B)

Is a comparison of the stock’s fundamental value with the share price. P/B is an indication of underestimate or overestimate of the stock. The formula computes P/B:

$$ \frac{ P}{ B} = \frac{{{\text{Stock}}\,{\text{Price}}}}{{\left( {{\text{Total}}\,{\text{assets}} - {\text{Intangible}}\,{\text{assets}}\,{\text{and}}\,{\text{liabilities}}} \right)}} $$
(8)

Earnings per share (EPS)

EPS provides the profitability indication of a firm, and can be determined by dividing the firm’s net income with its whole number of remaining stocks. EPS can be computed in two ways, as illustrated in Eqs. (9) and (10).

$$ EPS = \frac{{{\text{Net}}\,{\text{Income}}\,{\text{after}}\,{\text{Tax }}}}{{{\text{Total}}\,{\text{Number}}\,{\text{of}}\,{\text{Outstanding}}\,{\text{Shares }}}} $$
(9)
$$ {\text{Weighted}}\,{\text{earnings}}\,{\text{per}}\,{\text{share}} = \frac{{\left( {{\text{Net}}\,{\text{Income}}\,{\text{after}}\,{\text{Tax }} - {\text{Total}}\,{\text{Dividends}}} \right)}}{{{\text{Total}}\,{\text{Number}}\,{\text{of}}\,{\text{Remaining}}\,{\text{Stocks}}}} $$
(10)

Price/earnings ratio (P/E)

This ratio a very valuable evaluation metric for estimating the relative attractiveness of a firm’s current stock price compared to the firm’s per-share earnings. It is calculated as:

$$ P/E = \frac{{{\text{Market}}\,{\text{value}}\,{\text{per}}\,{\text{share }}}}{EPS} $$
(11)

Return on assets (ROA)

This ratio signifies the proportion of earnings a firm earns about the firm’s overall assets or resources. Thus, an indication of how profitable a firm is relative to the firm’s total resources or assets.

$$ {\text{ROA}} = \frac{{{\text{Net}}\,{\text{Income }}}}{{{\text{Total}}\,{\text{Assets}}}} $$
(12)

When the cost of debt is ignored, then ROA is given by the formula:

$$ {\text{ROA}} = \frac{{\left( {{\text{Net}}\,{\text{Income}} + {\text{Interest}}\,{\text{Expense}}} \right) }}{{{\text{Average}}\,{\text{Total}}\,{\text{Assets}}}} $$
(13)

This method of stock market analysis has become common in recent years, with the introduction of text mining techniques. Many studies have used the fundamental analysis for stock prediction, but (Talib et al. 2016) in their work titled “Text Mining-Techniques Applications and Issues”, they argue that quite several problems are associated with the text mining process which turns to affect the effectiveness and efficacy of decision making. Another critical issue is a multi-lingual text minor change dependency that creates problems (Henriksson et al. 2016) and only a hand-full of tools are available that has the support for multiple languages (Solanki 2013). Despite the increase in stock-market prediction from both technical and fundamental analysis point of view, some scholars (Fama 1965, 1970; Malkiel 1999) holds the belief that the stock market is unpredictable.

2.1 The unpredictability of the stock market

In Fama (1965, 1970) and Malkiel (1999), the authors holds a view that the stock market is stochastic, and hence, it is not predictable. This lead to the two famous hypotheses, namely, The random-walk hypothesis (RWH) and the efficient market hypothesis (EMH).

2.1.1 The random-walk hypothesis (RWH)

The Random-walk hypothesis reveals the unpleasant view of the predictability of the stock market. The assumption holds the belief that the stock price is fundamentally stochastic; hence, any initiative or effort to forecast or predict the future stock price will unavoidably fail (Dunne 2015). If indeed the market is stochastic, then there is a little chance of continuing.

2.1.2 The efficient market hypothesis (EMH)

The second hypothesis that the market is random, hence not predictable is the famous EMH by Fama (1965), which says the stock market is “informationally-efficient.” It hypothesised that the market is efficient at discovering the correct price for the stock market. On the other hand, the credibility of this hypothesis is challengeable since the hypothesizer Fama revised it and categorised it into three levels of efficacy as Weak-form, semi-strong, and robust (Fama 1970). However, the EMH is susceptible to debate on which one, if any, is correct.

Carefully studying these two hypotheses, there is a chance to predict the stock market when one has fundamental and technical knowledge about the stock market. That is, knowing and understanding of the historical stock data and fundamental or financial data of a firm can lead to a successful prediction of the firm’s future stock price.

2.2 Markets’ predictability

Despite the stands of the EMH (Fama 1965) against stock-market forecast established on historical publicly accessible data and information, a considerable amount of research advocates that more or fewer markets, particularly markets emerging, are not entirely and thoroughly well-organized, and prediction of future stock-prices and stock-returns possibly will yield better outcomes than random selection (Zhang et al. 2014). Again, Chen et al. (2014) argues that the stock market is predictable to an extent when looking from behavioural economics and socioeconomic theory of finance viewpoint.

2.3 Machine learning

Machine learning is a branch of Artificial Intelligent (AI), and it is a learning process, that starts with the identification of the learning-domain and concludes with testing and employing the obtained results of learning in solving a problem (Perwej and Perwej 2012). Many machine learning algorithms have been developed and applied to stock market prediction (Dunne 2015; Paik and Kumari 2017).

2.4 The general overview of predictive models

Figure 5 shows the general overview of stock market predictive models, with fundamental data (unstructured data) or technical data (historical market data) serving, as input datasets and the output are some predictive market values.

Fig. 5
figure 5

General overview of predictive models

2.5 Input data

In literature, it is revealed that, for one to make an effective economic prediction, it is essential to detect which variables help or contributes to predicting other economic variables (de Oliveira et al. 2013). Generally, financial data can be characterised by quantitative data (technical analysis) and qualitative reports of companies and investors sentiments (fundamental analysis) (Li et al. 2015).

2.5.1 Quantitative (structured) data

Historical stock price (HSP)

Past stock price, which includes previous closing price, opening price, the current closing price, price change, closing bid price, volume, and closing offer price.

2.5.2 Qualitative (unstructured) data

These are textual information concerning the stock owners and can be categorised as, financial Web news data (FWN), social media sentiment data (SMS), and macroeconomic variables (MVs). Lahmiri argues that these indicators are better predictors of stock price movement than technical indicators (Lahmiri 2011).

2.5.3 Data pre-processing

The obtained dataset is preprocessed to remove noise, data inconsistency, and finally, feature selection data transformation and normalisation for better performance and accuracy (Uysal and Gunal 2014).

2.6 Model evaluation

Every prediction model needs evaluation to ascertain the accuracy of the model. Some of the most commonly used accuracy metrics in literature include: the mean absolute percentage error (MAPE), mean square error (MSE), mean absolute error (MAE) and root mean squared error (RMSE), which are defined in Javed et al. (2014), Rajashree et al. (2014) and Nayak et al. (2015) as follows.

  1. 1.

    The correlation coefficient (R): performance index unveils the degree of associations between predicted values and actual values, it ranges from 0 to 1, and the bigger the Correlation coefficient, the better model performance.

    $$ R = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {t_{i} - \bar{t}} \right)\left( {y_{i} - \bar{y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {t_{i} - \bar{t}} \right)^{2} } .\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} }} $$
    (14)

    where \( \bar{t} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {t_{i} } \,and\,\bar{y} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {y_{i} } \) are the average values of ti and yi, respectively

  2. 2.

    Root mean squared error (RMSE): This performance index will show an estimation of the residual between the actual (ti) value and predicted value (yi) as given in Eq. (15).

    $$ RMSE = \sqrt {\frac{1}{n}} \mathop \sum \limits_{i = 1}^{n} \left( {t_{i} - y_{i} } \right) $$
    (15)

    where yi is the predicted value produced by the model, ti is the actual value and n = total number of testing data.

  3. 3.

    The next performance metric is the mean absolute percentage error (MAPE): this metric is an indicator of an average absolute percentage error; lower MAPE is better than higher MAPE.

    $$ MAPE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{t_{i} - y_{i} }}{{t_{i} }}} \right| $$
    (16)
  4. 4.

    Mean Absolute Error (MAE): This index measures the average magnitude of the errors in a set of predictions, without considering their direction.

    $$ MAE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {t_{i} - y_{i} } \right). $$
    (17)

Volatility

A comparison of volatility prediction for (1 day, 1 week, 1 month) ahead horizon in terms of root mean squared prediction error (RMSPE), mean squared prediction error (MSPE), and mean absolute prediction error (MAPE) defined in Minxia and Zhang (2014) and Nayak et al. (2015).

$$ RMSPE = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {\vartheta_{i} - \hat{\vartheta }_{i} } \right)^{2} } $$
(18)
$$ MSPE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {\vartheta_{i} - \hat{\vartheta }_{i} } \right)^{2} $$
(19)
$$ MAPE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\vartheta_{i} - \hat{\vartheta }_{i} } \right| $$
(20)

where \( \vartheta_{i} \) is the realised volatility and \( \hat{\vartheta }_{i} \) is the predicted volatility.

Momentum

An Assessment for energy on (1-day, 1-week, 1-month) ahead horizon in terms of MSPE, RMSPE and MAPE defined in Nayak et al. (2015)

$$ MSPE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {m_{i} - \hat{m}_{i} } \right)^{2} $$
(21)
$$ RMSPE = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {m_{i} - \hat{m}_{i} } \right)^{2} } $$
(22)
$$ MAPE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {m_{i} - \hat{m}_{i} } \right| $$
(23)

where \( m_{i} \) being the actual momentum and momentum predicted is \( \hat{m}_{i} \)

The accuracy, precision, recall, and F-score of the models are evaluated as proposed in Kumar et al. (2016).

Accuracy

The accuracy of a prediction model is calculated as:

$$ Accuracy \left( \% \right) = \frac{TR + TF}{TR + FR + FF + TF} \times 100 $$
(24)

Precision

The precision signifies the portion of predicted rise (or fall) in stock, which are genuinely rising (or fall)

$$ Precision\,for\,stock\,Rise\,(P^{R} ) = \frac{TR}{TR + FR} \times 100 $$
(25)
$$ Precision\,for\,stock\,Fall \left( {P^{F} } \right) = \frac{TF}{TF + FF} \times 100 $$
(26)

Recall

This represents the fraction of the rise (or fall) in stock those predicted by the proposed model.

$$ Recall\,for\,stock\,Rise\left( {R^{R} } \right) = \frac{TR}{TR + FF} \times 100 $$
(27)
$$ Recall\,for\,stock\,Fall\left( {R^{F} } \right) = \frac{TF}{TF + FR} \times 100 $$
(28)

where TR = number of the correctly predicted rise in stock price. TF = number of correctly predicted fall in stock price.

FR = number of the incorrectly predicted rise in stock price. FF = number of incorrectly predicted fall in stock price.

F-score

The association that exists between right stock (rise/fall) and that given by a predictor, if there is equality between precision and recall. F-score is represented as Fsc

$$ Score\,for\,stock\,Rise\left( {F_{sc}^{R} } \right) = \frac{{2 \times P^{R} \times R^{R} }}{{P^{R} + R^{R} }} $$
(29)
$$ Score\,for\,stock\,Fall\left( {F_{sc}^{F} } \right) = \frac{{2 \times P^{F} \times R^{F} }}{{P^{F} + R^{F} }} $$
(30)

The normalized mean squared error (NMSE) is a way to assess a model regarding the random walk (RW) impasse for financial time series model defined in de Araújo (2010) as:

$$ NMSE = \frac{{\mathop \sum \nolimits_{j = 1}^{N - 1} \left( {target_{j} - output_{j} } \right)^{2} }}{{\mathop \sum \nolimits_{j = 1}^{N - 1} \left( {output_{j} - output_{j + 1} } \right)^{2} }} $$
(31)

Prediction of change in direction (POCID) this indicator measures the ability of a predictive model to predict the future series value (target of prediction model) will see a decrease or an increase concerning the past value (de Araújo and Ferreira 2013).

$$ POCID = \frac{100}{N}\mathop \sum \limits_{j = 1}^{N} D_{j} $$
(32)

where

$$ D_{j} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\,(target_{j} - target_{j - 1} )(output_{j} - output_{j - 1} ) > 0} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$

2.7 Relevant systematic reviews on stock market prediction

Though there exist a number studies on stock-market prediction we have not found any dedicated and complete comparative review and analysis of the available systems based on the type of input data, the number of data-source and the technique used, percentage of training and testing dataset.

In Baker and Wurgler (2007) a review of the sentiments of investors and its effects on the stock-market was conducted. An extensive discussion of literature and classification scheme for categorising previous studies on market prediction into theoretical work, description, law, and politics and applications was carried out by Tziralis and Tatsiopoulos (2007). In Demyanyk and Hasan (2010) a summary of the methodologies used and the experimental results achieved in various operations research and economics papers to explain, forecast, or propose solutions for fiscal crises or banking-defaults; was also outlined. In Krollner et al. (2010b), the authors gave a summary of the machine learning algorithm, input variables, and performance metrics. A brief review of Text-mining methodologies for stock-market prediction was performed by Nikfarjam et al. (2010).

Review of artificial neural network (ANN) in stock market prediction has been carried out by Dase and Pawar (2010), Soni (2011), Neelima et al. (2012), Chang et al. (2013), Goel et al. (2016) and Murekachiro (2016). These works concluded that ANN dominates in stock-market predictions globally. A comparative summary of predictive models for financial stock-market projections was carried out by Suthar et al. (2012). An overview of the techniques employed in predicting the stock market and enhancement made on these techniques in India was presented by Agrawal et al. (2013). In another study, a comprehensive, systematic reveal of fundamental analysis techniques for stock market prediction was undertaken in Nassirtoussi et al. (2014). Again, Kearney and Liu (2014) perform a survey of literature on the textual sentiment, contrasting and relating the various data sources, content-analysis methods, and experimental prototypes that have been used to date.

An analysis of the present and new (fundamental analysis, technical analysis, and machine learning) techniques in stock-market prediction were carried to verify if there is sufficient evidence to support weak-form EMH (Dunne 2015). The history and components involved in fundamental and technical analysis for decision making in the stock market were examined by Renu and Christie (2018). A review of technical analysis on stock-markets to categorise and code published articles, to offer a summary of research works that have added up to the development in stock-market predictions was performed by Nazário et al. (2017). The authors concluded that ANNs are best effected with backpropagation (BP). In the report of Shobana and Umamakeswari (2016), the authors gave a brief review of some data mining techniques employed in 16 articles for stock-market prediction.

From the above reviews, it was evident that none of the previous studies considered

  1. 1.

    The number of data-sources employed in stock prediction and how it influences the predictive models and methodology used.

  2. 2.

    A comparison of self-stated accuracy among research works of same soft-computing approach.

  3. 3.

    The software package for building predictive models and the approach (technical analysis or fundamental analysis) used.

This paper fills in the gap by reviewing past, and current state-of-the-art stock market prediction works based on the type of input data; the number of data-source and the soft-computing technique used; and a comparison of accuracy, time frame, software packages used for modelling.

3 Research design

This section discusses the methods adopted in selecting literature and the systematic review criteria.

3.1 Research framework

One hundred and twenty-two (122) related essays were collected using random sampling technique. These include published journal articles, conference proceedings papers, doctoral dissertations or supplementary unpublished academic working papers and reports between 2007 and 2018.

First, the selected papers were grouped into three broad categories based on the input data used, namely textual data, historical market data, and a combination of textual and historical market data. Secondly, each group was examined based on their exclusive features, as illustrated in the conceptual framework in Fig. 6.

Fig. 6
figure 6

The conceptual framework for categorizing stocks predictive techniques

4 Results and discussions

This section presents the results and the discussion of the study based on the conceptual framework in Fig. 6.

4.1 Distribution of literature

Table 1 shows the distribution of surveyed works based on categorisation into the fundamental analysis, technical analysis, and combined. Eighty-one (81) was based on technical analysis (historical stock prices), twenty-eight (28) based on fundamental (web news, social media sentiments and macroeconomic variable) and thirteen (13) based on the combined analysis.

Table 1 Distribution of the literature based on categorisation

4.2 Technical analysis (quantitative stocks-market data)

The study reveals that 66% of surveyed works were based on technical analysis, as shown in Table 1, due to its structured nature of historical stock price. Again, 99.9% of the reviewed works predicted the stock market as compared to the foreign exchange market (FOREX), as shown in Table 2 (Appendix). The time frame and the period of data collection were also examined, and the minimum period for data collection was a month by Khan et al. (2011) and 38 years being the maximum by Anthony et al. (2011). The results also revealed that the most utilised technical indicator was the simple moving average (SMA). Besides, it was observed that a very high percentage of studies that employed SMA are likely to use EMA, MACD, and RSI, as shown in Table 2 (Appendix). The outcome contradicts (Krollner et al. 2010a) assertion that 75% of their reviewed papers depended on some form on lagged-index data. An interesting observation from the study is that as little as 3.28% of the 122 articles focused on the African market. Also, the predictive timeframe of most studies was a day ahead prediction followed by Intraday in the ranges of to 1, 2, 3, 5 or 6 h and 1-week to 1-month as shown in Table 2 (Appendix).

Table 2 References based on quantitative data

4.3 Fundamental analysis (qualitative data)

The results revealed that twenty-eight (28) out of 122 reviewed papers used fundamental analysis for stock market prediction, and 98% of these works used sentiment analysis of social network sites (SSNs), as predictors of market movement as shown in Table 3 (Appendix). This result confirms (Bollen et al. 2011) report that the analysis of daily content of twitter feeds had the ability and capacity to cause an increase of DJIA prediction accuracy up to 87.6%. Out of this twenty-eight (28), social media accounted for 54%, financial web-news accounted for 29%, while search engine queries and macroeconomic variables accounted for 7% each. Again, it is revealed that macroeconomic variables as the only data source to stock-market prediction have not seen much attention, as shown in Table 3 (Appendix). Despite the following authors (Adam and Tweneboah 2008; Kuwornu and Victor 2011; Adusei 2014; Boachie et al. 2016; Suhaibu et al. 2017; Ayub 2018; Kwofie and Ansah 2018; Pervaiz et al. 2018; Tsaurai 2018; Umoru and Nwokoye 2018) sees a positive correlation between macroeconomic variables and stock-market returns. This gap in literature creates the need for future research into the stock market prediction based on macroeconomic variables. Furthermore, the study revealed that 89% out of the twenty-eight (28) works based on social media sentiments, were all on stock markets outside African. Thus, there is a need for studies measuring social media sentiments influence on the Africa stock markets.

Table 3 Reference on fundamental analysis (textual data) for stock prediction

4.4 Combined (qualitative and quantitative data) analysis

As discussed in Sect. 2.6, data for stock-market prediction is either quantitative data or qualitative data or both. Some researchers sought to harness the power in both data sources by formulating a joint input data of fundamental and technical indicators to improve the accuracy of stock-price predictive models. The study revealed that thirteen (13) out one hundred and twenty-two (122) of reviewed works was based on this approach, as shown in Table 4 (Appendix). The study revealed that 77% of these works used two (2) data sources, and 23% used three (3) data sources. To the best of our knowledge, none of the previous study as at the time of this paper have used four (past stock-data, social media, financial news, and macroeconomic variable) or more data source for stock-market prediction. Another opportunity for future studies based on four (4) or more data sources for stock market prediction.

Table 4 Combined (two or more data source) analysis

4.5 Methods used for modelling and analysis

A summary of all the machine-learning algorithms used in the reviewed works is presented in this section. Hence, the main objective here was to give a report of what has been used and obtain a clearer understanding of what is lacking, which could be a pointer for future research. The results reveal that 92% of the algorithms used were classification machine-learning algorithms as tabulated in Table 5 (Appendix). This revelation implies that most of the reviewed work predicted stock price movement. Few of these works predicted the actual price of future stock. Hence, further studies can look at the difficulty in predicting the exact cost as compared to the movement.

Table 5 Methods, reported accuracy, evaluation metrics and output type

The study again reveals that DTs, SVM, and ANN are the most used machine learning algorithms in stock market predictions, with ANN and SVM topping the list, as shown in Table 5 (Appendix). This outcome confirms (Almeida et al. 2010; Adebiyi et al. 2014a) report that ANN and SVM achieve higher generalisation potential than their counterparts. Again, more than 50% of the works reviewed, used hybrid algorithms as a way of compensating the flaws in individual algorithms, and this is evident in the accuracy reported in some hybrid models compared to different models of the same kind, as shown in Table 5 (Appendix). Hence, investigation of such hybrid algorithms in the environment of stock-market prediction may lead to novel insights that can lead to curiosity for future researchers.

About 5% of the works reviewed, showed that ensemble learning techniques were used for stock-market prediction in Europe, the Bovespa Index, and Shanghai, as shown in Table 5 (Appendix). However, Ballings et al. (2015) in their works concludes that ensemble techniques should be benchmarked against other technology, with market-data from different continents. Their reason is that the accuracy of ensemble methods might differ over different dataset from different continents. Again, an opportunity for future studies. Term Frequency-Inverse Document Frequency (TF-IDF) was among the most common feature-representation technique for the textual data. However, 99% of the works reviewed implements feature selection algorithms that depend on correlation analysis. The most used metrics identified in the literature were MSE, RMSE, and MAPE. The high use of MSE and RMSE can be attributed to their effectiveness in measuring predictive model performance for short-term prediction. Furthermore, it was observed that MATLAB is the most used modelling tool for stock market prediction, as shown in Table 5 (Appendix). For prediction accuracy of stock-price movement, previous studies reported accuracy within 36.55–97.8%, as shown in Table 5 (Appendix). The outcome confirms that the stock market is highly predictable.

4.6 Training verse testing data volume

Every predictive model receives training and testing datasets, and Table 5 (Appendix) gives how most research works on stock market predictions partitioning their dataset. A higher percentage of the paper reviewed divides that dataset between (70–80%) for training and (30–20%) for testing. Except for a few cases (de Araújo 2010; Nhu et al. 2013; Sun et al. 2016) where the data were divided into three, that is training, testing, and validation.

4.7 Empirical setup

To verify the key findings in the literature, three (3) of the most used machine learning algorithms (DTs, SVM, and ANN) identified in Sect. 4.5 of this study for stock market prediction were selected and modelled against same data. Publicly available stock market data on the Ghana stock exchange (GSE) official website (https://gse.com.gh) was downloaded from January 2009 to February 2019. The downloaded dataset includes year high, year low, previous closing price, opening price, closing price, price change, closing bid price and closing offer. The multi-layer perceptron (MLP) was adopted. The performance of the selected algorithms was then compared based on the three most-used metrics identified in the literature (MSE, RMSE, and MAPE) in Sect. 4.5 of this paper. The dataset was cleaned from missing values (i.e. every missing value was replaced with the average value). We then normalised the dataset for efficiency using Eq. (33). Where: a is the normalisation value; a = the value to be normalised, \( a_{min} \,and\,a_{max} \) are the minimum and maximum value of the dataset. Six (6) most-used technical indicators (SMA, EMA, MACD, RSI, OBV and volume ratio (VR)) identified under Sect. 4.2 of this paper were calculated from the dataset (using Microsoft Excel 2013) and use as input parameters to predict a 30-day ahead rise or fall of a stock price.

Based on the data portioning identified under Sect. 4.6 of this study, our dataset was portioned into two 80% for training and 20% for testing. The MLP implemented has three hidden layer (HL), HL1 and HL2 (with five (5) nodes), and HL3 (with ten (10) nodes), the maximum iteration was set to 5000, optimizer = Limited-memory BFGS (lbfgs), activation = relu. For SVM, the Radial Basis Function (RBF) kernel was used, and the regularisation (C) = 100. The DT setting were, criterion = entropy, max_depth = 4. The experiments were conducted using scikit-learn library in Python (where the MLP, DT and SVM are already implemented) on the Anaconda framework.

$$ a^{\prime } = \frac{{a - a_{{\left( {min} \right)}} }}{{a_{{\left( { \hbox{max} } \right)}} - a_{{\left( { \hbox{min} } \right)}} }} $$
(33)

Figure 7 shows the outcome of the experiment. The error metrics (RMSE, MAE and MSE) values of ANN model (0.093, 0.009 and 0.00086) compared with DT (0.0947, 0.024 and 0.00897) and SVM (0.0973, 0.063 and 0.00947) reveals a better model fit of the ANN on stock market prediction than the SVM and DT models. However, the DT model offered less error margin between the predicted values and actual values compared to that of the SVM model. Consequently, the experimental outcome in this study confirms the high percentage of stock market predictions using ANN, as shown in Table 5 (Appendix). Despite, the DT performance is better than the SVM, which explains why it was among the top most used machine learning algorithm for building predictive models in stock market prediction.

Fig. 7
figure 7

Experimental results

5 Summary of findings

The extensive literature survey done in this paper was embarked on, to identify and assess all stock-market price and movement prediction related to academic articles from all possible sources of stock-market prediction research. Hence, resulted in the identification and assessment of one hundred and twenty-two (122) relevant literature on stock-market prediction between 2007 and 2018. However, we do not claim that this review is exhaustive, in that this paper does not give detailed practical understandings into the state of the predictive model research. There is a high bias in the use of technical indicators as input variables in the above-reviewed experiments; this leaves a gap for future research that combines behavioural and fundamental input variables.

Again the commonest and used technical indicators for stock market prediction were found to be SMA, EMA, MACD, RSI, and rate of change (ROC) which confirms (Krollner et al. 2010a; Renu and Christie 2018). Also, none of the one hundred and twenty-two (122) reviewed works has incorporate variable from past stock data, financial news, macroeconomic data, and social media sentiment as the input dataset. If all these data sources serve as input to a predictive model, a better and higher prediction accuracy results might be obtained as argued by Geva and Zahavi (2014).

More than 87% of the papers reviewed reported that their model beat their benchmark model. On the other hand, a percentage of previous studies did not cover real-world constraints like slippage and trading costs. A high percentage of stock-market prediction studies were carried out on the Asian and European stock markets, but Kumar and Thenmozhi (2006) and Ballings et al. (2015), argues that benchmarking ensemble machine learning algorithms for different continents against other techniques is of a higher necessity. In that, some learning techniques tend to perform better and of high accuracy in some parts of the globe than others.

Finally, an experimental setup with stock data from the Ghana Stock Exchange shows and affirms that artificial neural networks fit very well for stock market prediction as compared with support vector machines and decision trees, based on RMSE, MAE and MSE error metrics.

6 Conclusion

Previous works have also undertaken a review of the literature on fundamental and technical analysis (Nazário et al. 2017; Renu and Christie 2018), and machine learning algorithms applied in stock prediction by Dase and Pawar (2010), Soni (2011), Neelima et al. (2012), Chang et al. (2013) and Murekachiro (2016).

However, this study reviewed the pertinent literature on fundamental and technical analyses used in stock market predictions. Succinctly, the current study focused mainly on:

  1. 1.

    The nature of a dataset and the number of data sources used.

  2. 2.

    The data timeframe, the machine learning algorithms and task used.

  3. 3.

    A comparison of self-stated accuracy, error metrics, and software packages used for modelling in previous studies.

  4. 4.

    An experimental setup to verify finding in the literature.

The results revealed that ANN and SVM are usually used machine-learning algorithms for stock prediction. However, a lot of research work to improve stock prediction accuracy are ongoing using hybrid ensemble machine–learning method. It was noticed that, considering internal and more external factors could provide a more precise and accurate prediction. Besides, a deficient percentage of market prediction has been made on the African market, despite the volume of articles on stock-prediction.

6.1 Direction for future research

Many gaps were identified as opportunities for future studies in Sect. 4; however, our future work will focus on the performance of ensemble techniques over diverse stock-data from different continents.