1 Introduction

Technical indicators and other methods such as fundamental analysis and statistical methods are used for stock price prediction. The main factor that should be satisfied to gain more profit by applying the stock market is the “efficient market hypothesis (EMH)”. In other words, if the market is efficient, the prediction will be effective. In EMH, information has a significant impact on stock prices and prices may modify themselves according to the information [1]. The efficient market ensures investors have access to similar information. The efficient market is based on the assumption that no system can beat the market because if this system becomes public, everybody will use it. Thus, the market loses its potential profitability [2].

Neural networks are used for the prediction of stock prices because they are able to recognize the linear relationships between inputs and outputs [3]. Many researchers such as economists and financial experts have acknowledged the chaos in the stock market and other complex systems [4]. With the capability of neural networks to learn nonlinear relationships, we can overcome traditional analysis and the other computational methods’ drawbacks [5]. In addition to stock market prediction, neural networks are used for other financial tasks. There are a lot of neural networks implemented systems to track demand of products in the market. Additionally, they are able to forecast futures markets, Forex trading, financial planning, and corporate stability and bankruptcy [6]. While banks use neural networks to investigate loan applicants and estimate the probability of bankruptcy, financial managers use neural networks for planning and making profitable portfolios at the right time. As investment and transaction levels are growing, people are looking for tools and methods like neural network systems to maximize their profitability and minimize their risk.

Artificial neural networks are one of the main tools in machine learning. Machine learning and deep learning have become almost trending and effective methods commonly used by finance organizations to maximize their profits [7]. However, financial time series are highly nonlinear and their data seems to be completely random [8]. Traditional time series methods such as ARIMA and GARCH models are effective only when the time series are stationary [9]. This assumption is restricting and requires the series to be preprocessed. Moreover, the main problem arises during the implementation of these models in a live trading system, when there is no guarantee of stationarity as new data is added. Using neural networks can help solve this problem.

It is crystal clear that ANN also has some limitations and weaknesses. For example, in ANN, the training phase is very crucial. You may face overtraining, convergence/divergence, the risk of trapping in local minima or maxima, and so forth. One of the main solutions to overcome these drawbacks is using hybrid models. You can use meta-heuristic algorithms along with artificial neural networks as a robust method for prediction. This method has some advantages such as powerful exploration and exploitation, acceptable computational time, and being user-friendly.

In this paper, a genetic algorithm (GA) and a hybrid grey wolf optimization and particle swarm optimization binary algorithm (GWO-PSO) are used to choose the most appropriate input variables (i.e., feature selection). Applying GA as a feature selection (FS) method is so common in the literature, but we apply GWO-PSO as well to elaborate exploitation and exploration features. Some meta-heuristic algorithms such as harmony search (HS), particle swarm optimization (PSO), moth flame optimization (MFO), modified particle swarm optimization (MPSO), modified particle swarm optimization-time-varying coefficient (MPSO-TVAC), whale optimization (WOA), and chimp optimization algorithms (ChOA) are also used to improve the prediction power of the artificial neural network (ANN) and to minimize the network error by obtaining optimized weights and the best number of hidden layers in ANN. These metaheuristic algorithms are different in their mechanisms such as generation of the initial population, discovering search space, finding an optimal solution, the risk of trapping in local minima or maxima, etc. To compare the proposed algorithms’ performance and to choose the best one, we introduce eight estimation criteria for error assessment. In this regard, we collect the data of a Khodro company stock price in 5 years starting from 2013 through 2018. Khodro is a big company in the automobile industry in Iran. We access these data from TseClient software. In addition, we apply four types of computer software to process data: (1) Microsoft Excel for getting data; (2) Alyudada NeuroIntelligence for data normalization and processing; (3) MATLAB for training the network; and (4) Neural Designer for getting more details and complementary finding or analysis.

The experimental results show that a hybrid WOA has the best performance. Additionally, applying hybrid models can robust the prediction, and have different advantages such as speeding up calculations, compatibility with complex data structures, and being more user-friendly compared to time series models.

To sum up, the main contributions of this paper are summarized as follows:

  • First, we apply almost different metaheuristic algorithms from different categories such as evolutionary algorithms and swarm-based algorithms, and we compare their performance with a time series model called ARIMA. Applying different algorithms leads to different and interesting results. As such, we can compare and explain their pros and cons practically.

  • Second, we attempt to use different hybrid metaheuristic algorithms (i.e., GWO-PSO) as feature selection methods.

  • Third, we analyze EMH by running some experiments.

In this paper, we want to answer the following questions:

  • Do hybrid neural networks have better results (i.e., high predictability with less error)?

  • Does the use of genetic algorithms to determine technical indicators effect network error rate and computational speed?

  • Which hybrid algorithms have better predictive performance?

The structure of the paper is as follows: The second section belongs to the literature review and reviews different papers about the prediction of the stock price using different techniques especially, machine learning. Section 3 is about methodology and related formulas or equations along with introducing usable techniques. Section 4 is dedicated to finding and results. In this part, we shall try to compare the results and present the best methods based on the error and predictability. Finally, the last section is the conclusion and recommendations for future research.

2 Literature review

A stock market is a public market where the stocks of companies are traded [10]. This market provides opportunities for brokers and companies to invest and is one of the main indicators of the economic situation in each country. The stock market is characterized by some features such as non-linearity, discontinuity, and volatile multifaceted elements because it is related to many factors such as political events, general economic conditions, and broker's expectations [11]. Nowadays, data are processed quickly by applying high-tech tools as well as the advent of communication systems leads to the stock prices fluctuating very fast. As such, many banks, financial institutions, big investors, and brokers have to trade the stock within the shortest possible time [12]. Gaining more profit is the main goal of the investors. So, many researchers are looking for ways to make them able to forecast the market behavior [13]. Based on the literature, there are two main viewpoints about market efficiency. The first one is that markets are efficient and as a result, returns cannot be predicted completely [14]. The second one is that markets are inefficient and abnormal return is possible.

The ANN is considered the best and most verified method in the prediction of stock price [15]. There are many methods for training the ANN and some of them are better than the others in finding the linear and non-linear relationships. The researchers have tied to introduce some methods which have more accuracy and less error in acceptable computation run time. That is why the metaheuristic algorithms are utilized in this context frequently. These algorithms are used to optimize the network and to find the best number of input and hidden layers. It is shown that ANN models outperform traditional statistical models in forecasting the stock price, stock return, exchange rate, and inflation [16, 17].

Göçken et al. [18] used technical indicators and hybrid ANN with GA and HS to predict the price index in the Turkish stock market. The results showed that hybrid meta-heuristic algorithms error is less than simple ANN. They compared the hybrid ANN-HS with the ANN-GA model and found that ANN-HS error is less than ANN-GA. Qiu et al. [19] implemented the fuzzy surfaces to select the optimal input variables. In their study, the optimal set of initial weights and biases are determined by means of GA or SA to increase the accuracy of ANN. Hassanin et al. [20] used GWO to provide the ANN with good initial solutions. The results showed that GWO-based ANN outperforms both GA-based ANN and PSO-based ANN. Faris et al. [21] presented that their approach shows very competitive results based on the set of weights and biases for multi-layer perceptron networks. In addition, GA, PSO, DE, FFLY and cuckoo search are used to compare the performance of the proposed method. Rather et al. [22] observed the field of hybrid forecasting techniques has received lots of attention from researchers to form a robust model. Chong et al. [23] predicted the future market trend of South Korea by examining the effect of three unsupervised feature extraction methods (PCA, autoencoder, and restricted Boltzmann machine (RBM)) on the deep learning network with three loss functions such as NMSE, RMSE, and MSE. Sezer et al. [24] proposed a stock trading system based on a deep neural network for buy–sell–hold predictions. The GA is used to optimize the technical analysis parameters and create the buy–sell point in the system.

Di Persio and Honchar [25] applied three different recurrent neural network (RNN) approaches including a basic RNN, the LSTM, and the gated recurrent unit (GRU) on Google stock price to evaluate which variant of RNN performs better. It is obvious from the results that the LSTM outperformed other variants with a 72% accuracy rate on a 5-day horizon. The authors also explained the hidden dynamics of RNN. Ahmed et al. [26] used ant colony optimization (ACO) in forecasting the stock price of the Nigerian stock exchange. They compared ACO with three other algorithms such as a price momentum oscillator, a stochastic method, and a moving average method. They concluded that ACO is more accurate with lower error than other methods. Ghanbari and Arian [27] used support vector regression (SVR) and butterfly optimization algorithm (BOA) to predict the stock market. They presented a novel BOA-SVR model based on BOA and compared it with eleven other meta-heuristic algorithms on a number of stocks from NASDAQ. The result indicated that the presented model is capable to optimize the SVR parameters very well. Indeed, it is one of the best models with regard to prediction performance accuracy and time consumption.

Kumar et al. [28] reviewed and organized the published papers about stock market prediction using computational intelligence. The related papers were organized according to related datasets, input variables, pre-processing methods, techniques used to feature selection, forecasting methods, and performance metrics to evaluate the methods.

Farahani and Hajiagha [29] used ANN to predict five economic indicators such as S&P500, DAX, FTSE100, Nasdaq, and DJI. They trained the network with some new metaheuristic algorithms such as social spider optimization (SSO) and bat algorithm (BA). They used some technical indicators as input variables. Then, they used genetic algorithms (GA) as a heuristic algorithm for feature selection and choosing the best indicators. They used some loss functions such as mean absolute error (MAE) as error evaluation criteria. On the other hand, they used some time series models forecasting like ARMA and ARIMA for the prediction of stock price. Finally, they compared the results with each other means ANN-Metaheuristic algorithms and time series models.

You can observe recent papers about forecasting stock prices using neural networks and their methods in Table 1.

Table 1 Recent researches about forecasting the stock price using ANN and other methods

As it is clear, researchers have tried to use hybrid models to obtain better results and they have been successful. According to the results, we can figure out that the main merits of using the hybridization technique are as follows:

  • Decreasing computation time

  • Decreasing model complexity

  • Avoiding local minima or maxima trap

  • Avoiding fast convergence, etc.

To clarify further, limitations of the previous methods and advantages and disadvantages of methods used in these articles are provided in Tables 2 and 3.

Table 2 Limitations of the previous methods
Table 3 Strengths and weaknesses of considered methods

According to Table 2, we cannot say which method is better because each one has its own pros and cons. Some methods have more capabilities such as compatibility with non-linear data structure and speeding up calculations. On the other hand, they may have some limitations such as being hard to train and sensitive to noise and outliers. So, the usable method depends on the type of problem. We tried to present the strengths and weaknesses of applicable algorithms in Table 3. In this table, we want to show that all methods are not perfect and without any limitations.

3 Methodology

3.1 Input variables selection

This section describes the input variables selection methodology. Initially, for each case, 42 technical indicators are investigated as input variables. This number of input variables increases the complexity of the model and at some point, they do not provide extra information. For this reason, we use GA to select the most informative input variables. As such, using GA we can evaluate the usefulness of indicators or eliminate irrelevant ones to simplify the proposed model. Table 4 demonstrates all considered technical indicators as input variables [18, 38].

Table 4 Important and most common technical indicators as input variables

In Table 4, stochastic indicators (%\(K\) and %\(D\)) have two types: fast and slow. High and low are maximum and minimum price of the n period ago, respectively. About RSI indicator, average gain and average loss is defined as follows:

$$\begin{aligned} &{\text{Average}}\;{\text{gain}}\\& \quad = \left[ {\left( {\text{previous average gain}} \right) \times 13 + {\text{current gain}}} \right] / 14 \\& {\text{Average}}\;{\text{loss}} \\& \quad = \left[ {\left( {\text{previous average loss}} \right) \times 13 + {\text{ current loss}}} \right] / 14. \hfill \\ \end{aligned}$$

According to the Bollinger band indicator, \(\mathrm{MA}\) stands for moving average, \(\mathrm{TP}\) means typical price, \(n\) is equals to the number of periods which is usually 20 and finally, m refers to standard deviation and it is often 20. The last notation, that is \(\sigma [\mathrm{TP}.n]\), equals standard deviation during \(n\) period of \(\mathrm{TP}\).

3.2 Artificial neural network (ANN) model

At first, ANN is applied without adding any algorithm and then hybrid ANN is used for selecting input variables and determining the number of input and hidden layers. In this article, we consider multi-layer perceptron (MLP) including three layers (two layers for input and output variables and one layer for hidden layer). The input layer includes 42 input variables which means there are 42 neurons in the input variable. Because the output layer has one variable, it has one neuron. In this paper, the number of neurons in the hidden layer is obtained through trial and error. So, we examine 1–32 neurons in hidden layer and choose the fittest number of neurons that have the most accurate. For training ANN, we use error-back propagation. It should be mentioned that the minimization algorithm in learning the model is Levenberg–Marquardt (LM) algorithm which is used to find the minimum error point [39]. The number of training epochs is 1000 and for the first-time training rate is 0.01. We decrease this rate to 0.001 in order to obtain more accurate results. The output function of the hidden layers is the sigmoid function and the threshold function of the output layer is linear function. Figure 1 represents the architecture of the proposed neural network [18].

Fig. 1
figure 1

Architecture of the proposed neural network

In Fig. 1, \(P\) is the input pattern, \({b}_{1}\) is the vector of bias weights on the hidden neurons, and \({w}_{1}\) is the weight matrix between 0th (i.e., input) layer and 1th (i.e., hidden) layer. \({\mathrm{a}}_{1}\) is the vector containing the outputs from the hidden neurons, and n1 is the vector containing net-inputs going into the hidden neurons, \({\mathrm{a}}_{2}\) is the column-vector coming from the second output layer, and \({\mathrm{n}}_{2}\) is the column-vector containing the net inputs going into the output layer. \({\mathrm{w}}_{2}\) is the synaptic weight matrix between the 1st (i.e., hidden) layer and the 2nd (i.e., output) layer and \({\mathrm{b}}_{2}\) is the column-vector containing the bias inputs of the output neurons. Each row of \({\mathrm{w}}_{2}\) matrix contains the synaptic weights for the corresponding output neuron [18].

This study includes two main parts. The first one includes calculating technical indicators and selecting the most informative indicator by using GA. The second part is prediction of closing price by using different hybrid ANN models and comparing their prediction errors. Figure 2 represents the research methodology [40] and the role of metaheuristic algorithms in the article. In this regard, we divide stock price data from 2013 to 2018 into two parts: training and testing. Then, it is analyzed with artificial intelligence algorithms and we predict the next day closing stock price. We use 70% and 30% of data for training, validation and testing, respectively. Afterward, we compare models with 8 criteria for prediction error. In this research, we used 42 technical indicators as input variables. To make these variables usable as input variables, they should be scaled and normalized between − 1 and 1. So, the largest number will be "1" and the smallest number will be "− 1". We can do this with Alyuda Neuro Intelligence software. In Eq. 1, numerator \(i\) is the amount of data.

$$ \widetilde{{S_{i} }} = \frac{{\left( {S_{i} - S_{\min } } \right)}}{{S_{\max } - S_{\min } }}. \quad i = 1. 2 \ldots N. $$
(1)
Fig. 2
figure 2

Research methodology

3.2.1 Hybrid GA-ANN model

In this method, GA is used as a feature selection approach. The applied encoding approach is binary solution representation. Each chromosome contains 47 bits whereas the first 42 bits represent the existence or nonexistence of input (technical indicator) variables. "1" represents the existence and “0” shows the non-existence of the corresponding variable. Five other bits are equal to 1–32 (\({2}^{5}\)=\(32\)) which shows the number of neurons in the hidden layer. The population size of GA is assumed to be 20 [41]. The first population is generated randomly. The fitness function is mean square error (MSE). The smallest MSE in these series is the better choice for the next forecasting period. For increasing the training phase speed, the epochs are considered 100. At first, the training (learning) rate is 0.01 which will decrease during the iterations based on the considered results. By increasing the epochs to 1000, it is possible to get better results. The considered parameters in the genetic algorithm are summarized in Table 5:

Table 5 GA parameters

More details about GA mechanism as feature selection can be seen in Table 6.

Table 6 GA-ANN algorithm

Figure 3 represents the related flowchart of GA-ANN [19].

Fig. 3
figure 3

Considered GA flowchart for training ANN

Among 20 parents and 20 generated children, we select the 20 best individuals as new generations. The new generations keep repeating the mentioned method until reaching the termination condition. One of the termination conditions is repeating the best individual to 100 generations. If this condition does not hold, we address the maximum number of iterations. The maximum number of iterations equals 2000. You can also see the mutation and crossover operator in Fig. 4.

Fig. 4
figure 4

Cross-over and mutation operator

The crossover of two parent strings produces offspring (new solutions) by swapping parts or genes of the chromosomes. Crossover has a higher probability, typically in the range of 0.8–0.95.

3.2.2 Hybrid PSO-ANN model

PSO begins with the initial population and in sequential iterations moves toward an optimal solution [42]. In each iteration, two solutions are specified (\({X}_{j}^{Gbest}\) and \({X}_{j}^{i.pbest}\)) which represent the best-acquired location for all particles and the best location for the current solution, respectively. The structure of PSO is that in each iteration, each particle set its location in search space with regard to global and its own best location [42] (see Table 7).

Table 7 PSO parameters

In this study, we perform seven steps for training neural network by PSO that are summarized as follows:

  1. 1.

    Collecting data.

  2. 2.

    Creating network.

  3. 3.

    Estimating network.

  4. 4.

    Initializing weights and biases.

  5. 5.

    Training network by PSO.

  6. 6.

    Validating network.

  7. 7.

    Using network.

3.2.3 Hybrid HS-ANN model

In this study, we use the HS algorithm to train ANN and find the fittest number of input and hidden layers. The HS consists of three basic phases: initialization, improvisation of a harmony vector, and updating the HM [43]. In addition, other parameters of HS should be determined. These parameters are harmony memory size (HMS) which is equals 100, harmony memory considering rate (HMCR) which is equals 0.95, pitch adjusting rate (PAR) which is 0.3, and bandwidth (bw) which is 0.2. We can show the HM with \(\mathrm{HM}S*(N+1)\) where \(N\) is 42. The HS parameters are listed in Table 8.

Table 8 HS parameters

3.2.4 Hybrid GWO-PSO algorithm

It is a kind of hybrid algorithm including both attributes of GWO and PSO optimization algorithms in order to increase the algorithm's capability to exploit PSO with the ability to explore GWO to achieve both optimizer strength [40].Footnote 1

3.2.5 MPSO algorithm

From equations in the PSO algorithm, it is clear that it has three parts: the first part is the previous velocity of the particles; the second and third parts are the ones contributing to the change of the velocity of a particle [44]. A model which adds a second part to PSO model is MPSO. It has a parameter called inertia weight.Footnote 2

3.2.6 Hybrid MPSO-TVAC algorithm

To improve the quality of PSO in the optimization process and find the best solution, a novel modified PSO with time-varying acceleration coefficients (MPSO-TVAC) is proposed [46]. This method has a new parameter increasing the exploration capability thus it decreases the chance of trapping in local optimum.Footnote 3

3.2.7 MFO algorithm

MFO is an optimization algorithm that proposed in 2016 by Mirjalili [47]. In the MFO algorithm, moths are candidate solutions and the position of moths in the space are the problem's variables.Footnote 4

3.2.8 WOA

WOA is designed based on the hunting technique used by humpback whales [48]. They have a hunting mechanism called the bubble-net feeding method. Humpback whales try to create bubbles and then encircle and attack the prey. They update their positions based on the current best candidate and near-optimal solution. After considering the best candidate, they update their positions based on the best search agent. The following steps are needed for the operation of WOA.

  1. Step 1.

    The standard whale optimization algorithm starts by setting the initial values of the population size n, the parameter a, coefficients \(A\) and \(C\), and the maximum number of iterations max_itr.

  2. Step 2.

    Initialize the iteration counter \(t\).

  3. Step 3.

    The initial population n is generated randomly and each search agent \({x}_{i}\) in the population is evaluated by calculating its fitness function \(f({x}_{i})\).

  4. Step 4.

    Assign the best search agent \(X\).

  5. Step 5.

    The following steps are repeated until the termination criterion is satisfied.

  6. Step 5.1.

    Update the iteration counter \(t = t + 1\).

  7. Step 5.2.

    All the parameters \(a. A. C. l\) and \(P\) are updated.

  8. Step 5.3.

    The exploration and exploitations are applied according to the values of p and | A |

  9. Step 6.

    The best search agent \(X\) is updated.

  10. Step 7.

    The overall process is repeated until termination criteria is satisfied.

  11. Step 8.

    Determine the best search agent (solution) found so far \((X\)).

3.2.9 ChO algorithm

Generally, the hunting process of chimps is divided into two main phases: Exploration which consists of driving, blocking and chasing the prey and exploitation which consists of attacking the prey [49].

The chimps hunting model means driving, blocking, chasing and attacking has been modeled.Footnote 5

3.3 ARIMA forecasting model

Auto-regressive integrated moving average (ARIMA) is used for modeling time series which are stationary and you cannot find or see any special pattern. When we use the ARIMA, we would like to check if there is a linear relationship between past data and future data. The ARMA model includes different steps [50]. For example, first, you should check the stationarity. If the series is non-station, you should turn it into station data. There are a lot of methods for doing so. One of them is the Kolmogorov–Smirnov test. Figure 5 shows the flowchart of the ARIMA method.

Fig. 5
figure 5

ARIMA flowchart [33]

3.4 Testing efficient market hypothesis (EMH)

One of the main assumptions in market analysis is that the market is efficient. When you figure out if the market is efficient or not, the result affects your decision. When a market is efficient it means that abnormal returns cannot be earned by searching for mispriced stocks. So, the weak form of the EMH declines the value of technical analysis. As we mentioned, financial time series are not normal and they are skewed. So, we should perform the non-parametric test. Since the main focus of the article is on ANN, a brief explanation is provided about EMH. To decide if a sample comes from a population with a specific distribution, the Kolmogorov–Smirnov goodness of fit test is used [51]. The randomness of data is also evaluated using a run test [52].

3.5 Loss functions

For the loss function calculation, we utilize some loss functions in MATLAB to determine the best performance model which has the highest (maximum) accuracy and the lowest (minimum) error. Table 9 summarizes the available loss functions in MATLAB. Finally, we compare their accuracy with respect to calculated loss functions.

Table 9 Most common loss functions

4 Findings and results

In this section, we shall discuss the test data and numerical results obtained by using the presented algorithms.

4.1 Data statistics

First of all, as we mentioned earlier, we need to normalize data and scale them between [− 1, + 1]. Table 10 shows the normalized data.

Table 10 Data preview table

In this study, 42 technical indicators are used to predict stock prices. Among these indicators, 41 variables are used as input variables and one variable is the output or target variable, that is closing price for the next day. To run the experiments, the data is collected from the beginning of 2013 to the end of 2018 which is the daily stock price of Khodro company which is a big company in the automobile industry in Iran. The reasons for selecting this company are:

  1. (1)

    Data availability and easy access to data.

  2. (2)

    It is the biggest and most famous company in this industry in Iran.

To access the data, we accept Laboratory risk. The data was obtained through two different websites which are called TSETMC and CODAL (http://tsetmc.ir/ and https://www.codal.ir/). In addition, there is a financial data software called TSECLIENT 2.0 and you can download data easily according to the symbol name in the stock market.

The following pie chart shows the segmentation of data in the experiments. The total number of instances is 1082. The number of training instances is 650 (60.1%), the number of selection instances is 216 (20%), the number of testing instances is 216 (20%), and the number of unused instances is 0 (0%) (see Fig. 6).

Fig. 6
figure 6

Instances pie chart

In the appendix, Table 33 shows the value of the correlations between all input and target variables. The maximum correlation (0.994050) is between the input variable “Typical Price” and the target variable.

4.2 ANN model

First, we predict stock prices by ANN without using any additional algorithm. We perform it in three steps: (1) finding the best architecture (designing); (2) training the network; (3) validation and testing. We use 70% of the data for training and the remaining is used for validation and testing. Table 11 presents the best architecture of the network.

Table 11 Best network architecture

An architecture highlighted with blue color shows the best architecture including 41 neurons as the input layer, 50 neurons as the hidden layer, and one layer for output with the highest R-Squared. The best network error during each iteration is also shown in Fig. 7. Additionally, the network properties are summarized in Table 12.

Fig. 7
figure 7

Best network error

Table 12 Network properties

Figure 8 depicts the best performance in three parts of the method (training, validation, and testing). Regression and related plots are displayed in Fig. 8 with related statistics. More details about different loss estimations are summarized in Table 13.

Fig. 8
figure 8

ANN regression

Table 13 Training, validation and testing error (before using GA)

In addition, we apply the quasi-Newton method as an optimization algorithm in the training phase. It is designed based on Newton's method, but it does not need to calculate the second derivatives. Instead, the quasi-Newton method computes an approximation of the inverse Hessian at each iteration of the algorithm, by only using gradient information. Table 34 shows the results of this training strategy. Figure 9 also shows the training and selection errors in each iteration. The blue line represents the training error and the orange line represents the selection error. The initial value of the training error is 15.9893, and the final value after 468 epochs is 0.000213652. The initial value of the selection error is 20.6414, and the final value after 468 epochs is 0.000372232.

Fig. 9
figure 9

Training using Quasi-Newton optimization algorithm

Table 14 shows the training results by the quasi-Newton method. It includes some final values in the neural network, the loss function, and the optimization algorithm.

Table 14 Training error (using Quasi-Newton)

4.2.1 Hybrid GA-ANN model

The input variables selection is the way to find the optimal subset of inputs that has the minimum error. A growing input method is used here as an inputs selection algorithm Fig. 10 shows the error history for the different subsets during the growing input selection process. The blue line represents the training error and the orange line symbolizes the selection error.

Fig. 10
figure 10

Growing input error plot

Table 15 shows the inputs selection results by the growing inputs algorithm. It includes some final values for the parameters of the neural network, the error function and the inputs selection algorithm.

Table 15 GA results

A graphical representation of the deep architecture is depicted in Fig. 11. It contains a scaling layer, a neural network, and an un-scaling layer. The yellow, blue, and red circles represent scaling neurons, perceptron neurons, and un-scaling neurons, respectively. The number of inputs is 20, and the number of outputs is 1. The complexity, represented by the number of hidden neurons, is 1. Table 16 shows different types of errors in the training, selection, and testing phases.

Fig. 11
figure 11

Final architecture

Table 16 GA error table

Figure 12 represents testing the network. The horizontal line shows the closing price and the vertical line shows the output range which has normalized between [1, − 1]. Indeed, the output results of the neural network (blue line) are so close to the target values (red line).

Fig. 12
figure 12

Target versus output trend (Khodro)

4.2.2 Hybrid PSO-ANN model

Since we would like to predict stock price (closing price), we should create a fitness function. We perform it in the format of M file in MATLAB by adjusting the considered parameters which we explained before. At first, we initialize the algorithm which includes population and speed with initial values of \(pbest\) and \(gbest\). First, we consider values for \({C}_{1}\) and \({C}_{2}\) with given iteration which is 1000 here. We should update parameters constantly for achieving the intended goal. we should mention that the network is feedforward. We can see the regression in Fig. 13. Table 17 also shows the estimation errors and different loss functions.

Fig. 13
figure 13

ANN-PSO regression

Table 17 Hybrid ANN-PSO

4.2.3 Hybrid HA-ANN model

Like other algorithms such as GA and PSO, we perform several steps for training the network and solving the problem. The network structure is a feed-forward ANN (FFANN). First, the number of iterations is assumed 1000 and in order to achieve better results, we increase it to 5000. Finally, the result after 5000 iterations is shown in Table 18.

Table 18 Hybrid ANN-HS

In Table 18, the R2 is 0.995.

4.2.4 Hybrid GWO-PSO algorithm

In this section, we would like to provide precise and general results to avoid prolonging the content. So, the following important variables are considered input variables (see Table 19).

Table 19 Feature selection using GWO-PSO algorithm

Among these 42 indicators, 12 indicators are selected as input variables and others are not chosen. Table 20 shows the results.

Table 20 GWO-PSO feature selection results

4.2.5 MFO algorithm

First of all, we tune the parameters and the results are shown in Table 21. Figure 14 also shows the fitness function and convergence during iterations. You can see a clear decrease in each iteration until the best score is obtained, that is 8.0081e−32.

Table 21 MFO parameters
Fig. 14
figure 14

Test function and convergence curve

4.2.6 WOA

Like the MFO algorithm, first of all, we tune the parameters and the results are shown in Table 22. Figure 15 demonstrates the fitness function and convergence during iterations.

Table 22 WOA parameters
Fig. 15
figure 15

Test function and convergence curve

4.2.7 MPSO, MPSO-TVAC, ChO algorithms

In this part, we run three algorithms together but their results are depicted separately. As it is clear, among these three algorithms, that is, ChOA, MPSO, MPSO-TVAC, ChOA has the lowest error. Figures 16, 17, 18 and 19 show the chaotic map for types ChOA1 and ChOA2 after 500 iterations (see Table 23).

Fig. 16
figure 16

Mathematical models of dynamic coefficients (f) related to independent groups for (a) ChOA1

Fig. 17
figure 17

Mathematical models of dynamic coefficients (f) related to independent groups for (a) ChOA1

Fig. 18
figure 18

Chaotic map

Fig. 19
figure 19

Test function and convergence curve

Table 23 Parameters and errors

You can see that chaser with driver and attacker with barrier almost have the same behavior but as we stated previously, they follow different strategies.

In ChOA2, it is clear that in iteration 400, three groups including attacker, barrier and chaser are closed to each other.

It goes without saying that among these three algorithms, ChOA, MPSO-TVAC, and MPSO have the lowest error and optimal solutions, respectively. ChOA has a very sharp decline compared to other algorithms.

4.3 Time series forecasting (ARIMA)

Most of the time, the economic and financial time series are not normal and they have some characteristics such as skewness and kurtosis. So, we should check if the time series is stationary or not. For this purpose, we used the augmented dicky fuller (ADF) test for testing stationarity. One of the main methods which can show the existence of a unit root is a correlogram plot. The results are presented in Fig. 20 and Table 24.

Fig. 20
figure 20

Correlogram of closing price

Table 24 Unit root test using ADF

As it is clear, there is at least one-unit root. Further results and details can be obtained by ADF.

From Table 25, we can see that t statistic (i.e., − 2.108315) is higher than critical values in 1%, 5%, and 10% significance levels. Thus, the time series is not stationary and we have to solve it with one level differencing.

Table 25 ADF test after one level differencing

Now, we can see that t statistic (i.e., -17.31204) is less than critical values in all three significance lev3els. So, the series is stationary. Figure 21 presents more details.

Fig. 21
figure 21

Correlogram after one level differencing

Now, we can use ARIMA as a prediction model. We used Eviews10 as a tool for computation. The best model estimation is presented in Table 26. The model selection criteria are summarized in Table 27. Also, Fig. 30 illustrates the Akaike information criteria while Table 28 illustrates the ARIMA forecasting summary (See Fig. 22).

Table 26 ARIMA forecasting
Table 27 Model selection criteria
Table 28 ARIMA forecasting summary
Table 29 Testing normality using K-S test
Fig. 22
figure 22

Akaike information criteria (top 20 models)

From Table 28, we can find that the best ARIMA selected model is (4.1.1) with AIC value -5.0936.

4.4 Testing EMH

At first, we need to check the normality. So, we used the Kolmogorov–Smirnov normality test (see Table 29).

The value of Sigma is less than 0.05 which means that the time series is not normal. So, it is possible to use the non-parametric test. It means that we should run the test for checking the EMH (see Table 30).

Table 30 Run test

Sigma is less than %0.05 which means that data are not random. So, the market is not efficient.

4.5 Comparative study

In this section, we have reviewed some similar articles and compared our results with them in a table format (see Table 31). For better understanding of the results, we order the methods in accordance with their MSE (from minimum to maximum error).

Table 31 Comparative Study

5 Conclusions

In this paper, we used an artificial neural network as a prediction method to forecast Khodro stock prices. In this regard, we used a couple of important technical indicators such as SMA, EMA, and TMA as input variables. At this point, we selected the most important ones by using GA and GWO-PSO. Afterward, we trained the network using different meta-heuristic algorithms such as HS, PSO, MFO, MPSO, MPSO-TVAC, WOA, CHOA, and a time series model called ARIMA.

After obtaining optimum indicators and weights by GA and GWO-PSO, we computed different loss functions for each algorithm. As it can be concluded from Table 32, WOA and MPSO have the lowest and highest training and testing error, respectively. For evaluating the performance of the model, we should test it with a new set of data called testing performance data. Finally, we analyzed the EMH and the results showed that the market is inefficient. The main advantages obtained by using meta-heuristic algorithms are as follows:

Table 32 Arrange algorithms based on MSE
  • Speeding up calculations.

  • Reducing the model complexity.

  • Increasing the network accuracy.

  • Ease of using models.

On the other hand, as we mentioned earlier, these algorithms have some limitations:

  • These algorithms are sensitive to the value of their parameters. As such, these parameters should be tuned before ahead. In other words, setting parameters and assigning suitable values to each one can affect the outputs. Thus, if the tuning phase is not performed correctly, your model will face serious problems.

  • Another limitation of these algorithms (especially evolutionary algorithms) is that most of them fall into local optimum. In other words, there is no guarantee for global optimally. As a result, most of these algorithms have different strategies for exploitation and exploration. They have different approaches for generating the initial population, finding an optimal solution, etc.

  • The next limitation of these algorithms is that the obtained solutions are not repeatable. Each time you run these algorithms; you may reach different solutions.

So, in this research, we used different approaches to overcome the limitations of each algorithm and compared them to each other.

Our suggestion for future research is to concentrate on other parameters such as the number of hidden layers and activation function and to apply other models of HS such as HIS. In addition, researchers can train neural networks or select features with other new metaheuristic algorithms such as the bald eagle algorithm (BEA), sparrow search algorithm (SSA), Lichtenberg algorithm (LA), and so forth. Furthermore, we believe the prediction of crypto price by using these algorithms and other AI-based methods such as deep learning and fuzzy logic could be a good idea for future research.