Keywords

1 Introduction

An accurate forecasting result is an important and crucial support for business and industry managers to make decisions. The demand for the forecasting lies in many different places such as the customers’ needs, the product production capacity, the trading costs, and the expected net profit, so on so forth. The foreign exchange rate forecasting is one of them directly related to the net profit and the trading costs for the companies involving in the international trading business. In addition, the business circle is a series of fluctuated economic experiences composed of four circulating phases: the expansion, the peak, the contraction, and the trough. Motions of the circulation provide the impact to the trend of the long-term growth in Gross Domestic Product (GDP), which is closely related to the economic cycle. During the past two decades, globe stock markets undergo two significant episodes that can use economic cycle to spell out a rough sketch about the current globe stock markets: the first event takes place in the US that the dot-com bubble took globe stock markets a nose dive in 2001, and the other is subprime mortgage crisis culminated in the financial crisis of 2007 to 2008 that lasted over seven years’ bear market over the world. In accordance with the laws of the business circle, globe stock markets took a jump in 2016 and the prediction on the next tendency becomes a crucial problem to be monitored. Investors can obtain the flow of stock market by the industrial sectors’ indices which commonly assemble stocks into groups and are the primary form to convey the level of prices. The indices provide more encyclopaedic comparability and the more wide-spread angle of industrial performances we can grasp.

Base on the ground truth mentioned above, a swarm intelligence algorithm called Interactive Artificial Bee Colony (IABC) is utilised to compose the forecasting model. It produces the forecasting result of the return index by receiving inputs from the trading volume, the number of trading records, the opening price, and the closing price. The same inputs are used in a time series model called EGARCH to produce the baseline for compare. The rest of the paper is composed as follows: the literature review is given in Sect. 2, the experiment design is discussed in Sect. 3, the experimental results are given in Sect. 4, and the conclusion is given in Sect. 5.

2 Literature Review

Time-series Model is a typical technique used in the financial analysis [1]. It can bring data to light and conjecture that the reactions are taking place by analysing the mass data. Autoregressive Conditional Heteroscedasticity (ARCH) is brought up by Robert F. Engle (1982) [5] and that included Mean and Variance equations. Mean equations denote that the Efficient-market, the long-term arbitrage, and the speculation are unworkable. The variance equations denote that the risks are foreseeable and handleable. In ARCH model, the conditional variance will be adjusted with the time, which provides the ability of enhancing the characteristics of accuracy of the model. In recent years, ARCH models are expanded in a bunch of linked time-series models called ARCH family under many improvements and developments made by the scholars. For example, Tim Bollerslev (1986) proposes the Generalised Auto-regressive Conditional Heteroskedastic model (GARCH) [2] based on the ARCH [3] fundamental to simplify the setting of parameters that presumes the conditional variance is affected by both the error sum of squares and the conditional variance in the previous period. The ARCH and the GARCH models are both constructed base on the symmetric in fluctuation. Different from these models, French et al. (1986) [10] consider the same subject in a different angle of view. They point out that the impacts of the markets brought by the good news and the bad news are asymmetric. The bad news always trigger more variations than the good news when the event appears. In 1991, Nelson presents the Exponential GARCH (EGARCH) [11] model to include the asymmetric fluctuation phenomenon in the consideration. It has been proofed that the EGARCH model has better explanatory power in the analysis.

Different from the time-series models, the swarm intelligence is a research field containing algorithms designed based on the observation and simulation of the collective behaviours in natural or artificial [12]. The Artificial Bee Colony Algorithm (ABC) algorithm [13], which is proposed by Karaboga (2005). A few years later, Tsai et al. (2009) embed the Newtonian law of Universal Gravitation in the ABC algorithm to reform the selection mechanism for broadening the development capability of onlooker bees, the trial probes the interactive affection of ABC algorithm with the Interactive Artificial Bee Colony (IABC). The IABC algorithms has been used in foreign exchange rate forecasting in 2014 and 2016 [8, 9], renewable energy unit distribution [6] and pattern recognition [7]. And the outcomes obtained in the foreign exchange rate forecasting is satisfactory.

3 Experiment Design

In this work, the historical data of the trading volume, the number of trading records, the opening price, and the closing price in the past 30 days are collected for feeding into the EGARCH(1,1) and the IABC forecasting models, respectively. The experimental data includes those collected in January \(10^{th}\) in 2009 and December \(31^{st}\) in 2015. By the lubrication of the sliding window strategy, the forecasting models are capable to produce the outcomes for all targeting observational periods. According to the published classification in Taiwan Stock Exchange (TWSE), the category of technology industry stock contains seven independent stocks. The stocks in the technology industry category are the subject studied in this paper. The forecasting accuracy is measured via the Mean Absolute Percentage Error (MAPE) value. The MAPE tells the distance between the forecasted result and the actual return index in an objective way.

The experiment is composed of the steps listed as follows:

  1. 1.

    Fundamental statistical tests of the input data: The input data need to be preprocessed by the Ljung-Box Q test and the Augmented Dickey-Fuller (ADF) Unit Root Tests [4]. The details will be given in the following subsection.

  2. 2.

    Feeding the input data into both the EGARCH and the IABC forecasting models for training and testing.

  3. 3.

    Calculating the MAPE values corresponding to the forecasting results of both models.

3.1 Input Data for the Forecasting Models

The study subjects are the stocks in the technology industry category in TWSE market. This category includes seven stocks of computers and computer peripheral equipment, the optoelectronics, the communication technology and internet, the electronic parts and components, the electronic product distribution, the information service, and the other electronics.

For every stock, the collected trading volume, the number of trading records, the opening price, and the closing price are examined to make sure they are with the expected statistical characteristics and distribution. In general, the stock price should fit in the normal distribution by logarithm as a log return. The rate of the return is calculated by Eq. (1):

$$\begin{aligned} p_r = \log {(\frac{p_t}{p_{t-1}})} \end{aligned}$$
(1)

where \(p_t\) refers to the price on day t, and \(p_r\) denotes the rate of the return.

If the outcome of Eq. (1) fits in the normal distribution, the original input data can be further processed by the Ljung-Box Q text for analyzing the residual correlation by Eq. (2):

$$\begin{aligned} Q = n(n+2)\sum _{k=1}^{h}{\frac{\hat{p}_k^2}{n-k}} \end{aligned}$$
(2)

where Q denotes the test result, n refers to the size of the sample, \(\hat{p}_k\) indicates the autocorrelation at lag k, and h is the test number of lags.

The ADF unit root test is applied after the Ljung-Box Q test for verifying the sta-tionary of the series data by Eq. (3):

$$\begin{aligned} \triangle y_t = \alpha + \beta t + \gamma y_{t-1} + \delta _1 \varDelta y_{t-1} + \cdots + \delta _{p-1}\varDelta y_{t-p+1} + \varepsilon _t \end{aligned}$$
(3)

where \(\alpha \) stands for a constant, \(\beta \) indicates the coefficient on a time trend, and p is the lag order of the Autoregressive (AR) process.

3.2 The Time-Series Model: EGARCH

In this study, the classical EGARCH(1,1) model is used in the experiments. The EGARCH model can be depicted as follows:

$$\begin{aligned} y_t = x_tb + \varepsilon _t \end{aligned}$$
(4)

where \(y_t\) refers the function of exogenous variable \(x_t\), \(x_t b\) is the conditional mean of \(y_t\), and \(\varepsilon _t\) denotes the residuals.

$$\begin{aligned} \varepsilon _t \mid \varOmega _{t-1}\sim N(0,h_t) \end{aligned}$$
(5)

where \(\varOmega _{t-1}\) means all set of information to period \(t-1\) and \(h_t\) refers conditional variance of \(y_t\).

$$\begin{aligned} \begin{array}{ll} \ln {(h_t)} = \alpha _0 + \sum _{i=1}^{q}{\alpha _i \Big \{ \gamma _i \frac{\varepsilon _{t-i}}{\sqrt{h_{t-1}}} + \vartheta _i \Big [ \frac{|\varepsilon _{t-i}|}{\sqrt{h_{t-i}}} - \sqrt{\frac{2}{\pi }} \Big ] \Big \}} + \sum _{j=1}^{p}{\beta _j \ln {(h_{t-j})}}\text {,}\\ \text {subject to } \alpha _0 > 0\text {, }\alpha _i \ge 0\text {, } \beta _j \ge 0\text {, } \forall i,j \end{array} \end{aligned}$$
(6)

where \(\gamma _i\) is the parameter of the asymmetric volatility effect, \(\gamma _i \frac{\varepsilon _{t-i}}{\sqrt{h_{t-i}}}\) denotes the sign effect, \(\vartheta _i\) indicates the parameter of scale of unexpected variation, and \(\vartheta _i \Big [ \frac{|\varepsilon _{t-i}|}{\sqrt{h_{t-i}}} - \sqrt{\frac{2}{\pi }}\Big ]\) represents the magnitude effect.

3.3 The Interactive Artificial Bee Colony (IABC) Model

IABC is a branch of ABC algorithm, which involves the universal gravitation into the onlooker movement process in the conventional ABC algorithm. The IABC model can be depicted as follows:

  1. 1.

    Initialization: Randomly spread \(n_e\) percent of the population into the solution space, where \(n_e\) refers the ratio of employed bees to the total population.

  2. 2.

    Move the onlookers: Calculating the probability of selecting a food source by Eq. (7) and choose G of them as the reference by the roulette wheel scheme for every on-looker bees.

    $$\begin{aligned} P_i = \frac{F(\varTheta _i)}{\sum _{k=1}^{S}{F{\varTheta _k}}} \end{aligned}$$
    (7)

    where \(P_i\) denotes the probability of selecting the \(i^{th}\) employed bee, S stands for the total number of the employed bees, \(F(\varTheta _k)\) and \(F(\varTheta _i)\) represent the fitness value of employed bees k and i, respectively. Move the onlookers by Eq. (8) based on the selected G reference employed bees.

    $$\begin{aligned} x_i(t+1) = \varTheta _i + \sum _{k=1}^{G}{\{ \tilde{F}_{ik}\cdot [\varTheta _i(t) - \varTheta _k(t)] \}} \end{aligned}$$
    (8)

    where \(x_i\) is the coordinate of the \(i^{th}\) onlooker bee, t indicates the iteration number, \(\varTheta _i\) and \(\varTheta _j\) denote the coordinates of the \(i^{th}\) employed bee and the randomly chosen employed bee, respectively, G is the number of the selected employed bees, and \(\tilde{F}_{ik}\) refers to the normalized universal gravitation.

  3. 3.

    Move the scouts: The employed bee will become the scout and be moved again if none of the onlooker bees selected it as a reference after a predefined Limit iterations. Nevertheless, there are two randomly selected employee bees can be exempted from the rule. The rest of the employee bees, which matches the condition to become the scouts will be moved by Eq. (9):

    $$\begin{aligned} \varTheta _{ij} = \varTheta _{jmin} + r(\varTheta _{jmax} - \varTheta _{jmin}) \end{aligned}$$
    (9)

    where \(\varTheta _{jmax}\) denotes the maximum value over all dimensions of \(\varTheta _i\), \(\varTheta _{jmin}\) indicates the minimum value appears in all dimensions of \(\varTheta _i\), and r is a random variable in the range of [0, 1].

  4. 4.

    Update the near best solution: Keep the best fitness value and the corresponding co-ordinates of the bee.

  5. 5.

    Termination checking: As the termination conditions are satisfied, terminate the algorithm and output the kept near best solution; otherwise, go back to Step 2 and repeat the processes.

In the IABC forecasting model, the input data is merged from four variables into two variables by Eqs. (10) and (11):

$$\begin{aligned} TP = \frac{T_V}{V_R} \end{aligned}$$
(10)

where TP denotes the trading power, \(T_V\) is the trading volume, and \(T_R\) is the number of the trading records.

$$\begin{aligned} PD = \frac{P_O}{P_C} \end{aligned}$$
(11)

where PD stands for the price difference of a trading day, \(P_O\) and \(P_C\) represents the opening price and the closing price of a trading day, respectively.

In the training phase, 30-day historical data (denoted by \(Data_{(t-31:t-1)}\) is feed into the IABC model for training the corresponding weighting mask (denoted by W). The desired output should be as much close to the actual return index on day \(t-1\). The trained W is later used in the testing phase with the input data \(Data_{(t-30:t)}\) to produce the forecasting return index for day t.

3.4 Calcuation of the MAPE Value

The MAPE is calculated by Eq. (12):

$$\begin{aligned} MAPE = \frac{1}{m}\sum _{t=1}^{m}{\frac{|\hat{S}_t - S_t|}{S_t}}\times 100 \% \end{aligned}$$
(12)

where \(\hat{S}_t\) and \(S_t\) stand for the prediction and the actual value, respectively, and m is the total number of the data.

4 Experiments and Experimental Results

The experimental results include seven independent stocks in the technology industry category in 2009 to 2015. Totally, 1731 observations are included in the data. The experimental results are presented base on the stocks in Fig. 1.

Fig. 1.
figure 1

MAPE values of all stocks in 2009–2015.

According to the experimental results obtained with 7 independent stocks, we can find that the IABC forecasting model produces much smaller MAPE values than the EGARCH(1,1) model. Nevertheless, it is still suffered from trapping in the local optimum, sometimes, and produces a significant jump of the MAPE value.

5 Conclusions and Future Works

The goal of this work is to forecast the return index of the individual stocks base on the information observed from the trading historical data of the subjects. The category of technology industry stock, which includes 7 independent stocks, in Taiwan Stock Exchange (TWSE) is selected to be the study subject. The EARCH(1,1) and the IABC forecasting models are utilized to generate the forecasting results. The forecasting accuracy is measured by the MAPE value. The experimental results indicate that the IABC forecasting model presents superior results than the EGARCH(1,1). In the future work, we will focus on increasing the stability of the IABC forecasting model to reduce the unsteadiness of the outcome. The forecasting models and the forecasting return index provides the shareholders additional references for operating the investments.