Keywords

1 Introduction

In a field research of time series forecasting, (Box, 2013) developed the integrated autoregressive moving average (ARIMA) methodology which have been already indicated the restriction of linearity by statisticians in several ways. That’s why the robust versions of various ARIMA models have been evolved to function as a powerful tool as forming the empirical dependencies between successive and failures times (Walls & Bendell, 1987). Nonetheless, a major disadvantage of these models is a fundamental assumption with having an existence linear correlation among the time series values which unfortunately differs with roughly problems in practice typically considered as existing in a non-linear relationship. Therefore, alternative methods that thoroughly investigate the non-linear interaction need to be developed due to unsatisfactory results of the estimation of linear models to factual problems. One of radical alternatives is Artificial Neural Networks (ANNs) model whose properties imbricate statistical approaches in considerable way (Bishop, 1995), with performing a statistical analysis mainly based on an availability of data. An evolution of applied artificial neural systems consisting of ANNs has allowed a development of forecasting future values and knowledge depended on support of decision, having the fact that their patterns of recognition are considered uncomplicated and can be employed to a wide range of practical fields (Robert R. Trippi, 1996) where traditionally statistical methods are dominant. A major advantage of neural networks adequately reflects in their resilience as building models of non-linear relationship between economic factors that adaptively depended on the characteristics signified in the data. Therefore, ANNs can be seen as the data-based approach which well matches to several empirical data sets with having not an availability of theoretical guidance used as a suggestion for a proper process of data generation. In the field of time series forecasting, numerous empirical studies in which there are varied research of large-scale data applied to produce economic forecasts have demonstrated an effectiveness of ANNs. In comparison with ARIMA, ANNs can be also considered as nonparametric techniques with having a fundamental similarity of modelling a proper enclosed illustration of time series data.

Several research has applied ANNs in a way of integrating the ANNs with the ARIMA models, so that they could take advantages of these two models to generate the most accurate results (Kohzadi et al., 1996). A concept of hybrid models which is a combination of ANNs and ARIMA model in which each model’s unique feature properly integrates into an organic entity so as to seize varied patterns in the data have been demonstrated to improve an accuracy of forecasting (Bates & Granger, 1969). The result of forecasting was proven to be more effective and efficient as combining more than one forecasting model with the well-known M-competition problem (Makridakis et al., 1982) according to both theoretical and empirical findings ((Palm & Zellner, 1992); (Ginzburg & Horn, 1993); (Luxhøj et al., 1996). Some hybrid models have been presented in forecasting research with the application of ANNs. For example, the model built based on radial basis function networks of the Box–Jenkins models (Wedding & Cios, 1996); the hybrid synthesis of multiple models (Sfetsos & Siriopoulos, 2004) aiming to analyze a conformity between an efficiency of model and clustering algorithms and neural network. In addition, the hybrid artificial intelligence model considered as a formation of the rule-based system and the neural networks techniques used to predict the daily trend of S&P 500 Index (Tsaih et al., 1998), the integrated model of neural network and fuzzy model applied to predict the exchange rate (Kodogiannis & Lolis, 2002); or the forecast of price trend in short term of Taiwan stock index which was produced by the neural network with training data from ARIMA outcomes (Wang & Leu, 1996). Some studies demonstrated that the hybrid model provided more accurate forecasts than either the ARIMA model or the ANNs (P. G. Zhang, 2003), reflecting in better values of MSE (Mean Square Error). The hybrid model with the combination of ARIMA and ANNs apparently improves the preciseness of forecasting in a way that the ARIMA would manipulate the linear correlation lying in the historical data, while the ANNs would examine the nonlinear interaction existing in the uncertain parts of data. Basically, the time series composes four parts of trend (T), cyclical alteration (C), seasonal change (S), and irregular fluctuation (I), thus there is a common pattern of model to analyze the time series data: an additive model (TS = T + C + S + I). Besides, the correlations dwelling in the time series data consist of linear part (L) and nonlinear part (N), hence the additive model is (L + N).

The priority objective of this study is to build a hybrid of ARIMA and ANNs models to forecast the GDP and CPI of Vietnam. The first reason why the hybrid model is the chosen approach for this paper is to overcome an issue of what appropriate models for a certain time series dataset that could have linear or nonlinear structures or both. Another reason is that there is no fitted model which could deal with every data sample in terms of time-series forecasting (Chatfield, 1988); (Jenkins, 1982), because of a complexity of problems in practice and a lack of ability to explore varied structures in an efficient way of a certain model. (G. P. Zhang et al., 2001), for instant, applied ARIMA models to evaluate an efficiency of outcomes produced by ANNs model in the field of time-series forecasting. (Makridakis et al., 1993) proposes that the hybrid model formed by integrating several various models is likely to produce more accurate results as forecasting in comparison with a single model regardless of a consideration of what the best model is. S. Makridakis also found that a popularity of M2-contest which demonstrated that an integration of varied models in forecast enhanced an accuracy of an outcome, significantly initiated the application of the hybrid model in the field of time-series forecasting. From the initial study of (Bates & Granger, 1969) to inclusive research of (Clemen, 1989), the field of time-series forecasting using the hybrid model has undergone a long period of development with a core idea of taking an advantage of each model’s specification to explore all types of structure lying in the dataset. For example, (Wedding & Cios, 1996) proposed the hybrid model of radial basis function networks and the Box–Jenkins models, (Naftaly et al., 1994) introduced the hybrid model of plentiful feedforward neural networks.

The rest of the paper is presented as follows: the next section reviews the theoretical basis of ARIMA and ANNs approaches to time-series forecasting and the hybrid model. Section 3 presents the process of selecting and designing the hybrid model as well as the data description. Then Sect. 4 reports empirical results from two datasets. Finally, Sect. 5 proposes the concluding remarks.

2 Methodology

ARIMA Model

ARIMA model has a basic assumption about the linear correlation between the predicted value of dependent variable and the combination of historical observations of independent variables and random errors which can be illustrated as follows:

$$ \begin{gathered} y_t = \emptyset_0 + \emptyset_1 y_{t - 1} + \emptyset_2 y_{t - 2} + {\Lambda } + \emptyset_p y_{t - p} + \varepsilon_t - \theta_1 \varepsilon_{t - 1} - \theta_2 \varepsilon_{t - 2} - {\Lambda } - \hfill \\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad { }\theta_q \varepsilon_{t - q} \hfill \\ \end{gathered} $$
(1)

where yt and \(\varepsilon_t\) are the actual value and random error at time t, while \(\emptyset_i\) (i = 1, 2, …, p) and \(\theta_j\) (j = 1, 2,…, q) are model parameters; and p and q are integers considered as orders of autoregressive and moving average polynomials. This model assumes that \(\varepsilon_t\) is distributed in an independent and identical way with having a mean zero and a constant variance (\(\sigma^2\)). As q = 0, Eq. (1) is an autoregressive (AR) model with p orders. As p = 0, Eq. (1) is a moving average (MA) model with q orders. The seasonal model is expressed as ARIMA (p, d, q) (P, D, Q). The method of forming ARIMA models evolved by Box and Jenkins (1970) allowed to significantly advance the time series analysis and forecasting application, which was dominantly applied in various fields of time series forecasting for years. In addition, Box and Jenkins (1970) developed a process of ARIMA modelling including (i) model identification, (ii) parameter estimation and (iii) diagnostic test, with a fundamental assumption that a time series produced by ARIMA process is likely to contain theoretical autocorrelation patterns. It also suggested that the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the sample data should be used to select the value of p and q for ARIMA model.

An important requirement of the model identification stage is that the time series data should be in a form of stationary, meaning that the mean and autocorrelation pattern are unchanged regardless of timing point along the period. If the time series shows the trend and heteroscedasticity, it should be applied to a differencing transformation to terminate the trend and fix the variance. Secondly, the aim of parameters estimation stage is to minimize overall values of errors by utilizing a process of nonlinear optimization. Finally, some diagnostic tests are implemented to check a sufficiency of model by using the goodness of the error term as a main criterion.

Artificial Neural Networks (ANNs) Model

The artificial neural network has a proven capability for mapping intricate nonlinear relationships while it unnecessarily depends on assumptions of acquiring an essence of the relationship. A well-known ANNs model is the feed-forward network having input layer and single hidden layer that are formed in nonlinear s-shaped functions, and output layer represented in linear transfer function. For more details, the output (yt) link to the inputs (yt−1; yt−2;…; yt−p) as follows:

$$ y_t = \propto_0 + \sum\nolimits_{j = 1} {\alpha_j g} \left( {\beta_{0j} + \sum\nolimits_{i = 1}^p {\beta_{ij} y_{t - 1} } } \right) + \varepsilon_t $$
(2)

\(\alpha_j\) (j = 0, 1, 2, …, q) and \(\beta_{ij}\) (i = 0, 1, 2, …, p; j = 1) are the connection weights in which p is the number of input nodes and the number of hidden nodes is 1. This one-output-node network can afford to estimate an arbitrary function with a condition that the number of hidden nodes is set up to be sufficient.

The hidden layer is usually in the form of logistic function as follows:

$$ g\left( x \right) = \frac{1}{{1 + {\text{exp}}( - x)}} $$
(3)

Basically, the ANNs expresses a nonlinear function connecting the past observations (yt1, yt2, …, ytp) to the future value yt as follows:

$$ y_t = f\left( {y_{t - 1} ,y_{t - 2} ,{\Lambda },y_{t - p} ,w} \right) + \varepsilon_t $$
(4)

where w is a vector of whole parameters; f is a function formed by the network structure and connection weights, hence ANNs can be seen as a nonlinear autoregressive model.

In the process of designing neural network, there are no systematic obligations as selecting appropriate number of hidden layers q, it just depends on the characteristics of data of research. Besides, the good choice of number of lagged observations p as dimension of input vector can be seen as the most significant step of the model construction process because this parameter has a great effect on a determination of nonlinear autocorrelation pattern. Unfortunately, again there is no officially theoretical introduction for p selection but the practical experiments. All in all, a process of selecting a proper number of hidden nodes (q) and dimension of input vector (lagged observations, p) in practice would be determined as a main challenge because there is no principle-based guidance but experiences for choosing process of q and p. In terms of training a neural network, a backpropagation (BP) is probably a dominant approach which has been used to perform neural networks’ model.

For neural network processing, the dataset is normally separated to three subset data that are used for three steps of training, test, and validation. In contrast, the process of specification, estimation, and validation of ARIMA model just often uses a whole dataset because the form of model is typically pre-identified before estimating data to get the model order. The backbone assumption for this process of designing ARIMA model is that that if the model is adapted to the time series data, it is capable to use for forecasting (Makridakis et al., 1982). Meanwhile, an estimation of the nonlinear form and the order of ANNs must emerge from the data which is overfitted by the model as a result. The major similarities between ARIMA model and ANNs are that they both consist of plentiful divisions of various models in which each has specific order, and the differentiating of data which is normally large enough should be made to gain best outcomes.

Hybrid Model of ARIMA and ANNs

The hybrid model of ARIMA and ANNs has proven that it is a reliable measurement in the field of empirically statistics practice to explore the linear and nonlinear relationships lying in a certain dataset. Since ARIMA models is inadequate as estimating the correlation of nonlinear pattern, while ANNs model’s outcomes for an approximation of linear pattern is insufficient reliable (Denton, 1995), (Markham & Rakes, 1998), so the hybrid model is likely to be a more appropriate solution to investigate both linear and nonlinear structures.

A well-known formation of hybrid model known is the additive model as follows:

$$ {\text{Additive Model}}:y_t = L_t + N_t $$
(5)

where Lt and Nt are linear and nonlinear components respectively that must be explored from the dataset.

The performance of hybrid model includes (i) Applying ARIMA model to the linear part of time series as assumed as {yt; t = 1, 2, …} to produce a series of forecast {\(\widehat{L_t }\)} which is used to compute the residuals {et} that enclose only a nonlinear part as follows based on the forms of additive model:

$$ e_t = y_t - \widehat{L_t } $$
(6)

where et is the estimated residual at time t from the ARIMA (linear) model

The outcome of first stage is the nonlinear time series which would be used to integrate into ANNs in the second stage. In turn, ANNs performs the forecasts of nonlinear parts \(\{ \widehat{N_t }\}\) with the inputs of {et} series as formed as follows:

$$ e_t = f\left( {e_{t - 1} ,e_{t - 2} ,{\Lambda },e_{t - n} } \right) + \varepsilon_t $$
(7)

This stage is known as an error correction of time series forecast in the ANNs with the ARIMA-based result. In forms of additive model, it has the equation helping to generate the forecast as follows:

$$ \widehat{y_t } = \widehat{L_t } + \widehat{N_t } $$
(8)

In conclusion, the implement of the hybrid model consists of two stages that are an identification of ARIMA model with measuring the corresponding parameters to obtain the nonlinear part, and a prediction of ANNs based on the nonlinear part.

3 Model Design and Data Description

Hybrid Model

Autoregressive Integrated Moving Average Model

The data analysis of ARIMA model includes (i) Choosing the differentiation level d as to transform the data to the state of stationary; (ii) Evaluating the autocorrelation function (ACF) and the partial autocorrelation function (PACF) to decide the most appropriate values of p and q. The data of Vietnam CPI from January 1995 to July 2022 (CPI series) provided by IMF data resource shows that there is seasonal pattern because the autocorrelation values at different lags are out of the confidence bounds as shown in the plot of ACF of CPI time series (Fig. 1). The result of first differentiation of CPI series (Fig. 2) shows a stationary pattern without the seasonal trend in the ACF and PACF, they both have large values at lags 1 (Figs. 3 and 4) within the 5% significance interval. Therefore, three possible models of ARIMA (0,1,1); ARIMA (1,1,0) and ARIMA (1,1,1) should probably be considered based on the criteria of AIC and BIC as shown in Table 1.

Table 1. Goodness of Fit of three possible ARIMA models

According to the figures of AIC and BIC, the ARIMA (0,1,1) is the best model in accordance with the data, since it has the smallest value of AIC and BIC. The distribution approach of ARIMA model applied in this research is Gaussian distribution.

Fig. 1.
figure 1

ACF of CPI time series

Fig. 2.
figure 2

The first differentiation of CPI series

Fig. 3.
figure 3

ACF of first differentiation of CPI series

Fig. 4.
figure 4

PACF of first differentiation of CPI series

Table 1 shows a result of ADF test which aims to check a state of stationary of a time series data. The ADF test result proves that the first differentiation of Vietnam CPI series from 1995M01 to 2022M07 (Fig. 2) is the stationary (See Table 2).

Table 2. Augmented Dickey-Fuller Test

Nonlinear Autoregressive Neural Network (NAR)

Artificial neural networks (ANNs) which is developed based on the biological neurons system of human but in a form of mathematical model has a capability to verify nonlinear traits of time series data as modeling dynamic nonlinear time series. Basically, clusters of artificial neurons including several single neurons linked to each other in a network by weighted connections take responsibility to analyze a source of input which is activate by computed weights of mathematical function. Similarly, an output of the neural network is estimated by another activation function with a fitted threshold as follows:

$$ y = f(b + \sum\nolimits_i {w_i x_i }) $$
(9)

where y is the output; f is the activation function; b is the bias for the neuron algorithm which allows the signal to surpass the threshold of activation function; \(w_i \) are the weights; \(x_{i }\) are the inputs.

The Nonlinear Autoregressive Neural Network (NAR) has been developed based on a linear autoregressive model added the connections of feedback and widely used to predict multi-steps ahead of time series with past values of a real time series as the inputs by an equation as follows:

$$ y_t = f\left( {y_{t - 1} , \ldots ,y_{t - d} } \right) + \varepsilon_t $$
(10)

where f is the nonlinear function, in which the forecast values depend on series of past observations; \(y_{t - 1} , y_{t - 2} ,y_{t - d}\) are feedback delays; d is the time delay.

The NAR is designed and trained in an open loop with target values as a response so as to ensure more appropriate quality approximate to the real number in training. Next, the network is transformed to a closed loop in which the predictions are used as new source of inputs. The process of training for the neural network is to estimate the model with using the optimization of the network weights and neuron bias as criteria.

The more detailed equation of NAR model as follows:

$$ y_t = \alpha_0 + \sum\nolimits_{j = 1}^k {\alpha_j \emptyset } \left( {\sum\nolimits_{i = 1}^a {\beta_{ij} Y_{t - 1} } + \beta_{0j} } \right) + \varepsilon_t $$
(11)

where k is the number of hidden layers of activation function \(\emptyset\); \(\alpha_j\) is the weighted link between the hidden unit j and the output unit \(\beta_{ij}\) is the parameter interacting with the weighted link between the input unit i and the hidden unit j with a entries; \(\beta_{0j}\) and \(\alpha_0\) are the constants to the hidden unit j and the output unit, respectively.

The dataset is separated into three categories based on the objective of performance as follows:

  • Training: 70% of the dataset is used for the training process with an adjustment of the network based on its error.

  • Validation: 15% of the dataset is served for the validation process including the network generalization and training stopping.

  • Testing: 15% of the dataset is for the testing process as to optimize the network without affecting the training process, so that it could independently value the network performance during and after training.

The Selection of Hidden Layer Number

This research applies the model with three layers of back-propagation network including input layer, output layer, and a single hidden layer in which a selection of the number of neurons majorly affects the structure of neural network because Cybenko (1989), Hornik et al. (1989), G. Zhang et al. (1998) demonstrated that a certain neural network model probably need only one single hidden-layer to efficiently investigate whatever complex non-linear relationship with having the accurate outcomes. While the Tan-Sigmoid function is a form of transformation from the hidden layer, the Sigmoid function is set for the output layer.

The structure of neural network is [2,10,2], in which the one-single hidden layer includes 10 neurons (as default layer size of NAR model in Matlab), the number of neurons of input and output layers have the same number of neurons of 2. The time delay (lags) is 2 as default setting of NAR model in Matlab.

The Evaluation Criteria

For this research, the model performance criteria of neural network, with using the NAR model in Neural Network Toolbox of MatLab, are Mean-Squared Error (MSE with the maximum value of MSE is set at 0.001 (or 1000 iterations), the times of replication to get the averages of MSE, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are set at 10 for training process.

Regarding NAR model, this paper assesses an efficiency of the time-delay d in connection with the performance of the training process by using MSE function. The value of time-delay d is set at 2 as the default of NAR model. The applied function of training process is Bayesian Regulation backpropagation with the stopping point set at the finish of generalization improvement as the MSE of the validation dataset shows an increase.

Proposed Hybrid Model for Forecasting

The CPI time series is nonstationary at the first hand as shown in Fig. 5, thus the hybrid model of ARIMA and NAR models is suggested to produce the forecast with a main objective is to process the nonstationary dataset by ARIMA model to gain the residuals series which is used as the input for NAR model. This process includes three stages:

  • Stage 1: The residuals series (nonlinear part) produced by ARIMA (0,1,1) model is collected to use as the input of NAR model in Stage 2.

  • Stage 2: The NAR model is trained, validated, and tested with the input of the residual series gained in Stage 1.

  • Step 3: The fitted NAR model is applied to produce the multiple-step prediction of CPI series and compare with its actual values.

Data Description

Forecasting the fluctuation of CPI which reflects an inflation known as one of most significant factors of the economy has been still a tough challenge in the field of economical econometrics. For the theoretical perspective, although several linear and nonlinear models have been progressive, just some of them are likely to achieve an effectiveness in the outcomes of out-of-sample prediction in comparison with the simple-random-walk model. The data used in this paper is the monthly series of Vietnam CPI from January 1995 (1995M1) to July 2022 (2022M7) applied to the process of training, testing and validation of ARIMA, ANNs and the hybrid models. The illustration of the monthly observations of Vietnam CPI is illustrated in Fig. 1 showing a non-stationary pattern. Since the CPI series is non-stationary at first hand, thus the data is differentiated to become a stationary series to meet the basic assumption of ARIMA model. The dataset is separated into two sets that are a training sample including values from January 1995 to December 2008 (168 observations), and a testing sample including figures from January 2009 to July 2022 (163 observations) shown in Table 1. The function of training data set is for a process of model development and the test data set is for fitted model evaluation which shows the gap between the actual observations and forecasting values.

Fig. 5.
figure 5

Vietnam CPI monthly series (1995M01 – 2022M07)

Table 3. Data sample

4 Research Result

Tables 3 and 4 present the result of ARIMA model, and Goodness-of-fit of ARIMA and NAR models. Figure 6 shows the performance result of NAR model.

Table 4. Result of ARIMA (0,1,1)

Regarding the NAR model, Table 5 shows the MSE which is used to evaluate the performance of model and regression R-value reflecting the correlation between outputs and responses of the training results and additional test results.

Table 5. Training results and Additional test results of NAR model
Table 6. Goodness of Fit of ARIMA and NAR models

The Bayes factor of ARIMA model shown in Table 6 is computed based on the equation of (Wagenmakers, 2007), the figure of 0.083 suggests that there is strong evidence in favor of the ARIMA (0,1,1) rather than others according to the interpretation of Bayes factor developed by (Jeffreys, 1961).

The forecast result of ARIMA (0,1,1), NAR and Hybrid models are illustrated in Fig. 7. The plots indicate that while neither ARIMA model nor NAR alone can produce the accurate prediction for the period of 163 months, the forecast of Hybrid model is greater accuracy, reflecting in the width of gap between the actual and the prediction lines. This probably proposes that either ARIMA model or NAR model has insufficiently statistical reliability to explore both the linear and the nonlinear parts in the time series data in order to produce highly accurate prediction. Meanwhile, the forecast of the Hybrid model reveals that as integrating two models together, the overall forecasting errors can be significantly minimized.

Fig. 6.
figure 6

Performance result of NAR model based on MSE

Fig. 7.
figure 7

(a) ARIMA (0,1,1) model forecast; (b) NAR model forecast; (c) Hybrid model forecast

5 Conclusion

While the ARIMA model has demonstrated its considerable advances in the field of time series forecasting, the artificial neural network model has also shown a great in that research area with its feature of exploring nonlinear pattern. However, none of them can be seen as the best model which can be used for every type of time series dataset. That’s why this paper suggests a hybrid model which integrates ARIMA and ANNs models so as to produce more accurate prediction for any time series data. In other words, it takes full advantage of the capability of processing the linear pattern of ARIMA and the nonlinear pattern of ANNs. The empirical result of Vietnam CPI prediction conclusively demonstrates that the hybrid model is likely to outperform a single ARIMA model or ANNs as forecasting a multiple-step prediction of time series data.