Applying the Hybrid Model of EMD, PSR, and ELM to Exchange Rates Forecasting

Yang, Heng-Li; Lin, Han-Chou

doi:10.1007/s10614-015-9549-9

Applying the Hybrid Model of EMD, PSR, and ELM to Exchange Rates Forecasting

Published: 17 December 2015

Volume 49, pages 99–116, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Economics Aims and scope Submit manuscript

Applying the Hybrid Model of EMD, PSR, and ELM to Exchange Rates Forecasting

Download PDF

Heng-Li Yang¹ &
Han-Chou Lin¹

700 Accesses
23 Citations
Explore all metrics

Abstract

Financial time series forecasting has been a challenge for time series analysts and researchers because it is noisy, nonstationary and chaotic. To overcome this limitation, this study uses empirical mode decomposition (EMD) and phase space reconstruction (PSR) to assist in the task of financial time series forecasting. In addition, we propose an approach that combines these two data preprocessing methods with extreme learning machine (ELM). The approach contains four steps as follows. (1) EMD is used to decompose the dynamics of the exchange rate time series into several components of intrinsic mode function (IMF) and one residual component. (2) The IMF and residual time series phase space is reconstructed to reveal its unseen dynamics according to the optimum time delay $\tau $ and embedding dimension m. (3) The reconstructed time series datasets are divided into two datasets: training and testing, in which the training datasets are used to build ELM models. (4) A regression forecast model is set up for each IMF as well as the residual component by using ELM. The final prediction results are obtained by compositing the prediction values. To verify the effectiveness of the proposed approach, four exchange rates are chosen as the forecasting targets. Compared with some existing state-of-the-art models, the proposed approach yields superior results. Academically, we demonstrated the validity and superiority of the proposed approach that integrates EMD, PSR, and ELM. Corporations or individuals can apply the results of this study to acquire accurate exchange rate information and reduce exchange rate expenses.

A Hybrid EMD-ANN Model for Stock Price Prediction

Designing of Financial Time Series Forecasting Model Using Stochastic Algorithm Based Extreme Learning Machine

RETRACTED ARTICLE: Application of support vector neural network with variational mode decomposition for exchange rate forecasting

Article 19 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An exchange rate reflects relative values between different currencies, and is one of the most important financial and macroeconomic indicators in an economy. The fluctuation of exchange rates affects international trade, capital flows, and asset portfolio management. Many financial time series forecasting models (Adhikari and Agrawal 2013; Wu and Chang 2012; Zhiqiang et al. 2012) have been developed that play a critical role in the world economy because of their ability to forecast economic benefits and influence economic development. These models have attracted increased attention from academic researchers and business people for its theoretical possibilities and practical applications (Hadavandi et al. 2010; Lu et al. 2009). The ostensible purpose of the breakdown of financial market boundaries was to enhance the efficiency of capital funding (for example, the Bretton Woods system of monetary management was officially ended in the 1973). As a result, currencies that are traded internationally have become crucial economic indices for international trade, financial markets, alignment of economic policy by governments, and corporate financial decision-making.

However, financial time series forecasting is a challenging task because of its inherent nonlinearity and nonstationary characteristics. In the last few decades, these characteristics have attracted increased attention from many academic researchers. The forecasting approaches used in literature can be classified into two types of models: statistical and artificial intelligence (Wang et al. 2012; Zhu and Wei 2013). Linear statistical models such as exponential smoothing (Lemke and Gabrys 2010) and autoregressive integrated moving average (ARIMA) (GEP and GM 1970) have identified immense applications for forecasting financial data. A subclass ARIMA model, namely, Naïve random walk (RW) (Sun 2005; Tyree and Long 1995), has become the benchmark statistical technique in this domain. In a simple RW model, each forecast is assumed to be the sum of the most recent observation and a random error term. After the pioneering work of Meese and Rogoff (1983), the RW model has been extensively used by many researchers for foreign exchange rate forecasting. Currently, the simple RW is the most dominant linear model in literature of the financial time series and, especially, exchange rates (Zhang 2003).

Despite the simplicity and notable forecasting accuracies of RW models, their main drawback is their inherent linear form. Thus, such statistical models cannot effectively capture nonlinear patterns hidden in financial time series because these models are developed based on the assumption that the time series being forecasted are linear and stationary (Huang et al. 2010). To overcome this limitation of statistical models, several nonlinear models have been proposed. Among them, the artificial neural network (ANN) has attracted considerable interest from researchers because of their excellent nonlinear modeling capability (Zhang and Wu 2009; Chen et al. 2012a; Jaeger and Haas 2004; Deng et al. 2015; Vasilakis et al. 2013). Many studies have concluded that the ANN model outperforms conventional statistical models. However, ANN suffers from local minimum traps and has difficulty determining hidden layer size and learning rate (Kazem et al. 2013). A new learning algorithm for the single hidden layer feed-forward neural network (SLFN) known as the extreme learning machine (ELM) has been proposed that overcomes the aforementioned disadvantages (Huang et al. 2006a; Chen and Ou 2011). In the learning process of ELM, the input weights and hidden biases are randomly selected, and the output weights are analytically determined by using the Moore-Penrose generalized inverse. ELM can learn much faster and has a higher generalization performance than do the traditional gradient-based learning algorithms. In addition, ELM solves the problems of stopping criteria, learning rates, learning epochs, and local minima (Huang et al. 2006; Chen and Ou 2011; Xia et al. 2012; Lu and Shao 2012). In recent years, ELM has attracted considerable attention and become an important method in nonlinear modeling (Chen and Ou 2011; Xia et al. 2012; Lu and Shao 2012).

When building intelligent prediction models directly using original values, obtaining satisfactory forecast results is difficult because of the high-frequency, nonstationary, and chaotic properties of financial data. Hence, to further improve prediction performance, recent research efforts on modeling time series with complex nonlinearity, dynamic variation, and high irregularity have initially used information extraction techniques to extract features hidden in the data. They then use these extracted characteristics to construct a forecasting model (Lu et al. 2009; Chen et al. 2012b; Liu and Wang 2011; Lu 2010). In other words, by means of suitable feature extractions or signal processing methods, useful or interesting information that may not be observed directly from the original data can be revealed in the extracted features. Therefore, an effective forecasting model possessing more precise prediction capabilities must be developed.

Empirical mode decomposition (EMD), based on Hilbert-Huang transform (HHT), is suitable for decomposing nonlinear and nonstationary time series, which adaptively represent the local characteristics of the given signal (Huang et al. 1998, 2003).Though the use of EMD, any complicated signal can be decomposed into a finite and often small number of intrinsic mode functions (IMFs). IMFs possess simple frequency components and strong correlations, and thus are easy and accurate to forecast (Jaeger and Haas 2004). EMD has been widely used in many fields, including in the analysis of the atmosphere time series (Xuan and Yang 2008), river water turbidity forecasting (Wang and Qi 2009), crude oil price prediction (Yang et al. 2010), short-term wind power prediction, and others (Jaeger and Haas 2004; Lu and Shao 2012; Chen et al. 2012a; Bao et al. 2012; Ye and Liu 2011).

Another critical reason that financial time series are notoriously difficult to predict is their chaotic nature. Chaos is often identified in the fields of physics and other natural sciences. Empirical evidence of chaotic behavior in financial time series has also been identified (Barkoulas and Travos 1998; Gimore 2001; McKenzie 2001). Chaos theory points out that an adequate method can help reveal underlying information in complicated matters believed to be unpredictable (Takens 1981). For chaotic time series, the techniques of prediction based on phase space reconstruction (PSR) can be employed to extract information and characteristics hidden in dynamic systems of time series. PSR can transform a one-dimensional signal into a structure that embeds sufficiently high dimensions. In this new high dimensional space, a structure is formed that is topologically equivalent to the original phase space. This has led some researchers to apply chaos theory to time series forecasting.

In this study, we propose a hybrid exchange rate forecasting model by integrating EMD, PSR, and ELM (EMD $+$ PSR $+$ ELM). First, the original exchange rate time series are first decomposed into a finite number of independent IMFs employing different frequencies. Second, based on PSR, different ELM models are used to model and forecast the four sub-series, respectively, according to reconstructed time series. Finally, these forecasting results are combined with the ultimate forecasting result output. Moreover, experimental results from four sets of real exchange rate data demonstrate that the proposed hybrid forecasting method outperforms methods of Naïve RW, single ELM, and other hybrid models in terms of mean absolute error (MAE), root mean-square error (RMSE), and mean absolute error (MAPE).

2 Literature Review of Major Methods

2.1 EMD

The EMD method based on HHT is based on the simple assumption that any signal consists of different but simple intrinsic mode oscillations. The essence of the method is to identify the intrinsic oscillatory modes (IMFs) (Huang et al. 1998) based on their characteristic time scales in the signal and then decompose the signal accordingly. A characteristic time scale is defined by the time lapse between the successive extremes.

To extract the IMF from a given data set, the sifting process is implemented as follows. First, we identify all local extrema, and then connect all local maxima by a cubic spline line that thus acts as the upper envelope. Then, we repeat the procedure for the local minima to produce the lower envelope. The upper and lower envelopes should cover all the data between them. Their mean is designated $m_{1}(t)$, and the difference between the data and $m_{1}(t)$ is h(t), given by the following:

$$\begin{aligned} x(t)-m_1 (t)=h_1 (t). \end{aligned}$$

(1)

Ideally, $h_{1}(t)$ should be an IMF. Because the construction of $h_{1}(t)$ described previously should have forced the result to satisfy all definitions of an IMF, we demand the following conditions: (i) $h_{1}(t)$ should be free of riding waves, that is, the first component should not display under- or over-shots that ride on the data and produce local extremes without zero crossing; (ii) symmetry of the upper and lower envelops with respect to zero should be displayed; (iii) the number of zero crossing and extremes should be the same in both functions.

The sifting process must be repeated as many times as required to reduce the extracted signal to an IMF. In the subsequent sifting process steps, $h_{1}(t)$ is treated as the data:

$$\begin{aligned} h_1 (t)-m_{11} (t)=h_{11} (t), \end{aligned}$$

(2)

where $m_{11}(t)$ is the mean of the upper and lower envelops of $h_{\mathrm{s}}(t)$. This process can be repeated as many as k times and $h_{\mathrm{1k}}(t)$ is then defined as:

$$\begin{aligned} h_{1(k-1)} (t)-m_{1k} (t)=h_{1k} (t). \end{aligned}$$

(3)

After each processing step, we must confirm that the number of zero crossings equals the number of extrema. The resulting time series is the first IMF and then is designated as $c_{1}(t)=h_{\mathrm{1k}}(t)$. The first IMF component from the data contains the highest oscillation frequencies found in the original data x(t).

This first IMF is subtracted from the original data, and this difference is called a residue $r_{1}(t)$ by means of the following:

$$\begin{aligned} x(t)-c_1 (t)=r_1 (t). \end{aligned}$$

(4)

The residue $r_{1}(t)$ is considered as if it was the original data and we reapply the sifting process to it. The process of locating additional intrinsic modes $c_{1}(t)$ continues until the last mode is found. The final residue will be a constant or a monotonic function. In this last case, it will be the general trend of the data.

$$\begin{aligned} x(t)=\sum _{j=1}^{n} {c_{j}(t)+r_n (t)} . \end{aligned}$$

(5)

Thus, the data is decomposed into n-empirical IMF modes plus a residue, $r_{n}(t)$, which can be either the mean trend or a constant.

2.2 Phase Space Reconstruction

The analysis of time series generated by non-linear dynamic systems can be accomplished in accordance with Takens’ embedding theory (Takens 1981). Given a univariate time series $\{x_{i}\}_{i=1}^{N}$ generated from a d-dimension chaotic attractor and where N is the length of the time series, a phase space $\hbox {R}^{\mathrm{d}}$ of the attractor can be reconstructed by using a delay coordinate defined as:

$$\begin{aligned} X_{i}=(x_{i}, x_{i-\pi },\ldots , x_{i-(m-1)\pi }), \end{aligned}$$

(6)

where m is both the embedding dimension of reconstructed phase space and the time delay constant. Choosing the correct embedding dimension is crucial for predicting $x_{t+1}$. Takens (Takens 1981) considered that a sufficient condition for the embedding dimension is $m\ge 2d+1$. However, an embedding dimension that is too large requires additional observations and complex computation. Moreover, if we choose an embedding dimension that is too large, noise and other unwanted inputs will be embedded with the real source input information. This may then corrupt the underlying system dynamic information. Therefore, in accordance with Sauer et al. (1991), if the dimension of the original attractor is d, then an embedding dimension of $m=2d+1$ is adequate for reconstructing the attractor.

An efficient method of locating the minimal sufficient embedding dimension is the false nearest neighbors (FNN) procedure proposed by Kennel et al. (1992). Two near points in reconstructed phase space are called false neighbors if they are considerably far apart in the original phase space. Such a phenomenon occurs if we select an embedding dimension that is lower than the minimal sufficient value and if the reconstructed attractor does not therefore preserve the topological properties of the real phase space. In this case, points are projected into the false neighborhood of other points. The idea behind the FNN procedure is as follows. Suppose $X_{\mathrm{i}}$ has a nearest neighbor $X_{\mathrm{j}}$ in an m-dimensional space. Calculate the Euclidean distance $||X_{\mathrm{i}}-X_{\mathrm{j}}||$ and compute the following:

$$\begin{aligned} R_i =\frac{\left\| {X_{i+1}-X_{j+1}}\right\| }{\left\| {X_i -X_j}\right\| }. \end{aligned}$$

(7)

If $R_{\mathrm{i}}$ exceeds a given threshold $R_{\mathrm{tol}}$ (say, 10 or 15), the point $X_{\mathrm{j}}$ is considered a false nearest neighbor in dimension m. We can say that the embedding dimension m is sufficiently high if the fraction of points that have false nearest neighbors is zero or considerably small.

Estimation of time delay $\tau $ is another major concern. If $\tau $ is too small, redundancy will occur. However, if $\tau $ is too large, it will probably lead to a complex phenomenon called irrelevance. In this study, we use the first minimum of mutual information (MI) function (Huang et al. 2006a) to determine $\tau $ as follows:

$$\begin{aligned} MI(\tau )=\sum _{n=1}^{N-\pi } {P(x_n ,x_{n+\pi })\log _2 \left( {\frac{P(x_n ,x_{n+\pi })}{P(x_n )P(x_{n+\pi })}} \right) } , \end{aligned}$$

(8)

where $P(x_{\mathrm{n}})$ is the probability density of $x_{\mathrm{n}}$ and $P(x_{\mathrm{n}}, x_{\mathrm{n}+\tau })$ is the joint probability density of $x_{\mathrm{n}}$ and $x_{\mathrm{n}+\tau }$.

2.3 ELM

ELM is an improved learning algorithm for the SLFN architecture. ELM is different from the traditional neural network methodology in that all the parameters of the feed-forward networks (input weights and hidden layer biases) are not required to be tuned. The ability of SLFNs to choose input weights randomly, as well as hidden layer biases and a nonzero activation function to approximate any continuous functions on any input set, has been demonstrated in Rao and Mitra (1971). The SLFN with randomly chosen input weights and hidden layer biases can be considered a linear system. For this linear system, the output weights that link the hidden layer to the output layer can be analytically determined through a simple generalized inverse operation of the hidden layer output matrices. This simple approach enables ELM to be extremely efficient and many times faster than the traditional feed-forward learning algorithms.

The structure of ELM consists of an SLFN in which the input weight matrix W is randomly chosen and the output weight matrix $\upbeta $ is analytically determined. Suppose we are given a data set with N arbitrary distinct samples $(x_{\mathrm{i}}, t_{\mathrm{i}})$, where $x_{\mathrm{i}}=[x_{\mathrm{i1}}, x_{\mathrm{i2}}, . . ., x_{\mathrm{i1}}]^{\mathrm{T}} 2\in R^{\mathrm{n}}$ and $t_{\mathrm{i}}=[t_{\mathrm{i1}}, t_{\mathrm{i2}}, . . ., t_{\mathrm{im}}]^{\mathrm{T}} \in R^{\mathrm{m}}$. The mathematical model of a standard SLFN with $\tilde{N}$ hidden nodes and activation function g(x) for the given data can be formulated as follows (Huang et al. 2006):

$$\begin{aligned} \sum _{i=1}^{\tilde{N}} {\beta _i g_i (x_j )=} \sum _{i=1}^{\tilde{N}} {\beta _i g_i (w_i x_j +b_i )=y_j ,\quad j=1,\ldots ,N} , \end{aligned}$$

(9)

where $w_{\mathrm{i}}=[w_{\mathrm{i1}}, w_{\mathrm{i2}}, . . ., w_{\mathrm{in}}]^{\mathrm{T}}$ denotes the weight vector that connects the input nodes to the $i\hbox {th}$ hidden node and $b_{\mathrm{i}}=[b_{\mathrm{i1}}, b_{\mathrm{i2}}, . . ., b_{\mathrm{im}}]^{\mathrm{T}}$ is the weight vector that connects the output nodes with the $i\hbox {th}$ hidden node. In addition, $b_{\mathrm{i}}$ is the threshold of the $i\hbox {th}$ hidden node. The inner product of $w_{\mathrm{i}}$ and $x_{\mathrm{j}}$ is denoted by the operation $w_{\mathrm{i}}\cdot x_{\mathrm{j}}$ in (9). Let us consider that standard SLFNs with ${\tilde{N}}$ hidden nodes employing activation function g(x) can approximate these N samples with zero error. In such a situation, we obtain the following equation:

$$\begin{aligned} \sum _{j=1}^N {\left\| {y_j -t_j } \right\| =0} , \end{aligned}$$

(10)

where y denotes the actual output value of the SLFN. This indicates the existence of $\beta _{i}$, $w_{i}$, and $b_{i}$ such that:

$$\begin{aligned} \sum _{i=1}^{\tilde{N}}{\beta _{i} g_{i} (w_{i} x_{j} +b_{i})=t_{j},\quad j=1,\ldots ,N} . \end{aligned}$$

(11)

A succinct expression of the previous N equations can be written as:

$$\begin{aligned} H\beta =T, \end{aligned}$$

(12)

where H is the hidden layer output matrix.

$$\begin{aligned} H= & {} \left[ {{\begin{array}{c} h(x_1) \\ \vdots \\ h(x_N) \\ \end{array}}} \right] =\left[ {{\begin{array}{ccc} h_{1} (x_1)&{}\quad \cdots &{}\quad h_{\tilde{N}} (x_1) \\ \cdots &{}\quad \cdots &{}\quad \cdots \\ h_{1} (x_N)&{}\quad \cdots &{}\quad h_{\tilde{N}} (x_N) \\ \end{array}}} \right] , \end{aligned}$$

(13)

$$\begin{aligned} \beta= & {} \left[ {{\begin{array}{c} \beta _{1}^{T} \\ \vdots \\ \beta _{\tilde{N}}^{T}\\ \end{array}}} \right] , \end{aligned}$$

(14)

$$\begin{aligned} T= & {} \left[ {{\begin{array}{c} T_{1}^{T} \\ \vdots \\ T_{N}^{T} \\ \end{array}}} \right] . \end{aligned}$$

(15)

As previously discussed, the input weights and hidden biases are randomly generated and do not require any tuning as in the case with traditional SLFN methodology. The evaluation of the output weights that link the hidden layer to the output layer is equivalent to determining the least-square solution to the given linear system. The minimum norm least-square (LS) solution to the linear system defined in (12) is:

$$\begin{aligned} \hat{\beta }=H^{+}T. \end{aligned}$$

(16)

The $H^{+}$ in the previous equation is the Moore–Penrose (MP) generalized inverse of matrix H (Babovic et al. 2000). The minimum norm LS solution is unique and has the smallest norm among all the LS solutions. The MP inverse-method-based ELM is shown to achieve a quality generalization performance with a radically increased learning speed. A general algorithm for ELM can be stated as follows. For a given training set, including activation function g(x) and hidden neuron number L:

Step 1:
Assign random input weight $w_i $ and bias $b_{i}, i=1, . . ., L$.
Step 2:
Calculate the hidden layer output matrix H.
Step 3:
Calculate the output weight $\beta :\beta =H^{+}T$.

3 Proposed Model

The proposed hybrid approach for exchange rate forecasting (EMD-PSR-ELM) combines EMD, PSR, and ELM, and consists of four main stages. These four stages are described as follows.

Stage 1 EMD Decomposition

The original time series $x(t), t = 1, 2,{\ldots },N$ is decomposed into n IMF components, $c_{\mathrm{j}}(t), j = 1, 2, {\ldots }, n$, and one residual component $r_{\mathrm{n}}(t)$ by using EMD.

Stage 2 Phase Space Reconstruction

First, the MI function in (8) is calculated for each $c_{\mathrm{j}}(t)$ and $r_{\mathrm{n}}(t)$ time series. Second, the first delay time in which the MI function minimum value occurs is considered the optimum time delay $\tau $. Third, the FNN method is employed to find the minimum sufficient embedding dimension m. Fourth, according to the optimum time delay $\tau $ and embedding dimension m, the time series phase space is reconstructed to reveal its unseen dynamics.

Therefore, the input and output samples can be represented by the matrix X and Y, respectively, in the following forms (where x can denote $c_{\mathrm{j}}$ and $r_{\mathrm{n}}$):

$$\begin{aligned} X=\left[ {{\begin{array}{cccc} x(1)&{}\quad x(1+\tau )&{}\quad \cdots &{}\quad x(1+(m-1)\tau ) \\ x(2)&{}\quad x(2+\tau )&{}\quad \cdots &{}\quad x(2+(m-1)\tau ) \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ x(M)&{}\quad x(M+\tau )&{}\quad \cdots &{}\quad x(M+(m-1)\tau ) \\ \end{array}}} \right] , \quad Y=\left[ {{\begin{array}{c} x(1+(m-1)\tau +lag) \\ x(2+(m-1)\tau +lag) \\ \vdots \\ x(M+(m-1)\tau +lag) \\ \end{array}}}\right] .\nonumber \\ \end{aligned}$$

(17)

Some forecasting techniques for chaotic time series nearly fix their selected time lag at 1 (Liong and Sivapragasam 2002; Makridakis 1993). Therefore, for this study, we also fix time lag = 1.

Stage 3 ELM Modeling

The reconstructed time series datasets are divided into training and testing datasets. The training datasets are used to build ELM models.

Stage 4 Result Composition

A regression forecast model is set up for each IMF whereas the residue is set up by using ELM. The final prediction results are obtained by compositing the prediction values. F is the ELM predictor function. The final forecasting result is $\sum \nolimits _{j=1}^{n-1} {F_j (c_j (t))} +F_n (r_n (t))$.

The flow chart of the proposed EMD-PSR-ELM model is shown in Fig. 1.

4 Experimental Results and Analysis

4.1 Data Sets

Daily exchange rate values for USD/TWD, EUD/TWD, GBP/TWD, and AUD/TWD were extracted from the data stream provided by OANDA (http://www.oanda.com) and used in this study. The entire data set covers the period from January 1, 2007 to December 31, 2013, yielding a total of 2557 observations. The data set was divided into two sets, training and testing data. The daily data from January 1, 2007 to April 19, 2013, generating a total of 2301 observations, were used as the training data set. Others of the daily data from April 20, 2013 to December 31, 2013, producing a total of 256 observations, were used as the testing data set. In the next section, we explain the manner in which we implement our EMD-PSR-ELM model.

4.2 Benchmark Prediction Models

As mentioned in Sect. 1, this study adopts the Naïve RW, ELM, EMD-ELM, and PSR-ELM as the benchmarks for the experiment.

(1)
Naïve RW the Naïve RW simply takes the forecast for the next value from the current value. Thus, no fitting process is required.
(2)
ELM the original time series x(t) are directly used to build ELM models and to forecast final results. The function can be expressed as $\hat{x} (t+1)=F(x(t))$, F refers to the ELM predictor function.
(3)
EMD-ELM first, the original time series are decomposed by EMD into several IMF time series and one residual time series. These decomposed datasets are then utilized to build the ELM models previously mentioned into the EMD-ELM models.
(4)
PSR-ELM we use the PSR method to reconstruct the original time series space, from which we can obtain optimum embedded dimension m and delay time $\tau $. The reconstruction datasets are adopted to build ELM models as well. The function can be expressed as the following:
$$\begin{aligned} \hat{x} (t+1)=F(x(t),x(t-\tau ),...,x(t-(m-1)\cdot \tau )). \end{aligned}$$
(18)

4.3 Evaluation Criteria

To evaluate the forecasting performance of the proposed model, we adopt the MAE, RMSE, and MAPE. These measures are defined as follows:

$$\begin{aligned} \hbox {MAE}= & {} N^{-1}\sum _{t=1}^{N} {\left| Y_{(t)}-{\hat{Y}}_{(t)}\right| }, \end{aligned}$$

(19)

$$\begin{aligned} \hbox {RMSE}= & {} \left( N^{-1}\sum _{t=1}^{N} (Y_{(t)}-{\hat{Y}}_{(t)})^{2}\right) ^{1/2}, \end{aligned}$$

(20)

$$\begin{aligned} \hbox {MAPE}= & {} N^{-1}\sum _{t=1}^{N}{\left| (Y_{(t)} -Y_{(t)})/{\hat{Y}}_{(t)}\right| }, \end{aligned}$$

(21)

where $Y_{(t)}$ and ${\hat{Y}}_{(t)}$ are the actual and prediction values, respectively, at time t, and N is the sample size. Note that MAE, RMSE, and MAPE are the measures of the deviation between actual and prediction values. Therefore, improved forecasting performance occurs when the values of these measures are small. However, if the results are not consistent among these criteria, we choose MAPE as suggested by Makridakis (1993) as the benchmark because MAPE is relatively more stable than are other criteria.

4.4 Implementation of EMD

Based on the previous steps described in Sect. 3, we conducted prediction experiments. First, using the EMD technique, the four exchange rate series (USD/TWD, EUD/TWD, GBP/TWD, AUD/TWD) were decomposed into 10 IMFs (IMF1–IMF10) and one residual (Residual), as shown in Fig. 2. All the extracted IMF components are graphically illustrated in the order in which they were extracted. The order of frequency (or period) from the highest frequency to the lowest is indicated. The last component is the residual of sifting. This generally represents the trend of the time series. In this study, EMD components were obtained by using the HHT MATLAB program (http://rcada.ncu.edu.tw/research1_clip_program.htm).

4.5 Implementation of PSR

In the PSR stage, MI was used to select the optimal delay time $\tau $, which was selected based on the first minimum value of the MI function. After the optimal $\tau $ was selected, FNN (was then used to extract the minimum embedding dimension. Table 1 shows the optimal m and $\tau $ for each IMF and the residual. These optimal embedding dimensions and delay times are used to construct the input matrix (X). The data were fed to ELM forecast models and set up for each IMF and the residual. The final prediction results of the EMD-PSR-ELM model were obtained by compositing (i.e., combining separate prediction values into one value). We used Hao Cheng’s Fractal MATLAB toolbox to implement the MI and FNN functions.

Table 1 Optimal m and $\tau $ for each IMF and residual

Full size table

4.6 Forecasting Results and Analysis

To compare the performance of different models, we first applied the benchmarks, Naïve RW, ELM, PSR $+$ ELM, and EMD $+$ ELM, to forecast the four exchange rates, respectively. The performance comparison of five models (Naïve RW, ARIMA, back propagation neural network (BPNN), ELM, PSR $+$ ELM, EMD $+$ ELM, and the EMD $+$ PSR $+$ ELM) according to three evaluation criteria (MAPE, MAE and RMSE) is reported in Table 2. Relative errors defined as “the ratio of error to the actual value” of the five models are shown in Fig. 3.

The empirical analysis confirms that the performance of EMD $+$ PSR $+$ ELM is the best among the five models with respect to the four exchange rates. The empirical results demonstrated the usefulness of the two-stage data preprocessing (stage 1 EMD, stage 2 PSR) of the ELM model we proposed. We can observe some phenomenon in Fig. 3 to identify possible superiority. In the high-frequency points, relative errors of the hybrid model are much smaller than in other models. This observation demonstrates that the EMD method can reduce noise contained in time series and can thus enhance accuracy.

The average error of pure ELM was the worst for MAPE, MAE, and RMSE in the four exchange rates. It was even worse than the Naïve RW in nearly all measures. This indicates that single ELM is unsuitable for exchange rate time series forecasting. However, if we combine the EMD data processing method into a single ELM model, that is, EMD $+$ ELM model, its performance would improve.

Table 2 reveals that the accurate rates in the PSR $+$ ELM model for the four exchange rates are not better than those in the single ELM model, even is the worst in AUD/TWD exchange rate dataset. The optimal embedding dimension m and delay time $\tau $ for the four exchange rates by using the PSR method are presented in the Tables ($m=1, \tau =7$), ($m=1, \tau =7$), ($m=1, \tau =7$) and ($m=6, \tau =7$). When $m=1$, the input matrix (X) constructed is the same as that does not use the PSR method. Therefore, the performance for the three exchange rates, USD/TWD, EUD/TWD, and GBP/TWD, is the same as that presented in Table 2. We have no sufficient evidence to demonstrate that PSR does not enhance accuracy for exchange rate forecasting.

Furthermore, to demonstrate the effectiveness of the EMD $+$ PSR $+$ ELM model, we also compared ARIMA and BPNN, the most popular neural network modules, to our model. Experimental results also revealed that our model outperforms these other two with respect to MAPE, MAE, and RMSE criteria for the four data sets. This information is also shown in Table 2. It proved the strong robustness of our proposed hybrid model. Optimal parameters for ARIMA, BPNN, and PSR $+$ ELM for the four data sets are shown in Table 3.

Table 2 Model performance for different exchange rate datasets

Full size table

5 Conclusions

Designing an appropriate model to forecast financial data is a major challenge for time series analysts and researchers. This is mainly because the irregular movements and several changing turning points of these series are practically too difficult to understand and predict. In this study, a new hybrid model that intelligently combines the EMD, PSR and ELM models (EMD $+$ PSR $+$ ELM), is proposed to forecast exchange rates. From the experimental results of this study, we can draw the following conclusions:

(1)
EMD can fully capture the local fluctuations of data and can be used as a preprocessor to decompose the complicated raw data into a finite set of IMFs and a residue, which can improve rate predicting accuracy.
(2)
The network topology of the model has a major influence on prediction performance for ELM. It is more objective in identifying the chaotic characteristics of exchange rate time series and determining the embedding dimension of the reconstructed phase space through the FNN function. The determined embedding dimension can then be served as the numbers of nodes in the input layer for the SLFN.
(3)
Empirical results from four real-world exchange rate time series clearly suggest that our hybrid method substantially improves the overall accuracy of forecasting and also outperformed both a statistical model (Naïve RW) and an artificial intelligence model (ELM). Therefore, the proposed method is extremely suitable for prediction using nonlinear, nonstationary, and highly complex data and is an efficient method for exchange rate prediction.

Table 3 Optimal parameter for ARIMA, BPNN, PSR $+$ ELM

Full size table

Future research should consider the property of data in order to combine time series and AI method. Direction prediction criteria are crucial to the trading strategies of investors. In our model, we select only one-dimensional time series of exchange rates for input variables. Future research might attempt to enhance the performance of prediction models by including other efficient input variables such as macroeconomic variables and using diverse data for feasibility. One possibility might try to find important input variables by adopting some strong or emerging mathematical methods, such as MARS, CMARS for building more perfect integrated model. In addition, the relationships between and trading information about different markets might be examined.

References

Adhikari, R., & Agrawal, R. K. (2013). A combination of artificial neural network and random walk models for financial time series forecasting. Neural Computing and Applications, 24(6), 1441–1449.
Article Google Scholar
Babovic, V., Keijzer, M., & Bundzel, M. (2000). From global to local modelling: A case study in error correction of deterministic models. In Proceedings of Fourth International Conference on Hydroinformatics, Iowa City, USA: CD-ROM, IAHR
Bao, Y. K., Xiong, T., & Hu, Z. Y. (2012). Forecasting air passenger traffic by support vector machines with ensemble empirical mode decomposition and slope-based method. Discrete Dynamics in Nature and Society, 2012
Barkoulas, J., & Travos, N. (1998). Chaos in an emerging capital market? The case of Athens stock exchange. Applied Financial Economics, 8, 231–243.
Article Google Scholar
Chen, C. F., Lai, M. C., & Yeh, C. C. (2012). Forecasting tourism demand based on empirical mode decomposition and neural network. Knowledge-Based Systems, 26, 281–287. doi:10.1016/j.knosys.2011.09.002.
Article Google Scholar
Chen, F. L., & Ou, T. Y. (2011). Sales forecasting system based on Gray extreme learning machine with Taguchi method in retail industry. Expert Systems with Applications, 38(3), 1336–1345. doi:10.1016/j.eswa.2010.07.014.
Article Google Scholar
Chen, K. L., Yeh, C. C., & Lu, T. L. (2012). A hybrid demand forecasting model based on empirical mode decomposition and neural network in TFT-LCD industry. Cybernetics and Systems, 43(5), 426–441.
Article Google Scholar
Deng, S. K., Yoshiyama, K., Mitsubuchi, T., & Sakurai, A. (2015). Hybrid method mf Multiple kernel learning and genetic algorithm for forecasting short-term foreign exchange rates. Computational Economics, 45(1), 49–89. doi:10.1007/s10614-013-9407-6.
Article Google Scholar
GEP, B., & GM, J. (1970). Time Series Analysis Forecasting and Control (3rd ed.). CA: Holden-Day.
Gimore, C. G. (2001). An examination of nonlinear dependence in exchange rates, using recent methods from chaos theory. Global Finance Journal, 12, 139–151.
Article Google Scholar
Hadavandi, E., Shavandi, H., & Ghanbari, A. (2010). Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowledge-Based System, 23(8), 800–808.
Article Google Scholar
Huang, G. B., Chen, L., & Siew, C. K. (2006a). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks, 17(4), 879–892. doi:10.1109/TNN.2006.875977.
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006b). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. doi:10.1016/j.neucom.2005.12.126.
Article Google Scholar
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A, 454(1971), 903–995.
Article Google Scholar
Huang, N. E., Wu, M. L. C., Long, S. R., Shen, S. S. P., Qu, W., Gloersen, P., et al. (2003). A confidence limit for the empirical mode decomposition and Hilbert spectral analysis. Proceedings of the Royal Society A, 459(2037), 2317–2345.
Article Google Scholar
Huang, S. C., Chuang, P. J., Wu, C. F., & Lai, H. J. (2010). Chaos-based support vector regressions for exchange rate forecasting. Expert Systems with Applications, 37(12), 8590–8598. doi:10.1016/j.eswa.2010.06.001.
Article Google Scholar
Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304(5667), 78–80.
Article Google Scholar
Kazem, A., Sharifi, E., Hussain, F. K., Saberi, M., & Hussain, O. K. (2013). Support vector regression with chaos-based firefly algorithm for stock market price forecasting. Applied Soft Somputing, 13(2), 947–958. doi:10.1016/j.asoc.2012.09.024.
Article Google Scholar
Kennel, M. B., Brown, R., & Abarbanel, H. D. I. (1992). Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Review A, 45(6), 3403–3411.
Article Google Scholar
Lemke, C., & Gabrys, B. (2010). Meta-learning for time series forecasting and forecast combination. Neurocomputing, 73(10–12), 2006–2016.
Article Google Scholar
Liong, S. Y., & Sivapragasam, C. (2002). Flood stage forecasting with SVM. Journal of the American Water Resources Association, 38(1), 173–186.
Article Google Scholar
Liu, H., & Wang, J. (2011). Integrating independent component analysis and principal component analysis with neural network to predict chinese stock market. Mathematical Problems in Engineering, 2011, 15.
Google Scholar
Lu, C. J. (2010). Integrating independent component analysis-based denoising scheme with neural network for stock price prediction. Expert Systems with Applications, 37(10), 7056–7064.
Article Google Scholar
Lu, C. J., Lee, T. S., & Chiu, C. C. (2009). Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems, 47(2), 115–125. doi:10.1016/j.dss.2009.02.001.
Article Google Scholar
Lu, C. J., & Shao, Y. E. (2012). Forecasting computer products sales by integrating ensemble empirical mode decomposition and extreme learning machine. Mathematical Problems in Engineering, 2012, 15.
Google Scholar
Makridakis, S. (1993). Accuracy measures: Theoretical and practical concerns. International Journal of Forecasting, 9(4), 527–529.
Article Google Scholar
McKenzie, M. D. (2001). Chaotic behavior in national stock market indices: New evidence from the close returns test. Global Finance Journal, 12, 35–53.
Article Google Scholar
Meese, R., & Rogoff, K. (1983). Empirical exchange rate models of the seventies: Do they fit out of sample? Journal of International Economics, 14(1–2), 3–24.
Article Google Scholar
Rao, C. R., & Mitra, S. K. (1971). Generalized inverse of matrices and its applications. New York: Wiley.
Google Scholar
Sauer, T., Yorke, J. A., & Casdagli, M. (1991). Embedology. Journal of Statistical Physics, 65(3/4), 579–616.
Article Google Scholar
Sun, Y. (2005). Exchange rate forecasting with an artificial neural network model: Can we beat a random walk model?. New Zealand: Lincoln University.
Google Scholar
Takens, F. (1981). Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, 898, 366–381.
Google Scholar
Tyree, A., & Long, J. (1995). Forecasting currency exchange rates: neural networks and the random walk model. In Proceedings of the Third International Conference on Artificial Intelligence Applications. New York: Wall Street
Vasilakis, G. A., Theofilatos, K. A., Georgopoulos, E. F., Karathanasopoulos, A., & Likothanassis, S. D. (2013). A genetic programming approach for EUR/USD exchange rate forecasting and trading. Computational Economics, 42(4), 415–431. doi:10.1007/s10614-012-9345-8.
Article Google Scholar
Wang, J. D., & Qi, W. G. (2009). Prediction of river water turbidity based on EMD-SVM. Acta Electronica Sinica, 37(10), 2130–2133.
Google Scholar
Wang, J. J., Wang, J. Z., Zhang, Z. G., & Guo, S. P. (2012). Stock index forecasting based on a hybrid model. Omega, 40(6), 758–766. doi:10.1016/j.omega.2011.07.008.
Article Google Scholar
Wu, J. L., & Chang, P. C. (2012). A trend-based segmentation method and the support vector regression for financial time series forecasting. Mathematical Problems in Engineering, 2012, 1–20.
Google Scholar
Xia, M., Zhang, Y., Weng, L., & Ye, X. (2012). Fashion retailing forecasting based on extreme learning machine with adaptive metrics of inputs. Knowledge-Based Systems, 36, 253–259. doi:10.1016/j.knosys.2012.07.002.
Article Google Scholar
Xuan, Z. Y., & Yang, G. X. (2008). Application of EMD in the atmosphere time series prediction. Acta Automat Sinica, 34(1), 97–101.
Article Google Scholar
Yang, Y. F., Bao, Y. K., Hu, Z. Y., & Zhang, R. (2010). Crude oil price prediction based on empirical mode decomposition and support vector machines. Chinese Journal of Management, 7(12), 1884–1889.
Google Scholar
Ye, L., & Liu, P. (2011). Combined model based on EMD-SVM for short-term wind power prediction. Proceedings of the Chinese Society of Electrical Engineering, 31(31), 102–108.
Google Scholar
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159–175.
Article Google Scholar
Zhang, Y., & Wu, L. (2009). Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network. Expert Systems with Applications, 36(5), 8849–8854. doi:10.1016/j.eswa.2008.11.028.
Article Google Scholar
Zhiqiang, G., Huaiqing, W., & Quan, L. (2012). Financial time series forecasting using LPP and SVM optimized by PSO. Soft Computing, 17(5), 805–818.
Article Google Scholar
Zhu, B. Z., & Wei, Y. M. (2013). Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology. Omega, 41, 517–524.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Management Information Systems, National Chengchi University, Taipei, Taiwan
Heng-Li Yang & Han-Chou Lin

Authors

Heng-Li Yang
View author publications
You can also search for this author in PubMed Google Scholar
Han-Chou Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng-Li Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, HL., Lin, HC. Applying the Hybrid Model of EMD, PSR, and ELM to Exchange Rates Forecasting. Comput Econ 49, 99–116 (2017). https://doi.org/10.1007/s10614-015-9549-9

Download citation

Accepted: 07 December 2015
Published: 17 December 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10614-015-9549-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Applying the Hybrid Model of EMD, PSR, and ELM to Exchange Rates Forecasting

Abstract

Similar content being viewed by others

A Hybrid EMD-ANN Model for Stock Price Prediction

Designing of Financial Time Series Forecasting Model Using Stochastic Algorithm Based Extreme Learning Machine

RETRACTED ARTICLE: Application of support vector neural network with variational mode decomposition for exchange rate forecasting

1 Introduction