1 Introduction

Time series prediction is one of the important research and application areas. The prediction results of time series can be applied to different areas, such as business, engineering, economics, weather and stock market forecasting, inventory and production control, signal processing, and many other fields. Exchange rates are among the most important economic indices in the international monetary markets and play an important role in controlling dynamics of exchange market. The large business companies, which are engaged to the trading, use foreign investment and make currency transfers in the course of business. Exchange rates are affected by many economic, political, and even psychological factors. Exchange rate series is characterized by complexity, volatility, and unpredictability. Many times exchange rate series is high-order nonlinear. Accurate forecasting of exchange rate movements can result in substantial improvement in the firm’s overall profitability and very important for the success of many business and fund managers. Numerous techniques have been developed to explore this nonlinearity and improve the accuracy of prediction of exchange rate. These are well-known Box–Jenkins method [1], autoregressive random variance (ARV) model [2], autoregressive conditional heteroscedasticity (ARCH) [3], and general autoregressive conditional heteroscedasticity (GARCH) [4]. While these models may be good for a particular situation, they does not give satisfactory results for the nonlinear time series [5]. The applied traditional methods used for prediction are based on technical analysis of time series, such as looking for trends, stationarity, seasonality, random noise variation, moving average. Most of them are linear approaches that have shortcomings.

Hence, the idea of applying nonlinear models, like soft-computing technologies, such as neural networks, fuzzy systems, genetic algorithms, has become important for time series prediction. These methods have shown clear advantages over the traditional statistical ones [6, 7]. Recently, neural networks [811], radial-based networks [1214], and neuro-fuzzy networks [15, 16] are widely used for time series forecasting. In [17], it was found that neural networks are better than random walk models in predicting exchange rates. In [18, 19], a multilayer perceptron network is applied to predict the exchange rate between USD and DEM. In [2022], different neural network structures are used to forecast exchange rates.

In this paper, to increase prediction accuracy and reduce search space and time for achieving the optimal solution, the combination of soft-computing technologies, such as wavelet neural networks with a fuzzy knowledge base is used for time series prediction, in particular, for the prediction of exchange rate between USA Dollars and Turkish Lira (USA/TL).

Fuzzy technology is an effective tool for dealing with complex, nonlinear processes characterized with ill-defined and uncertain factors. Traditionally, to develop a fuzzy system, human experts often carry out the generation of IF–THEN rules by expressing their knowledge. In the case of complicated processes, it is difficult for human experts to test all the input–output data and find necessary rules for the fuzzy system. To solve this problem and simplify the generating of IF–THEN rules, several approaches have been applied [23, 24]. Nowadays, the use of neural networks (NNs) takes more importance for this purpose. In this paper, the integration of NN and wavelet function is considered. Wavelet function is a waveform that has limited duration and an average value of zero. The integration of the localization properties of wavelets and the learning abilities of NN shows advantages of wavelet neural networks (WNNs) over NN in complex nonlinear system modeling. A WNN that uses wavelet functions has been proposed by researchers for solving approximation and classification problems [2528]. Wavelet neural networks are used for the prediction of chaotic time series [29, 30], for short term and long-term prediction of electricity load [31, 32]. The wavelet analysis approximates the decomposed time series at different levels of resolution. Fuzzy wavelet neural network (FWNN) combines wavelet theory, fuzzy logic, and neural networks. The synthesis of fuzzy wavelet neural inference system includes the finding of the optimal definitions of the premise and consequent part of fuzzy IF–THEN rules through the training capability of wavelet neural networks, evaluating the error response of the system. A combination of fuzzy technology and WNN has been considered for solving signal-processing and control problems [3337]. Wavelet network model of fuzzy inference system [33, 34], fuzzy systems with linear combination of the basis function [35, 36] are proposed. Thuillard [33] proposed to choose the membership functions from the family of scaling functions and to construct the fuzzy system using wavelet techniques. Fuzzy wavelet network that includes combinations of three subnets: pattern recognition subnet, fuzzy reasoning subnet, and control synthesis subnet is introduced [36]. The FWNN structure that is constructed on the base of a set of fuzzy rules is proposed and used for approximating nonlinear functions [38]. The FWNN-based controller is developed for the control of dynamic plants [39, 40] and prediction of electricity consumption [41, 42]. The combination of wavelet network and fuzzy logic allows us to develop a system that has fast training speed and to describe nonlinear objects that are characterized with uncertainty. Wavelet transform has the ability to analyze nonstationary signals to discover their local details. Fuzzy logic allows us to reduce the complexity of the data and to deal with uncertainty. Neural network has a self-learning characteristic that increases the accuracy of the prediction. In this paper, these methodologies are used to construct fuzzy wavelet neural inference system to solve exchange rates prediction problem.

During design of FWNN system, one of the important problems is its learning and convergence. Recently, a number of different approaches have been used for the design of fuzzy neural network systems. These are the clustering techniques [4347], the table look-up scheme [48, 49], the least-squares method (LSM) [24, 47], gradient algorithms [24, 3942], and genetic algorithms [24, 42, 50]. Abiyev [42] uses combination of gradient and genetic algorithms for FWNN design. Genetic algorithms can find global optimal solution of the problem but needs more time for parameter updating. In this paper, to decrease learning time, the fuzzy clustering with the gradient algorithm is applied to design the FWNN prediction system, which will be demonstrated in Sect. 3.

The paper is organized as follows: Sect. 2 presents the structure of FWNN prediction model. Section 3 presents parameter update rules of the FWNN system. The descriptions of the fuzzy c-means clustering algorithm and gradient descent algorithm for learning of FWNN are given. Section 4 contains simulation results of the FWNN used for prediction of chaotic time series. The application of developed structure is used for prediction of exchange rates. Comparative results of different models for time series prediction are given. Finally, a brief conclusion is presented in Sect. 5.

2 Fuzzy wavelet neural network

The kernel of a fuzzy inference system is its fuzzy knowledge base. In a fuzzy knowledge base, the information that consists of input–output data points of the system is interpreted into linguistically interpretable fuzzy rules that have IF–THEN form. Fuzzy systems are generally designed using either Mamdani or Takagi–Sugeno–Kanag (TSK) type IF–THEN rules. In the former type, both the antecedent and the consequent parts utilize fuzzy values. The TSK type fuzzy rules utilize fuzzy values in the antecedent part, crisp values or often linear functions in the consequent part. In many research works, it has been shown that TSK type fuzzy neural systems can achieve a better performance than the Mamdani type fuzzy neural systems in learning accuracy [24, 51]. The theory and design methodologies of Mamdani and TSK fuzzy systems are presented in [23, 24, 5052]. This paper presents fuzzy wavelet neural network that integrates wavelet functions with the TSK fuzzy model. The consequent parts of TSK type fuzzy IF–THEN rules are represented by either a constant or a function. As a function, most of the fuzzy and neuro-fuzzy models use linear functions. Neuro-fuzzy systems can describe the considered problem by means of combination of linear functions. Sometimes these systems need more rules for modeling complex nonlinear processes in order to obtain the desired accuracy. Increasing the number of the rules leads to increasing number of neurons in the hidden layer of the network. To improve the computational power of the neuro-fuzzy system, we use wavelets in the consequent part of each rule. In this paper, the fuzzy rules that are constructed by using wavelets are used. They have the following form.

$$ \begin{gathered} {\text{If}}\,x_{ 1} \,{\text{is}}\,A_{ 1 1} \,{\text{and}}\,x_{ 2} \,{\text{is}}\,A_{ 1 2} \,{\text{and}}\, \ldots \,{\text{and}}\,x_{m} \,{\text{is}}\,A_{ 1m} ,\,{\text{then}}\,y_{ 1} \,{\text{is}}\,\sum\limits_{i = 1}^{m} {w_{i1} (1 - z_{i1}^{2} ){\text{e}}^{{ - {\frac{{z_{i1}^{2} }}{2}}}} } \, \hfill \\ {\text{If}}\,x_{ 1} \,{\text{is}}\,A_{ 2 1} \,{\text{and}}\,x_{ 2} \,{\text{is}}\,A_{ 2 2} \,{\text{and}}\, \ldots \,{\text{and}}\,x_{m} \,{\text{is}}\, \, A_{ 2m} ,\,{\text{then}}\,y_{ 2} \,{\text{is}}\,\sum\limits_{i = 1}^{m} {w_{i2} (1 - z_{i2}^{2} ){\text{e}}^{{ - {\frac{{z_{i2}^{2} }}{2}}}} } \hfill \\ \cdots \hfill \\ {\text{If}}\,x_{ 1} \,{\text{is}}\,A_{n 1} \,{\text{and}}\,x_{ 2} \,{\text{is}}\,A_{n 2} \,{\text{and}}\, \ldots \,{\text{and}}\,x_{m} \,{\text{is}}\,A_{nm} ,\,{\text{then}}\,y_{n} \,{\text{is}}\,\sum\limits_{i = 1}^{m} {w_{in} (1 - z_{in}^{2} ){\text{e}}^{{ - {\frac{{z_{in}^{2} }}{2}}}} } \hfill \\ \end{gathered} $$
(1)

Here, x 1, x 2, …, x m are input variables, y 1, y 2, …, y n are output variables that include Mexican Hat wavelet functions, A ij is a membership function for the i-th rule of the j-th input defined as Gaussian membership function. N is number of fuzzy rules. Conclusion parts of rules contain wavelet neural networks (WNNs). The WNNs include wavelet function. Wavelets are defined in the following form

$$ \Uppsi_{j} (x) = {\frac{1}{{\sqrt {\left| {a_{j} } \right|} }}}\psi \left( {{\frac{{x - b_{j} }}{{a_{j} }}}} \right),\quad a_{j} \ne 0,\;\;j = 1, \ldots ,n $$
(2)

Ψ j (x) represents the family of wavelet obtained from the single Ψ(x) function by dilations and translations, where \( a_{j} = \{ a_{1j} ,a_{2j} , \ldots ,a_{mj} \} \,{\text{and}} \,b_{j} = \{ b_{1j} ,b_{2j} , \ldots ,b_{mj} \} \) are the dilation and translation parameters, respectively. x = {x 1x 2, …, x m } are input signals. Ψ(x) is localized in both time space and frequency space and is called a mother wavelet. The output of WNN is calculated as

$$ y = \sum\limits_{j = 1}^{k} {w_{j} \Uppsi_{j} (x) = } \sum\limits_{j = 1}^{k} {w_{j} \left| {a_{j} } \right|^{{ - \frac{1}{2}}} \psi (a_{j}^{ - 1} x - d_{j} )} $$
(3)

Here, \( d_{j} = a_{j}^{ - 1} *b_{j} \). Ψ j (x) is the wavelet function of j-th unit of the hidden layer, w j are weight coefficients between input and hidden layers, a i and b j are parameters of the wavelet function. Wavelet networks include wavelet functions in the neurons of hidden layer of network. WNN has good generalization ability, can approximate complex functions to some precision very compactly, and can be easily trained than other networks, such as multilayer perceptrons and radial-based networks [32, 53]. A good initialization of the parameters of WNNs allows to obtain fast convergence. A number of methods are implemented for initializing wavelets, such as orthogonal least square procedure, clustering method [25, 26]. The optimal dilation and translation of the wavelet increases training speed and obtains fast convergence. The approximation and convergence properties of WNN are presented in [27].

In formula (1), fuzzy rules provide the influence of each WNN to the output of FWNN. The use of WNN with different dilation and translation values allows to capture different behaviors and essential features of the nonlinear model under these fuzzy rules. The proper fuzzy model that is described by IF–THEN rules can be obtained by learning-dilation and translation parameters of conclusion parts and the parameters of membership function of premise parts. Here, because of the use of wavelets, the computational strength and generalization ability of FWNN are improved, and FWNN can describe the nonlinear processes with desired accuracy.

The structure of fuzzy wavelet system is given in Fig. 1. The fuzzy wavelet neural network includes six layers. In the first layer, the number of nodes is equal to the number of input signals. These nodes are used for distributing input signals. In the second layer, each node corresponds to one linguistic term. For each input signal entering to the system, the membership degree to which input value belongs to a fuzzy set is calculated. To describe linguistic terms, the Gaussian membership function is used.

$$ \eta_{j} (x_{i} ) = {\text{e}}^{{ - {\frac{{(x_{i} - c_{ij} )^{2} }}{{2\sigma_{ij}^{2} }}}}} \quad i = 1, \ldots ,m,\;j = 1, \ldots ,n $$
(4)

Here, m is the number of input signals, n is the number of fuzzy rules (hidden neurons in third layer). \( c_{ij} \,{\text{and}}\,\sigma_{ij} \) are center and width of the Gaussian membership functions, respectively. η j (x i ) is membership function of i-th input variable for j-th term.

Fig. 1
figure 1

Structure of Fuzzy wavelet neural network

In the third layer, the number of nodes corresponds to the number of rules R 1, R 2, …, R n . Each node represents one fuzzy rule. The number of fuzzy rules and the number of membership functions are determined using clustering algorithm which is described Sect. 3. The third layer realizes the inference engine. In this layer, the t-norm prod operator is applied to calculate the membership degree of the given input signals for each rule.

$$ \mu_{j} (x) = \eta_{j} (x_{1} ) * \eta_{j} (x_{2} ) * \cdots * \eta_{j} (x_{m} ),\quad j = 1, \ldots ,n $$
(5)

Here, * is t-norm prod operator.

These μ j (x) signals are input signals for the next layer. This layer is a consequent layer. It includes n wavelet neural networks that are denoted by WNN1, WNN2, …, WNN n . In the fifth layer, the output signals of third layer are multiplied by the output signals of wavelet networks. The output of j-th wavelet network is calculated as

$$ y_{j} = w_{j} \Uppsi_{j} (z);\quad \Uppsi_{j} (z) = \sum\limits_{i = 1}^{m} {{\frac{1}{{\sqrt {\left| {a_{ij} } \right|} }}}(1 - z_{ij}^{2} ){\text{e}}^{{ - {\frac{{z_{ij}^{2} }}{2}}}} } $$
(6)

Here, \( {\text{z}}_{ij} = {\frac{{x_{i} - b_{ij} }}{{a_{ij} }}} \). Here, \( a_{ij} \,{\text{and}}\,b_{ij} \, \) are parameters of the wavelet function between i-th (i = 1,,n) input and j-th (j = 1,,m) WNN. In sixth and seventh layers, defuzzification is made to calculate the output of whole network. In this layer, the contribution of each WNN to the output of the FWNN is determined.

$$ u = {{\sum\limits_{j = 1}^{n} {\mu_{j} (x)y_{j} } } \mathord{\left/ {\vphantom {{\sum\limits_{j = 1}^{n} {\mu_{j} (x)y_{j} } } {\sum\limits_{j = 1}^{n} {\mu_{j} (x)} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j = 1}^{n} {\mu_{j} (x)} }} $$
(7)

Here, y j are the output signals of wavelet neural networks.

After calculating the output signal of the FWNN, the training of the network is started. The training includes the adjustment of the parameters of the membership functions c ij and σ ij in the second layer and the parameters of the wavelet functions w j , a ij , b ij , (i = 1,…,m, j = 1,…,n) of network in the fourth layer. In the next section, the learning of type-2 FWNN is derived.

3 Parameter update rules

The design of FWNN (Fig. 1) includes determination of the unknown parameters that are the parameters of the antecedent and the consequent parts of the fuzzy IF–THEN rules (1). In the antecedent parts, the input space is divided into a set of fuzzy regions, and in the consequent parts the system behavior in those regions is described. As mentioned earlier, recently, a number of different approaches have been used for designing fuzzy IF–THEN rules based on clustering [4347], the table look-up scheme [48, 49], the least-squares method (LSM) [24, 47], gradient algorithms [24, 3942], and genetic algorithms [24, 42, 50]. In [47], the use of LSM and clustering algorithm is used for updating parameters of the neuro-fuzzy inference system. In this paper, the fuzzy clustering is applied to design the antecedent (premise) parts, and the gradient algorithm is applied to design the consequent parts of the fuzzy rules. Fuzzy clustering is an efficient technique for constructing the antecedent structures. The aim of clustering methods is to identify a certain group of data from a large data set, such that a concise representation of the behavior of the system is produced. Each cluster center can be translated into a fuzzy rule for identifying the class. Different clustering algorithms are developed. For fuzzy systems, subtractive clustering [44, 46] and fuzzy c-means [43] clustering have been developed recently. Subtractive clustering [44] is an extension of the grid-based mountain clustering [45]. It is unsupervised clustering, in which the number of clusters for input data points is determined by the clustering algorithm. Sometimes we need to control the number of clusters in an input space. In these cases, the supervised clustering algorithms are of primary concern. Fuzzy c-means clustering is one of them. It can efficiently be used for fuzzy systems [43] with simple structure and sufficient accuracy. In this paper, the fuzzy c-means clustering technique is used for structuring the premise part of the fuzzy system.

At the first step, fuzzy c-means classification is applied in order to partition input data and construct antecedent part of fuzzy IF–THEN rules. Fuzzy c-means classification is based on minimization of the following objective function:

$$ J_{m} = \sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{C} {u_{ij}^{m} d_{ji}^{2} ,\quad {\text{where}}\,} d_{ji} = \left\| {x_{i} - c_{j} } \right\|,\;\; 1\le m < \infty } $$
(8)

where m is any real number greater than 1, u ij is the degree of membership of x i in the cluster j, x i is the i-th of d-dimensional measured data, c j is the k-dimension center of the cluster, and ||*|| is any norm expressing the similarity between any measured data and the center.

Fuzzy partitioning is carried out through an iterative optimization of the objective function shown earlier, with the update of membership u ij and the cluster centers c j . The algorithm is composed of the following steps.

  1. 1.

    Initialize U = [u ij ] matrix, U (0)

  2. 2.

    Calculate the centers vectors C (t= [c j ] with U (t)

    $$ c_{j} = {{\left( {\sum\limits_{i = 1}^{N} {u_{ij}^{m} \cdot x_{i} } } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{i = 1}^{N} {u_{ij}^{m} \cdot x_{i} } } \right)} {\sum\limits_{i = 1}^{N} {u_{ij}^{m} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{i = 1}^{N} {u_{ij}^{m} } }} $$
  3. 3.

    Update U (t) , U (t+1)

    $$ u_{ij} = {1 \mathord{\left/ {\vphantom {1 {\sum\limits_{k = 1}^{C} {\left( {{\frac{{d_{ik} }}{{d_{jk} }}}} \right)^{{{\frac{2}{m - 1}}}} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{k = 1}^{C} {\left( {{\frac{{d_{ik} }}{{d_{jk} }}}} \right)^{{{\frac{2}{m - 1}}}} } }} $$
  4. 4.

    If \( \left\{ {\left| {U^{(t + 1)} - U^{(t)} } \right|} \right\} < \varepsilon \) then stop; otherwise set t = t + 1 and return to step 2.

After partition, each cluster center will correspond to center of the membership function used in input layer of type-2 FWNN. The width of the membership function is determined using distance between cluster centers.

After the design of the antecedents parts by fuzzy clustering, the gradient descent algorithm is applied to design the consequent parts of the fuzzy rules. In what follows, the parameter update rules are derived for the FWNN. At first, the output error is calculated. At the beginning, the parameters of consequent parts FWNN are generated randomly. To generate a proper FWNN model, the training of the parameters has been carried out. In this paper, we applied gradient learning with adaptive learning rate. The adaptive learning rate guarantees the convergence and speeds up the learning of the network.

At first, on the output of network, the value of cost function is calculated.

$$ E = \frac{1}{2}\sum\limits_{i = 1}^{O} {(u_{i}^{d} - u_{i} )^{2} } $$
(9)

Here, O is the number of output signals of the network (in given case O = 1), \( u_{i}^{d} {\text{ and }}u_{i} \) are desired and current output values of the network, respectively. Using gradient algorithm, the parameters w j , a ij , b ij , (i = 1,…,m, j = 1,…,n) of wavelet neural network and parameters of membership function \( {\text{c}}_{ij} {\text{ and }}\sigma_{ij} \) (i = 1,…,m, j = 1,…,n) of neuro-fuzzy structure are adjusted using the following formulas.

$$ \begin{gathered} w_{j} (t + 1) = w_{j} (t) + \gamma {\frac{\partial E}{{\partial w_{j} }}};\quad a_{ij} (t + 1) = a_{ij} (t) + \gamma {\frac{\partial E}{{\partial a_{ij} }}};\quad b_{ij} (t + 1) = b_{ij} (t) + \gamma {\frac{\partial E}{{\partial b_{ij} }}}; \hfill \\ \, \hfill \\ \end{gathered} $$
(10)
$$ c_{ij} (t + 1) = c_{ij} (t) + \gamma {\frac{\partial E}{{\partial c_{ij} }}},\quad \sigma_{ij} (t + 1) = \sigma_{ij} (t) + \gamma {\frac{\partial E}{{\partial \sigma_{ij} }}}, $$
(11)

Here, γ is the learning rate, i = 1,…,m, j = 1,…,n, m is the number of input signals of the network (input neurons), and n is the number of fuzzy rules (hidden neurons).

The values of derivatives in (10) are computed using the following formulas.

$$ \begin{aligned} {\frac{\partial E}{{\partial w_{j} }}} & = {\frac{\partial E}{\partial u}}{\frac{\partial u}{{\partial y_{j} }}}{\frac{{\partial y_{j} }}{{\partial w_{j} }}} = {{(u(t) - u^{d} (t)) \cdot \mu_{j} \cdot \psi_{j} (z)} \mathord{\left/ {\vphantom {{(u(t) - u^{d} (t)) \cdot \mu_{j} \cdot \psi_{j} (z)} {\sum\limits_{j = 1}^{n} {\mu_{j} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j = 1}^{n} {\mu_{j} } }} \\ {\frac{\partial E}{{\partial a_{ij} }}} & = {\frac{\partial E}{\partial u}}{\frac{\partial u}{{\partial y_{j} }}}{\frac{{\partial y_{j} }}{{\partial \psi_{j} }}}{\frac{{\partial \psi_{j} }}{{\partial z_{ij} }}}{\frac{{\partial z_{ij} }}{{\partial a_{ij} }}} = {{\delta_{j} (3.5z_{ij}^{2} - z_{ij}^{4} - 0.5){\text{e}}^{{ - {\frac{{z_{ij}^{2} }}{2}}}} } \mathord{\left/ {\vphantom {{\delta_{j} (3.5z_{ij}^{2} - z_{ij}^{4} - 0.5){\text{e}}^{{ - {\frac{{z_{ij}^{2} }}{2}}}} } {\left( {\sqrt {a_{ij}^{3} } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sqrt {a_{ij}^{3} } } \right)}}, \\ {\frac{\partial E}{{\partial b_{ij} }}} & = {\frac{\partial E}{\partial u}}{\frac{\partial u}{{\partial y_{j} }}}{\frac{{\partial y_{j} }}{{\partial \psi_{j} }}}{\frac{{\partial \psi_{j} }}{{\partial z_{ij} }}}{\frac{{\partial z_{ij} }}{{\partial b_{ij} }}} = {{\delta_{j} (3z_{ij} - z_{ij}^{3} ){\text{e}}^{{ - {\frac{{z_{ij}^{2} }}{2}}}} } \mathord{\left/ {\vphantom {{\delta_{j} (3z_{ij} - z_{ij}^{3} ){\text{e}}^{{ - {\frac{{z_{ij}^{2} }}{2}}}} } {\left( {\sqrt {a_{ij}^{3} } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sqrt {a_{ij}^{3} } } \right)}} \, \\ {\text{here }}\delta_{j} & = {\frac{\partial E}{\partial u}}{\frac{\partial u}{{\partial y_{j} }}}{\frac{{\partial y_{j} }}{{\partial \psi_{j} }}} = {{(u(t) - u^{d} (t)) \cdot \mu_{j} \cdot w_{j} } \mathord{\left/ {\vphantom {{(u(t) - u^{d} (t)) \cdot \mu_{j} \cdot w_{j} } {\sum\limits_{j = 1}^{n} {\mu_{j} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j = 1}^{n} {\mu_{j} } }},\quad \, i = 1, \ldots ,m,\;j = 1, \ldots ,n \\ \end{aligned} $$
(12)

The derivatives in (11) are determined by the following formulas.

$$ {\frac{\partial E}{{\partial c_{ij} }}} = \sum\limits_{j}^{{}} {{\frac{\partial E}{\partial u}}{\frac{\partial u}{{\partial \mu_{j} }}}{\frac{{\partial \mu_{j} }}{{\partial \eta_{ij} }}}{\frac{{\partial \eta_{ij} }}{{\partial c_{ij} }}}} $$
(13)
$$ {\frac{\partial E}{{\partial \sigma_{ij} }}} = \sum\limits_{j}^{{}} {{\frac{\partial E}{\partial u}}{\frac{\partial u}{{\partial \mu_{j} }}}{\frac{{\partial \mu_{j} }}{{\partial \eta_{ij} }}}{\frac{{\partial \eta_{ij} }}{{\partial \sigma_{ij} }}}} $$
(14)

Here,

$$ {\frac{\partial E}{\partial u}} = u(t) - u^{d} (t);\quad {\frac{\partial u}{{\partial \mu_{j} }}} = {\frac{{y_{j} - u}}{{\sum\nolimits_{j = 1}^{n} {\mu_{j} } }}};\quad {\frac{{\partial \mu_{j} }}{{\partial \eta_{ij} }}} = \prod\limits_{\begin{subarray}{l} k = 1 \\ k \ne i \end{subarray} }^{N1} {\eta_{ij} } ;\quad i = 1, \ldots ,m;\;j = 1, \ldots ,n;\;k = 1, \ldots ,m $$
(15)

∏ is t-norm product operator.

$$ {\frac{{\partial \eta_{j} (x_{i} )}}{{\partial c_{ij} }}} = \eta_{j} (x_{i} ){\frac{{(x_{i} - c_{ij} )}}{{\sigma_{ij}^{2} }}};\quad {\frac{{\partial \eta_{j} (x_{i} )}}{{\partial \sigma_{ij} }}} = \eta_{j} (x_{i} ){\frac{{(x_{i} - c_{ij} )^{2} }}{{\sigma_{ij}^{3} }}} $$
(16)

The parameters of the type-2 FWNN can thus be updated using (10, 11) together with (1216).

One important problem in learning algorithms is convergence. The convergence of the gradient descent method depends on the selection of the initial values of the learning rate. Usually, this value is selected in the interval [0–1]. A large value of the learning rate may lead to unstable learning, a small value of the learning rate results in a slow learning speed. In this paper, an adaptive approach is used for updating these parameters. That is, the learning of the FWNN parameters is started with a small value of the learning rate γ. During learning, γ is increased if the value of change of error ΔE = E(t) − E(t + 1) is positive, and decreased if negative. This strategy ensures a stable learning for the FWNN, guarantees the convergence, and speeds up the learning.

The optimal value of the learning rate for each time instance can be obtained using a Lyapunov function [54]. Let γ(t) be the learning rate for the weights \( W = [w_{j} ,a_{ij} ,b_{ij} ,c_{ij} ,\sigma_{ij} ] \) of the FWNN, trained using (10) and (11). The convergence is guaranteed if the following condition is satisfied.

$$ 0 < \gamma (t) < {\frac{2}{{\left( {\mathop {\max }\limits_{t} \left\| {{\frac{\partial u(t)}{\partial W}}} \right\|} \right)^{2} }}} $$
(17)

The derivation of this condition is given in Appendix.

4 Simulation

4.1 Time series prediction

The FWNN structure and its learning algorithms are applied for modeling and predicting the future values of chaotic time series. As an example, Box and Jenkins gas furnace data (series J) [1] prediction problem is considered. The data set was taken from a combustion process of a machine—air mixture. This is a benchmark problem used for testing identification and prediction algorithms. The data set includes 296 pairs of input–output measurements. The input as x(t) is the gas flow into the furnace, and the output as y(t) is the concentration of carbon dioxide (CO2) in outlet gas. The sampling interval is 9 s. Following previous researches, the inputs of the prediction model are selected as x(t − 4) and y(t − 1), and the output is y(t).

Using statistical data, the training of FWNN has been carried out. The training input/output data for the prediction system will be a structure whose first component is the two-dimension input vector, and the second component is the predicted output.

The unknown parameters of fuzzy FWNN are the parameters of the membership functions of the second layer (σ and c) and the parameters of the wavelet function (a and b). The learning of the parameters is accomplished by using fuzzy c-means clustering and the gradient descent algorithms. In prediction, the clustering involves the determination of clusters in the input data space and the translation of these clusters into fuzzy rules. At first, the fuzzy clustering approach described in Sect. 2 is applied to the input data points of the plant in order to determine the cluster centers. In identification, the clustering involves the determination of clusters in data space for the input signals x(t − 4) and y(t − 1) and the translation of these clusters into fuzzy rules such that the model obtained is close to the identified system. The obtained cluster centers are then used in order to organize the premise parts of the fuzzy rules. It is to be noted that with such an approach, the number of parameters to be determined for the antecedent part are reduced significantly. That is to say, with two cluster centers for each input, a total of four membership functions exist with 2 parameters each and thus having 8 parameters in the antecedent part of FWNN in total. Using these four membership functions, two for each input signal, four rules could be constructed. On the other hand, if the generation of the membership functions is done completely randomly, then 8 membership functions are needed for the 4 rules, and this would require the determination of 16 parameters.

After the determination of the parameters of the antecedent part, the gradient algorithm is applied for the learning of the parameters in the consequent part. These are the parameters of the wavelet functions of the layer 4. The initial values of a and b are selected randomly, in the interval [−1, 1]. Using the parameter update rules derived earlier, they are updated for the given input signals.

During the learning, the data points were partitioned in 200 data points as a training set, and the remaining 92 points as a test set for testing the performance of the evolved model. All data set is scaled between 0 and 1. The training of FWNN is carried out for 500 epoch.

Simulation was performed for two cases: when the number of clusters is 2 and 3. At first stage, the fuzzy rules are constructed using two clusters for each input variable and taking all possible combinations. Four fuzzy rules are constructed using different combination of clusters for two inputs. After clustering of input space gradient, decent algorithm is used for learning of consequent parts of the fuzzy rules, that is the parameters of wavelet function of 4th layer of the FWNN.

For comparative analysis, the obtained results are compared with existing online models applied to the same task. As a performance criteria, the root mean square error (RMSE) is used

$$ {\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {(x_{i}^{d} - x_{i} )^{2} } } $$

During learning, the value of RMSE was 0.014. After learning, in the generalization step, the value of RMSE was 0.019. Total number of parameters will be determined as 8(premise part) + 20(consequent part) = 28. Figure 2 depicts RMSE values obtained during training. In Fig. 3, the trajectories of desired and predicted values and the prediction error for both training and checking data are shown. Here, the solid line indicates the trajectory of statistical data, and the dashed line indicates the predicted value of time series. Here, 200 data points is used as a training set, and the remaining 92 points as a test set. As shown in Fig. 3, the value of RMSE for training data (until iteration 200) is small than for checking data (after iteration 200). In second stage, the training of prediction model is carried out using 3 cluster centers for each input. Using the combination of these cluster centers, the 9 fuzzy rules were generated. Total number of parameters will be determined as 12(premise part) + 45(consequent part) = 57. After clustering input space gradient, decent algorithm is applied for learning of consequent parts of the fuzzy rules, that is parameters of wavelet function of 4th layer of FWNN. During learning, the values of RMSE for training and testing data were 0.013 and 0.016, respectively. The results of simulations for different number of fuzzy rules are given in Table 1. In the paper, because of the approach used in learning, the computational time of FWNN system would be decreased. The parameters of the antecedent part are determined using a clustering technique that takes a very small time (0.15–0.2 s). What takes time is the gradient technique. That is the training time of the FWNN is basically the time required to learn the parameters of the consequent parts. Because of the fact that clustering takes less time than the gradient technique, the training time of a FWNN that uses clustering for the antecedent part and the gradient approach for the consequent part will be less than the training time of FWNN that uses the gradient technique for both parts with the same number of rules (Table 1).

Fig. 2
figure 2

RMSE values obtained during training

Fig. 3
figure 3

Box–Jenkins time series data, model output and prediction error for training and test samples (first 200 points is used for training, second 92 data points for testing)

Table 1 Prediction results for gas furnace data

Table 2 shows the comparison of test results of different models for Box–Jenkins data series. The papers [5557] use reference fuzzy sets in premise parts; therefore, the total number of parameters are omitted and the direct comparison makes no sense.

Table 2 Comparisons of test results of different prediction models for Box–Jenkins time series

4.2 Forecasting of exchange rates

The FWNN system is applied for constructing a prediction model of exchange rate USA/TL. Exchange rates play an important role in controlling dynamics of exchange market. Exchange rate series is characterized by complexity, volatility, and unpredictability. Exchange rate series is high-order nonlinear. Prediction of exchange rate is one of the important financial problems. Appropriate prediction of exchange rate is very important for the success of many business and fund managers.

The FWNN structure and its learning algorithm are used to construct the prediction model. In the prediction problem, it is needed to predict the value of exchange rate in the near future x(t + pr) on the base of sample data points {x(t − (D − 1)Δ),…,x(t − Δ),…,x(t)}. Here, pr is the prediction step. Four input data points [x(t − 3) x(t − 2) x(t − 1) x(t)] are used as input to the prediction model. The output training data corresponds to x(t + 3). In other words, since the exchange rates is considered daily, the value that is to be predicted will be after pr = 3 day. The training input/output data for the prediction system will be a structure whose first component is the four dimension input vector and the second component is the predicted output.

To start the training, the FWNN structure is generated. It includes four input and one output neurons. As before, fuzzy classification is applied in order to partition input space and select the parameters of the premise parts, that is the parameters of Gaussian membership functions used in the second layer of FWNN. Fuzzy c-means clustering is used for the input space with 2 clusters for each input. Sixteen fuzzy rules are constructed using different combination of these clusters for four inputs. After clustering input space gradient, decent algorithm is used for learning of consequent parts of the fuzzy rules, that is parameters of wavelet function of 4th layer of FWNN. The initial values of the parameters wavelet functions are randomly generated in the interval [−1, 1] and, using the gradient algorithm derived earlier, they are updated for the given input–output training pairs. As a performance criterion RMSE is used.

For training of the system, the statistical data describing daily exchange rates from January 2007 to April 2009 are considered. The data set consists of 557 data. The 500 data are used for training, and next 50 data are used for diagnostic testing. All input and output data are scaled in the interval [0, 1]. The training is carried out for 500 epochs. The values of the parameters of the FWNN system were determined at the conclusion of training. Once the FWNN has been successfully trained, it is then used for the prediction of the daily exchange rates. During learning, the value of RMSE was 0.0164. After learning, for the test data the value of RMSE was 0.0226. Figure 4 depicts RMSE values obtained during training.

Fig. 4
figure 4

RMSE values obtained during training

In Fig. 5, the output of the FWNN system for three-step ahead prediction of exchange rates for learning and generalization step is shown. Here, the solid line is desired output, and the dashed line is the FWNN output.

Fig. 5
figure 5

Three-step ahead prediction. Plot of output signals generated by FWNN (dotted line) and predicted signal (solid line)

The plot of prediction error is shown in Fig. 6. As shown in the figure, in generalization step (end part of error curve), the value of error increases. Figure 7 demonstrates the three-step ahead prediction of FWNN for test data. The result of the simulation of the FWNN prediction model is compared with result of simulation of the NN-based prediction model. To estimate the performance of the neural and FWNN prediction systems, the RMSE values of errors between predicted and current output signal are compared.

Fig. 6
figure 6

Plot of prediction error

Fig. 7
figure 7

Three-step ahead prediction. Curves describing testing data

In next experiment, x(t + 6) six-step ahead prediction of exchange rates is performed. The data points [x(t − 6) x(t − 2) x(t − 1) x(t)] are used as input for the system. The 500 data points are used for learning, and last 50 days are used for testing. In the result of learning, the parameters of FWNN were found. Training and test values of RMSE were 0.0228 and 0.0332, correspondingly.

The simulation results satisfy the efficiency of the application of FWNN technology in constructing a prediction model of exchange rates. In Table 3, the comparative results of simulations are given. As shown in the table, the performance of FWNN prediction is better than the performance of the NN model.

Table 3 Comparative results of simulation

5 Conclusion

The time series prediction model is developed by integrating fuzzy logic, neural networks, and wavelet technology. The wavelet networks are used to construct the fuzzy rules, and the functionality of the fuzzy system is realized by the neural network structure. FWNN prediction model is constructed using fuzzy clustering and gradient algorithm. The fuzzy clustering is applied in order to select the parameters antecedent part of fuzzy rules, that is the parameters of second layer. Gradient algorithm is used for training the parameters of consequent part–4th layer of FWNN structure. The structure and parameter update rules of the FWNN system are applied for modeling and prediction of complex time series. Simulation results demonstrated that the applied FWNN structure has better performance than other models. The developed FWNN structure is applied to develop a model for predicting future values of exchange rates. This process is high-order nonlinear. Using statistical data, the prediction model is constructed. The test results of the developed system are compared with these obtained from the feed-forward NN-based system, and the first one has demonstrated better performance.