Keywords

1 Introduction

Nowadays computational intelligence methods and especially hybrid systems of computational intelligence [13] are wide spread for Data Mining tasks in different areas under uncertainty, non-stationarity, nonlinearity, stochastic, chaotic conditions of the investigated objects and, first of all, in control, identification, prediction, classification, emulation etc. These systems are flexible because they combine effective approximating properties and learning abilities of artificial neural networks, transparency and interpretability of the results obtained by using neuro-fuzzy systems, the possibility of a compact description of the local features of non-stationary signals, providing wavelet neural networks and more advanced wavelet-neuro-fuzzy systems.

At the same time in the framework of such directions as Dynamic Data Mining (DDM) and Data Stream Mining (DSM) [46] more (if not the most) of observed systems appear either ineffective or inoperative in general. It connects with that the problems of DDM and DSM must be solved (including learning process) in on-line mode, when the data is fed to the processing sequentially, often in real time. It is clear that traditional multilayer perceptron trained based on back-propagation algorithm and requires the pre-determined training sample cannot operate in such conditions.

It is possible to implement the on-line learning process in the neural networks whose output signal depends on tuning synaptic weighs linearly, for example, radial basis function networks [RBFN], normalized radial basis function networks (NRBFN) [7, 8], generalized regression neural networks (GRNN) [9] and like them neural networks. However using of architectures that based on kernel activation functions is complicated, by so-called, course of dimensionality. Especially often such problem is appeared when using “lazy learning” [10] based on the conception “neurons on data point” [11].

Neuro-fuzzy systems have undoubted advantages over neural networks and first of all the significantly smaller number of tuning synaptic weights that allows to reduce time of learning process. Here it needs to notice the TSK-system [12] and its simplest version—Wang-Mendel system [13], ANFIS, DENFIS, SAFIS [14, 15] and etc. However, in these systems to provide the required approximating properties not only the synaptic weights but also membership functions (centres and widths) must be tuned. Furthermore, the training process of these parameters is performed using backpropagation algorithms in batch mode.

Hybrid wavelet-neuro-fuzzy systems [16], having a number of advantages, are too tedious from computational point of view, which complicates their using in real-time tasks. For solving such kind of problems, so-called, generalized additive models [17] are good. But such systems don’t operate under non-stationarity, nonlinearly and chaotic conditions.

In connection with that the development of hybrid system of computational intelligence is preferred. This system has to combine main advantages (the learning ability, the approximation and extrapolation properties, the identification of local features of signals, the transparency and interpretability of wavelet neuro-fuzzy systems) with simplicity and learning rate of generalized additive models.

2 Hybrid Generalized Additive Wavelet-Neuro-Fuzzy System Architecture

Figure 1 shows the architecture of the proposed hybrid generalized additive wavelet-neuro-fuzzy system (HGAWNFS).

Fig. 1
figure 1

Hybrid generalized additive wavelet-neuro-fuzzy system architecture

This system consists of four layers of information processing; the first and second layers are similar to the layers of TSK-neuro-fuzzy system. The only difference is that the odd wavelet membership functions “Mexican Hat”, which are “close relative” of Gaussians, are used instead of conventional bell-shaped Gaussian membership functions in the first hidden layer

$$ \varphi_{li} (x_{i} (k)) = (1 - \tau_{li}^{2} (k))\exp ( - \tau_{li}^{2} (k)/2) $$
(1)

where \( x(k) = (x_{1} (k), \ldots ,x_{i} (k), \ldots ,x_{n} (k))^{T} - (n \times 1) \) is the vector of input signals, \( k = 1,2, \ldots \) is a current moment of time, \( \tau_{li} (k) = (x_{i} (k) - c_{li} )\sigma_{li}^{ - 1} ;c_{li} ,\sigma_{li} \) are the centre and width of the corresponding membership function implying that \( {\underline{c}} \le c_{li} \le \bar{c};\;{\underline{\sigma }} \le \sigma_{li} \le \bar{\sigma };\; i = 1,2, \ldots ,n; \; l = 1,2, \ldots ,h; \; n \) is the input number; h is the membership functions number.

It is necessary to note that using the wavelet functions instead of common bell-shaped positive membership functions gives the system more flexibility [18], and using odd wavelets for the fuzzy reasoning does not contradict the ideas of fuzzy inference, because the negative values of these functions can be interpreted as non-membership levels [19].

Thus, if the input vector \( x(k) \) is fed to the system input, then in the first layer the hn levels of membership functions \( \varphi_{li} (x_{i} (k)) \) are computed and in the hidden layer h vector product blocks perform the aggregation of these memberships in the form

$$ \tilde{x}_{l} (k) = \prod\limits_{i = 1}^{n} {\varphi_{li} (x_{i} (k))} . $$
(2)

This means the input layers transform the information similarly to the neurons of the wavelet neural networks [20, 21], which form the multidimensional activation functions providing a scatter partitioning of the input space.

At that, therefore, in the region of input space, which remote from centres \( c_{l} = (c_{l1} , \ldots ,c_{li} , \ldots ,c_{\ln } )^{T} \) of multivariable activation functions

$$ \prod\limits_{i = 1}^{n} {(1 - \tau_{li}^{2} (k))\exp ( - \tau_{li}^{2} (k)/2)} , $$
(3)

the provided quality of approximation can be not high that is common disadvantage of all systems.

To provide the required approximation properties, the third layer of system is formed based on type-2 fuzzy wavelet neuron (T2FWN) [22, 23]. This neuron consists of two adaptive wavelet neurons (AWN) [24], whose prototype is a wavelet neuron of Yamakawa [25]. Wavelet neuron is different from the popular neo-fuzzy neuron [25] that uses the odd wavelet functions instead of the common triangular membership functions. The use of odd wavelet membership functions, which form the wavelet synapses \( WS_{1} , \ldots ,WS_{l} , \ldots ,WS_{h} \), provides higher quality of approximation in comparison with nonlinear synapses of neo-fuzzy neurons.

In such a way the wavelet neuron performs the nonlinear mapping in the form

$$ f(\tilde{x}(k)) = \sum\limits_{l = 1}^{h} {f_{l} (\tilde{x}_{l} (k))} $$
(4)

where \( \tilde{x}(k) = (\tilde{x}_{1} (k), \ldots ,\tilde{x}_{l} (k),. \ldots ,\tilde{x}_{h} (k))^{T} , \; f(\tilde{x}(k)) \)—is the scalar output of wavelet neuron.

Each wavelet synapse \( WS_{l} \) consists of g wavelet membership functions \( \tilde{\varphi }_{jl} (\tilde{x}_{l} ), \; j = 1,2, \ldots ,g \) (g is a wavelet membership function number in the wavelet neuron) and the same number of the tuning synaptic weights \( w_{jl} \). Thus, the transform is implemented by each wavelet synapse \( WS_{l} \) in the k-th instant of time, which can be written in form

$$ f_{l} (\tilde{x}_{l} (k)) = \sum\limits_{j = 1}^{g} {w_{jl} (k - 1)\tilde{\varphi }_{jl} (\tilde{x}_{l} (k))} $$
(5)

(here \( w_{jl} (k - 1) \) is the value of synaptic weights that are computed based on previous \( k - 1 \) observations), and the general wavelet neuron performs the nonlinear mapping in the form

$$ \tilde{f}(\tilde{x}(k)) = \sum\limits_{l = 1}^{h} {\sum\limits_{j = 1}^{g} {w_{jl} (k - 1)\tilde{\varphi }_{jl} (\tilde{x}_{l} (k))} } $$
(6)

i.e., in fact, this is the generalised additive model [17] that is characterised by the simplicity of computations and high approximation properties.

The output layer of system is formed by summator unit and it can be written in form

$$ \sum\limits_{l = 1}^{h} {\prod\limits_{i = 1}^{n} {\varphi_{li} (x_{i} (k)) = \sum\limits_{l = 1}^{h} {\tilde{x}_{l} (k)} } } $$
(7)

and by division unit, which realizes the normalization for avoiding of “gaps” appearance in the parameters space.

In such a way the output of HGAWNFS can be written in form

$$ \begin{aligned} \hat{y}(k) & = \frac{{\sum\nolimits_{l = 1}^{h} {\sum\nolimits_{j = 1}^{g} {w_{jl} (k - 1)\tilde{\varphi }_{jl} (\tilde{x}_{l} (k))} } }}{{\sum\nolimits_{l = 1}^{h} {\tilde{x}_{l} (k)} }} = \frac{{\sum\nolimits_{l = 1}^{h} {\sum\nolimits_{j = 1}^{g} {w_{jl} (k - 1)\tilde{\varphi }_{jl} \left( {\prod\nolimits_{i = 1}^{n} {\varphi_{li} (x_{i} (k))} } \right)} } }}{{\sum\nolimits_{l = 1}^{h} {\prod\nolimits_{i = 1}^{n} {\varphi_{li} (x_{i} (k))} } }} \\ & = \sum\limits_{l = 1}^{h} {\sum\limits_{j = 1}^{g} {w_{jl} (k - 1)\frac{{\tilde{\varphi }_{jl} (\tilde{x}_{l} (k))}}{{\sum\nolimits_{l = 1}^{h} {\tilde{x}_{l} (k)} }}} } = \sum\limits_{l = 1}^{h} {\sum\limits_{j = 1}^{g} {w_{jl} (k - 1)} } \tilde{\psi }_{jl} (\tilde{x}(k)) = w^{T} (k - 1)\psi (\tilde{x}(k)) \\ \end{aligned} $$
(8)

where \( \tilde{\psi }_{jl} (\tilde{x}(k)) = \tilde{\varphi }_{jl} (\tilde{x}_{l} (k))\left( {\sum\nolimits_{l = 1}^{h} {\tilde{x}_{l} (k)} } \right)^{ - 1} = \tilde{\varphi }_{jl} \left( {\prod\nolimits_{i = 1}^{n} {\varphi_{li} (x_{i} (k))} } \right)\left( {\sum\nolimits_{l = 1}^{h} {\prod\nolimits_{i = 1}^{n} {\varphi_{li} (x_{i} (k))} } } \right)^{ - 1} \), \( w(k - 1) = (w_{11} (k - 1),w_{21} (k - 1), \ldots ,w_{g1} (k - 1), \) \( w_{12} (k - 1), \ldots ,w_{jl} (k - 1), \ldots ,w_{gh} (k - 1))^{T} \), \( \tilde{\psi }(\tilde{x}(k)) = (\tilde{\psi }_{11} (\tilde{x}(k)),\tilde{\psi }_{21} (\tilde{x}(k)), \ldots ,\tilde{\psi }_{jl} (\tilde{x}(k)), \ldots ,\tilde{\psi }_{gh} (\tilde{x}(k)))^{T} \).

3 Adaptive Learning Algorithm of Hybrid Generalized Additive Wavelet-Neuro-Fuzzy System

The learning process of HGAWNFS is reduced in the simplest case to the synaptic weights tuning of wavelet neuron in the third hidden layer. For tuning of wavelet neuron its authors [25] used the gradient procedure which minimizes the learning criterion

$$ E(k) = \frac{1}{2}\left( {y(k) - \hat{y}(k)} \right)^{2} = \frac{1}{2}e^{2} (k) = \frac{1}{2}\left( {y(k) - \sum\limits_{l = 1}^{h} {\sum\limits_{j = 1}^{g} {w_{jl} \tilde{\psi }_{jl} (\tilde{x}(k))} } } \right)^{2} $$
(9)

and it can be written in form

$$ \begin{aligned} w_{jl} (k) & = w_{jl} (k - 1) + \eta e(k)\tilde{\psi }_{jl} (\tilde{x}(k)) = w_{jl} (k - 1) + \eta (y(k) - \hat{y}(k))\tilde{\psi }_{jl} (\tilde{x}(k)) \\ & = w_{jl} (k - 1) + \eta \left( {y(k) - \sum\limits_{l = 1}^{h} {\sum\limits_{j = 1}^{g} {w_{jl} (k - 1)\tilde{\psi }_{jl} (\tilde{x}(k))} } } \right)\tilde{\psi }_{jl} (\tilde{x}(k)) \\ \end{aligned} $$
(10)

where \( y(k) \) is reference signal, \( e(k) \) is learning error, \( \eta \) is fixed learning rate parameter.

For the speed acceleration of tuning process of synaptic weights under non-stationary conditions the exponential weighted recurrent least squares method can be used in form

$$ \left\{ {\begin{array}{*{20}l} {w(k) = w(k - 1) + \frac{{P(k - 1)e(k)\tilde{\psi }(\tilde{x}(k))}}{{\beta + \tilde{\psi }^{T} (\tilde{x}(k))P(k - 1)\tilde{\psi }(\tilde{x}(k))}},} \hfill \\ {P(k) = \frac{1}{\beta }\left( {P(k - 1) - \frac{{P(k - 1)\tilde{\psi }(\tilde{x}(k))\tilde{\psi }^{T} (\tilde{x}(k))P(k - 1))}}{{\beta + \tilde{\psi }^{T} (\tilde{x}(k))P(k - 1)\tilde{\psi }(\tilde{x}(k))}}} \right)} \hfill \\ \end{array} } \right. $$
(11)

(where \( 0 < \beta \le 1 \) is forgetting factor), which, therefore, can be numerical unstable for high tuning parameters number.

Under uncertain, stochastic or chaotic conditions, it is more effective to use the adaptive wavelet neuron (AWN) [26] instead of common wavelet neuron. In this case, we can tune not only synaptic weights, but the parameters of centres, widths and shapes.

The basis of adaptive wavelet neuron is the adaptive wavelet activation function that was proposed in [22] and can be written in form

$$ \tilde{\varphi }_{jl} (\tilde{x}_{l} (k)) = (1 - \alpha_{jl} (k)\tau_{jl}^{2} (k))\exp ( - \tau_{jl}^{2} (k)/2) $$
(12)

where \( 0 \le \alpha_{jl} \le 1 \) is the shape parameter of adaptive wavelet function, if \( \alpha_{jl} = 0 \) it is conventional Gaussian, if \( \alpha_{jl} = 1 \) it is the wavelet “Mexican Hat”, and if \( 0 < \alpha_{jl} < 1 \) it is some hybrid activation-membership function (see Fig. 3).

Figure 2 shows the adaptive wavelet activation function with different parameters \( \alpha \) и \( \sigma \).

Fig. 2
figure 2

Adaptive wavelet activation function: a \( \alpha = 1,\sigma = 1; \) b dashed line \( \alpha = 0.3,\sigma = 1.5; \) solid line \( \alpha = 0.6,\sigma = 0.5; \) c \( \alpha = 0,\sigma = 1 \)

Basically to tune the centers, widths and shapes parameters we can use optimization of the learning criterion (9) by the gradient procedure (10), calculated the partial derivative on \( c_{jl} ,\sigma_{jl}^{ - 1} \) and \( \alpha_{jl} \), but for the increasing speed of learning process we can use the one-step modification of Levenberg-Marquardt algorithm [27] for tuning all-parameters of each wavelet synapse simultaneously.

By introducing in the consideration \( (g \times 1) \)-vectors \( w_{l} (k) = (w_{1l} (k),w_{2l} (k), \ldots ,w_{gl} (k))^{T} \), \( \tilde{\psi }_{l} (\tilde{x}_{l} (k)) = (\tilde{\psi }_{1l} (\tilde{x}_{l} (k)),\tilde{\psi }_{2l} (\tilde{x}_{l} (k)), \ldots , \) \( \tilde{\psi }_{gl} (\tilde{x}_{l} (k)))^{T} \), \( c_{l} (k) = (c_{1l} (k),c_{2l} (k), \ldots ,c_{gl} (k))^{T} \), \( \sigma_{l}^{ - 1} (k) = (\sigma_{1l}^{ - 1} (k),\sigma_{2l}^{ - 1} (k), \ldots ,\sigma_{gl}^{ - 1} (k))^{T} \), \( \alpha_{l} (k) = (\alpha_{1l} (k),\alpha_{2l} (k), \ldots ,\alpha_{gl} (k))^{T} \), \( \tau_{l} (k) = (\tau_{1l} (k),\tau_{2l} (k), \ldots ,\tau_{gl} (k))^{T} \), we can write the learning algorithm in form

$$ \left\{ \begin{aligned} & w_{l} (k) = w_{l} (k - 1) + \frac{{e(k)\tilde{\psi }_{l} (\tilde{x}_{l} (k))}}{{\eta^{w} + \left\| {\tilde{\psi }_{l} (\tilde{x}_{l} (k))} \right\|^{2} }} = w_{l} (k - 1) + \frac{{e(k)\tilde{\psi }_{l} (\tilde{x}_{l} (k))}}{{\eta^{w} }}, \hfill \\ & c_{l} (k) = c_{l} (k - 1) + \frac{{e(k)\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))}}{{\eta^{c} + \left\| {\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))} \right\|^{2} }} = c_{l} (k - 1) + \frac{{e(k)\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))}}{{\eta^{c} }}, \hfill \\ &\sigma_{l}^{ - 1} (k) = \sigma_{l}^{ - 1} (k - 1) + \frac{{e(k)\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))}}{{\eta^{\sigma } + \left\| {\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))} \right\|^{2} }} = \sigma_{l}^{ - 1} (k - 1) + \frac{{e(k)\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))}}{{\eta^{\sigma } }}, \hfill \\ &\alpha_{l} (k) = \alpha_{l} (k - 1) + \frac{{e(k)\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))}}{{\eta^{\alpha } + \left\| {\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))} \right\|^{2} }} = \alpha_{l} (k - 1) + \frac{{e(k)\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))}}{{\eta^{\alpha } }} \hfill \\ \end{aligned} \right. $$
(13)

where \( \tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k)) = 2w_{l} (k - 1)\sigma_{l}^{ - 1} (k - 1)((2\alpha_{l} (k - 1) + 1)\tau_{l} (\tilde{x}_{l} (k)) - \alpha_{l} (k - 1)\tau_{l}^{3} (\tilde{x}_{l} (k))) \) \( \times \,\exp ( - \tau_{l}^{2} (\tilde{x}(k))/2) \); \( \tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k)) = w_{l} (k - 1)(\tilde{x}(k) - c_{l} (k - 1))(\alpha_{l} (k - 1)\tau_{l}^{3} \times \) \( (\tilde{x}_{l} (k)) - (2\alpha_{l} (k - 1) + 1)\tau_{l} (\tilde{x}_{l} (k)))\exp ( - \tau_{l}^{2} (\tilde{x}(k))/2) \); \( \tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k)) = - w_{l} (k - 1)\tau_{l}^{2} (\tilde{x}_{l} (k)) \) \( \times \,\exp ( - \tau_{l}^{2} (\tilde{x}(k))/2) \); \( \tau_{l}^{2} (\tilde{x}(k)) = \sigma_{l}^{ - 1} (k) \odot \sigma_{l}^{ - 1} (k) \odot (\tilde{x}_{l} (k)I_{l} - c_{l} (k - 1)) \) \( \odot (\tilde{x}_{l} (k)I_{l} - c_{l} (k - 1)) \), \( \odot \) is direct product symbol, \( I_{l} \) is \( (l \times 1) \)—the unit vector, \( \eta^{w} ,\,\eta^{c} ,\eta^{\sigma } ,\eta^{\alpha } \) are nonnegative momentum terms.

For increasing of filtering properties of learning procedure the denominators in recurrent equation system (13) can be modified in such way

$$ \left\{ \begin{aligned} \eta^{w} (k) = \beta \eta^{w} (k - 1) + \left\| {\tilde{\psi }_{l} (\tilde{x}_{l} (k))} \right\|^{2} , \hfill \\ \eta^{c} (k) = \beta \eta^{c} (k - 1) + \left\| {\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))} \right\|^{2} , \hfill \\ \eta^{\sigma } (k) = \beta \eta^{\sigma } (k - 1) + \left\| {\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))} \right\|^{2} , \hfill \\ \eta^{\alpha } (k) = \beta \eta^{\alpha } (k - 1) + \left\| {\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))} \right\|^{2} \hfill \\ \end{aligned} \right. $$
(14)

where \( 0 \le \beta \le 1 \) have the same sense that in the algorithm (11).

4 Robust Learning Algorithm of Hybrid Generalized Additive Wavelet-Neuro-Fuzzy System

Although the square criterion allows to obtain the optimal evaluation when the processed signal and disturbances have Gaussian distribution, but when the distribution has, so-called, “heavy tails” (for example, Laplace and Cauchy distribution etc.) the evaluation which based on quadratic criterion can be inadequate. In this case the robust methods with M-criterion are more effective [28].

Introducing for the consideration the modified Welsh robust identification criterion in the form

$$ E_{R} (k) = 1 - \exp ( - \delta e^{2} (k)) $$
(15)

where \( e(k) \) is the learning error, \( \delta \) is positive parameter, which set from empirical reasoning and defined the size of nonsensitivity zone for the outliers.

Figure 3 shows the comparison the robust identification criterion with different values of parameter \( \delta \) and least squares criterion.

Fig. 3
figure 3

Robust identification criterion (6) with different values of parameter \( \delta \)

Providing the sequence of the same transformation we can write the robust learning algorithm in form

$$ \left\{ {\begin{array}{*{20}l} {w_{l} (k) = w_{l} (k - 1) + \lambda_{w} \left( {{{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l} (\tilde{x}_{l} (k))} \mathord{\left/ {\vphantom {{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l} (\tilde{x}_{l} (k))} {\eta^{w} }}} \right. \kern-0pt} {\eta^{w} }}} \right),} \hfill \\ {\eta^{w} (k) = \beta \eta^{w} (k - 1) + \left\| {\tilde{\psi }_{l} (\tilde{x}_{l} (k))} \right\|^{2} ,} \hfill \\ {c_{l} (k) = c_{l} (k - 1) + \lambda_{c} \left( {{{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))} \mathord{\left/ {\vphantom {{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))} {\eta^{c} }}} \right. \kern-0pt} {\eta^{c} }}} \right),} \hfill \\ {\eta^{c} (k) = \beta \eta^{c} (k - 1) + \left\| {\tilde{\psi }_{l}^{c} (\tilde{x}_{l} (k))} \right\|^{2} ,} \hfill \\ \end{array} } \right. $$
(16)
$$ \left\{ {\begin{array}{*{20}l} {\sigma_{l}^{ - 1} (k) = \sigma_{l}^{ - 1} (k - 1) + \lambda_{\sigma } \left( {{{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))} \mathord{\left/ {\vphantom {{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))} {\eta^{\sigma } }}} \right. \kern-0pt} {\eta^{\sigma } }}} \right),} \hfill \\ {\eta^{\sigma } (k) = \beta \eta^{\sigma } (k - 1) + \left\| {\tilde{\psi }_{l}^{\sigma } (\tilde{x}_{l} (k))} \right\|^{2} ,} \hfill \\ {\alpha_{l} (k) = \alpha_{l} (k - 1) + \lambda_{\alpha } \left( {{{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))} \mathord{\left/ {\vphantom {{\delta e(k)\exp( - \delta {\text{e}}^{2} (k))\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))} {\eta^{\alpha } }}} \right. \kern-0pt} {\eta^{\alpha } }}} \right),} \hfill \\ {\eta^{\alpha } (k) = \beta \eta^{\alpha } (k - 1) + \left\| {\tilde{\psi }_{l}^{\alpha } (\tilde{x}_{l} (k))} \right\|^{2} .} \hfill \\ \end{array} } \right. $$
(17)

5 Conclusions

In this paper, the hybrid generalised additive wavelet-neuro-fuzzy system and its learning algorithms have been proposed. This system combines advantages of neuro-fuzzy system of Takagi-Sugeno-Kang, wavelet neural networks and generalised additive models of Hastie-Tibshirani.

The proposed hybrid system is characterised by computation simplicity, improving approximation and extrapolation properties as well as high speed of learning process. Hybrid generalized additive wavelet-neuro-fuzzy system can be used to solve a wide class of tasks in Dynamic Data Mining and Data Stream Mining, which are related to on-line processing (prediction, emulation, segmentation, on-line fault detection etc.) of non-stationary stochastic and chaotic signals corrupted by disturbances. The computational experiments are confirmed to effectiveness of developed approach.