A hybrid cascade neural network with an optimized pool in each cascade

Bodyanskiy, Ye.; Tyshchenko, O.; Kopaliani, D.

doi:10.1007/s00500-014-1344-3

A hybrid cascade neural network with an optimized pool in each cascade

Focus
Published: 27 June 2014

Volume 19, pages 3445–3454, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

A hybrid cascade neural network with an optimized pool in each cascade

Download PDF

Ye. Bodyanskiy¹,
O. Tyshchenko¹ &
D. Kopaliani¹

274 Accesses
13 Citations
Explore all metrics

Abstract

This paper proposes a new architecture and learning algorithms for a hybrid cascade neural network with pool optimization in each cascade. The proposed system is different from existing cascade systems in its capability to operate in an online mode, which allows it to work with non-stationary and stochastic nonlinear chaotic signals with the required accuracy. Compared to conventional analogs, the proposed system provides computational simplicity and possesses both tracking and filtering capabilities.

Adaptive learning of an evolving cascade neo-fuzzy system in data stream mining tasks

Article 04 April 2016

Cascade chaotic neural network (CCNN): a new model

Article 27 January 2022

A Hybrid Cascade Neural Network with Ensembles of Extended Neo-Fuzzy Neurons and Its Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Today artificial neural networks (ANNs) are successfully used in a wide range of data processing problems (when data can be presented either in the form of “object-property” tables or in the form of time series, often produced by non-stationary nonlinear stochastic or chaotic systems). The advantages ANNs have over other existing approaches derive from their universal approximating capabilities and learning capacities.

Conventionally, “learning” is defined as a process of adjusting synaptic weights using an optimization procedure that involves searching for the extremum of the given learning criterion. The learning process quality can be improved by adjusting a network topology along with its synaptic weights (Haykin 1999; Cichocki and Unbehauen 1993). This idea is the foundation of evolving computational intelligence systems (Kasabov 2001, 2003, 2007; Kasabov and Song 2002; Kasabov et al. 2005; Lughofer 2011; Angelov and Filev 2004, 2005; Angelov and Kasabov 2005; Angelov and Lughofer 2008; Angelov and Zhou 2006, 2008; Angelov et al. 2004, 2005, 2006, 2007, 2008, 2010; Lughofer and Klement 2003, 2004; Lughofer and Bodenhofer 2006; Lughofer and Guardiola 2008a, b; Lughofer and Kindermann 2008, 2010; Lughofer and Angelov 2009, 2011; Lughofer 2006, 2008a, b, c, 2010a, b; Lughofer et al. 2003, 2004, 2005, 2007, 2009). Under this approach, the best known architectures are DENFIS by Kasabov and Song (2002), Angelov and Filev (2004) and FLEXFIS by Lughofer (2008c). These systems are actually five-layer Takagi-Sugeno networks and evolution is fulfilled in the fuzzification layer. These networks can process data in an online mode where a clusterization task is solved in the antecedent (unsupervised learning) and consequent parameter tuning is performed with supervised learning with the help of the exponentially weighted recurrent least-squares method. Though these systems are characterized by high approximating properties, they can also process non-stationary signals and they require big-volume samples to tune parameters. A clusterization procedure cannot be optimized in terms of speed like all the algorithms which are based on self-learning. A rather interesting class of computational intelligence systems where an architecture is evolving during a learning process is cascade-correlation neural networks (Fahlman and Lebiere 1990; Prechelt 1997; Schalkoff 1997; Avedjan et al. 1999) due to their high efficiency degree and learning simplicity of both synaptic weights and a network topology. Such a network starts off with a simple architecture consisting of a pool (ensemble) of neurons which are trained independently (the first cascade). Each neuron in the pool can have a different activation function and/or a different learning algorithm. The neurons in the pool do not interact with each other while trained. After all the neurons in the pool of the first cascade have their weights adjusted, the best neuron with respect to a learning criterion forms the first cascade and its synaptic weights can no longer be adjusted. Then the second cascade is formed usually out of similar neurons in the training pool. The only difference is that the neurons which are trained in the pool of the second cascade have an additional input (and, therefore, an additional synaptic weight) which is an output of the first cascade. Similar to the first cascade, the second cascade will eliminate all but one neuron showing the best performance whose synaptic weights will thereafter be fixed. Neurons of the third cascade have two additional inputs, namely the outputs of the first and second cascades. The evolving network continues to add new cascades to its architecture until it reaches the desired quality of problem solving over the given training set.

Authors of the most popular cascade neural network, CasCorLA, S. E. Fahlman and C. Lebiere, used elementary Rosenblatt perceptrons with traditional sigmoidal activation functions and adjusted synaptic weights using the Quickprop-algorithm (a modification of the $\delta $-learning rule). Since the output signal of such neurons is non-linearly dependent on its synaptic weights, the learning rate cannot be increased for such neurons. In order to avoid multi-epoch learning (Bodyanskiy et al. 2008, 2009, 2011a, b, c; Bodyanskiy and Viktorov 2009a, b; Bodyanskiy and Kolodyazhniy 2010), different types of neurons (with outputs that depend linearly on synaptic weights) should be used as network nodes. This would allow to use optimal learning algorithms in terms of speed and process data as it is an input to the network. However, if the network learns in an online mode, it is impossible to determine the best neuron in the pool. While working with non-stationary objects, one neuron of the training pool can be identified as the best for one part of the training set, but not for the others. Thus we suggest that all the neurons in the training pool should be retained and a certain optimization procedure (generated according to a general network quality criterion) should be used to determine an output of the cascade. In this paper, we try to create such a hybrid neural network with an optimized neuron pool in each cascade.

2 An optimized cascade neural network architecture

The architecture of the desired hybrid neural network with an optimized pool of neurons in each cascade is shown in Fig. 1.

An input of the network (the so-called “receptive layer”) is a vector signal

$$\begin{aligned} x(k)=\left( x_1(k),x_2(k), \ldots , x_n(k)\right) ^T\!, \end{aligned}$$

where $k=1,2,\ldots ,$ is either the quantity of samples in the “object-property” table or the current discrete time. These signals are fed to the inputs of each neuron in the network $N_j^{[m]}$ ($j=1,2,\ldots , q$ is the quantity of neurons in the training pool, $m=1,2,\ldots $ is the number of the cascade), that produces outputs $\hat{y}_j^{[m]}(k)$. These outputs are then combined with a generalizing neuron $GN^{[m]}$, which generates an optimal output $\hat{y}^{*[m]}(k)$ of the $m$-th cascade. While the input of the neurons in the first cascade is $x(k)$ (which may also contain a threshold value $x_0(k)\equiv 1$), neurons in the second cascade have an additional input for the generated signal $\hat{y}^{*[1]}(k)$, neurons in the third cascade have two additional inputs $\hat{y}^{*[1]}(k)$, $\hat{y}^{*[2]}(k)$, and neurons in the $m$-th cascade have $(m-1)$ additional inputs $\hat{y}^{*[1]}(k), \hat{y}^{*[2]}(k), \ldots ,\hat{y}^{*[m-1]}(k)$. The new cascades become a part of the network during a training process when it becomes clear that the current cascades do not provide the desired quality.

3 Training elementary Rosenblatt perceptrons in a cascade neural network

For now, let us assume that the $j$-th node in the $m$-th cascade is an elementary Rosenblatt perceptron with the activation function

$$\begin{aligned} 0<\sigma ^{[m]}_j({\gamma _j}^{[m]} u_j^{[m]})=\frac{1}{1+e^{- \gamma ^{[m]}_j u_j^{[m]}}}<1, \end{aligned}$$

where $u_j^{[m]}$ is an internal activation signal of the $j$-th neuron in the $m$-th cascade, and $\gamma _j^{[m]}$ is a gain parameter. In such a case, the neurons in the pool of the first cascade will have the following outputs:

$$\begin{aligned} \hat{y}_j^{[1]}=\sigma _j^{[1]} \left( \gamma _j^{[1]} \sum _{i=0}^{n} w_{ji}^{[1]}x_i\right) =\sigma _j^{[1]}\left( \gamma _j^{[1]}w_j^{[1]T} x \right) , \end{aligned}$$

where $w_{ji}^{[1]}$ is the $i$-th synaptic weight for the $j$-th neuron in the first cascade. Outputs of the second cascade

$$\begin{aligned} \hat{y}_j^{[2]}=\sigma _j^{[2]} \left( \gamma _j^{[2]} \left( \sum _{i=0}^{n} w_{ji}^{[2]}x_i+w_{j,n+1}^{[2]}\hat{y}^{*[1]}\right) \right) , \end{aligned}$$

outputs of the $m$-th cascade

$$\begin{aligned} \begin{aligned} \hat{y}_j^{[m]}&=\sigma _j^{[m]}\bigg ( \gamma _j^{[m]} \bigg ( \sum _{i=0}^{n} w_{ji}^{[m]}x_i+w_{j,n+1}^{[m]}\hat{y}^{*[1]}\\&\quad +w_{j,n+2}^{[m]}\hat{y}^{*[2]}+\cdots +w_{j,n+m-1}^{[m]}\hat{y}^{*[m-1]}\bigg )\bigg )\\&=\sigma _j^{[m]}\left( \gamma _j^{[m]}\sum _{i=0}^{n+m-1}w_{ji}^{[m]}x_j^{[m]}\right) =\sigma _j^{[m]}\left( w_j^{[m]T}x^{[m]}\right) \end{aligned}, \end{aligned}$$

where $x^{[m]}=\left( x^T, \hat{y}^{*[1]},\ldots , \hat{y}^{*[m-1]}\right) ^T$.

Thus the cascade network, using Rosenblatt perceptrons as nodes and containing $m$ cascades, is dependent on $(m(n+2)+\sum _{p=1}^{m-1}p)$ parameters including the gain parameters $\gamma _j^{[p]}, p=1,2,\ldots ,m$. We use a conventional quadratic function as a learning criterion

$$\begin{aligned} E_j^{[m]}&= \frac{1}{2}\left( e_j^{[m]}(k)\right) ^2=\frac{1}{2}\left( y(k)-\hat{y}_j^{[m]}(k)\right) ^2\nonumber \\&= \frac{1}{2}\left( y(k)-\sigma _j^{[m]}\left( \gamma _j^{[m]}w_j^{[m]T}x^{[m]}(k)\right) \right) ^2, \end{aligned}$$

(1)

where $y(k)$ is a reference signal. The gradient optimization of the criterion (1) with respect to $w_j^{[m]}$ is

$$\begin{aligned} w_j^{[m]}(k+1)&= w_j^{[m]}(k)+\eta _j^{[m]}(k+1)e_j^{[m]}(k+1)\gamma _j^{[m]}\nonumber \\&\times \hat{y}_j^{[m]}(k\!+\!1)\left( 1\!-\!\hat{y}_j^{[m]}(k\!+\!1)\right) x^{[m]}(k\!+\!1)\nonumber \\&= w_j^{[m]}(k)+\eta _j^{[m]}(k+1)e_j^{[m]}(k+1)\nonumber \\&\times \gamma _j^{[m]}J_j^{[m]}(k+1), \end{aligned}$$

(2)

(here $\eta _j^{[m](k+1)}$ is a learning rate parameter), and minimization of (1) with respect to $\gamma _j^{[m]}$ can be performed using the Kruschke–Movellan algorithm (Kruschke and Movellan 1991):

$$\begin{aligned} \gamma _j^{[m]}(k\!+\!1)&\!=\!\gamma _j^{[m]}(k)\!+\!\eta _j^{[m]}(k\!+\!1)e_j^{[m]}(k\!+\!1)\hat{y}_j^{[m]}(k\!+\!1)\nonumber \\&\quad \times \left( 1-\hat{y}_j^{[m]}(k+1)\right) u_j^{[m]}(k+1). \end{aligned}$$

(3)

Combining (2) and (3), we obtain a general learning algorithm for the $j$-th neuron in the $m$-th cascade:

$$\begin{aligned} \begin{aligned} \left( \frac{w_j^{[m]}(k+1)}{\gamma _j^{[m]}(k+1)}\right)&=\left( \frac{w_j^{[m]}(k)}{\gamma _j^{[m]}(k)}\right) \\&\quad +\eta _j^{[m]}(k+1)e_j^{[m]}(k+1)\hat{y}_j^{[m]}(k+1)\\&\quad \times (1\!-\!\hat{y}_j^{[m]}(k\!+\!1))\left( \frac{\gamma _j^{[m]}x^{[m]}(k\!+\!1)}{u_j^{[m]}(k\!+\!1)}\right) , \end{aligned} \end{aligned}$$

or, introducing new variables, in a more compact form:

$$\begin{aligned}&\tilde{w}_j^{[m]}(k+1)\\&\quad =\tilde{w}_j^{[m]}(k)+\eta _j^{[m]}(k+1)\\&\qquad \times e_j^{[m]}(k\!+\!1) \hat{y}_j^{[m]}(k\!+\!1)(1\!-\!\hat{y}_j^{[m]}(k\!+\!1))\tilde{x}^{[m]}(k\!+\!1)\\&\quad =\tilde{w}_j^{[m]}(k)+\eta _j^{[m]}(k+1)e_j^{[m]}(k+1)\tilde{J}_j^{[m]}(k+1). \end{aligned}$$

Weight adjustment can be improved by introducing a momentum term to the learning process (Chan and Fallside 1987; Almeida and Silva 1990; Holmes and Veitch 1991), so that instead of the learning criterion (1) we use the function

$$\begin{aligned}&E_j^{[m]}(k)=\frac{\eta }{2}(e_j^{[m]}(k))^2\nonumber \\&\quad +\frac{1-\eta }{2}\Vert \tilde{w}_j^{[m]}(k)-\tilde{w}_j^{[m]}(k-1)\Vert ^2,0<\eta \leqslant 1 \end{aligned}$$

(4)

and the algorithm is

(5)

which is a modification of the Silva–Almeida procedure (Almeida and Silva 1990).

Using the approach suggested in (Bodyanskiy et al. 2001b, 2003b), we can introduce tracking and filtering properties. So a final version of the algorithm is

$$\begin{aligned} \left\{ \begin{array}{l} \tilde{w}_j^{[m]}(k+1)=\tilde{w}_j^{[m]}(k)\\ \quad +\frac{\eta e_j^{[m]}(k+1)\tilde{J}_j^{[m]}(k+1)}{r_j^{[m]}(k+1)}\\ \quad +\frac{(1-\eta )(\tilde{w}_j^{[m]}(k)-\tilde{w}_j^{[m]}(k-1))}{r_j^{[m]}(k+1)}, \\ r_j^{[m]}(k+1)=r_j^{[m]}(k)+||\tilde{J}_j^{[m]}(k+1)||^2\\ \quad -||\tilde{J}_j^{[m]}(k-s)||^2 \end{array}\right. \end{aligned}$$

(6)

where $s$ is a sliding window size.

It is interesting that when $s=1$, $\eta =1$, we get a nonlinear version of the well-known Kaczmarz–Widrow–Hoff algorithm (Kaczmarz 1937, 1993; Hoff and Widrow 1960):

$$\begin{aligned} \tilde{w}_j^{[m]}(k+1)=\tilde{w}_j^{[m]}(k)+\frac{e_j^{[m]}(k+1)\tilde{J}_j^{[m]}(k+1)}{\Vert \tilde{J}_j^{[m]}(k+1)\Vert ^2}, \end{aligned}$$

which is widely used in artificial neural networks’ learning and characterized by high convergence rate.

4 Training neo-fuzzy neurons in a cascade neural network

A low learning rate of Rosenblatt perceptrons along with difficulty in interpreting results (inherent to all ANNs in general) encourages us to search for alternative approaches to the synthesis of evolving systems in general and cascade neural networks in particular. High interpretability and transparency along with good approximation capabilities and ability to learn are the main features of the neuro-fuzzy systems (Jang et al. 1997), which are the foundation of hybrid artificial intelligence systems.

In Bodyanskiy and Viktorov (2009a, b), Bodyanskiy and Kolodyazhniy (2010) hybrid cascade systems were introduced which used neo-fuzzy neurons (Kusanagi et al. 1992; Uchino and Yamakawa 1997; Miki and Yamakawa 1999) as network nodes, allowing them to significantly increase a rate of synaptic weight adjustment. A neo-fuzzy neuron (NFN) is a non-linear system providing the following mapping:

$$\begin{aligned} \hat{y}=\sum _{i=1}^{n} f_i(x_i), \end{aligned}$$

where $x_i$ is the $i\mathrm{th}$ input ($i=1,2,\ldots ,n$), $\hat{y}$ is an output of the neo-fuzzy neuron. Structural units of the neo-fuzzy neuron are non-linear synapses $NS_i$ which transform input signals in the following way:

$$\begin{aligned} f_i(x_i)=\sum _{l=1}^{h} w_{li} \mu _{li}(x_i), \end{aligned}$$

where $w_{li}$ is the $l\mathrm{th}$ synaptic weight of the $i\mathrm{th}$ non-linear synapse, $l=1,2,\ldots ,h$ is the total quantity of synaptic weights and, therefore, membership functions $\mu _{li}(x_i)$ in the synapse. So $NS_i$ implements fuzzy inference in the form

$$\begin{aligned} \text {IF}\, x_i \,\text {IS}\, X_{li} \,\text {THEN THE OUTPUT IS} \,w_{li}, \end{aligned}$$

where $X_{li}$ is a fuzzy set with a membership function $\mu _{li}$, $w_{li}$ is a singleton (a synaptic weight in a consequent). It can be seen that, in fact, the non-linear synapse implements the zero-order Takagi-Sugeno fuzzy inference.

Figure 2 shows the $j\mathrm{th}$ neo-fuzzy neuron of the first cascade (according to the network topology shown in Fig. 1).

$$\begin{aligned} \left\{ \begin{array}{ll} \hat{y}_j^{[1]}(k)=\sum _{i=1}^{n}f_{ji}^{[1]}(x_i(k))\\ =\sum _{i=1}^{n}\sum _{l=1}^{h} w_{jli}^{[1]} \mu _{jli}^{[1]}(x_i(k)),\\ \text {IF}\, x_i(k) \,\text {IS}\, X_{jli}\, \text {THEN}\\ \hbox {THE}\,\hbox {OUTPUT}\,\hbox {IS}\, w_{jli}^{[1]}. \end{array}\right. \end{aligned}$$

(7)

Authors of the neo-fuzzy neuron (Kusanagi et al. 1992; Uchino and Yamakawa 1997; Miki and Yamakawa 1999) used a traditional triangular structure meeting the conditions of Ruspini partitioning (unity partitioning) as membership functions:

$$\begin{aligned} \mu _{jli}^{[1]}(x_i)= \left\{ \begin{array}{ll} &{} \frac{x_i-c_{j,l-1,i}^{[1]}}{c_{jli}^{[1]}-c_{j,l-1,i}^{[1]}},\,\quad \text {if}\, x_i \in [c_{j,l-1,i}^{[1]},c_{jli}^{[1]}], \\ &{} \frac{c_{j,l+1,i}^{[1]}-x_i}{c_{j,l+1,i}^{[1]}-c_{jli}^{[1]}},\,\quad \text {if}\, x_i \in [c_{jli}^{[1]},c_{j,l+1,i}^{[1]}] \\ &{} 0,\,\quad \text {otherwise}, \end{array}\right. \end{aligned}$$

(8)

where $c_{jli}^{[1]}$ are center parameters of membership functions over the interval [0,1] which are relatively arbitrarily chosen (usually evenly distributed) where, naturally, $0\le x_i \le 1$. This choice ensures that the input signal $x_i$ activates only two neighboring membership functions, and their sum is always equal to 1, which means that

$$\begin{aligned} \mu _{jli}^{[1]}(x_i)+\mu _{j,l+1,i}^{[1]}(x_i)=1 \end{aligned}$$

and

$$\begin{aligned} f_{ji}^{[1]}(x_i)=w_{jli}^{[1]}\mu _{jli}^{[1]}(x_i)+w_{j,l+1,i}^{[1]}\mu _{j,l+1,i}^{[1]}(x_i). \end{aligned}$$

Approximating capabilities can be improved using cubic splines (Bodyanskiy and Viktorov 2009b) instead of triangular membership functions:

$$\begin{aligned} \mu _{jli}^{[1]}(x_i)= \left\{ \begin{array}{ll} &{} \frac{1}{4}\left( 2+3\frac{2x_i-c_{jli}^{[1]}-c_{j,l-1,i}^{[1]}}{c_{jli}^{[1]}-c_{j,l-1,i}^{[1]}}\right. \\ &{}\left. \quad -\left( \frac{2x_i-c_{jli}^{[1]}-c_{j,l-1,i}^{[1]}}{c_{jli}^{[1]}-c_{j,l-1,i}^{[1]}}\right) ^3\right) ,\\ &{}\text {if}\, x_i \in [c_{j,l-1,i}^{[1]},c_{jli}^{[1]}], \\ &{}\frac{1}{4}\Bigg (2-3\frac{2x_i-c_{j,l+1,i}^{[1]}-c_{jli}^{[1]}}{c_{j,l+1,i}^{[1]}-c_{jli}^{[1]}}\\ &{}\quad +\Bigg (\frac{2x_i-c_{j,l+1,i}^{[1]}-c_{jli}^{[1]}}{c_{j,l+1,i}^{[1]}-c_{jli}^{[1]}}\Bigg )^3\Bigg ),\\ {} &{}\text {if}\, x_i \in [c_{jli}^{[1]},c_{j,l+1,i}^{[1]}], \\ &{} 0, \text {otherwise} \end{array}\right. \end{aligned}$$

(9)

or B-splines (Bodyanskiy and Kolodyazhniy 2010):

$$\begin{aligned} \mu _{jli}^{g[1]}(x_i)=\left\{ \begin{aligned}&\begin{aligned}&1,\, \text {if}\, x_i \in [c_{jli}^{[1]},c_{j,l+1,i}^{[1]}] \\&0, \, \text {otherwise}, \end{aligned} \Bigg \} \quad \text {for}\, g=1, \\&\begin{aligned}&\frac{x_i-c_{jli}^{[1]}}{c_{j,l+g-1,i}^{[1]}-c_{jli}^{[1]}} \mu _{jli}^{g-1,[1]}(x_i)\\&\quad +\frac{c_{j,l+g,i}^{[1]}-x_i}{c_{j,l+g,i}^{[1]}-c_{j,l+1,i}^{[1]}} \\&\quad \times \mu _{j,l+1,i}^{g-1,[1]}(x_i) \Bigg \} \quad \text {for}\, g>1, \end{aligned} \\ \end{aligned} \right. \end{aligned}$$

(10)

where $\mu _{jli}^{g[1]}(x_i)$ is the $l$-th spline of the $g$-th order. It can be seen that when $g=2$ we obtain triangular membership functions in (8). B-splines also ensure unity partitioning, but in a general case they can activate an arbitrary number of membership functions, beyond the interval [0,1], which might be useful for subsequent cascades (i.e., those following the first).

It is clear that other structures such as polynomial harmonic functions, wavelets, orthogonal functions, etc. can be used as membership functions for non-linear synapses.

It is still unclear which of the functions can provide the best results, which is why the idea of using not a single neuron, but a pool of neurons with different membership and activation functions seems promising.

Similar to (7) we can determine outputs for the remaining cascades: outputs of the neurons in the second cascade

$$\begin{aligned} \hat{y}_{j}^{[2]}\!=\!\sum _{i=1}^{n}\sum _{l=1}^{h} w_{jli}^{[2]} \mu _{jli}^{[2]}(x_i)\!+\!\sum _{l=1}^{h} w_{jl,n+1}^{[2]} \mu _{jl,n\!+\!1}^{[2]} (\hat{y}^{*[1]}), \end{aligned}$$

outputs of the $m\mathrm{th}$ cascade

$$\begin{aligned} \begin{aligned} \hat{y}_{j}^{[m]}&=\sum _{i=1}^{n}\sum _{l=1}^{h} w_{jli}^{[m]} \mu _{jli}^{[m]}(x_i)\\&\quad +\sum _{p=n+1}^{n+m-1}\sum _{l=1}^{h} w_{jlp}^{[m]} \mu _{jlp}^{[m]} (\hat{y}^{*[p-n]}). \end{aligned} \end{aligned}$$

Thus, the cascade network formed with the neo-fuzzy neurons, consisting of $m$ cascades, contains $h\big ( n+\sum _{p=1}^{m-1}p \big )$ parameters. Introducing a vector of membership functions for the $j$-th neo-fuzzy neuron in the $m$-th cascade,

$$\begin{aligned} \begin{aligned}&\mu _{j}^{[m]}(k)=(\mu _{j11}^{[m]}(x_{1}(k)),\ldots ,\mu _{jh1}^{[m]}(x_{1}(k)),\mu _{j12}^{[m]}(x_{2}(k)),\\&\qquad \ldots ,\mu _{jh2}^{[m]}(x_{2}(k)),\ldots ,\mu _{jli}^{[m]}(x_{i}(k)),\ldots ,\mu _{jhn}^{[m]}(x_{n}(k)),\\&\qquad \mu _{j1,n+1}^{[m]}(\hat{y}^{*[1]}(k)),\ldots ,\mu _{jh,n+m-1}^{[m]}(\hat{y}^{*[m-1]}(k)))^{T} \end{aligned} \end{aligned}$$

and a corresponding vector of synaptic weights,

$$\begin{aligned} \begin{aligned} w_{j}^{[m]}&=(w_{j11}^{[m]},\ldots ,w_{jh1}^{[m]},w_{j12}^{[m]},\ldots ,w_{jh2}^{[m]},\ldots ,\\&w_{jli}^{[m]},\ldots ,w_{jhn}^{[m]},w_{j1,n+1}^{[m]},\ldots ,w_{jh,n+m-1}^{[m]})^{T}, \end{aligned} \end{aligned}$$

we obtain an output

$$\begin{aligned} \hat{y}_{j}^{[m]}(k)=w_{j}^{[m]T}\mu _{j}^{[m]}(k). \end{aligned}$$

The learning criterion (1) for this case will be

$$\begin{aligned} E_{j}^{[m]}(k)=\frac{1}{2}(e_{j}^{[m]}(k))^{2}=\frac{1}{2}(y(k)-w_{j}^{[m]T}\mu _{j}^{[m]}(k))^{2} \end{aligned}$$

(11)

and its minimization can be reached by using a “sliding window” modification of the procedure (Bodyanskiy et al. 1986):

$$\begin{aligned} \left\{ \begin{array}{ll} &{} w_{j}^{[m]}(k+1)=w_{j}^{[m]}(k) \\ &{}\,\,\,+\frac{e_{j}^{[m]}(k+1)\mu _{j}^{[m]}(k+1)}{r_{j}^{[m]}(k+1)} \\ &{} r_{j}^{[m]}(k+1)=r_{j}^{[m]}(k)+||\mu _{j}^{[m]}(k+1)||^{2}\\ &{} \,\,\,\,-\Vert \mu _{j}^{[m]}(k-s)\Vert ^{2} \end{array} \right. \end{aligned}$$

(12)

or when $s=1$ (Bodyanskiy et al. 2003a):

$$\begin{aligned} w_{j}^{[m]}(k+1)=w_{j}^{[m]}(k)+\frac{e_{j}^{[m]}(k+1)\mu _{j}^{[m]}(k+1)}{\Vert \mu _{j}^{[m]}(k+1)\Vert ^{2}}, \end{aligned}$$

which reduces to the one-step-optimal Kaczmarz–Widrow–Hoff algorithm. Its clear that one could use other algorithms instead of (12), for example, the exponentially weighted recurrent least squares method (EWRLSM) that is used in DENFIS (Kasabov and Song 2002), eTS (Angelov and Filev 2004) and FLEXFIS (Angelov et al. 2005; Lughofer 2008c). But one should remember that EWRLSM may be unstable when a forgetting factor is rather low. When using the criterion with the momentum term (3) instead of (11) we obtain a final learning algorithm for the neo-fuzzy neuron:

$$\begin{aligned} \left\{ \begin{array}{ll} &{} w_{j}^{[m]}(k+1)=w_{j}^{[m]}(k) \\ &{}\quad +\Bigg (\frac{\eta e_{j}^{[m]}(k+1)\mu _{j}^{[m]}(k+1)}{r_{j}^{[m]}(k+1)}\\ &{}\quad +\frac{(1-\eta )(w_{j}^{[m]}(k)-w_{j}^{[m]}(k-1))}{r_{j}^{[m]}(k+1)}\Bigg ), \\ &{} r_{j}^{[m]}(k+1)=r_{j}^{[m]}(k)+||\mu _{j}^{[m]}(k+1)||^{2}\\ &{}\quad -\Vert \mu _{j}^{[m]}(k-s)\Vert ^{2}. \end{array} \right. \end{aligned}$$

(13)

It should be kept in mind that, since a NFN output is linearly dependent on its synaptic weights, we can use any adaptive linear identification algorithm (Ljung 1999) (second-order recursive least square methods, robust, ignoring outdated information, etc.), which allows us to process non-stationary signals in an online mode.

5 Optimization of the pool output

Outputs generated by neurons in each pool are combined with the corresponding neuron $GN^{[m]}$, the output accuracy of which $\hat{y}^{*[m]}(k)$ must be higher than the accuracy of any output $\hat{y}_{j}^{[m]}(k)$. This task can be solved with the help of the neural networks ensembles approach. Although the well-known algorithms are not designated for working in an online mode, the adaptive generalizing forecasting could be used in this case (Bodyanskiy et al. 1983, 1989, 1999, 2001a; Bodyanskiy and Pliss 1990; Bodyanskiy and Vorobyov 2000).

Let us introduce a vector of pool inputs for the $m$-th cascade:

$$\begin{aligned} \hat{y}^{[m]}(k)=(\hat{y}_{1}^{[m]}(k),\hat{y}_{2}^{[m]}(k),\ldots ,\hat{y}_{q}^{[m]}(k))^{T}; \end{aligned}$$

then an optimal output of the neuron $GN^{[m]}$, which is in essence an adaptive linear associator (Cichocki and Unbehauen 1993; Haykin 1999), can be defined as

$$\begin{aligned} \hat{y}^{*[m]}(k)=\sum _{j=1}^{q} c_{j}^{[m]}\hat{y}_{j}^{[m]}(k)=c^{[m]T}\hat{y}^{[m]}(k) \end{aligned}$$

with additional restrictions on unbiasedness

$$\begin{aligned} \sum _{j=1}^{q} c_{j}^{[m]}=E^{T}c^{[m]}=1, \end{aligned}$$

(14)

where $c^{[m]}=(c_{1}^{[m]},c_{2}^{[m]},\ldots ,c_{q}^{[m]})^{T}$, $E=(1,1,\ldots ,1)^{T}$ are $(q\times 1)$-vectors.

Introducing a learning criterion on a sliding window

$$\begin{aligned} \begin{aligned} E^{[m]}(k)&=\frac{1}{2}\sum _{\tau =k-s+1}^{k}(y(\tau )-\hat{y}^{*[m]}(\tau ))^{2}\\&=\frac{1}{2}\sum _{\tau =k-s+1}^{k}(y(\tau )-c^{[m]T}\hat{y}^{[m]}(\tau ))^{2}, \end{aligned} \end{aligned}$$

taking into account the constraints (14), the Lagrangian function will be

$$\begin{aligned} L^{[m]}(k)=E^{[m]}(k)+\lambda (1-E^{T}c^{[m]}) \end{aligned}$$

(15)

where $\lambda $ is an undetermined Lagrange multiplier.

Direct minimization of (15) with respect to $c^{[m]}$ gives

$$\begin{aligned} \left\{ \begin{array}{ll} &{} \hat{y}^{*[m]}(k+1)=\frac{\hat{y}^{[m]T}(k+1)P^{[m]}(k+1)E}{E^{T}P^{[m]}(k+1)E}, \\ &{} P^{[m]}(k+1)=\Bigg (\sum _{\tau =k-s+2}^{k+1}\hat{y}^{[m]}(\tau )\hat{y}^{[m]T}(\tau )\Bigg )^{-1} \end{array} \right. \end{aligned}$$

(16)

or in a recurrent form:

$$\begin{aligned} \left\{ \begin{array}{ll} &{} \tilde{P}^{[m]}(k+1)=P^{[m]}(k)\\ &{}\quad -\frac{P^{[m]}(k)\hat{y}^{[m]}(k+1)\hat{y}^{[m]T}(k+1)P^{[m]}(k)}{1+\hat{y}^{[m]T}(k+1)P^{[m]}(k)\hat{y}^{[m]}(k+1)}, \\ &{} P^{[m]}(k+1)=\tilde{P}^{[m]}(k+1)\\ &{}\quad +\frac{\tilde{P}^{[m]}(k+1)\hat{y}^{[m]}(k-s+1)\hat{y}^{[m]T}(k-s+1)\tilde{P}^{[m]}(k+1)}{1-\hat{y}^{[m]T}(k-s+1)\tilde{P}^{[m]}(k+1)\hat{y}^{[m]}(k-s+1)},\\ &{} \hat{y}^{*[m]}(k+1)=\frac{\hat{y}^{[m]T}(k+1)P^{[m]}(k+1)E}{E^{T}P^{[m]}(k+1)E}.\\ \end{array} \right. \end{aligned}$$

(17)

when $s=1$, ratios (16) and (17) take on an extremely simple form:

$$\begin{aligned} \hat{y}^{*[m]}(k+1)&= \frac{\hat{y}^{[m]T}(k+1)\hat{y}^{[m]}(k+1)}{E^{T}\hat{y}^{[m]}(k+1)}\nonumber \\&= \frac{||\hat{y}^{[m]}(k\!+\!1)||^{2}}{E^{T}\hat{y}^{[m]}(k\!+\!1)} =\frac{\sum _{j=1}^{q} (\hat{y}_{j}^{[m]}(k\!+\!1))^{2}}{\sum _{j=1}^{q} \hat{y}_{j}^{[m]}(k\!+\!1)}.\nonumber \\ \end{aligned}$$

(18)

It is important to note that training both neo-fuzzy neurons and neuron-generalizers can be organized in an online adaptive mode. In this way, the neurons’ weights of all previous cascades are not frozen, but they are constantly adjusted, and the number of cascades can both increase and decrease in real time, which distinguishes the proposed neural network from other well-known cascade systems.

6 Experimental results

Considering a comparison with the evolving approaches eTS (Angelov and Filev 2004), $\text {Simp}\_{\text {eTS}}$ (Angelov and Filev 2005), SAFIS (Rong et al. 2006), MRAN (Yingwei et al. 1997), RANEKF (Kadirkamanathan and Niranjan 1993), and FLEXFIS (Angelov et al. 2005; Lughofer 2008c), the following nonlinear dynamic system identification example was applied (the results of these methods are reported in (Angelov and Zhou 2006)):

$$\begin{aligned} y(n+1)=\frac{y(n)y(n-1)(y(n)-0.5)}{1 + y^{2}(n) + y^{2}(n-1)} + u(n), \end{aligned}$$

(19)

where $u(n)=\sin (2\pi /25)$, $y(0)=0$, and $y(1)=0$. For the incremental and evolving training procedures, 5,000 samples were created starting with $y(0)=0$, and further, 200 test samples were created for eliciting the root mean-squared error (RMSE) on these samples as a reliable estimator for the generalized prediction error.

Regarding a comparison with the evolving fuzzy modelling approaches DENFIS (Kasabov and Song 2002) and eTS (Angelov and Filev 2004) and its extension exTS (Angelov and Zhou 2006) for the minimum-input-minimum-output Takagi–Sugeno model case, FLEXFIS was used for predicting the Mackey–Glass chaotic time series, given by (the results on the first method are reported in Kasabov and Song (2002), while for the other two are reported in Angelov and Zhou (2006)):

$$\begin{aligned} \frac{\mathrm{d}x(t)}{\mathrm{d}t}=\frac{0.2x(t-\tau )}{1 + x^{10}(t-\tau )}- 0.1x(t), \end{aligned}$$

(20)

where $x(0)=1.2$, $\tau =17$, and $x(t)=0$ for $t<0$. The task is to predict $x(t+85)$ from the input vectors $[x(t-18) x(t-12) x(t-6) x(t)]$ for any value of $t$. 3,000 training samples were collected for $t$ in the interval [201, 3,200]. 500 test samples in the interval [5,001, 5,500] were collected to elicit the NDEI on unseen samples, which is the RMSE divided by the standard deviation of the target series (Table 1).

Table 1 FLEXFIS, eTS, $\text {Simp}\_{\text {eTS}}$, SAFIS, MRAN, RANEKF and Cascade NN

Full size table

It should be mentioned that the proposed cascade NN works in an online mode (unlike the other systems in Table 2).

Table 2 FLEXFIS, eTS, $\text {Simp}\_{\text {eTS}}$, SAFIS, MRAN, RANEKF and Cascade NN

Full size table

The electric load data were provided by a local supplier from Kharkiv, Ukraine. The data describe hourly electric load in that region in 2007 (8,760 samples). Sampling time of these data is one hour. 6,132 data points were used for training, and 2,628 data points were used for testing. The forecast made by the cascade neural network was one hour ahead.

Input variables to the cascade neural network were the load a week ago and yesterday at the same hour as predicted, the load an hour ago, the current load, the change in load from the previous to the current hour, and the number of the current hour within the current year (6 inputs altogether).

The cascade neural network used for prediction contained 4 cascades (3 neo-fuzzy neurons in each cascade with 5 membership functions per input in each neo-fuzzy neuron). After training the network gave 0.02165 mean squared error (MSE) of prediction. In Fig. 3, the forecast for 3 weeks at the end of March and the beginning of April 2007 is shown (Table 3).

Table 3 Cascades’ forecasting accuracy

Full size table

This period includes Easter holidays and the change-over from the winter to summer time, so it is characterized by less regularity of electricity consumption than most other weeks throughout a year. For this period alone, the cascade network provided a prediction with MSE = 0.02886. For comparison, for July 2007 MSE = 0.0223, because there were no holidays during this month. These results are about 40 % more accurate than those previously obtained for the same time series by RBFN and MLP (Table 4).

Table 4 Forecasting results

Full size table

The proposed cascade NN was also used for the Narendra time series prediction (Narendra and Parthasarathy 1990).

$$\begin{aligned} \begin{aligned}&y(k+1)=\frac{y(k)}{1 + y^{2}(k)} + f(u(k)),\\&f(u(k))=u^{3}(k),\\&u(k)=\sin (\pi k/250), \quad \text {if} \, k<500\\&u(k)=0.8\sin (\pi k/250) + 0.2\sin (\pi k/25),\\ \quad \text {if} \, k&\ge 500.\\ \end{aligned} \end{aligned}$$

(21)

The number of inputs is 6, the number of cascades is 5, and the number of neurons in each cascade is 3 (Table 5).

Table 5 The Narendra time series prediction

Full size table

7 Summary

This paper proposes a new architecture and learning algorithms for a hybrid cascade neural network with pool optimization in each cascade. The proposed system is different from existing cascade systems in its capability to operate in an online mode, which allows it to work with non-stationary and stochastic nonlinear chaotic signals with the required accuracy. Compared to the well-known evolving neuro-fuzzy systems based on Takagi–Sugeno fuzzy reasoning, the proposed system provides computational simplicity and possesses both tracking and filtering capabilities.

References

Almeida L, Silva F (1990) Speeding up backpropagation. Adv Neural Comput, pp 151–158
Angelov P, Filev D (2004) An approach to online identification of Takagi–Sugeno fuzzy models. IEEE Trans Syst Man Cybern Part B Cybern 34(1):484–498
Article Google Scholar
Angelov P, Filev D (2005) Simpl e TS: a simplified method for learning evolving Takagi–Sugeno fuzzy models. In: Proceedings of FUZZ-IEEE, pp 1068–1073
Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: Proceedings of the 1st international workshop on genetic fuzzy systems, pp 76–82
Angelov P, Lughofer E (2008) Data-driven evolving fuzzy systems using e TS and FLEXFIS: comparative analysis. Int J Gen Syst 37(1):45–67
Article MATH Google Scholar
Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: Proceedings of 2006 international symposium on evolving fuzzy systems, pp 29–35
Angelov P, Zhou X (2008) Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans Fuzzy Syst 16(6):1462–1475
Article Google Scholar
Angelov P, Xydeas C, Filev D (2004) Online identification of MIMO evolving Takagi–Sugeno fuzzy models. In: Proceedings of IJCNN-FUZZ-IEEE, pp 55–60
Angelov P, Lughofer E, Klement E (2005) Two approaches to data-driven design of evolving fuzzy systems: e TS and FLEXFIS. In: Proceedings of NAFIPS, pp 31–35
Angelov P, Giglio V, Guardiola C, Lughofer E, Lujan J (2006) An approach to modelbased fault detection in industrial measurement systems with application to engine test benches. Meas Sci Technol 17(7):1809–1818
Article Google Scholar
Angelov P, Zhou X, Filev D, Lughofer E (2007) Architectures for evolving fuzzy rulebased classifiers. In: Proceedings of SMC, pp 2050–2055
Angelov P, Lughofer E, Zhou X (2008) Evolving fuzzy classifiers using different model architectures. Fuzzy Sets Syst 159(23):3160–3182
Article MATH MathSciNet Google Scholar
Angelov P, Filev D, Kasabov N (2010) Evolving intelligent systems: methodology and applications. Wiley, New York
Book Google Scholar
Avedjan E, Barkan G, Levin I (1999) Cascade neural networks. Avtomatika i telemekhanika 3:38–55
Google Scholar
Bodyanskiy Y, Kolodyazhniy V (2010) Cascaded multi-resolution spline-based fuzzy neural network. In: Proceedings of international symposium on evolving intelligent systems, pp 26–29
Bodyanskiy Y, Pliss I (1990) Adaptive generalized forecasting of multivariate stochastic signals. In: Proceedings Latvian signal proceedings of international conference, vol 2, pp 80–83
Bodyanskiy Y, Viktorov Y (2009a) The cascaded neo-fuzzy architecture and its on-line learning algorithm. Intell Process 9:110–116
Google Scholar
Bodyanskiy Y, Viktorov Y (2009b) The cascaded neo-fuzzy architecture using cubic-spline activation functions. Inf Theor Appl 16(3):245–259
Google Scholar
Bodyanskiy Y, Vorobyov S (2000) Recurrent neural network detecting changes in the properties of nonlinear stochastic sequences. Autom Remote Control 61(7):1113–1124
Google Scholar
Bodyanskiy Y, Madjarov N, Pliss I (1983) Adaptive forecasting of nonstationary processes. Avtomatika I Izchislitelna Tekhnika 6:5–12
Google Scholar
Bodyanskiy Y, Pliss I, Solovyova T (1986) Multistep optimal predictors of multidimensional non-stationary stochastic processes. Doklady AN USSR A(12):47–49
Google Scholar
Bodyanskiy Y, Pliss I, Solovyova T (1989) Adaptive generalized forecasting of multidimensional stochastic sequences. Doklady AN USSR A(9):73–75
Google Scholar
Bodyanskiy Y, Stephan A, Vorobyov S (1999) Algorithm for adaptive identification of dynamical parametrically nonstationary objects. J Comp Syst Sci Int 38(1):14–38
Google Scholar
Bodyanskiy Y, Cichocki A, Vorobyov S (2001a) An adaptive noise cancellation for multisensory signals. Fluct Noise Lett 1(1):13–24
Google Scholar
Bodyanskiy Y, Kolodyazhniy V, Stephan A (2001b) An adaptive learning algorithm for a neuro-fuzzy network, pp 68–75
Bodyanskiy Y, Kokshenev I, Kolodyazhniy V (2003a) An adaptive learning algorithm for a neo-fuzzy neuron. In: Proceedings of international conference of European Union Society for fuzzy logic and technology, pp 375–379
Bodyanskiy Y, Kolodyazhniy V, Otto P (2003b) A new learning algorithm for a forecasting neuro-fuzzy network. Integr Comput Aided Eng 10(4):399–409
Google Scholar
Bodyanskiy Y, Dolotov A, Pliss I, Viktorov Y (2008) The cascaded orthogonal neural network. Inf Sci Comput 2:13–20
Google Scholar
Bodyanskiy Y, Viktorov Y, Pliss I (2009) The cascade growing neural network using quadratic neurons and its learning algorithms for on-line information processing. Intell Inf Eng Syst 13:27–34
Google Scholar
Bodyanskiy Y, Grimm P, Teslenko N (2011a) Evolving cascaded neural network based on multidimensional Epanechnikov’s kernels and its learning algorithm. Inf Technol Knowl 5(1):25–30
Google Scholar
Bodyanskiy Y, Kharchenko O, Vynokurova O (2011b) Hybrid cascaded neural network based on wavelet-neuron. Inf Theor Appl 18(4):335–343
Google Scholar
Bodyanskiy Y, Teslenko N, Vynokurova O (2011c) Cascaded GMDH-wavelet-neuro-fuzzy network. In: international workshop on inductive modelling, pp 22–30
Chan L, Fallside F (1987) An adaptive learning algorithm for backpropagation networks. Comput Speech Lang 2:205–218
Article Google Scholar
Cichocki A, Unbehauen R (1993) Neural Netw Optim Signal Process. Teubner, Stuttgart
Google Scholar
Fahlman S, Lebiere C (1990) The cascade-correlation learning architecture. Adv Neural Inf Process Syst 2:524–532
Google Scholar
Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, NJ
MATH Google Scholar
Hoff M, Widrow B (1960) Adaptive switching circuits, pp 96–104
Holmes G, Veitch A (1991) A modified quickprop algorithm. Neural Comput 3:310–311
Article Google Scholar
Jang JSR, Sun CT, Muzutani E (1997) Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice Hall, Upper Saddle River
Google Scholar
Kaczmarz S (1937) Angenäeherte Auflösung von Systemen linearer Gleichungen. Bull Acad Polon Sci Lett A 35:355–357
Kaczmarz S (1993) Approximate solution of systems of linear equations. Int J Control 53:1269–1271
Article MathSciNet Google Scholar
Kadirkamanathan V, Niranjan M (1993) A function estimation approach to sequential learning with neural networks. Neural Comput 5:954–975
Article Google Scholar
Kasabov N (2001) Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Trans Syst Man Cybern Part B Cybern 31(6):902–918
Article Google Scholar
Kasabov N (2003) Evolving connectionist systems. Springer, London
Book Google Scholar
Kasabov N (2007) Evolving connectionist systems: the knowledge engineering approach. Springer, London
Google Scholar
Kasabov N, Song Q (2002) DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans Fuzzy Syst 10(2):144–154
Article Google Scholar
Kasabov N, Zhang D, Pang P (2005) Incremental learning in autonomous systems: evolving connectionist systems for on-line image and speech recognition. In: Proceedings of IEEE workshop on advanced robotics and its social impacts, pp 120–125
Kruschke J, Movellan J (1991) Benefits of gain: speed learning and minimum layers backpropagation networks. IEEE Trans Syst Man Cybern 21:273–280
Article MathSciNet Google Scholar
Kusanagi H, Miki T, Uchino E, Yamakawa T (1992) A neo-fuzzy neuron and its applications to system identification and prediction of the system behavior. In: Proceedings of international conference on fuzzy logic and neural networks, pp 477–483
Ljung L (1999) System identification: theory for the user. Prentice Hall, NJ
Book Google Scholar
Lughofer E (2006) Process safety enhancements for data-driven evolving fuzzy models. In: Proceedings of the 2nd symposium on evolving fuzzy systems, pp 42–48
Lughofer E (2008a) Evolving vector quantization for classification of on-line data streams. In: Proceedings of conference on computational intelligence for modelling, control and automation, pp 780–786
Lughofer E (2008b) Extensions of vector quantization for incremental clustering. Pattern Recognit 41(3):995–1011
Article MATH Google Scholar
Lughofer E (2008c) FLEXFIS: a robust incremental learning approach for evolving ts fuzzy models. IEEE Trans Fuzzy Syst 16(6):1393–1410
Article Google Scholar
Lughofer E (2010a) On dynamic selection of the most informative samples in classification problems. In: Proceedings of the 9th international conference in machine learning and applications, pp 120–125
Lughofer E (2010b) On-line evolving image classifiers and their application to surface inspection. Image Vis Comput 28(7):1063–1172
Article Google Scholar
Lughofer E (2011) Evolving fuzzy systems and methodologies: advanced concepts and applications. Springer, Heidelberg
Book Google Scholar
Lughofer E, Angelov P (2009) Detecting and reacting on drifts and shifts in on-line data streams with evolving fuzzy systems. In: Proceedings of the IFSA/EUSFLAT 2009 conference, pp 931–937
Lughofer E, Angelov P (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl Soft Comput 11(2):2057–2068
Article Google Scholar
Lughofer E, Bodenhofer U (2006) Incremental learning of fuzzy basis function networks with a modified version of vector quantization. IPMU 2006:56–63
Google Scholar
Lughofer E, Guardiola C (2008a) Applying evolving fuzzy models with adaptive local error bars to on-line fault detection. In: Proceedings of genetic and evolving fuzzy systems 2008, pp 35–40
Lughofer E, Guardiola C (2008b) On-line fault detection with data-driven evolving fuzzy models. J Control Intell Syst 36(4):307–317
MATH Google Scholar
Lughofer E, Kindermann S (2008) Improving the robustness of data-driven fuzzy systems with regularization. In: Proceedings of the IEEE world congress on computational intelligence 2008, pp 703–709
Lughofer E, Kindermann S (2010) Sparse FIS: data-driven learning of fuzzy systems with sparsity constraints. IEEE Trans Fuzzy Syst 18(2):396–411
Google Scholar
Lughofer E, Klement E (2003) Online adaptation of Takagi–Sugeno fuzzy inference systems. In: Proceedings of CES-IMACS multiconference
Lughofer E, Klement E (2004) Premise parameter estimation and adaptation in fuzzy systems with open-loop clustering methods. In: Proceedings of FUZZ-IEEE 2004
Lughofer E, Efendic H, Re L, Klement E (2003) Filtering of dynamic measurements in intelligent sensors for fault detection based on data-driven models. In: Proceedings of the IEEE CDC conference, pp 463–468
Lughofer E, Klement E, Lujan J, Guardiola C (2004) Model-based fault detection in multi-sensor measurement systems. In: Proceedings of IEEE IS 2004, pp 184–189
Lughofer E, Huellermeier E, Klement E (2005) Improving the interpretability of data-driven evolving fuzzy systems. In: Proceedings of EUSFLAT 2005, pp 28–33
Lughofer E, Angelov P, Zhou X (2007) Evolving single- and multi-model fuzzy classifiers with FLEXFIS- class. In: Proceedings of FUZZ-IEEE 2007, pp 363–368
Lughofer E, Smith J, Caleb-Solly P, Tahir M, Eitzinger C, Sannen D, Nuttin M (2009) On human-machine interaction during on-line image classifier training. IEEE Trans Syst Man Cybern Part A Syst Hum 39(5):960–971
Miki T, Yamakawa T (1999) Analog implementation of neo-fuzzy neuron and its on-board learning. In: Computational intelligence and applications. WSES Press, Piraeus, pp 144–149
Narendra K, Parthasarathy K (1990) Identification and control of dynamical systems using neural networks. IEEE Trans Neural Netw 1(1):4–27
Article Google Scholar
Prechelt L (1997) Investigation of the Cas Cor family of learning algorithms. Neural Netw 10:885–896
Article Google Scholar
Rong NJ, Huang GB, Saratchandran P (2006) Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction. Fuzzy Sets Syst 157(9):1260–1275
Article MATH MathSciNet Google Scholar
Schalkoff R (1997) Artificial neural networks. The McGraw-Hill Comp., New York
MATH Google Scholar
Uchino E, Yamakawa T (1997) Soft computing based signal prediction, restoration and filtering. Fuzzy logic, neural networks and genetic algorithms. In: Intelligent Hybrid Systems. Kluwer Academic Publishers, Boston, pp 331–349
Yingwei L, Sundararajan N, Saratchandran P (1997) A sequential learning scheme for function approximation using minimal radial basis function (RBF) neural networks. Neural Comput 9:461–478
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Control Systems Research Laboratory, Kharkiv National University of Radioelectronics, Lenina av., 14, Kharkiv, 61166, Ukraine
Ye. Bodyanskiy, O. Tyshchenko & D. Kopaliani

Authors

Ye. Bodyanskiy
View author publications
You can also search for this author in PubMed Google Scholar
O. Tyshchenko
View author publications
You can also search for this author in PubMed Google Scholar
D. Kopaliani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O. Tyshchenko.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bodyanskiy, Y., Tyshchenko, O. & Kopaliani, D. A hybrid cascade neural network with an optimized pool in each cascade. Soft Comput 19, 3445–3454 (2015). https://doi.org/10.1007/s00500-014-1344-3

Download citation

Published: 27 June 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s00500-014-1344-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A hybrid cascade neural network with an optimized pool in each cascade

Abstract

Similar content being viewed by others

Adaptive learning of an evolving cascade neo-fuzzy system in data stream mining tasks

Cascade chaotic neural network (CCNN): a new model

A Hybrid Cascade Neural Network with Ensembles of Extended Neo-Fuzzy Neurons and Its Deep Learning

1 Introduction

2 An optimized cascade neural network architecture

3 Training elementary Rosenblatt perceptrons in a cascade neural network

4 Training neo-fuzzy neurons in a cascade neural network

5 Optimization of the pool output

6 Experimental results

7 Summary

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid cascade neural network with an optimized pool in each cascade

Abstract

Similar content being viewed by others

Adaptive learning of an evolving cascade neo-fuzzy system in data stream mining tasks

Cascade chaotic neural network (CCNN): a new model

A Hybrid Cascade Neural Network with Ensembles of Extended Neo-Fuzzy Neurons and Its Deep Learning

Explore related subjects

1 Introduction

2 An optimized cascade neural network architecture

3 Training elementary Rosenblatt perceptrons in a cascade neural network

4 Training neo-fuzzy neurons in a cascade neural network

5 Optimization of the pool output

6 Experimental results

7 Summary

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation