1 Introduction

At a very fundamental and philosophical level, the field of neural networks (NNs) deals with understanding the brain as an information processing machine. Conversely, from a computational perspective, it concerns utilizing the knowledge of the (human) brain to build more intelligent computational models and computer systems. Thus, in the last few decades, NNs have been widely and successfully applied to an ensemble of information processing problems, and the areas of Pattern Recognition (PR) nd forecasting are rather primary application domains. For example, the authors of [11] have applied the classical Back Propagation NN for the segmentation of Meteosat images. Similarly, the literature [9] has also reported an alternative approach to the class prediction of protein structures by means of NNs. These authors have stated that their proposed scheme maintains the same level of classification accuracy as its competition, although it is achieved with minimum computational time. In [17], Oliver and his co-authors verified the effectiveness of an adaptation of the NN in the design and development of solutions for some urban network problems.

Since classic NNs have attracted so much attention, many researchers have proposed various improved or novel NN models, such as celluar NNs [7, 8], spiking NNs [12], random NNs [27], chaotic NNs etc. In this paper, we concentrate on the philosophy and paradigm of chaotic neural networks (CNNs).

It is an accepted fact that a well-defined artificial neural systems could display three typical properties: convergence, oscillation and chaos. Indeed, Freeman’s clinical work has clearly demonstrated that the brain, at the individual neural level and at the global level, possesses chaotic properties. He demonstrated that the quiescent state of the brain is chaos. However, during perception, when attention is focused on any sensory stimulus, the brain activity becomes more periodic [10]. Thus, as applied scientists, if we are able to develop a system which mimics the brain, it could lead to a new model of NNs—which is the motivation with designing and utilizing CNNs.

The theory of CNNs has been extensively studied in the last two decades since Aihara and Adachi et al. proposed the first CNN model [1, 2] (named the AdNN in our previous papers [19, 21, 22]). The AdNN was, actually, based on the modeling of the giant axons of squids. CNN models with biological plausibility have been widely used in various fields such as multi-value content addressing, optimization, image segmentation, information retrieval and data encryption [16, 23, 29]. Motivated by the work of Aihara, Adachi et al., various types of CNNs have been proposed to solve a number of optimization problems (such as the Traveling Salesman Problem, (TSP)), or to obtain Associative Memory (AM) and/or PR properties.

An interesting step in this regard was the work reported in [28], where the authors utilized the delayed feedback and the Ikeda map to design a CNN to mimic the biological phenomena observed by Freeman [10]. Later, Hiura and Tanaka investigated several CNNs based on a piecewise sine map or the Duffing’s equation to solve the TSP [13, 25, 26]. Another valuable work related to chaos and NNs was reported in [3]. The authors of [3] applied CNNs to the family of the original Bidirectional Associative Memory (BAM) NNs, and also designed the Chaotic BAM (C-BAM) models. Their work demonstrated that the C-BAM family can access patterns that members of the original BAM family were incapable of accessing.

The primary aim of this paper is to give a general framework for the design of CNNs. By appropriately choosing self-feedback, coupling functions and external stimulus, we are able to drive a dynamical system, defined by discrete time feedback equations, to interesting chaotic trajectories. Our general framework has the same topological structure (completely connected) as the CNNs listed above. It is characterized by the definition of a recurrent NN described in terms of a Present-State/Next-State function and a State/Output function. The sigmoid function is still used as the transfer function. But more importantly, the work presented in this article is a prelude to a novel strategy for the design of CNNs. Essentially, the chaotic feedback module of this newly proposed framework can be easily modified and then substituted for by other conditional functions. Simultaneously, the network can be controlled to possess analogous properties—as long as the respective parameters are appropriately tuned.

2 State of the Art

A classical NN can be characterized by five elements: the dynamics of the individual neuron, the network’s topology, the learning rule used, the input and the output. In this paper, we mainly consider a typical NN with chaotic and recurrent characteristics.

A recurrent NN usually has a structure as showed in Fig. 1; it is usually fully connected. Each neuron has an external input \(s_i\) and an output \(u_i\). Each neuron also receives outputs from other neurons. The summation of external input, the threshold value and weighted outputs from other neurons is called the “net input”. The output of a neuron is given by a transform function, which is usually a sigmoid or a hard-limited function. The essential difference between a classical NN and a chaotic NN is that a CNN’s neuron displays obvious chaotic properties, whereas a classical NN does not.

Fig. 1
figure 1

a The structure of a recurrent NN with four neurons. b The typical structure of an individual neuron

2.1 The Adachi Neural Network (AdNN) and its variants

The AdNN is a network of neurons with weights associated with the edges, a well-defined Present-State/Next-State function, and a well-defined State/Output function. It is composed of N neurons which are topologically arranged as a completely connected graph, as shown in Fig. 1a. A neuron, identified by the index i, is characterized by two internal states, \(\eta _i(t)\) and \(\xi _i(t)\) (\(i=1...N\)) at time t, whose values are defined by Eqs. (1) and (2) respectively. The output of the ith neuron, \(u_i(t)\), is given by Eq. (3), which specifies the so-called sigmoid function.

$$\begin{aligned} \eta _i(t+1)= & {} k_f\eta _i(t)+\sum _{j=1}^{N} w_{ij}u_j(t), \end{aligned}$$
(1)
$$\begin{aligned} \xi _i(t+1)= & {} k_r\xi _i(t)-\alpha u_i(t)+a_i. \end{aligned}$$
(2)
$$\begin{aligned} u_i(t+1)= & {} f(\eta _i(t+1)+\xi _i(t+1)). \end{aligned}$$
(3)

As per the above definitions and illustration, the structure of a single AdNN’s neuron can be described by Fig. 2. The reader must observe that it is almost the same as the one in Fig. 1b except that there is one more nonlinear unit, which leads to the chaotic phenomena. The reader should also note a fundamental difference between Figs. 2 and 1b which is that the former has a self-feedback connection which the latter does not have.

Fig. 2
figure 2

The structure of an AdNN’s single neuron

The AdNN uses the Hebbian rule to determine the weights of the connections. Under certain settings, the AdNN can behave as a dynamic AM. It can dynamically recall all the memorized patterns as a consequence of an input which serves as a “trigger”. Further, if the external stimulations correspond to trained patterns, the AdNN can behave like a PR system.

By invoking an analysis based on Lyapunov Exponents (LEs), one can show that the AdNN has 2N negative LEs. In order to obtain positive LEs, Calitoiu et al. [4] proposed a model of CNNs which modifies the AdNN to enhance its PR capabilities. The most significant difference is the updating of the values of the internal state(s). Unlike the AdNN, which incorporates all the internal states to achieve the dynamical behavior, the M-AdNN uses two global internal states which are both associated with a single neuron, for example, the \(N^{th}\) neuron. Thus, Eqs. (1) and (2) get modified to become Eqs. (4) and (5) respectively:

$$\begin{aligned} \eta _i(t+1)= & {} k_f\eta _N(t)+\sum _{j=1}^{N} w_{ij}u_j(t), \end{aligned}$$
(4)
$$\begin{aligned} \xi _i(t+1)= & {} k_r\xi _N(t)-\alpha u_i(t)+a_i. \end{aligned}$$
(5)

By resorting to this modification, the M-AdNN has two positive LEs, namely: \(\lambda _N=\ln k_f+\frac{1}{2}\ln N\) and \(\lambda _{2N}=\ln k_r+\frac{1}{2}\ln N\), which renders the M-AdNN to be truly chaotic.

Calitoiu and his co-authors also proposed a new approach for modeling the problem of blurring or inaccurate perception, and demonstrated that the quality of a system can be modified without manipulating the quality of the stimulus. This new model has been termed the Mb-AdNN [5]. As opposed to the M-AdNN, where the updates are based on two global states, the updates in the Mb-AdNN are based on the states of the first m neurons. In the interest of brevity, the details of the Mb-AdNN are omitted here.

2.2 Our Previous Work

More recently, in our previous paper [18], we presented a collection of previously unreported properties of the AdNN. We have shown that it goes through a spectrum of characteristics as one of its crucial parameters, \(\alpha \), changes. As \(\alpha \) increases, it is first an AM, and it then becomes quasi-chaotic. The system is subsequently distinguished by two phases which really do not have clear boundaries of demarcation, where in the former it is quasi-chaotic for some patterns, and periodic for others, and in the latter, it exhibits PR properties. It is fascinating that the AdNN also possesses the capability to recognize masked or occluded patterns, and even patterns that are completely inverted.

Later, we investigated the problem of reducing the computational cost of the AdNN and its variants. Because their structures involve a completely connected graph, the computational complexity of the AdNN (and its variants) is quadratic in the number of neurons. In [20], we considered how the computations can be significantly reduced by merely using a linear number of inter-neuron connections. To achieve this, we extracted from the original completely connected graph, one of its spanning trees, and then computed the best weights for this spanning tree by using a gradient-based algorithm. By a detailed experimental analysis, we showed that the new linear-time AdNN-like network possesses chaotic and PR properties for different settings.

2.3 Overview of Other Chaotic Neural Networks

2.3.1 A Duffing’s Equation Based CNN

The CNN based on Duffing’s equation (referred to as the Du-CNN in this paper) was initially proposed in [13]. This model can be summarized as following: For a single neuron, the internal state is determined from the variable x(t) of the Duffing’s equation, which is defined by:

$$\begin{aligned} \left\{ \begin{array}{rrl} \frac{dx}{dt} &{}=&{} y \\ \frac{dy}{dt} &{}=&{} \varepsilon -\alpha y + \beta x -\gamma x^3 + f \cos z \\ \frac{dz}{dt} &{}=&{} \omega , \end{array} \right. \end{aligned}$$
(6)

where the constant \(\alpha \) is a damping coefficient, \(\beta \) and \(\gamma \) are coefficients of the “double well” potential, f and \(\omega \) are an amplitude and a frequency of a periodic driving force, respectively. \(\varepsilon \) is a gradient parameter of a driving extended field. It describes a dynamical system that exhibits chaotic behavior. In the Du-CNN, all the neurons are completely connected. The total net input of the \(i^{th}\) neuron at time t is given by:

$$\begin{aligned} I_i(t)=\sum _j w_{ij}u_j(t)+s_i-\theta _i, \end{aligned}$$
(7)

where \(w_{ij}\) is the coupling strength between \(i^{th}\) neuron and \(j^{th}\) neuron. \(s_i\) and \(\theta _i\) are an external input and a threshold value of \(i^{th}\) neuron, respectively. \(u_j(t)\) is the output of the \(j^{th}\) neuron. The final output of the \(i^{th}\) neuron is dictated by a sigmoid function:

$$\begin{aligned} u_i=f(x_i)=\frac{1}{1+e^{-x_i/T}}, \end{aligned}$$
(8)

where T is a temperature-like parameter used to control the uncertainty associated with the firing of the neuron. The reader must note that the output at the next time instant depends significantly on the previous net input. This is achieved by controling the parameter \(\varepsilon \) in Eq. (6):

$$\begin{aligned} \varepsilon =\lambda \arctan (I_i(t)). \end{aligned}$$
(9)

As a result, the structure of the neuron in this NN is slightly different from the AdNN’s, as shown in Fig. 3. Compared to Fig. 2, there is one more control function T, which is used to control \(\varepsilon \). The nonlinear unit is defined by the Duffing’s equation, Eq. (6).

Fig. 3
figure 3

The structure of a Du-CNN’s single neuron

2.3.2 A PWSM Based CNN

Essentially, the CNN based on the Piece-Wise Sin Map (PWSM) is very similar to the CNN illustrated in Sect. 2.3.1. It is also a network with a fully-connected structure. The internal state of a single neuron is determined from the variable x(t) of a PWSM:

$$\begin{aligned} x_i(t)=g_i(x_i(t)), \end{aligned}$$
(10)

where the function \(g(\cdot )\) is called the Piece-Wise Sin Map defined by:

$$\begin{aligned} g_i(x) = \left\{ \begin{array}{l} g^-_i(x)=\frac{1}{2}\sin [2\{\pi +\sin ^{-1}(2\varepsilon ^-_i)\}x] ~~~(-0.5\le x <0), \\ g^+_i(x)=\frac{1}{2}\sin [2\{\pi +\sin ^{-1}(2\varepsilon ^+_i)\}x] ~~~(0\le x \le 0.5),\\ \end{array} \right. \end{aligned}$$
(11)

where \(\varepsilon ^{\pm }_i\) are the values of \(|g^{\pm }_i(x)|\) as \(x_i=\pm 0.5\). Both \(\varepsilon ^+_i\) and \(\varepsilon ^-_i\) are positive and satisfy \(\varepsilon ^+_i+\varepsilon ^-_i=0.5\). t is a discrete time index with respect to the time evolution of the map. Just as in the case of the Du-CNN, the net input of neuron i is given by Eq. (7). The impact of the net input on the network is also, similar to the Du-CNN, achieved via controlling the parameter \(\varepsilon \) defined by:

$$\begin{aligned} \varepsilon ^{\pm }_i(t)=\frac{0.25}{1+\eta I^2_i(t)}, \\ \varepsilon ^{\mp }_i(t)=0.5-\varepsilon ^{\pm }_i(t). \end{aligned}$$

Finally, the output of each neuron is given by a step function:

$$\begin{aligned} u_i=f(x_i)=\left\{ \begin{array}{l} 1, \quad 0 \le x_i(n) \le 0.5\\ 0, \quad -0.5 \le x_i(n) <0 \end{array} \right. \end{aligned}$$
(12)

Obviously, this PWSM-based CNN has exactly the same structure as the Du-CNN. Furthermore, both of these CNNs also use the Hebbian rule to determine the connection weights, as reported in [13, 25, 26]. In the interest of brevity, further details of this CNN are omitted here.

2.3.3 Time Delayed Differential Equation Based CNNs

Delayed CNNs have been widely investigated in the past decades. They are also a kind of Hopfield-like NN which exhibits rich nonlinear dynamics. A time delayed differential equation-based CNN usually has a general form:

$$\begin{aligned} \frac{d{x_i}(t)}{dt} = - {c_i}{x_i}(t) + \sum \limits _{j = 1}^n {a_{ij}}f({x_j}(t)) + \sum \limits _{j = 1}^n {{b_{ij}}f({x_j}(t - {\tau _{ij}}(t))) + {I_i}(t)}, \end{aligned}$$
(13)

where n denotes the number of units in the CNN, \(x(t) = \{x_1(t), \ldots , x_n(t)\} \in R_n\) is the state vector associated with the neurons, \(I = \{I_1, I_2, \ldots , I_n\} \in R_n\) is the external input vector, \(f(x(t)) = \{f_1(x_1(t)), f_2(x_2(t)), \ldots , f_n(x_n(t))\} \in R_n\) corresponds to the activation functions of the neurons, \(\tau (t) = \tau _{ij} (t) (i, j = 1, 2, \ldots , n)\) are the time delays. \( C = diag(c_1, c_2, \ldots , c_n)\) is a diagonal matrix, \(A = (a_{ij})_{n\times n}\) and \(B = (b_{ij})_{n\times n}\) are the connection weight matrix and the delayed connection weight matrix, respectively. The dynamics of Eq. (13) have been well studied and it has been reported that it can exhibit rich chaotic phenomena, for example, if the parameters are: \(A=\left( \begin{array}{cc} 2.0 &{} -0.1 \\ -5.0 &{} 3.0 \\ \end{array} \right) \), \(B=\left( \begin{array}{cc} -1.5 &{} -0.1 \\ -0.5 &{} -2.5 \\ \end{array} \right) \), \(C=\left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \\ \end{array} \right) \).

Further, if \(f_i(x_i(t))=tanh(x_i(t))\), \(\tau (t)=1+0.1sin(t)\), and \(I=0\), the trajectories of Eq. (13) are shown in Fig. 4, which is an apparent chaotic trajectory.

Summary As we can see from the above discussions, the key point in designing a CNN lies in the manner in which we build a neuron that could possess chaotic properties. Apart from the above mentioned CNNs, other NNs have also been reported (e.g., the models analyzed in [15, 24]) using a common method that directly choose a variable of the chaotic equations to indicate a neuron’s internal state. In every case, all the neurons are fully inter-connected. Also, the connection weights are determined by a given learning rule, e.g., the Hebbian rule. It has been reported that such models can be well trained to do different tasks such as solving the TSP, PR, AM and so on.

Fig. 4
figure 4

The trajectories of Eq. (13). In this figure, the values of x(t) and y(t) are calculated by means of the fourth-order Runge-Kutta method. The time span is from 0 to 200 with a total of 30,000 steps

In this paper, we intend to propose a novel and universal way to design CNNs.

3 Preliminaries

A discrete time Present-state/Next-state recurrent NN with n neurons can be described by:

$$\begin{aligned} {\varvec{x}}(t+1)={\varvec{f}}\,({\varvec{x}}(t))+{\varvec{g}}({\varvec{x}}(t))+{\varvec{h}}({\varvec{I}}(t)), \end{aligned}$$
(14)

where \({{\varvec{x}}}\in \mathcal {R}^n\), \({{\varvec{x}}}(t)=[x_1(t),x_2(t),\ldots ,x_n(t)]^T\) is the network state vector at time step t. \({{\varvec{f}}}\), \({{\varvec{g}}}\) and \({{\varvec{h}}}\) are continuous differentiable functions as follows:

  1. 1.

    \({{\varvec{f}}}({{\varvec{x}}})=[f_1(x_1),\, f_2(x_2),\ldots ,f_n(x_n)]^T\) is a self-feedback function;

  2. 2.

    \({{\varvec{g}}}({{\varvec{x}}})=[g_1({{\varvec{x}}}),g_2({{\varvec{x}}}),\, \ldots ,\, g_n({{\varvec{x}}})]^T\) is a coupling function;

  3. 3.

    \({{\varvec{h}}}({{\varvec{I}}}(t))=[h_1(I_1(t)),h_2(I_2(t)),\ldots ,h_n(I_n(t))]^T\) is an extra stimulus transfer function.

\({{\varvec{I}}}(t)=[I_1(t)\), \(I_2(t),\ldots ,I_n(t)]^T\) is an extra stimulation.

To simplify the model, we let \(f_i(\cdot )=f_j(\cdot )\), \(g_i(\cdot )=g_j(\cdot )\) and \(h_i(\cdot )=h_j(\cdot )\) for every pair, i and j.

The model given by Eq. (14) characterizes most of discrete recurrent NNs such as the discrete Hopfield network, a single neuron of which is typically described by:

$$\begin{aligned} x_i(t+1)=kx_i(t)+\alpha \sum ^n_{j=1}w_{ij}y_j(t)+I_i, \end{aligned}$$
(15)

where k is a constant refractory factor, and \(y_i(t)\) is an output function of the state \(x_i(t)\) at time t. By comparing Eqs. (14) and (15), we can observe that the Hopfield model is a very special case of our “universal” model.

Before we proceed, we state the following definition and proposition, so that the claims that follow can be justified.

Definition 1

An \(n\times n\) real matrix A is positive definite (denoted as \(A>0\)) if \(Z^TAZ > 0\) for all non-zero column vectors Z with real entries (\(Z\in \mathcal {R}^n\)), where \(Z^T\) denotes the transpose of Z.

Proposition 1

For any given symmetric diagonal dominant matrix \(A=[a_{ij}], a_{ij}=a_{ji}\), A is positive definite if \(a_{ii}>0\) for all \(i=1,2,\ldots ,n\).

Proof

This can be proven easily by the definition of positive definiteness. If \(A=[a_{ij}]\) with \(a_{ii}>0\), A is diagonal dominant if \(a_{ii}>\sum ^n_{j=1,j\ne i}|a_{ij}|\) for all \(i,j=1,2,\ldots ,n\). Now consider \(Z=[z_1,z_2,\ldots ,z_n]^T\), with \(Z\ne 0\). Then:

$$\begin{aligned} Z^TAZ= & {} \sum ^{n}_{i=1}\left( z_i\sum ^{n}_{j=1}a_{ji}z_j\right) = \sum ^n_{i=1}\sum ^n_{j=1}z_iz_ja_{ij} \\= & {} \sum ^{n}_{i=1}\sum ^n_{j=1,j\ne i}z_iz_ja_{ij}+\sum ^n_{i=1}z^2_ia_{ii}. \end{aligned}$$

Thus,

$$\begin{aligned} Z^TAZ= & {} \sum ^{n}_{i=1}\sum ^n_{j=1,j\ne i}z_iz_ja_{ij}+\sum ^n_{i=1}z^2_ia_{ii} \nonumber \\> & {} \sum ^n_{i=1}\sum ^n_{j=1,j\ne i}z_iz_ja_{ij}+\sum ^n_{i=1}z^2_i\sum ^n_{j=1,j\ne i}|a_{ij}| \nonumber \\= & {} |a_{12}|(z_1\pm z_2)^2+|a_{13}|(z_1\pm z_3)^2+\cdots \nonumber \\&+\, |a_{n(n-1)}|(z_n\pm z_{n-1})^2 \nonumber \\\ge & {} 0. \end{aligned}$$
(16)

Hence the result. \(\square \)

Theorem 1

Let A and B be real symmetric matrices. Then the roots \(\{\kappa _i\}\) of the characteristic equation \(det[A-\kappa B]=0\) satisfy \(\kappa _i\ge 1\), \((i=1,2,\ldots ,n)\) if \(A>0\), \(B>0\) and \(A-B\ge 0\).

Proof

A is real and symmetric. Thus there exists an orthogonal matrix Q such that \(Q^TAQ={\varLambda }_a\) where \({\varLambda }_a=diag(a_1, a_2, \ldots , a_n)\), where \(a_i ~(i=1, 2, \ldots , n)\) are the eigenvalues of A. Obviously, \(a_i>0\) for all \(i=1, 2, \ldots , n\) because A is positive definite.

Let \(C=diag(c_1, c_2, \ldots , c_n)\) where \(c_i=\sqrt{\frac{1}{a_i}}\). Clearly, we have \(C^TQ^T\textit{AQC}=I\) where I is the Identity matrix. If we denote \(P=QC\), we get:

$$\begin{aligned} P^T\textit{AP}=I. \end{aligned}$$
(17)

Similarly, B is real and symmetric, and so there exists an orthogonal matrix H such that \(H^T\textit{BH}={\varLambda }_b\) where \({\varLambda }_b=diag(b_1, b_2, \ldots , b_n)\), where \(b_i>0 ~(i=1, 2, \ldots , n)\) are the eigenvalues of B. Consequently, we obtain

$$\begin{aligned} P^TBP=P^TH{\varLambda }_bH^TP=(H^TP)^T{\varLambda }_b(H^TP). \end{aligned}$$
(18)

We now denote \(R=H^TP\), and thus:

$$\begin{aligned} P^T\textit{BP}=R^T{\varLambda }_bR. \end{aligned}$$
(19)

Since H, Q and C are invertible, R is also invertible. As a result,

$$\begin{aligned} (R^T)^{-1}P^T\textit{BPR}^{-1}={\varLambda }_b. \end{aligned}$$
(20)

On the other hand, observe that

$$\begin{aligned} R^TR=P^T\textit{HH}^TP=P^TP=C^TQ^T\textit{QC}={\varLambda }_a^{-1}. \end{aligned}$$
(21)

According to Eqs. (17) and (21), we get

$$\begin{aligned} (R^T)^{-1}P^T\textit{APR}^{-1}=(R^TR)^{-1}={\varLambda }_a. \end{aligned}$$
(22)

If we now denote \((PR^{-1})=S\), it implies that \(S^T\textit{AS}={\varLambda }_a\) and \(S^T\textit{BS}={\varLambda }_b\), where S is also invertible. One can easily verify that:

\(S^TS=(\textit{PR}^{-1})^T(\textit{PR}^{-1})=(\textit{PP}^{-1}(H^T)^{-1})^T(\textit{PP}^{-1}(H^T)^{-1})=I\), which indicates the matrix S is orthogonal. Consequently:

\(S^T(A-B)S=diag(a_1-b_1, a_2-b_2, \ldots , a_n-b_n)\).

We now use the fact that \(A-B\ge 0\), which means that \(a_i\ge b_i\) for \(i=1, 2, \ldots , n\). Therefore,

\(det[A-\kappa B]=0\,\Rightarrow \,det[S^T\textit{AS}-S^T\kappa \textit{BS}]=0\)

\(\Rightarrow \) \(det[S^T(A-\kappa B)S]=0\)

\(\Rightarrow \) \(\kappa _i=a_i/b_i\ge 1\), proving the result. \(\square \)

4 A Framework for CNNs Design

We are now in the position to discuss how we can force the system described by Eq. (14) to yield chaotic properties.

First of all, we do not expect the states of the network to tend towards infinity (i.e., unbounded values), because if they did the network would be “unusable”. We thus constrain the self-feedback, the coupling function and the external stimulus uniform to be bounded, and so, for all \({{\varvec{x}}}(t)\in \mathcal {R}^n\):

  1. 1.

    \(||{{\varvec{f}}}({{\varvec{x}}}(t))||_{\infty }\le \epsilon \),

  2. 2.

    \(||{{\varvec{g}}}({{\varvec{x}}}(t))||_{\infty }\le G\), and

  3. 3.

    \(||{{\varvec{h}}}({{\varvec{I}}}(t))||_{\infty }\le H\).

We also set, \({{\varvec{g}}}(\mathbf 0 )=\mathbf 0 \) which means there is no “coupling stimulation” while a neuron’s net input is 0.

As a result, for any given initial point \({{\varvec{x}}}(0)\), the trajectory:

\(||{{\varvec{x}}}(t)||_{\infty }\,\le ||{{\varvec{f}}}({{\varvec{x}}}(t))||_{\infty }+ ||{{\varvec{g}}}({{\varvec{x}}}(t))||_{\infty }+ ||{{\varvec{h}}}({{\varvec{I}}}(t))||_{\infty } \le \epsilon +G+H\),

which implies that the network states are limited.

Our goal is to find a set of \({{\varvec{f}}}(\cdot )\), \({{\varvec{g}}}(\cdot )\) and \({{\varvec{h}}}(\cdot )\) so that at least one the LEs of Eq.(14) is positive, which, in turn, would imply chaotic behavior. That is:

$$\begin{aligned} 0<c\le \lambda _i({{\varvec{x}}}(0))<\infty , \quad i\in \{1,2,\ldots ,n\}, \end{aligned}$$
(23)

where c is a given constant.

Indeed, we can succeed in making the system described by Eq. (14) to be chaotic, as formalized in Theorem 2.

Theorem 2

The system described by Eq. (14) is chaotic if \(M_t=[{\varGamma }^T_t{\varGamma }_t]-e^{2tc}I\) is diagonal dominant where \({\varGamma }_t=J_t\cdot {\varGamma }_{t-1}\), \({\varGamma }^T_t\) denotes the transpose of \({\varGamma }_t\), \(J_t\) is the Jacobian matrix of system described by Eq. (14) at time t, I is an \(n \times n\) Identity matrix, and c is a given positive constant.

Proof

First of all, we mention that the existence of a single positive LE implies chaos. Thus, we intend to prove that the dynamical system described by Eq. (14), indeed, possesses at least a single positive LE by using the principles of induction and deduction.

We know that the Jacobian matrix is involved in calculating the LEs. The Jacobian matrix of Eq. (14) at point \({{\varvec{x}}}(t)\) is defined as:

$$\begin{aligned} J_t={{\varvec{f}}}'({{\varvec{x}}}(t))+{{\varvec{g}}}'({{\varvec{x}}}(t))+{{\varvec{h}}}'({{\varvec{I}}}(t)), \end{aligned}$$
(24)

Let us assume that the external stimulus is independent of \({{\varvec{x}}}\), which is a very reasonable assumption. Thus \({{\varvec{h}}}'({{\varvec{I}}}(t))=\mathbf 0 \). Then, the equality Eq. (24) equals:

$$\begin{aligned} J_t={{\varvec{f}}}'({{\varvec{x}}}(t))+{{\varvec{g}}}'({{\varvec{x}}}(t))={\varLambda }_{{{\varvec{x}}}(t)} +{{\varvec{g}}}'({{\varvec{x}}}(t)), \end{aligned}$$
(25)

where \({\varLambda }_{{{\varvec{x}}}(t)}=diag\{f'(x_1(t)),f'(x_2(t)),\ldots ,f'(x_n(t))\}\) is a diagonal matrix.

Let \({{\varvec{g}}}'_t\) and \({\varLambda }_{t}\) denote \({{\varvec{g}}}'(x(t))\) and \({\varLambda }_{{{\varvec{x}}}(t)}\) at time t respectively. In the interest of simplicity, we let \(f(x_i(t))=(-1)^m(\sigma (t) x_i(t)-2m\epsilon )\), \(m=0,\pm 1,\pm 2,\ldots \) (which is called the “sawtooth” functionFootnote 1, as shown in Fig. 5). Consequently:

\(|f(x_i(t))|=|(-1)^m(\sigma (t) x_i(t)-2m\epsilon )|\le \epsilon \) and \(|f'(x_i(t))|=\sigma (t)\).

Thus, when \(t=0\),

$$\begin{aligned} {\varGamma }^T_0{\varGamma }_0= & {} J^T_0J_0 = ({\varLambda }_{0}+{{\varvec{g}}}'_0)^T\cdot ({\varLambda }_{0}+{{\varvec{g}}}'_0) \nonumber \\= & {} ({{\varvec{g}}}'_0)^T{{\varvec{g}}}'_0+ {\varLambda }_{0}[{{\varvec{g}}}'_0+({{\varvec{g}}}'_0)^T]+{\varLambda }^2_{0}. \end{aligned}$$
(26)

Observe that \([{\varGamma }^T_0{\varGamma }_0]\) is symmetric and \({\varLambda }_0\) is diagonalFootnote 2. We can thus, certainly, find a large \(\sigma (0)\) so as to make \([{\varGamma }^T_0{\varGamma }_0]\) diagonal dominant. According to Proposition 1, we know that a symmetric diagonal dominant matrix with a positive diagonal implies positive definiteness, which means that \([{\varGamma }^T_0{\varGamma }_0]>0\) and that \([{\varGamma }^T_0{\varGamma }_0]^{-1}>0\).

At any time t, we are going to demonstrate that an appropriate \(\sigma (t)\) will lead to \(M_t=[{\varGamma }^T_t{\varGamma }_t]-e^{2tc}I\) to be diagonal dominant.

When \(t=1\),

$$\begin{aligned} M_1= & {} {\varGamma }^T_1{\varGamma }_1-e^{2c}I \nonumber \\= & {} (J_1{\varGamma }_0)^T\cdot J_1{\varGamma }_0-e^{2c}I \nonumber \\= & {} {\varGamma }^T_0({\varLambda }_{1}+{{\varvec{g}}}'_1)^T\cdot ({\varLambda }_{1}+{{\varvec{g}}}'_1){\varGamma }_0-e^{2c}I \nonumber \\= & {} \left( {\varLambda }^2_{1}{\varGamma }^T_0{\varGamma }_0-e^{2c}I\right) +{\varLambda }_{1}{\varGamma }^T_0\left[ {{\varvec{g}}}'_1+({{\varvec{g}}}'_1)^T\right] {\varGamma }_0 + {\varGamma }^T_0({{\varvec{g}}}'_1)^T{{\varvec{g}}}'_1{\varGamma }_0. \end{aligned}$$
(27)

The first item, \({\varGamma }^T_0{\varGamma }_0\) is diagonal dominant. Further, the second and third items are symmetric, and thus a proper \(\sigma (1)\) can force the matrix given by Eq. (27) to be diagonal dominant, which implies that \(M_1\) is positive definite as per Proposition 1. Thus,

$$\begin{aligned} M_1={\varGamma }^T_1{\varGamma }_1-e^{2c}I= & {} \left[ (J_1{\varGamma }_0)^T\cdot J_1{\varGamma }_0\right] -e^{2c}I \end{aligned}$$
(28)
$$\begin{aligned}= & {} {\varGamma }^T_0J^T_1J_1{\varGamma }_0-e^{2c}I > 0. \end{aligned}$$
(29)

Since \({\varGamma }^T_0{\varGamma }_0>0\) and \(\left[ {\varGamma }^T_0{\varGamma }_0\right] ^{-1}>0\), the inequality Eq. (29) can be rewritten as:

$$\begin{aligned} J^T_1J_1-e^{2c}\left[ {\varGamma }_0{\varGamma }^T_0 \right] ^{-1}>0. \end{aligned}$$
(30)

Consequently, \(J^T_1J_1>0\).

In a similar way, we may choose a serial \(\{\sigma (t)\}\) so that the following conditions could be fulfilled:

$$\begin{aligned}&\displaystyle {\varGamma }^T_{t-1}{\varGamma }_{t-1} > 0. \end{aligned}$$
(31)
$$\begin{aligned}&\displaystyle \quad J^T_tJ_t-e^{2tc}\left[ {\varGamma }_{t-1}{\varGamma }^T_{t-1}\right] ^{-1} >0. \end{aligned}$$
(32)
$$\begin{aligned}&\displaystyle J^T_tJ_t >0. \end{aligned}$$
(33)
Fig. 5
figure 5

An example of the sawtooth function. In this figure, \(\sigma =2\) and \(\epsilon =4\)

Consider the inequality Eq. (32). If we let \(A=J^T_tJ_t\), \(B=e^{2tc}\left[ {\varGamma }_{t-1}{\varGamma }^T_{t-1}\right] ^{-1}\), we observe that \(A>0\) and \(A-B>0\), as per Theorem 1, and all the eigenvalues of the matrix \(A-\kappa B\) are no less than unity. Let us now substitute \(J_t\) and \(J^T_t\) by \({\varGamma }_t{\varGamma }^{-1}_{t-1}\) and \([{\varGamma }^{-1}_{t-1}]^T{\varGamma }^T_{t}\) respectively. Then

$$\begin{aligned} |A-\kappa B|= & {} \left| J^T_tJ_t-\kappa e^{2tc}\left[ {\varGamma }_{t-1}{\varGamma }^T_{t-1}\right] ^{-1}\right| \nonumber \\= & {} \left| \left[ {\varGamma }^{-1}_{t-1}\right] ^T{\varGamma }^T_{t}{\varGamma }_t{\varGamma }^{-1}_{t-1}-\kappa e^{2tc}\left[ {\varGamma }_{t-1}{\varGamma }^T_{t-1}\right] ^{-1}\right| \nonumber \\= & {} e^{2tc}\left| \left[ {\varGamma }^{-1}_{t-1}\right] ^T\cdot \left[ e^{-2tc}{\varGamma }^T_t{\varGamma }_t-\kappa I \right] \cdot {\varGamma }^{-1}_{t-1}\right| , \end{aligned}$$
(34)

which means that all the roots of the characteristic equation \(\left| e^{-2tc}{\varGamma }^T_t{\varGamma }_t-\kappa I \right| =0\) are no less than unity. In other words, all the eigenvalues of \({\varGamma }^T_t{\varGamma }_t\) are no less than \(e^{2tc}\). Therefore, according to the definition of LEs,

$$\begin{aligned} \lambda _i= & {} \lim _{t\rightarrow \infty }\frac{1}{2t}\log \left| \mu _i\left[ {\varGamma }^T_t{\varGamma }_t \right] \right| \nonumber \\\ge & {} \lim _{t\rightarrow \infty }\frac{1}{2t}\log (e^{2tc})=c>0. \end{aligned}$$
(35)

We have thus proven that the system described by Eq. (14) is truly chaotic since \(\lambda _i>0\). \(\square \)

5 Experimental Results

Based on the analysis above, we can obtain one of the simplest CNN models that can be characterized by:

$$\begin{aligned} x_i(t+1)=f(x_i(t))+\sum ^n_{j=1}w_{ij}\tanh \left( \sum ^n_{k=1}v_{jk}x_k(t)\right) , \end{aligned}$$
(36)

where \(f(\cdot )\) is a sawtooth function. W and V are randomly generatedFootnote 3 between 0 and 1. The total discrete time length is \(T=2000\).

$$\begin{aligned} W=\left( \begin{array}{cccc} 0.21 &{} 0.03 &{} 0.05 \\ 0.55 &{} 0.61 &{} 0.49 \\ 0.63 &{} 0.36 &{} 0.19 \\ \end{array} \right) ,\quad V=\left( \begin{array}{cccc} 0.12 &{} 0.19 &{} 0.28\\ 0.21 &{} 0.04 &{} 0.54\\ 0.15 &{} 0.64 &{} 0.70\\ \end{array} \right) . \end{aligned}$$
(37)

The topological structure of this CNN model is shown in Fig. 6.

Fig. 6
figure 6

The topological structure of the CNN model defined by Eq. (36). It consists of an input layer, an output layer, a hidden layer and a self-feedback component. In each layer, only 2 neurons are plotted in the interest of simplicity

It possesses rich dynamical properties. As one can verify:

  1. 1.

    \(|g_i({{\varvec{x}}})|=|\sum ^n_{j=1} w_{ij}\tanh (\sum ^n_{k=1}v_{kj}x_j(t))|\le n\),

  2. 2.

    \(|\frac{\partial g_i({{\varvec{x}}})}{\partial x_j}| \le w_{ij}\),

  3. 3.

    \(|f(x)|=|(-1)^m(\sigma x-2m\epsilon )|\le \epsilon \), and

  4. 4.

    \(|f'(x)|=\sigma \) and \(g(\mathbf 0 )=0\).

We can visualize the system’s dynamical behavior by plotting the phase diagram of three neurons: \(x_1(t)\), \(x_2(t)\) and \(x_3(t)\). These phase diagrams are shown in Figs. 7 and 8.

Fig. 7
figure 7

The network’s trajectory converges to a serial period-n points when there is no self-feedback (when \(\epsilon =0\)). Each dotted-line implies a period-n point

Fig. 8
figure 8

Double-period bifurcation and chaos happen at different values of \(\sigma \)

Fig. 9
figure 9

The \(10 \times 10\) patterns used by Adachi et al. The first four patterns are used to train the network. The fifth patterns are obtained from the fourth pattern by including 15 % noise. The sixth pattern is the untrained pattern

Fig. 10
figure 10

PR properties: The Hamming distance between the output and the trained patterns. The input patterns are the fourth pattern of Fig. 9

We explain the system dynamics as follows:

  1. 1.

    Let \(\epsilon =0\). In this setting, there is no self-feedback according to the sawtooth function definition. In this case, as shown in Fig.7, no chaos phenomenon is observed. Instead, the trajectory converges at a period-n orbit, as shown in Fig. 7.

  2. 2.

    Let \(\epsilon \) increase, e.g., \(\epsilon =1.0\). Here, the phase space of the network varies with the value of \(\sigma {:}\)

    1. (a)

      If \(\sigma <0.95\), there is still no chaos, the trajectory converges at a fixed point.

    2. (b)

      A double-period bifurcation happens when \(\sigma \approx 0.95\). We must point out that it is not easy to calculate the exact point of \(\sigma \) where this bifurcation happens since the network is a high dimensional chaotic system. \(\sigma \approx 0.95\) can be observed from Fig. 8.

    3. (c)

      Chaos windows appear while \(\sigma \) increases, e.g., \(\sigma =1.4\). As we can verify, in this case, such values of \(\sigma \), W and V could lead to Eqs. (31), (32) and (33) to be diagonal dominant, which implies chaos as per Theorem 2.

6 Applications of the Designed Model

6.1 Chaotic Pattern Recognition

We shall now report the PR properties of the designed model specified by Eq. (36). These properties have been gleaned as a result of examining the Hamming distance between the input pattern and the patterns that appear at the output. The experiment was conducted with the data sets used in [1], which is given in Fig. 9, and are referred to as the Adachi data sets. The patterns were described by \(10 \times 10\) pixel images, and the networks thus had 100 neurons.

Fig. 11
figure 11

PR properties: The Hamming distance between the output and the trained patterns. The input pattern is the fifth pattern of Fig. 9

Fig. 12
figure 12

PR properties: a The Hamming distance between the output and the trained patterns. The input pattern is the sixth pattern of Fig. 9. The Hamming distance does not imply convergence. b The chaotic trajectory of a randomly chosen neuron

Before we proceed, we emphasize that there is a marked difference between the basic principles of achieving PR using CNNs and when one uses a classical NN. Traditionally, a classical NN will “stay” at a certain known pattern if the input can be recognised. As opposed to this, if the input cannot be recognised, the network outputs an unknown pattern, implying that the pattern is not one of the trained patterns. Observe that in both these situations, the outputs of network are stable. However, the basic principle of Chaotic PR is significantly different. In the ideal setting we would have preferred the CNN to be chaotic when it is exposed to untrained patterns, and the output to appear stable when it is exposed to trained patterns.

To obtain the desired PR properties of the model described by Eq. (36), the parameters were set as follows: w is the synaptic weight obtained by the Hebbian rule, v is set, for simplicity, as the Identity matrix, and the transfer function is a sigmoid function. We enumerate three cases as below:

  1. 1.

    The initial input of the network is a known pattern, say P4.

    The Hamming distance converges to 0 immediately, which implies that the output converges to the input pattern, as shown in Fig. 10. Obviously, we can conclude that the input pattern can be recognised successfully if it is one of the known patterns.

  2. 2.

    The initial input of the network is a noisy pattern, in this case P5, which is a noisy version of P4.

    It is interesting to see that the output converges to the original pattern P4 instead of the initial input P5 after only one step. That is, even if the initial input contains some noise, the network is still able to recognize it correctly (Fig. 11).

  3. 3.

    The initial input of the network is an unknown pattern, P6.

    From Fig. 12 we see that the output does not converge to any known/unknown pattern.

    From the above three figures we can conclude the following: If a CNN is appropriately defined, it converges immediately when presented with known patterns or if the input is a noisy version of a trained pattern. On the other hand, it demonstrates chaos when presented with unknown patterns. In this sense, we confirm that the proposed CNN possesses chaotic PR properties.

6.2 Solving the Traveling Salesman Problem

In this section, we report the results of utilizing the CNN to solve the Traveling Salesman Problem (TSP). We adopt the same coordinates for the cities as used in [14]. In the interest of brevity, we omit the details of how we apply CNNs to solve TSP. Actually, we used the exact same network topology, weights and energy function as Hopfield et al. used in [13, 14, 25]. The only difference was the dynamics of neurons. To be specific, all the neurons’ states were updated according to the model given by Eq. (36), which has been proven to possess chaotic properties.

Fig. 13
figure 13

The solution (minimum distance summation) of the TSP varying with the evolution of the network in 1000 steps

As per Fig. 13, we find that during the network’s evolution, the optimal solution was located 16 times (labeled here by red squares) within 1000 iterations. With regard to its searching ability (37 times within 38,558 iterations reported in [25]), we affirm that our newly proposed model performs better, even though it is a very general model.

7 Conclusions

In this paper we have investigated how to design a chaotic neural network (CNN) by appropriately applying a self-feedback function to a recurrent neural network. By means of Jacobian matrix analysis, diagonal dominant matrix and Lyapunov analysis, we have proved that a two-layer recurrent NN with a carefully-devised feedback function can lead to chaotic behavior. Numerical results have been presented that are based on experiments conducted by using the sawtooth function as the self-feedback function and a hyperbolic tangent function as the coupling function. The results show that our general CNN model is able to present rich periodic and chaotic properties by choosing appropriate control parameters. The applications of the paper for PR and to solve the TSP have also been included.

The future work spawning from this paper would be to train this model by using suitable learning algorithms so that it can be applied in various areas including pattern recognition, associate memory, cryptography etc.