1 Introduction

Classical neural networks suffer from size and temporal superposition, generally called stability-plasticity dilemma in Artificial Neural Network (ANN) literature [1,2,3]. Neuroscience studies describe these effects as the binding problem and superposition catastrophe [4]. Some researchers take this a step further and state that the electrical oscillations in the biological sensory systems trigger the cells in a sequence, effectively serializing certain recognition activity in time, adding a new Degree of Freedom (DoF) to the biological recognition engine [5]. As the neural cells in a network do not increase over time, the size of the initial neural network has to be large enough to accommodate the likely size of knowledge that will be acquired over time. Each learning experience has to bind to a subset of the existing neural infrastructure. However, as the knowledge base increases, newer knowledge has to superimpose on existing infrastructure, possibly fragmenting the existing subsets. Old knowledge is replaced or distorted by new knowledge, effectively destabilizing the established subsets. This may not always have a damaging effect but does distort/refine old subsets.

Kohonen networks called the Self Organizing Maps (SOM) start with pre defined set of nodes initialized to random weights [6]. As the input is applied, some of the input nodes produce maximum output and one among them will be chosen as winner. The key to Kohonen’s networks is that the neighbors of the winner node adjust their weights towards that of the winner. Over period of time, repetition of this process creates a neighborhood of nodes that recognize similar inputs. Each of such neighborhoods represents one class of input. As the neighborhood is not constrained to be convex, it should be possible to support nonlinear classification. If the number of classes of inputs exceeds a certain number in relation to the total number of nodes in the network, the neighborhoods have to split and merge to accommodate new classes. Therefore, SOMs are subject to superposition catastrophe. Notice that they do not suffer from the binding problem as data classes are associated with sets of nodes.

On the other hand, neural networks based on Adaptive Resonance Theory (ART) [1] start with very few or no cells at all and add cells to the network when an input set cannot be recognized (classified). In these networks, a cell generates winning output when input matches its stored value. Other cells will have lesser output. Input range covered by a cell and the number of cells for a given range of values are both dynamically optimizable. Therefore, ART networks are minimal in terms of number of nodes required to build the network. Extensions of ART networks using real inputs and outputs, called ART2 scale and translate the input before storing in short term memory [1, 7]. However, the complexity of ART2 networks alters the simplicity and elegance of ART1 to the extent that these networks lose their relation with biological equivalents: A biological neuron would be simple, repetitive and connected.

In the following sections, a new type of neural network called Auto Resonance Network (ARN) is described. ARN can classify real valued multi-dimensional input and have an adjustable acceptance threshold (ρ) for each node in the network. These networks can be used as generic data classifiers by adding node labeling method or a neuronal layer and find applications in various areas of artificial intelligence.

2 Auto resonance network (ARN)

Auto Resonance Network will have a single layer of several neuronal nodes, each containing a pattern to match a particular input set. Each node has a specific stored pattern that is different from other nodes. At the basic level, behaviour of the network would be as follows: Input will be applied as a set, one set at a time. Output layer would consist of a single layer of nodes all of which are connected to all inputs. Each node is tuned to recognize a specific pattern of input vector. Internal memory of the node may have an exact or approximate or transformed version of input it matches. When a new input is applied (i) one of the nodes is at resonance or (ii) some nodes are near resonance or (iii) none of the nodes are in resonance. In the first case it is the winner. In second case, the node with the highest output is selected as winner if the output is above a selection threshold. In all other cases, there is no recognition. When there is no recognition, a new node may be created such that it is tuned to match the input. Success of this network depends on finding a suitable function that offers good tunability and variable cover.

When a new node is appended to ARN, it is pre-tuned to resonate with current input. We will illustrate the concept with a single input network. Resonance of a node can be described using a simple equation

$${y}_{p }= x(1-x)$$
(1)

where, \(x\) is The input represented by a real number assumed to be normalized to a range of {0…1}. Equation (1) will yield a maximum value of \(1/4\) when the input \(x=1/2\), i.e., the node will resonate if the input is \(1/2\). Therefore, we can use

$$y=4(1-x)x$$
(2)

to normalize the result to 1. In order to set the resonance at any value of \({x}_{r}\in \left\{0\ldots 1\right\}\), we can scale the input such that

$$w{x}_{r}=\frac{1}{2}\;{\text{or}}\; w = 1/(2{x}_{r})$$
(3)

and calculate the output of the node as

$$y= 4*\left(1-wx\right)wx$$
(4)

The resonant weight \(w\) is computed when the node is inserted in the network and remains largely unaltered as a memory impression of the input \({x}_{r}\) present at the time of creating the node.

The chart in Fig. 1 shows the behavior of nodes tuned at various points of resonance identified by \({x}_{r}\). Overall structure of ARN is shown in Fig. 2. In Fig. 2a, scaling of single input by the resonant weight and computation of resonator output is shown. This is a basic module of ARN. A node with N input nodes is shown in Fig. 2b. The input to a node consists of a vector

Fig. 1
figure 1

Resonance curves for various xr

Fig. 2
figure 2

Auto Resonance Network structure

$$X= \left\{{x}_{1}, {x}_{2}, \dots {x}_{N}\right\}, { x}_{i }\in \left\{0\dots 1\right\}, i=1\dots N$$
(5)

For each of the inputs, output of each resonator module given by (2) are summed and normalized as at the output of the node. A layer of ARN will have several such nodes as shown in Fig. 2c. Assuming that there are K nodes, each one is tuned to a different input vector \({X}_{k}{|}_{t= {t}_{k}}\) where \({t}_{k}\) is the time at which k-th node was created. We can extend (2) to describe output of a k-th node as

$${y}_{k}= \frac{4}{N}{\sum }_{i}\left(1-{w}_{ki}{x}_{i}\right)\left({{w}_{ki}x}_{i}\right), i=1\dots N, k=1\dots K$$
(6)

where,\({w}_{ki}\) is the scaling factor for i-th input of k-th node. The resonant weight \({w}_{ki}\) represents the in-the-node impression of input \({x}_{ki}\) stored as memory in the node. Note that \({x}_{i}\) is the i-th element of the input vector while \({x}_{ki}\) represents the resonating input \({x}_{i}\) for \(k\)-th node. This \(k\)-th node will produce maximum value of 1 when \({x}_{i}=\frac{1}{2{w}_{ki}}, i=1\dots N\). In Fig. 2, the resonator function is indicated as \(d(1-d)\) where \(d=wx\).

To summarize, each resonator corresponds to one input of one node. Each node has N inputs and same number of resonators. The output of a node is maximal when all the resonators produce maximum output. Therefore, the resonant weight of a node is expressed as \(W=1/2{X}_{r}\). The resonant weights are calculated only once, when the node is added to the network. There will be only one node that maximally matches exact combination of inputs. For these reasons, we call this network as Auto Resonance Network (ARN).

2.1 Envelop functions

It can be seen from Fig. 1 that output of a resonating node decreases as a continuous function of input on either side of input values. If the output of a node is above a threshold value the node is a winner. By reducing or increasing the threshold the range of input values to which a node resonates can be adjusted. For example, if the threshold is reduced, the range increases and vice versa. Set of all inputs when the output of a node is above a threshold is called coverage of the node.

Coverage of k-th node can be expressed as

$${\text{C}}_{\text{k}} = \{{\text{X}} |({\text{y}}_{\text{k}}>\uprho)\;{\text{and}}\;({\text{y}}_{\text{k}}>{\text{y}}_{\text{i}},\quad \forall {\text{i}} \ne {\text{k}})\}$$
(7)

where ρ is the threshold value.

We may further note from Fig. 1 that coverage of area for each node is not same for a given threshold. For example the peak for \({\text{X}}_{\text{r}}=0.1\) is significantly sharper than the one at \({\text{X}}_{\text{r}}=0.2\). We could set separate thresholds to individual nodes such that all nodes have similar coverage. However, a better way to correct this situation is to use a non-linear scaling of input. We call these functions as envelop functions.

Envelop functions can provide several advantages. For example, they can transform unbound input \({\text{x}} \in {\mathbb{R}}\) into bound region like {0…1}. If the envelop function modifies the input \({\text{x}}_{\text{s}}= {\text{g}}({\text{x}})\) then, the resonant weights also should be scaled with identical function.

$${\text{w}}_{\text{ki}}=\frac{1}{2{\text{g}}\left({\text{x}}_{\text{ki}}\right)}$$
(8)
$$ {\text{y}}_{{\text{k}}} = {\text{~}}\frac{4}{{\text{N}}}\mathop \sum \limits_{{\text{i}}} {\text{w}}_{{{\text{ki}}}} {\text{g}}\left( {{\text{x}}_{{\text{i}}} } \right){\text{~}}(1 - {\text{~w}}_{{{\text{ki}}}} {\text{g}}({\text{x}}_{{\text{i}}} )) $$
(9)

Envelop functions stretch or compress a specific part of the input range in order to exemplify an area of interest. Effect of some of these functions is shown in Fig. 3.

Fig. 3
figure 3

Effect of envelop functions on classification

A simple scaled log function is shown in Fig. 3a uses:

$${\text{g}}\left({\text{x}}\right)= {\text{ln}}({\text{x}})$$
(10)

where σ is scaling factor. By adjusting σ we can effectively compensate for the non-linear coverage of (4). Figure 3a shows the effect of (10) on coverage with \(\upsigma =1\). A modified sigmoid function shown in Fig. 3b exhibits a controllable linear coverage. It uses an envelope function

$${\text{g}}\left({\text{x}}\right)= \frac{1}{(1+{\text{e}}^{-\left({\text{x}}-1\right)})}$$
(11)

A value of σ = 4 is used for illustration. It is clear that envelop functions can reduce the non-uniform coverage across the input range.

2.2 Extending the input range

Though accepting input in the limited range of {0…1} need not be a limitation, it would be convenient if there are other functions that provide a larger input range yet maintain the resonance property. Interestingly, there are many other monotonic functions to implement such resonance and build an ARN. A generic approach would be to define an additive inverse of the function over a range and multiply the two to get a resonance function. One such simple function is the difference function given by \(\left({\text{M}}_{\text{ki}}- {\text{x}}_{\text{i}}\right)\) such that

$${\text{y}}_{\text{k}}=1+{\sum }_{\text{i}}\left({\text{M}}_{\text{ki}}- {\text{x}}_{\text{i}}\right)({\text{x}}_{\text{i}}- {\text{M}}_{\text{ki}} )$$
(12)

where \({\text{M}}\) is the memory copy of the tuned input.

Another good candidate is the Scaled and Shifted Sigmoid Function (3SF) given below:

$${\text{y}}_{\text{s}}= \frac{1}{(1+{\text{e}}^{-\left({\text{x}}-{\text{M}}\right)})}$$
(13)

will map a real number in the range of \(-\infty :\infty \) to a monotonically increasing value in the range 0:1. Interestingly, the function has a value of \(0.5\) at \({\text{x}}={\text{M}}\). Therefore, we could replace \( {\text{w}}_{{{\text{ki}}}} {\text{x}}_{{\text{i}}} \) in (4) with this function. It may be noted that \({(1- {\text{y}}}_{\text{s}})\) can be easily computed as

$${\text{y}}_{\text{s}-}= \frac{1}{(1+{\text{e}}^{\left({\text{x}}-{\text{M}}\right)})}$$
(14)

Therefore, for N = 1, we can rewrite (2) as

$${\text{y}}= 4\left[\frac{1}{\left(1+{\text{e}}^{-\left({\text{x}}-{\text{M}}\right)}\right)}\right]\left[\frac{1}{\left(1+{\text{e}}^{\left({\text{x}}-{\text{M}}\right)}\right)}\right]$$
(15)

The sigmoid nature of this curve fits well with a physical neuron activation which shows saturation as the input increases, rather than growing monotonically. Equation (15) allows a node to be set to resonate at any \({\text{x}}_{\text{i} }\in {\mathbb{R}}\). Note that \({\text{M}}\) can be used to select the point of resonance while \( \sigma \) can be used to control the tuning and hence coverage of the node as shown in Fig. 4a and b respectively. Equation (15) provides a generalized function for implementing ARN nodes, albeit with increased complexity. The tradeoff between complexity versus flexibility can tilt towards (4) or (15) depending on end use.

Fig. 4
figure 4

Tunability of ARN using (15)

2.3 Tuning the nodes and coverage

Coverage of a node acts like noise margin by providing near maximal output when the input is close to the resonating value. Therefore, each node can recognize noisy input if it is within the coverage area. Coverage of the node can vary dynamically. Node may slowly shift to a different tuning point depending on the statistical properties of incoming data but stays close to the original tuned location.

Controlling the quality of resonance as shown in Fig. 4b can be used to sharpen the performance of ARN nodes. For example, if a node receives exactly the same input repeatedly, we can increase its sharpness by increasing σ. On the other hand, if the input value varies around the resonance value but within a threshold, we can reduce the value of σ to increase its coverage. A simple relation that can be used to tune the resonance is given by,

$${\upsigma}_{\left({\text{n}}+1\right)}= {\upsigma}_{\text{n}}+ \upkappa {\text{f}}/(1-\upnu )$$
(16)

where, η is learning rate, \(\upkappa \) is a proportionality constant related to \({\text{f}}\) statistical frequency, number of times the node matched the input and \(\upnu \) is related to signal variance. Therefore, this equation provides basis for reinforcement learning on ARN nodes.

Stable nodes can undergo further tuning to increase or decrease the area covered by the nodes. This can be achieved by varying the selection threshold or the σ value associated with the node. This requires that the nodes compute statistical moments as they are accessed. For a node described by (15), and knowing that \({\text{y}}_{\text{max}}=1\) and assuming \({\text{x}}_{\text{r}}=0,\) we can write the value of \({\text{x}}\) for threshold of \({\text{y}}= \) as,

$$\uprho =\frac{4}{\text{N}}\{\frac{1}{\left(1+{\text{e}}^{-\text{x}}\right)\left(1+{\text{e}}^{\text{x}}\right)}\}$$
(17)

which gives an expression for coverage of an ARN node as a function of threshold and tuning factor. Equation (17) can be rewritten as

$${\text{x}}=\frac{1}{\upsigma}{\text{cosh}}^{-1}(\frac{2}{\text{N}}- 1)$$
(18)

which gives the coverage of an ARN node for various values of threshold and scale factor around the peak value.

2.4 Types of ARN nodes

Typically, ARN nodes are created when an input does not produce resonance in existing nodes and the expected value of output is known. Such nodes have a well defined point of resonance, output mapping and adjustable coverage. We will call these as Type-1 nodes. Additional nodes can be created in absence of input by interpolating properties derived from Type-1 nodes. The output and associated data can be estimated as a perturbation of values of Type-1 nodes or interpolated using piecewise linear or other suitable approximation. We will call them as Type-2 nodes.

3 Results

ARN can perform real-world input classification. Figures 5, 6 and 7 show the Pattern classification using scaled shifted sigmoid envelop function for different thresholds. Each bubble indicates an input. Each color represents a node. If a node resonates on application of an input it is represented by the color of resonating node and shown at the location specified by input. If there is no resonating node, then the network is not resonating. In such a case a new node is created and appended to existing network. In other words, a new node is added when maximum output value of all nodes in the recognition layer is below the threshold. The node is assigned a color from a fixed list of colors. Therefore, color of the node depends on the order in which an input arrives during training and may vary from run to run. Group of inputs that drive the same node towards resonance (represented by same color) represent the coverage of the node. Increasing the threshold from 0.7 to 0.9 has increased the number of output nodes.

Fig. 5
figure 5

1D Pattern classification using scaled shifted sigmoid envelop function for different thresholds

Fig. 6
figure 6

2D Pattern classification for different thresholds

Fig. 7
figure 7

3D Pattern classification for different thresholds

4 Discussion

Existing neural models may not always work in newer application domains. Therefore, application of ANN for specific functionality may require development of specific neural structures. For example, Convolution Neural Networks (CNN) are used in image recognition [8] while Long Short-Term Memory (LSTM) networks are used for time series prediction [9]. More recently, spiking neural networks are being explored for robotics and several other areas in Artificial General Intellignce (AGI). In this paper, we described a novel neural network structure based on synergistic control of musculo-skeletal systems called as Auto Resonance Network (ARN). ARN can classify real valued multi-dimensional input and have an adjustable acceptance threshold (ρ) for each node in the network. Each layer of ARN has a specific goal and searches are always local to a layer. Hierarchical ARN represents a feed forward network of ARN layers and other cellular automata. Every node in ARN resonates with in a small controllable volume in input space, called the coverage of the node. Resonance allows approximation to the locus of resonance, giving the network an ability to respond to input different from training set. This tunable approximation is critical to working of ARN architecture. To a certain extent, ARN is similar to Radial Basis Function (RBF) network but with different control algorithm. The size of the network grows with input and therefore overcomes the binding problem and plasticity stability dilemma. These networks can be used as generic data classifiers by adding node labeling method or a neuronal layer. We illustrate their application to robotic motion.

An original heuristic modelling algorithm expressed in terms of homogenous combinations of the classical system dynamics and the Bayesian degree of truth employed in modeling is reported in [10]. Multi-relational data can be on relational databases where they consist of multiple relations that are linked together by entity-relationship links. The class label can be predicted by correlating the information of related data. Labels can then be propagated to crate the paths [11].

Deep learning methods for different applications are reported in [12]. It is possible to use several of the existing types of neural networks discussed in [13, 14] for path planning of robotic motion. A hierarchical network for path planning built using ARN has been reported in [15, 16]. However, they can be used as generic data classifiers and find applications in various areas of artificial intelligence. A modified version of the algorithm has been reported for image classification [17].

5 Conclusions

The sparseness of ARN and the simplicity of resonance equations can make ARN suitable for implementations in embedded systems. An important advantage of ARN is ease of control over coverage and sparse node assignment. It is obvious that the order in which input is applied has a strong effect on how the ARN network gets created. But nodes that are responsible for a specific output can be identified. The network can be refined by successively applying new data that covers the labeled data class to increase the accuracy of classification. Therefore, it is easy to see how the network has interpreted the data. A single layer ARN can classify convex data sets and therefore require labeling at higher levels of hierarchy in a typical deep learning neural network. ARN can be used in lower levels of such structures to provide data classification. Efforts are on the way to use this structure in various areas of current research.