Keywords

1 Introduction

In recent years detailed partial connectomes of insect neural networks have been produced. An example is the elementary motion detection (EMD) network of Drosophila’s visual system [2,3,4]. The insect has two large eyes composed of repeating hexagonal columns called ommatidia. Each ommatidia has an identical structure which processes visual information from a small region of the full visual field. It is composed of four distinct layers of neuropil - the retina, lamina, medulla and the lobula as shown in Fig. 1(a). The neural pathway for detecting elementary motion within this structure is depicted schematically in Fig. 1(b). It begins at the retina where cells transduce light into electrical signals. These then synapse onto L1 and L2 cells in the lamina. At this stage the information is rectified into ON and OFF pathways - ON pathways carry information on luminescence increments and the OFF pathways luminescence decrements. The L1 and L2 cells in the lamina synapse predominantly onto the Mi1, Tm3 and Tm2, Tm1 cells in the medulla. These cells are implicated in implementing temporal delays between adjacent activity in spatial regions in the visual field [5]. The pairs of cells in the medulla synapse onto T4 and T5 cells in the lobula which are then excited if a spatiotemporal correlation exists between it’s presynaptic cells indicating motion. Subsets of T4 and T5 cells are sensitive to motion in one of the four cardinal directions up, down, left and right and terminate independently in one of four distinct layers in the lobula plate (LP). Groups of lobula plate tangential cells (LPTC) are excited by activity in one layer of the LP and fire to indicate motion in a specific direction. Another group of lobula plate/lobula columnar type two (LPLC2) cells are excited only by diverging motion in each layer of the LP such that they indicate a looming stimulus on a collision course [6]. Furthermore, it has also been found that the neuromodulator octopamine tunes dynamics of the EMDs in the Drosophila visual system as a function of whether the insect is resting or flying [7,8,9]. This allows the insect to adapt its sensitivity to different velocities of stimulus as well as reduce power consumption whilst in a resting state. In Fig. 1(c) tshe response of an LTPC cell is reported for Drosophila stimulated with a moving grating when it is in resting and flying states. The area under the curve for the insect in its resting state is greatly reduced relative to that of its flying state which is thought to be an evolutionary adaptation to optimize its consumption of energy.

Fig. 1.
figure 1

The architectural layout of the Drosophila visual system and its detection of motion. (a) The four layers of neuropil in the Drosophila visual system [3]. (b) The specific cells identified in the pathway from the retina to the lobula plate involved in elementary motion detection [5]. (c) Temporal frequency sensitivity tuning curve of the mean response of LPTC neuron in Drosophila in resting and flying states as modulated by octopamine. [7]. (d) A schematic of the HR-EMD model often implemented in hardware motion detectors [14].

The Hassenstein-Reichardt EMD (HR-EMD) is a popular model which reproduces experimental observations of Drosophila’s EMDs [10]. Photoexcitation at adjacent regions in the visual field propogates signals through crossing low pass filters before being recombined at a multiplication unit to detect spatiotemporal correlations. The output of the unit can be either positive or negative indicating motion in one of two directions as in Fig. 1(d). A number of previous works have implemented the HR-EMD in analogue very large scale integrated (aVLSI) systems for point [11], 1D [12], 2D [13] and rotational motion [14] detection but the platform is yet to be adopted for applications. Another perspective on hardware based motion detection are token and feature based EMDs [15,16,17,18]. Here we propose an alternative approach which stays true to the guiding principle of detecting spatiotemporal correlations but takes advantage of the ability to integrate resistive memory with CMOS technology and recent insights into the structural organization of Drosophila’s elementary motion detection neural network. We use a dynamic vision sensor (DVS) to provide input for a SNN composed of CMOS LIF neurons interconnected by RRAM synapses to perform direction detection and detect collisions. First we present the network topology followed by relevant electrical measurements of RRAM matrices and finally demonstrate through simulation the ability of the network to compute the direction of motion of a grating.

2 The Network

This section introduces the sub-units of the full system individually before outlining how they fit together.

2.1 Delay and Correlate Network

To detect motion in one direction along a single dimension a temporal delay can be implemented between two spatially adjacent inputs followed by a downstream mechanism for detecting spatiotemporal correlations. In Drosophila this delay is thought to be implemented between pairs of neurons in the medulla [5] and the correlation performed by the postsynaptic neurons in the lobula. Delay can be implemented in a LIF CMOS neuron through controlling the input time constant of the neuron and a correlation operation can be performed by parameterizing a neuron to fire only when two presynaptic spikes arrive within a short time window. These features are captured by the one dimensional delay and correlate spiking neural network shown in Fig. 2(a). Neurons in the input layer fire to denote spatial activity and synapse with excitatory connections onto a second delay layer. In this layer the input time constant of the B1 and B3 neurons are larger than the central B2 neuron. B1 and B2 synapse with excitatory connections onto the correlator neuron C1 in the output layer and B3 synapses with an inhibitory connection. As depicted in Fig. 2(b) if the input layer is excited in the sequence A1, A2 then A3 along its preferred direction the firing times of B1 and B2 should be similar and the firing time of B3 will be delayed. This allows for the two excitatory spikes from B1 and B2 to arrive at C1 within a short time window such that it fires. For motion against the preferred direction, excitation in the sequence A3, A2 then A1, the firing times of B3 and B2 will be similar and that of B1 will be delayed. Consequently C1 will not be sufficiently excited given the combination of inhibition and temporal separation of the excitatory input. In a third case where all input elements are excited simultaneously B1 and B3 will fire at the same time, negating each others contribution, such that C1 does not receive sufficient excitation from B2 alone to fire. For detection of motion in two dimensions four identical 1D delay and correlate SNNs, sharing the same central low input time constant node, can be arranged orthogonally. If the input neurons are spatially arranged in a cross then spatiotemporal correlations can be detected corresponding to UP, DOWN, LEFT and RIGHT motion within a small region of the visual field. A 2D delay and correlate SNN and the spatial organization of its input is shown in Fig. 3. To extract information from an entire visual field it then follows that an array of 2D delay and correlate SNNs can be connected to an input such that they receive spatially corresponding excitation from across the visual field.

Fig. 2.
figure 2

The one dimensional delay and correlate spiking neural network and its functional basis. (a) A 1D delay and correlate SNN. Open blue circles represent LIF CMOS neurons, green arrows excitatory synapses and red dashed arrows inhibitory synapses. Vertical lines separate the three layers of the network into the abstracted functions as performed by the different layers of neuropil in the elementary motion detection system. (b) A time domain plot for the three neurons involved in detection of motion - B1 and B2 being excited in sequence and their resulting signals meeting at C3. Red spikes correspond to the activity of a presynaptic neuron, blue spikes to the activity of the neuron associated with the plot and the green trace is the neuron input voltage resulting from the incoming presynaptic spikes. (Color figure online)

Fig. 3.
figure 3

The two dimensional equivalent of the delay and correlate network. Blue filled circles represent LIF CMOS neurons where the fill colour indicates the pathway involved in detecting the directions UP (green), DOWN (purple), LEFT (yellow), and RIGHT (blue) while the central shared low time constant pathway is coloured red. Green arrows represent excitatory synapses and red dashed arrows inhibitory ones. The spatial organization (left) of the inputs to a 2D delay and correlate SNN (right) is shown. (Color figure online)

Fig. 4.
figure 4

The connection pattern required between the 2D delay and correlate SNNs within the four quadrants of the visual field and the five readout network neurons for motion detection and detecting collisions. The coloured blue circles correspond to the output layer neurons from Fig. 3 with their spatial organization indicating which direction they detect. From all four quadrants excitatory (green) connections project to the corresponding readout neuron per direction. Diverging directions in the four quadrants provide excitation (green) to the LPLC2 neuron on the left hand side of the figure while directions converging towards the centre make inhibitory (red dashed) connections. Therefore, four readout neurons denote motion and one is excited during an expanding edge denoting impending collision. (Color figure online)

2.2 Readout Network

Inspired by the connectivity pattern observed in Drosophila which allows for motion and collision detection we connect the 2D delay and correlate SNN array to a readout network of five neurons in a similar manner. The connection pattern between the 2D delay and correlate SNNs across the visual field and the readout network is depicted in Fig. 4. To readout the direction of motion four neurons, modelling the LPTCs, corresponding to UP, DOWN, LEFT and RIGHT motion are excited by the corresponding directional output of each of the 2D delay and correlate SNNs across the visual field. Additionally, one further neuron, modelling the LPLC2 neuron is included. This neuron splits the visual field into four quadrants. From each quadrant excitatory connections are made between the 2D delay and correlate SNN directions which diverge from the centre of the visual field and inhibitory connections from directions corresponding to movement converging towards the centre. A looming stimulus on a collision course is defined as an expanding dark edge from the centre to the edges of the visual field and therefore this neuron will fire to denote looming.

2.3 System Architecture

Activity in the visual field excites a DVS camera [19, 20] which propagates spikes presynaptically to spatially corresponding 2D delay and correlate SNNs using the address event representation (AER) protocol [21]. Two dimensional spatiotemporal correlations of the input data are detected by the 2D delay and correlate SNNs before their outputs are integrated together in a readout layer which computes the direction of motion and detects impending collisions.

3 Relevant Properties of RRAM Synapses

In oxide based resistive random access memories (RRAM) the storage element consists of a transition metal oxide (TMO) material - sandwiched between metal electrodes - which can be switched to two distinct stable conductance values. The low resistance state (LRS) or the high resistance state (HRS) can be achieved through SET or RESET operations respectively by applying electric fields of opposite polarities between the electrodes. RRAM devices can be integrated in matrices [22] whereby each device in the matrix is connected in series at the drain of a transistor (1T1R). This allows for each memory device to be read from and written to individually. Each 1T1R structure in the matrix is addressed using the bit line (BL) and source line (SL) which connect to the top electrode of the device and the source of the transistor respectively and a wordline (WL) which connects to the gate of the transistor. The WL voltage therefore regulates the transistor drain source current, in this case termed the programming current (ICC), which determines device conductance and variability in this value after a SET operation. The method of connecting LIF CMOS neurons together through a RRAM matrix is depicted in Fig. 5. Connections between neurons are defined by a device in LRS whereas non-connections are defined by a device in HRS and specific devices can be programmed individually. In this manner specific feed-forward network topologies can be realised. The data plotted in Fig. 6 corresponds to measurements from a 4kbit matrix of HfO2 based TMO RRAM devices with titanium nitride (TiN) electrodes integrated in a 130 nm CMOS technology node [23]. In Fig. 6(a) the cumulative distribution function (CDF) of the conductance values per RRAM device state in the matrix is plotted whereby the LRS and HRS conductance value of each device is shown. Additionally, in Fig. 6(b) the effect of varying the gate voltage of the transistor, therefore the programming current, on the mean value of the LRS conductance CDF is shown. Crucially, the ability to modulate RRAM device conductance allows emulation of neural plasticity [24] - an essential property of neural networks for learning, adapting and maintaining homeostasis.

Fig. 5.
figure 5

The method of networking neurons with RRAM matrices. CMOS LIF neurons arranged as such around a RRAM matrix (left) implements a two layer feed forward network (right). Open blue circles correspond to LIF CMOS neurons while filled navy and red circles correspond to RRAM memories in different conductance states. Arrows indicate the direction of spike prorogation between the output of one layer of neurons and the input of the following. (Color figure online)

Fig. 6.
figure 6

Data from electrical measurement of a 4kbit RRAM matrix relevant for the application. (a) The cumulative distribution function of LRS (red) and HRS (blue) of HfO2-based RRAM devices integrated in a 4kbit matrix. A SET operation is performed with VBL = 2 V, VWL = 1.3 (corresponding to ICC = 200 \(\upmu \)A) and Tpulse = 100 ns, whereas RESET is performed with VSL = 2.5V, VWL = 3.5 V and Tpulse = 100 ns. (b) The cumulative distribution function of the LRS with a range of programming currents when performing a SET operation over the 4kbit matrix. (Color figure online)

4 Simulation Results

A simulated DVS of resolution 10 \(\times \) 10 is stimulated with a horizontal grating moving upwards vertically over its input in time providing OFF pathway spikes for the network topology. Noise is also simulated through setting each DVS pixel to spike with a probability of 0.025 per step of the grating. The frequency of the grating is defined as pixels crossed per second of simulation time. Twenty 2D delay and correlate SNNs are used to span the visual field each containing thirteen neurons and twenty one synapses. In total the topology requires 256 neurons and 580 synapses for a DVS of resolution 10 \(\times \) 10. The simulation advances in discrete time steps at which the synaptic currents and neuron input and output voltages are updated with respect to their values at the previous timestep in accordance with the LIF neuron model and the first order dynamic synapse model [25]. The desired outcome is for the readout layer neuron denoting upward motion to fire at an elevated rate relative to the others. UP, DOWN, LEFT and RIGHT correspond to the number of times each readout neuron fires within one vertical sweep of the grating. The F1 score corresponding to correctly identified upward motion, defined in Eq. 1, is used to assess the network performance. F1 score ranges from a minimum value of zero to a maximum value of one. The grating is swept over the input at a range of frequencies and the F1 score is calculated per frequency. We refer to the plot of F1 score with grating frequency as the sensitivity tuning curve (STC).

$$\begin{aligned} F_{1} = \frac{2UP}{2UP + (DOWN + LEFT + RIGHT)} \end{aligned}$$
(1)
Fig. 7.
figure 7

Results of the spiking network simulation demonstrating the performance and power consumption of the two network configurations over a range of grating frequencies. (a) The sensitivity tuning curve corresponding to the performance of the topology in detecting the correct direction of motion over a range of grating frequencies. The red points correspond to the Slow network configuration and the green points to the Fast network configuration using the parameters from the genetic optimizations. (b) The firing-rate tuning curve corresponding to the response frequency of the neuron defined as the number of times the UP neuron fires per second over the same range of grating frequencies. It can be seen that resulting from the reduced response in the Slow configuration the power consumption is reduced in this state relative to that of the Fast configuration. (Color figure online)

Fig. 8.
figure 8

The network parameters used to produce the plots in Fig. 7. (a) The blue open circles represent LIF CMOS neurons, the green arrows excitatory synapses and the red dashed arrows inhibitory synapses. (b) The RRAM synapse time constants and conductances resulting from the genetic optimization for the Slow and Fast configurations. The colours of the conductance values refer to the colour of the trace in Fig. 6(b) which correspond to the value of ICC required to obtain the conductance value. (c) The LIF CMOS neuron time constants and threshold voltages resulting from the genetic optimization for the Slow and Fast configurations. Since the threshold voltages and readout synapse and neuron values are fixed they are the same for both configurations and are written in bold. The value of RRAM conductance variability for both configurations was set to 10% as observed in Fig. 6 for SET operations. (Color figure online)

Genetic algorithms (GA) are bio-inspired approaches to multi-parameter optimization problems simulating the process of natural selection to arrive at a near optimal solution [26] and are readily applied to optimize neural networks [27]. Here a GA is used to set the free parameters of the network. Therein, the conductances and time constants of the synapses and the threshold voltage and the input time constants of the neurons for the repeating 2D delay and correlate SNN and the readout network. The parameters for each of the four orthogonal 1D delay and correlate SNNs are the same such that each EMD is composed of four identical rotations of the same network sharing a common central node. Similarly, parameters for all readout layer neurons and synapses are set equal such that the full parameter space has twenty-four dimensions. The values for RRAM conductance are bounded between \(5\times 10^{-5}\)S and \(3\times 10^{-4}\)S corresponding to the range of values obtained through measurement of RRAM LRS with different values of ICC as in Fig. 6(b). Sixty networks are created per generation. First generation parameters are assigned by sampling from a uniform distribution between a lower and upper bound per parameter. A STC is produced for each of the sixty networks and the ten with the largest area under the curve (AUC) in addition to two randomly selected ones are recombined in pairs to produce the next generation. Parameters are assigned with a certain probability of mutation. Hard mutations occur with a probability of 0.05 whereby the parameter is randomly assigned a value from a uniform distribution between an upper and lower bound. A soft mutation occurs with a probability of 0.5 whereby the parameter is reassigned from a normal distribution around the value inherited from the parent with a standard deviation of 1%. The optimization terminates when the F1 score plateaus. Using the parameters of the best performing network after termination of the genetic optimization ten sensitivity tuning curves were obtained with ten instances of the topology and the means were plotted since the performance of each instance differs slightly due to the modelled inherent RRAM variability and simulated DVS noise. In total two network configurations were found inspired by Drosophila using octopamine to adapt its elementary motion detection neural network between a flying and resting state. One configuration is optimized to detect lower frequency gratings and the other to detect higher frequency gratings - they are referred to as the Slow and Fast network states respectively. The Slow and Fast configurations are plotted with a red dashed and green trace in Fig. 7(a) which shows that the Slow state accurately detects the direction of motion within a range of grating frequencies of 0.7−3 Hz while the Fast state does so within a range between 2–20 Hz. Further, with greater correspondence to experimental results reported in Fig. 1(c), a firing-rate tuning curve (FTC) is plotted in Fig. 7(b) corresponding to the frequency at which the UP neuron fires in the Slow and Fast network states. Since the area under the FTCs of Fig. 7(b) is greater over the range of grating frequencies when the network is in the Fast state than the Slow state the power consumed by the readout network in the Slow state is reduced by 42% relative to that of the Fast state. In Fig. 8(a) the schematic of the repeating structure across the topology is depicted. Figure 8(b) and (c) report the synapse and neuron parameters used to achieve the results of Fig. 7. To arrive at these values for the Slow state the network was optimized with the described genetic algorithm to have the greatest area under the STC within the range of 1–3 Hz. On this occasion values of the RRAM conductances were multiplied by one and a half times and a second genetic optimization to maximize the area under the STC between 10–25 Hz was performed with only the synapse and neuron input time constants as free parameters. The result of Fig. 7 shows that, like Drosophila, the range of velocities of stimulus to which the system is most sensitive to can be adapted by tuning the RRAM synapse conductances and the network component time constants. Additionally when the network is switched to the Slow state the power consumption of the system is reduced since it no longer responds to non-relevant higher velocity stimulus.

5 Conclusions

Making use of resistive memory and spiking neural networks an alternative approach to elementary motion detection was proposed. A DVS camera provided input on the visual field to an array of delay and correlate spiking neural networks composed of simulated LIF CMOS neurons and RRAM synapses based on HfO2 technology to compute the direction of motion and to detect collisions. The network was parameterized using a genetic algorithm and then through simulation sensitivity tuning curves demonstrated that it was able to detect the motion of a grating over a range of frequencies. As observed in Drosophila through tuning RRAM conductances and network component time constants the range of grating frequencies to which it is most sensitive to can be adapted. In addition it was seen that, like Drosophila, when the topology was parameterized such that it was sensitive to lower velocity stimulus the power consumption was reduced relative to the configuration for higher velocity stimulus.